Hello,
I am using FarmCPUpp with my own genotype (GD), genotype map (GM), and phenotype (Y) files.
When I run farmcpu(), I notice that some SNPs in the GWAS output have all results as NA:
From my observation, these NA results happen whenever the corresponding SNP in the GD file contains at least one missing value (NA).
My question
- Does FarmCPUpp currently support handling SNPs with missing values?
- Should missing values in the numeric GD file be represented as
NA, -9, or something else?
- If missing data are not supported, what is the recommended pre-processing approach (e.g. imputation, filtering)?
Example of my workflow
library(bigmemory)
library(FarmCPUpp)
myY <- read.table("taxa.txt", header = TRUE, stringsAsFactors = FALSE)
myGM <- read.table("mdp_SNP_information.txt", header = TRUE, stringsAsFactors = FALSE)
myGD <- read.big.matrix("mdp_numeric.txt",
type = "double", sep = "\t", header = TRUE,
col.names = myGM$SNP, ignore.row.names = FALSE,
has.row.names = TRUE,
backingfile = "mdp_numeric.bin",
descriptorfile = "mdp_numeric.desc")
myResults <- farmcpu(Y = myY, GD = myGD, GM = myGM)
Could you please clarify:
What is the correct way to represent missing genotypes in the numeric GD file?
If missing values are not supported, is imputation required before running FarmCPUpp?
Thank you for your time and for maintaining this package!
Hello,
I am using FarmCPUpp with my own genotype (GD), genotype map (GM), and phenotype (Y) files.
When I run
farmcpu(), I notice that some SNPs in the GWAS output have all results asNA:From my observation, these
NAresults happen whenever the corresponding SNP in the GD file contains at least one missing value (NA).My question
NA,-9, or something else?Example of my workflow