Consider a single array from Agilent Cytogenomics. I preprocessed it with two different methods:
- rCGH
- With
limma, following the instructions in the first part of the cghMCR vignette
The two procedures produce vastly different number of segments after normalization and preprocessing. 1. yields 89 segments, while 2. yields approximately 300 segments.
The downside then lies when you have specific regions / genes to check. Even using GC correction and ensuring the right peak is used for EM normalization, the log2ratios in 1. are higher than 2. by at least a factor of 1 in log scale. This can lead to bogus copy number estimations, as the validations showed a much lower copy number (between 3 and 4, closer to the estimate made by 2., while 1. was almost 10).
The main reason is that there are a lot fewer segments in 1. than in 2. and that skews calculations. Setting the distance to join segments to 0 (from 10kbp default) doesn't improve the situation.
Consider a single array from Agilent Cytogenomics. I preprocessed it with two different methods:
limma, following the instructions in the first part of thecghMCRvignetteThe two procedures produce vastly different number of segments after normalization and preprocessing. 1. yields 89 segments, while 2. yields approximately 300 segments.
The downside then lies when you have specific regions / genes to check. Even using GC correction and ensuring the right peak is used for EM normalization, the log2ratios in 1. are higher than 2. by at least a factor of 1 in log scale. This can lead to bogus copy number estimations, as the validations showed a much lower copy number (between 3 and 4, closer to the estimate made by 2., while 1. was almost 10).
The main reason is that there are a lot fewer segments in 1. than in 2. and that skews calculations. Setting the distance to join segments to 0 (from 10kbp default) doesn't improve the situation.