Skip to content

Segmentation can be a little too aggressive #5

@lbeltrame

Description

@lbeltrame

Consider a single array from Agilent Cytogenomics. I preprocessed it with two different methods:

  1. rCGH
  2. With limma, following the instructions in the first part of the cghMCR vignette

The two procedures produce vastly different number of segments after normalization and preprocessing. 1. yields 89 segments, while 2. yields approximately 300 segments.

The downside then lies when you have specific regions / genes to check. Even using GC correction and ensuring the right peak is used for EM normalization, the log2ratios in 1. are higher than 2. by at least a factor of 1 in log scale. This can lead to bogus copy number estimations, as the validations showed a much lower copy number (between 3 and 4, closer to the estimate made by 2., while 1. was almost 10).

The main reason is that there are a lot fewer segments in 1. than in 2. and that skews calculations. Setting the distance to join segments to 0 (from 10kbp default) doesn't improve the situation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions