Skip to content

Does not work with UNIQUEKEY on longitudinal data #22

@DanielEWeeks

Description

@DanielEWeeks

For longitudinal data, the dbGaP Submission Guide indicates the following:

A second subject phenotypes DS may include all the variables that change per event or time for a person. For example, when a dataset has a single SUBJECT_ID listed multiple times due to measures collected at different events, this would be considered a longitudinal dataset. To make a row unique, unique (composite) keys should have scientific significance and aid in searching for covariate data. Unique keys should not be marked for every single variable in the dataset. Going back to the example, in the corresponding DD, mark an "X" under the UNIQUEKEY column for the variables SUBJECT_ID + EVENT. This means that for each subject at some particular event, there are some set of relevant data collected.

However, when gaptools is run on a longitudinal file where there are two X's in the UNIQUEKEY column as directed, it erroneously complains that there is an error:

ERROR: E0102_Duplicated_Id (n=71)
DESCRIPTION: IDs are duplicated. Each person should only have a single subject ID; each sample ID should be represented in a si ngle row. Remove repeating IDs.
Example(s):

  • SUBJECT_ID | Rows
  • SUBJ060 | 2,8,120
  • SUBJ054 | 3,7
  • SUBJ102 | 4,6,98
  • SUBJ002 | 5,9
  • SUBJ052 | 10,14,138
  • SUBJ088 | 11,17,141
  • SUBJ092 | 12,15,99
  • SUBJ074 | 13,16,205
  • SUBJ037 | 18,20
  • SUBJ038 | 19,21

Thank you,
Dan Weeks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions