-
Notifications
You must be signed in to change notification settings - Fork 11
Does not work with UNIQUEKEY on longitudinal data #22
Description
For longitudinal data, the dbGaP Submission Guide indicates the following:
A second subject phenotypes DS may include all the variables that change per event or time for a person. For example, when a dataset has a single SUBJECT_ID listed multiple times due to measures collected at different events, this would be considered a longitudinal dataset. To make a row unique, unique (composite) keys should have scientific significance and aid in searching for covariate data. Unique keys should not be marked for every single variable in the dataset. Going back to the example, in the corresponding DD, mark an "X" under the UNIQUEKEY column for the variables SUBJECT_ID + EVENT. This means that for each subject at some particular event, there are some set of relevant data collected.
However, when gaptools is run on a longitudinal file where there are two X's in the UNIQUEKEY column as directed, it erroneously complains that there is an error:
ERROR: E0102_Duplicated_Id (n=71)
DESCRIPTION: IDs are duplicated. Each person should only have a single subject ID; each sample ID should be represented in a si
ngle row. Remove repeating IDs.
Example(s):
- SUBJECT_ID | Rows
- SUBJ060 | 2,8,120
- SUBJ054 | 3,7
- SUBJ102 | 4,6,98
- SUBJ002 | 5,9
- SUBJ052 | 10,14,138
- SUBJ088 | 11,17,141
- SUBJ092 | 12,15,99
- SUBJ074 | 13,16,205
- SUBJ037 | 18,20
- SUBJ038 | 19,21
Thank you,
Dan Weeks