Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 3 additions & 47 deletions data/protocol/NCT01797120/README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,5 @@
# Synthetic Subject Data for NCT01797120

synthetic subject data for the breast cancer study NCT01797120 using the results published on clinicaltrials.gov (NCT01797120-results.fhir.json)


## Summary

| Metric | Target | Actual |
|---------------------------|---------------|---------------|
| Subjects (FULEV / FULPL) | 66 / 65 | 66 / 65 ✓ |
| PFS events FULEV | 39 | 39 ✓ |
| PFS events FULPL | 50 | 50 ✓ |
| ORR FULEV | 18.2% | 18.8% ✓ |
| ORR FULPL | 12.3% | 12.7% ✓ |
| CBR FULEV | 63.6% | 68.8% (~close)|
| CBR FULPL | 41.5% | 49.2% (~close)|


**CBR = CR + PR + SD ≥ 24 weeks**

Where:
CR — Complete Response: all target lesions disappear
PR — Partial Response: ≥30% decrease in sum of lesion diameters
SD — Stable Disease: neither CR/PR nor progression, sustained for at least 24 weeks (≈6 months)

The published targets for this trial were:

| Arm | CBR |
|-------------------------------|-------------------------------|
| Fulvestrant + Everolimus | 63.6% (42/66) |
| Fulvestrant + Placebo | 41.5% (27/65) |


The small Clinical Benefit Rate (CBR) overcount comes from the 2+4 "still on treatment" subjects (who remain in the EFFFL population) being assigned responses from the CBR-sized pool. PFS events and ORR hit their targets exactly.

Output files in *_test_data/_*:

| File | Rows | Content |
|-------------------|-------------------|-----------------------------------|
| DM.csv | 131 | Demographics, arm, site, dates |
| EX.csv | 2,099 | Fulvestrant IM doses + everolimus/placebo daily records |
| LB.csv | 19,068 | CBC, chemistry, lipids at every visit |
| VS.csv | 6,810 | BP, HR, temp, weight at every visit |
| TU.csv | 610 | RECIST target lesion diameters at imaging visits |
| RS.csv | 328 | Overall disease response at each assessment |
| ADSL.csv | 131 | Subject-level PFS/OS times, events, response flags |
| ADTTE.csv | 254 | One PFS + one OS record per treated subject |
# Project supporting files and data for NCT01797120

This directory contains the protocol and SAP documents for the NCT01797120 study as well as the results in FHIR format.

From the FHIR results, synthetic data was generated and appears in the `test_data` directory. The functions used to generate this data can be found in the `scripts` directory.
34 changes: 34 additions & 0 deletions data/protocol/NCT01797120/test_data/FEEDBACK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Recent feedback resulting in the new script generation for these datasets.

The `../scripts/cdisc_generation_functions.py` file hass been changed in accrodance with the feedback below. All CSV files in this direcetory have been generated with the latest `../scripts/cdisc_generation_functions.py`.

## DS Domain contains values not expected

| DSDECOD Value | Comment |
|---------------|-----------|
| RANDOMIZED | This is a value in the Protocol Milestone codelist. The `DSCAT` for the record would be "PROTOCOL MILESTONE" |
| TREATMENT | I don't understand what a record this value is supposed to mean. This is not a value in any of the codelists for `DSDECOD`. Dates are between those for "RANDOMIZED" records and records with other `DSDECOD` values, but are anywhere from a few weeks to a few months after the "RANDOMIZED" record. |
| PROGRESSIVE DISEASE | This is a value in the Completion/Reason for Non-Completion code. The `DSCAT` for the record would be "DISPOSITION EVENT". We would also expect a `DSSCAT` value, probably "STUDY TREATMENT", since in this study, subjects are followed (ideally) until death, even if they've stopped treatment. Most subjects seem to two have records for the same date with `DSDECOD = "PROGESSIVE DISEASE"`, one with `EPOCH = "TREATMENT"` and one with `EPOCH = "FOLLOW-UP"`. This doesn't make sense since any particular date falls into only on EPOCH, and there is only one disposition event of ending treatment for progressive disease. Actually, since there are two treatments in the study, if the treatments were stopped at different times, it would be possible to have two disposition events for ending treatment, one with `DSSCAT = "FULVESTRANT"` and one with `DSSCAT = "EVEROLIMUS"`. |
| COMPLETED | This is a term in the Completion/Reason for Non-Completion codelist, used for records with `DSCAT = "DISPOSITION EVENT"`. For patients with records with this value, the record is always the third record, after records for "RANDOMIZED" and "TREATED". This doesn't make sense in this trial, where it's not clear what normal completion of the trial would be, since subjects are supposed to be followed until death. If a subject does die, one would expect DSDECOD to be "DEATH". |
| WITHDRAWAL BY SUBJECT | This is a term in the Completion/Reason for Non-Completion codelist, used for records with DSCAT = "DISPOSITION EVENT". For patients with records with this value, the record is always the third record, after records for "RANDOMIZED" and "TREATED". |


I compared `DS` with `DM` and saw that everyone in the trial has the same `RFSTDTC`, and that this matches the date of the DS record with `DSDECOD = "RANDOMIZED"` for every subject. It's not realistic that all subjects would have started treatment on the same day. It's possible that some subjects started treatment a day or two after randomization, though one would try to start treatment as soon as a subject is randomized. So the fact that `RFSTDTC` is always the same as the date of randomization in `DS` doesn't bother me.

Reviewers would expect to have `DTHFL` included in the `DM` dataset and `DTHDTC` to be populated if `DTHFL = "Y"`. Admittedly, the fact that a patient died is usually collected in some other domain (probably `DS`), and added to `DM`.

`TRT01A` is an ADaM variable, not an SDTM variable. the arm to which a subject was randomized would be represented in some combination of `ARMCD`, `ARM`, `ACTARMCD`, and `ACTARM`. `ARMCD` and `ARM` are code and text for an arm, as are `ACTARMCD` and `ACTARM`. `ARMCD/ARM` are the same as `ACTARMCD/ACTARM` unless a subject receives no treatment (in which case `ACTARMCD/ACTARM` are null) a subject receives a treatment other than that to which they were randomized. I don't think we need to build the treated-wrong situation into the synthetic data, although I think the study included a couple of subjects who were never treated. What's currently in `TRT01A` would probably be in `ACTARM` and `ARM`.

`EXSEQ` is incorrectly populated. `--SEQ` distinguishes between records for a subject.


The `EX` dataset is missing `EXFREQ`.
`EXFREQ` for fulvestrant injections would likely be "ONCE" with a record for each injection, and `EXSTDTC = EXENDTC`. Given the dosing schedule (Cycle 1 Day 1, Cycle 1 Day 15, then Day 1 of every subsequent cycle), the minimum number of records would be two, one for the first two doses given at a frequence of every 14 days, and a second for all the remaining injections, with a frequency of every 28 days. Practically, since patient visits drift off-schedule the one record per dose approach is probably more practical.
A single record could be used for everolimus with `EXFREQ = "OD"` for everolimus and `EXSTDTC` and `EXENDTC` the start and end dates of the series of tablets.


The `EXTRT` value `"STUDY DRUG"` is wrong. `EXTRT` would have the actual study drug. A blinded representation is possible in the EC dataset. However, neither of the two study drugs is given at 100 mg. Fulvestrant is given as injections of 250 or 500 mg on particular days. Everolimus or matching placebo pills are given every day. The `ECTRT` value would probably be something like "Everolimus/Placebo" with a unit of "TABLET".


It is likely that this study, which is blinded, would have both `EC` and `EX`. For analysis, `EX` is the one needed. The true drug names would be in `EXTRT` and dose would be expressed in mg for both drugs.
In the current dataset, the dates in `EXSTDTC` and `EXENDTC` might be meant as the first and last dates of any treatment. Those dates are derived from `EX` and represented in the DM dataset in `RFXSTDTC` and `RFXENDTC`.
19 changes: 19 additions & 0 deletions data/protocol/NCT01797120/test_data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Synthetic Oncology Clinical Trial Dataset

Synthetic CDISC-compliant datasets modeled after **NCT01797120 (PrE0102)** — a Phase II randomized trial of Fulvestrant ± Everolimus in postmenopausal HR+ metastatic breast cancer.

> **All data is purely synthetic.** Study ID `SYNTH-ONC-001`. No real patient data is present in this repository.

---

## Source Trial Summary

| Field | Value |
|---|---|
| Trial | NCT01797120 (PrE0102) |
| Indication | HR+ HER2− metastatic breast cancer (AI-resistant, postmenopausal) |
| Design | Phase II, randomized, double-blind, placebo-controlled |
| Arms | Fulvestrant 500 mg + Everolimus 10 mg (n=66) vs. Fulvestrant + Placebo (n=65) |
| Primary endpoint | Progression-Free Survival (PFS) |
| Published PFS | 10.3 months (treatment) vs. 5.1 months (placebo) |
| Publication | *Journal of Clinical Oncology*, June 2018 (PMID: 29664714) |