-
Notifications
You must be signed in to change notification settings - Fork 5
Unexplained drop in patients when Removing events outside window #362
Description
@GvL1992 found this issue and can probably explain it better than I can.
We have identified multiple studies were we see a significant drop in persons included in TreatmentPatterns at the step "Removing events outside window" that cannot be explained.
In the original MultipleMyeloma study we see 778 patients included in Bordeaux treatment patterns results but 1930 patients in the MM cohort. So more than half are excluded. We don't have attrition in the app so it's possible that these patients don't have treatments within the time window.
https://data.darwin-eu.org/EUPAS105033/
In the current AML study we see a drop from 1978 to 900 on CRN when filtering patients without treatments in the time window. The time window for this study was start of AML to end of observation time.
Again this could be because there are no treatments for these patients.
In the current multiple myeloma routine repeated study we the same drop but also have more diagnostic information to help understand the cohorts being fed into treatment patterns.
We only have detailed data for IPCI. In this case we are looking at MM patients with a window between 90 days before index to 5 years after index date.
We can see that we drop from 573 to 281 (roughly 50% drop).
Now in this case we have some data on the timing between the treatment cohort start date and the MM index date. Spcifically consider cyclophosphamide.
We should have 41 patients getting cyclophosphamide. The median time from MM to cyclophosphamide is
Median [Q25 - Q75] | 674 [76 - 1,412] so most patients should be within the event window for treatment patterns.
And yet we see no patients with cyclophosphamide in TreatmentPatterns.
It's difficult to say for sure this is for sure an error and to reproduce it without IPCI. I've looked at the treatment patterns code that does this exclusion and it does appear to be correct.
The key code is
dplyr::full_join(
x = andromeda$eventCohorts,
y = andromeda$targetCohorts,
by = dplyr::join_by(
personId == personId,
subject_id_origin == subject_id_origin,
y$indexDate <= x$startDate,
x$startDate <= y$endDate,
), suffix = c("Event", "Target")
)
I'm wondering if there could be an issue with the full outer join and the inequality join condition. I'm going to try changing this to an inner_join on person_id followed by a filter for the date condition and see if we get the same results.