Skip to content

Unexplained drop in patients when Removing events outside window #362

@ablack3

Description

@ablack3

@GvL1992 found this issue and can probably explain it better than I can.

We have identified multiple studies were we see a significant drop in persons included in TreatmentPatterns at the step "Removing events outside window" that cannot be explained.

In the original MultipleMyeloma study we see 778 patients included in Bordeaux treatment patterns results but 1930 patients in the MM cohort. So more than half are excluded. We don't have attrition in the app so it's possible that these patients don't have treatments within the time window.

https://data.darwin-eu.org/EUPAS105033/

Image Image

In the current AML study we see a drop from 1978 to 900 on CRN when filtering patients without treatments in the time window. The time window for this study was start of AML to end of observation time.

Image

Again this could be because there are no treatments for these patients.

In the current multiple myeloma routine repeated study we the same drop but also have more diagnostic information to help understand the cohorts being fed into treatment patterns.

We only have detailed data for IPCI. In this case we are looking at MM patients with a window between 90 days before index to 5 years after index date.

Image

We can see that we drop from 573 to 281 (roughly 50% drop).

Now in this case we have some data on the timing between the treatment cohort start date and the MM index date. Spcifically consider cyclophosphamide.

Image

We should have 41 patients getting cyclophosphamide. The median time from MM to cyclophosphamide is
Median [Q25 - Q75] | 674 [76 - 1,412] so most patients should be within the event window for treatment patterns.

And yet we see no patients with cyclophosphamide in TreatmentPatterns.

Image

It's difficult to say for sure this is for sure an error and to reproduce it without IPCI. I've looked at the treatment patterns code that does this exclusion and it does appear to be correct.

The key code is

dplyr::full_join(
    x = andromeda$eventCohorts,
    y = andromeda$targetCohorts,
    by = dplyr::join_by(
      personId == personId,
      subject_id_origin == subject_id_origin,
      y$indexDate <= x$startDate,
      x$startDate <= y$endDate,
    ), suffix = c("Event", "Target")
)

I'm wondering if there could be an issue with the full outer join and the inequality join condition. I'm going to try changing this to an inner_join on person_id followed by a filter for the date condition and see if we get the same results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions