Skip to content

Finish deconvolution notebook draft#1011

Merged
sjspielman merged 5 commits into
masterfrom
sjspielman/deconvolution-draft-part2
Jun 1, 2026
Merged

Finish deconvolution notebook draft#1011
sjspielman merged 5 commits into
masterfrom
sjspielman/deconvolution-draft-part2

Conversation

@sjspielman
Copy link
Copy Markdown
Member

Closes #980

This PR wraps up the deconvolution notebook with a heatmap of co-occurring pairs and TSV export, so a pretty short PR (just wanted a fresh slate for reviewing this section). For this, I again use a 0.1 threshold but want to know specifically how results would change if we changed the threshold.

I think just this heatmap wraps the notebook up nicely, but just for posterity there are two things that I thought of but did not do because I think it's just too much.

  • More wrangling/plotting for looking at multiple cell types co-occurring. This is obviously too much, but sounds fun to me personally ;)
  • While I think it's best to be consistent and stick with 0.1, in my heart I want to show a second heatmap at 0.05 as well which brings lymphoid back into the mix, just to compare. Again, this just sounds fun to me personally!
image

As part of review let me know if you'd suggest any other additional or expanded content, but I think this is pretty solid with just the heatmap. This is the least confusing way I thought to write the code, but it manages of course to still be a bit confusing - any ideas for simplifying?
03-deconvolution.nb.html

@sjspielman sjspielman requested a review from jashapiro May 28, 2026 18:10
@sjspielman
Copy link
Copy Markdown
Member Author

sjspielman commented May 28, 2026

Hey look it's PR 1011! We (the royal we specifically) love to see it.

Copy link
Copy Markdown
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I think of co-occurance, What I usually want to see is how often two events co-occur given the frequency of each individual event, so the normalization you are doing here is not what I would expect.

The the co-occurance measure we are interested in is how the count you have compare to the expected count.
The expected count for each pair of cell types will be n_a * n_b / n_total (p(a) * p(b) * n_total, with some cancelling), so you should calculate that for each row, then you can calculate the ratio of the observed co-occurance to that expected by chance and then plot that.

I don't think we need to get into the full statistics of co-occurance analysis here, unless you really like saying "hypergeomtric distribution," but I do think we should show what the expectation is and characterize the results based on that, not just on the raw counts or constant scaling of that).

(I mostly resisted until now talking about how you can calculate these from the binary occurence matrix and a few matrix multiplications instead of joins, because I doubt we should go that route here, but that is how I would really implement it.)

Comment thread spatial/03-deconvolution.Rmd Outdated
Comment on lines +533 to +536
cooccur_df <- present_df |>
dplyr::inner_join(present_df, by = "barcode", relationship = "many-to-many") |>
# rename after self-joining
dplyr::rename(cell_type_a = cell_type.x, cell_type_b = cell_type.y) |>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When starting a join, I prefer to not pipe into the join statement. But the real reason for the comment was to skip the renaming and just set the suffixes in the join statement.

Suggested change
cooccur_df <- present_df |>
dplyr::inner_join(present_df, by = "barcode", relationship = "many-to-many") |>
# rename after self-joining
dplyr::rename(cell_type_a = cell_type.x, cell_type_b = cell_type.y) |>
cooccur_df <- dplyr::inner_join(
present_df,
present_df, # join table to itself
by = "barcode",
relationship = "many-to-many",
suffix = c("_a", "_b")
) |>

@sjspielman
Copy link
Copy Markdown
Member Author

I don't think we need to get into the full statistics of co-occurance analysis here, unless you really like saying "hypergeomtric distribution," but I do think we should show what the expectation is and characterize the results based on that, not just on the raw counts or constant scaling of that).

(I mostly resisted until now talking about how you can calculate these from the binary occurence matrix and a few matrix multiplications instead of joins, because I doubt we should go that route here, but that is how I would really implement it.)

Thanks, and definitely greed with all of this - I was trying to show something hinting at co-occurrence but I did not want to do it "properly" (please never read this sentence out of context 😂 ) aka with full statistics, and a more involved but more accurate implementation seemed like getting too into the weeds. So, we should really just be sure to emphasize that this is highly exploratory and there's a whole world of stats for more complicated analyses.

Either way I'll amend the calculations as suggested to get closer to a "real" analysis.

@sjspielman
Copy link
Copy Markdown
Member Author

Updated here: 03-deconvolution.nb.html

I ended up calculating the log ratio to center it at 0 for easier plot interpretation, and bonus it comes with a real life stats name! I also added some more contextualizing text. Notably, there are several chunks now to break up the calculation, and none of them are live which is good because the notebook is already long and many typos would occur otherwise.

@sjspielman sjspielman requested a review from jashapiro May 29, 2026 16:19
@jashapiro
Copy link
Copy Markdown
Member

Notably, there are several chunks now to break up the calculation, and none of them are live which is good because the notebook is already long and many typos would occur otherwise.

Probably a lot less if you had used matrices! But then you have to convince people that matrix multiplication actually works for co-occurance.

Copy link
Copy Markdown
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this update! Much better for interpretation. I had just a few comments about color and theme, but I don't think I need to see it again.

Comment thread spatial/03-deconvolution.Rmd Outdated
# Calculate ratios
cooccur_df <- observed_df |>
# combine data frames so we have a column for each of:
# expected count
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to have observed here instead? (You have a separate comment about expected)

Suggested change
# expected count
# observed count

Comment thread spatial/03-deconvolution.Rmd Outdated
Comment on lines +618 to +621
theme_classic() +
theme(
axis.title = element_blank(),
axis.text.x = element_text(angle = 30, hjust = 1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I get picky about the theme here? I don't like having only two axes on a heatmap. I'd rather have the full box or no axes at all. Personally I'd probably go for theme_bw and then remove the gridlines.

Comment thread spatial/03-deconvolution.Rmd Outdated
) +
# add scale that diverges from 0 in the middle
scale_fill_gradient2(
low = "steelblue", mid = "white", high = "firebrick",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the opportunity here to use something close to the Data Lab color scheme, and I say we take it. (feel free to tweak for optimal appearance)

Suggested change
low = "steelblue", mid = "white", high = "firebrick",
low = "blue3", mid = "white", high = "gold1",

Copy link
Copy Markdown
Member Author

@sjspielman sjspielman May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @allyhawkins if you'd like to take this chance to suggest different blues and yellows ;)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not U-M colors, Data Lab colors!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not U-M colors, Data Lab colors!

😢

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said she could suggest, not that I would use them!

@sjspielman sjspielman merged commit 24a715f into master Jun 1, 2026
5 checks passed
@sjspielman sjspielman deleted the sjspielman/deconvolution-draft-part2 branch June 1, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Draft spatial deconvolution notebook

3 participants