Finish deconvolution notebook draft by sjspielman · Pull Request #1011 · AlexsLemonade/training-modules

sjspielman · 2026-05-28T18:10:37Z

Closes #980

This PR wraps up the deconvolution notebook with a heatmap of co-occurring pairs and TSV export, so a pretty short PR (just wanted a fresh slate for reviewing this section). For this, I again use a 0.1 threshold but want to know specifically how results would change if we changed the threshold.

I think just this heatmap wraps the notebook up nicely, but just for posterity there are two things that I thought of but did not do because I think it's just too much.

More wrangling/plotting for looking at multiple cell types co-occurring. This is obviously too much, but sounds fun to me personally ;)
While I think it's best to be consistent and stick with 0.1, in my heart I want to show a second heatmap at 0.05 as well which brings lymphoid back into the mix, just to compare. Again, this just sounds fun to me personally!

As part of review let me know if you'd suggest any other additional or expanded content, but I think this is pretty solid with just the heatmap. This is the least confusing way I thought to write the code, but it manages of course to still be a bit confusing - any ideas for simplifying?
03-deconvolution.nb.html

sjspielman · 2026-05-28T18:11:08Z

Hey look it's PR 1011! We (the royal we specifically) love to see it.

jashapiro

When I think of co-occurance, What I usually want to see is how often two events co-occur given the frequency of each individual event, so the normalization you are doing here is not what I would expect.

The the co-occurance measure we are interested in is how the count you have compare to the expected count.
The expected count for each pair of cell types will be n_a * n_b / n_total (p(a) * p(b) * n_total, with some cancelling), so you should calculate that for each row, then you can calculate the ratio of the observed co-occurance to that expected by chance and then plot that.

I don't think we need to get into the full statistics of co-occurance analysis here, unless you really like saying "hypergeomtric distribution," but I do think we should show what the expectation is and characterize the results based on that, not just on the raw counts or constant scaling of that).

(I mostly resisted until now talking about how you can calculate these from the binary occurence matrix and a few matrix multiplications instead of joins, because I doubt we should go that route here, but that is how I would really implement it.)

jashapiro · 2026-05-29T00:33:48Z

+cooccur_df <- present_df |>
+  dplyr::inner_join(present_df, by = "barcode", relationship = "many-to-many") |>
+  # rename after self-joining
+  dplyr::rename(cell_type_a = cell_type.x, cell_type_b = cell_type.y) |>


When starting a join, I prefer to not pipe into the join statement. But the real reason for the comment was to skip the renaming and just set the suffixes in the join statement.

Suggested change

cooccur_df <- present_df |>

dplyr::inner_join(present_df, by = "barcode", relationship = "many-to-many") |>

# rename after self-joining

dplyr::rename(cell_type_a = cell_type.x, cell_type_b = cell_type.y) |>

cooccur_df <- dplyr::inner_join(

present_df,

present_df, # join table to itself

by = "barcode",

relationship = "many-to-many",

suffix = c("_a", "_b")

) |>

sjspielman · 2026-05-29T14:12:26Z

I don't think we need to get into the full statistics of co-occurance analysis here, unless you really like saying "hypergeomtric distribution," but I do think we should show what the expectation is and characterize the results based on that, not just on the raw counts or constant scaling of that).

(I mostly resisted until now talking about how you can calculate these from the binary occurence matrix and a few matrix multiplications instead of joins, because I doubt we should go that route here, but that is how I would really implement it.)

Thanks, and definitely greed with all of this - I was trying to show something hinting at co-occurrence but I did not want to do it "properly" (please never read this sentence out of context 😂 ) aka with full statistics, and a more involved but more accurate implementation seemed like getting too into the weeds. So, we should really just be sure to emphasize that this is highly exploratory and there's a whole world of stats for more complicated analyses.

Either way I'll amend the calculations as suggested to get closer to a "real" analysis.

…oduce some caveats, updated heatmap accordingly

sjspielman · 2026-05-29T16:19:10Z

Updated here: 03-deconvolution.nb.html

I ended up calculating the log ratio to center it at 0 for easier plot interpretation, and bonus it comes with a real life stats name! I also added some more contextualizing text. Notably, there are several chunks now to break up the calculation, and none of them are live which is good because the notebook is already long and many typos would occur otherwise.

jashapiro · 2026-05-29T16:39:35Z

Notably, there are several chunks now to break up the calculation, and none of them are live which is good because the notebook is already long and many typos would occur otherwise.

Probably a lot less if you had used matrices! But then you have to convince people that matrix multiplication actually works for co-occurance.

jashapiro

Thanks for this update! Much better for interpretation. I had just a few comments about color and theme, but I don't think I need to see it again.

jashapiro · 2026-05-29T16:44:09Z

+# Calculate ratios
+cooccur_df <- observed_df |>
+  # combine data frames so we have a column for each of:
+  # expected count


did you mean to have observed here instead? (You have a separate comment about expected)

Suggested change

# expected count

# observed count

jashapiro · 2026-05-29T16:51:05Z

+  theme_classic() +
+  theme(
+    axis.title = element_blank(),
+    axis.text.x = element_text(angle = 30, hjust = 1)


Can I get picky about the theme here? I don't like having only two axes on a heatmap. I'd rather have the full box or no axes at all. Personally I'd probably go for theme_bw and then remove the gridlines.

jashapiro · 2026-05-29T16:56:04Z

+  ) +
+  # add scale that diverges from 0 in the middle
+  scale_fill_gradient2(
+    low = "steelblue", mid = "white", high = "firebrick",


We have the opportunity here to use something close to the Data Lab color scheme, and I say we take it. (feel free to tweak for optimal appearance)

Suggested change

low = "steelblue", mid = "white", high = "firebrick",

low = "blue3", mid = "white", high = "gold1",

cc @allyhawkins if you'd like to take this chance to suggest different blues and yellows ;)

No, not U-M colors, Data Lab colors!

No, not U-M colors, Data Lab colors!

😢

I said she could suggest, not that I would use them!

sjspielman added 2 commits May 28, 2026 14:00

add the heatmap

86739e0

Merge branch 'master' into sjspielman/deconvolution-draft-part2

b2d5e4e

sjspielman requested a review from jashapiro May 28, 2026 18:10

jashapiro reviewed May 29, 2026

View reviewed changes

sjspielman added 2 commits May 29, 2026 12:15

observed/expected calculations with expanded text to explain and intr…

ee22e35

…oduce some caveats, updated heatmap accordingly

spelling

8866e71

sjspielman requested a review from jashapiro May 29, 2026 16:19

jashapiro approved these changes May 29, 2026

View reviewed changes

fix comment and heatmap styling including colors

34b35cc

sjspielman merged commit 24a715f into master Jun 1, 2026
5 checks passed

sjspielman deleted the sjspielman/deconvolution-draft-part2 branch June 1, 2026 14:34

	low = "steelblue", mid = "white", high = "firebrick",
	low = "blue3", mid = "white", high = "gold1",

Uh oh!

Conversation

sjspielman commented May 28, 2026

Uh oh!

sjspielman commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jashapiro left a comment

Choose a reason for hiding this comment

Uh oh!

jashapiro May 29, 2026

Choose a reason for hiding this comment

Uh oh!

sjspielman commented May 29, 2026

Uh oh!

sjspielman commented May 29, 2026

Uh oh!

jashapiro commented May 29, 2026

Uh oh!

jashapiro left a comment

Choose a reason for hiding this comment

Uh oh!

jashapiro May 29, 2026

Choose a reason for hiding this comment

Uh oh!

jashapiro May 29, 2026

Choose a reason for hiding this comment

Uh oh!

jashapiro May 29, 2026

Choose a reason for hiding this comment

Uh oh!

sjspielman May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jashapiro May 29, 2026

Choose a reason for hiding this comment

Uh oh!

allyhawkins May 29, 2026

Choose a reason for hiding this comment

Uh oh!

sjspielman May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sjspielman commented May 28, 2026 •

edited

Loading

sjspielman May 29, 2026 •

edited

Loading