Skip to content

LDSC implementation#91

Open
LArnoldt wants to merge 14 commits into
mainfrom
la_ldsc_docs_v2
Open

LDSC implementation#91
LArnoldt wants to merge 14 commits into
mainfrom
la_ldsc_docs_v2

Conversation

@LArnoldt

@LArnoldt LArnoldt commented Nov 6, 2025

Copy link
Copy Markdown
Collaborator

No description provided.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ayshan4444 This now has mygene and pybiomart. Could we use pybiomart throughout, since it's used everywhere else in the package, so we don't have two tools doing essentially the same task?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ayshan4444 This notebook does not show off the functionality for the continuous annotations (e.g. make_continuous_annot_from_bimfile), only the non-continuous, no? Could we show this and demonstrate the differences in the notebook?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it would be better if we merge make_continuous_annot_from_bimfile and make_annot_from_bimfile/ make_annot_from_donor_data and make_continuous_annot_from_donor_data/ make_annot_from_donor_data and making continuous/ binary an argument in the functions. I think this is less confusing for the user.

@@ -0,0 +1,7306 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    dd = get_onek1k(config_path="cellink/src/cellink/resources/config/onek1k.yaml", data_home="/project/genomics/ayshan/1k1k_dataset", verify_checksum=False)

Same here as in MAGMA tutorial. Ideally we could use get_dummy_onek1k here instead. Also same as mentioned in MAGMA tutorial: data_home should ideally be the default and then symlink to your path, so people can execute without changing anything.


Reply via ReviewNB

@@ -0,0 +1,7306 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Duncan et al 2025 do binary or continuous? I think would be good to show continuous in this tutorial as well?


Reply via ReviewNB

@Ayshan4444 Ayshan4444 Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duncan didnt specify which they did but in person they shared that it was continuous. How should I integrate that?

@@ -0,0 +1,914 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #32.    MAGMA_BIN = "/project/genomics/ayshan/ldsc_analysis/psychmagma/magma"

The magma bin needs to be downloadable in the notebook. See for example the scDRS tutorial (https://github.com/theislab/cellink/blob/main/docs/tutorials/scdrs_seismic.ipynb):

import requests, zipfile, io, os, stat

url = "https://vu.data.surf.nl/public.php/dav/files/1M1d9vHtVidEwvU/?accept=zip"

r = requests.get(url)
r.raise_for_status() 

with zipfile.ZipFile(io.BytesIO(r.content)) as z:
    z.extractall("magma")

for root, dirs, files in os.walk("magma"):
    for file in files:
        path = os.path.join(root, file)
        st = os.stat(path)
        os.chmod(path, st.st_mode | stat.S_IEXEC) 


Reply via ReviewNB

@@ -0,0 +1,914 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #34.    MAGMA_GENE_LOC  = "/project/genomics/ayshan/ldsc_analysis/data_2/annotation_sources/NCBI38.gene.loc"

Same for this file. Needs to be downloadable. If we don't have a downloader for this already (I don't think so), please add it to the pacakge.


Reply via ReviewNB

@@ -0,0 +1,914 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #35.    G1000_BFILE     = "/project/genomics/ayshan/ldsc_analysis/data_2/1000G_EUR_Phase3/g1000_eur"

Same for these files. Needs to be downloadable. The package already has a function for this:

from cellink.resources import get_1000genomes_ld_scores, get_1000genomes_ld_weights
ldscores_path, ldscores_prefix = get_1000genomes_ld_scores(
    config_path="../../src/cellink/resources/config/1000genomes.yaml", population="EUR", return_path=True
)
ldweights_path, ldweights_prefix = get_1000genomes_ld_weights(
    config_path="../../src/cellink/resources/config/1000genomes.yaml", population="EUR", return_path=True
)


Reply via ReviewNB

@@ -0,0 +1,914 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.        data_home="/project/genomics/ayshan/1k1k_dataset",

Ideally the tutorial should have no absolute paths, so users can just use it directly without adapting, could you just set a symlink in your home directory or similar, where data_home defaults to?


Reply via ReviewNB

@@ -0,0 +1,914 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is continuous S-LDSC here, and not in the other cell_level_Ldsc_analysis.ipynb tutorial?


Reply via ReviewNB

@@ -0,0 +1,914 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    GWAS_PVAL_FILE = "/project/genomics/ayshan/ldsc_analysis/MODEL/INPUT/magma_zscore/scz_gwas.txt"

Where is this file coming from? People can't reproduce the tutorial without this file then, ideally the tutorial should have no absolute paths, so users can just use it directly without adapting.


Reply via ReviewNB

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is a sample I preprocessed to run here, should I skip it?

@@ -0,0 +1,914 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #5.        bfile=G1000_BFILE,

Here the 1000G files downloaded from the package need to be used.


Reply via ReviewNB

@@ -0,0 +1,914 @@
{

@LArnoldt LArnoldt May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.    GENES_RAW = "/project/genomics/ayshan/ldsc_analysis/MODEL/INPUT/magma_zscore/scz_result.genes.raw"

Where is this file coming from? People can't reproduce the tutorial without this file then, ideally the tutorial should have no absolute paths, so users can just use it directly without adapting.


Reply via ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants