LDSC implementation#91
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
@Ayshan4444 This now has mygene and pybiomart. Could we use pybiomart throughout, since it's used everywhere else in the package, so we don't have two tools doing essentially the same task?
There was a problem hiding this comment.
@Ayshan4444 This notebook does not show off the functionality for the continuous annotations (e.g. make_continuous_annot_from_bimfile), only the non-continuous, no? Could we show this and demonstrate the differences in the notebook?
There was a problem hiding this comment.
I think, it would be better if we merge make_continuous_annot_from_bimfile and make_annot_from_bimfile/ make_annot_from_donor_data and make_continuous_annot_from_donor_data/ make_annot_from_donor_data and making continuous/ binary an argument in the functions. I think this is less confusing for the user.
| @@ -0,0 +1,7306 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #2. dd = get_onek1k(config_path="cellink/src/cellink/resources/config/onek1k.yaml", data_home="/project/genomics/ayshan/1k1k_dataset", verify_checksum=False)
Same here as in MAGMA tutorial. Ideally we could use get_dummy_onek1k here instead. Also same as mentioned in MAGMA tutorial: data_home should ideally be the default and then symlink to your path, so people can execute without changing anything.
Reply via ReviewNB
| @@ -0,0 +1,7306 @@ | |||
| { | |||
There was a problem hiding this comment.
Does Duncan et al 2025 do binary or continuous? I think would be good to show continuous in this tutorial as well?
Reply via ReviewNB
There was a problem hiding this comment.
Duncan didnt specify which they did but in person they shared that it was continuous. How should I integrate that?
| @@ -0,0 +1,914 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #32. MAGMA_BIN = "/project/genomics/ayshan/ldsc_analysis/psychmagma/magma"
The magma bin needs to be downloadable in the notebook. See for example the scDRS tutorial (https://github.com/theislab/cellink/blob/main/docs/tutorials/scdrs_seismic.ipynb):
import requests, zipfile, io, os, stat
url = "https://vu.data.surf.nl/public.php/dav/files/1M1d9vHtVidEwvU/?accept=zip"
r = requests.get(url)
r.raise_for_status()
with zipfile.ZipFile(io.BytesIO(r.content)) as z:
z.extractall("magma")
for root, dirs, files in os.walk("magma"):
for file in files:
path = os.path.join(root, file)
st = os.stat(path)
os.chmod(path, st.st_mode | stat.S_IEXEC)
Reply via ReviewNB
| @@ -0,0 +1,914 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #34. MAGMA_GENE_LOC = "/project/genomics/ayshan/ldsc_analysis/data_2/annotation_sources/NCBI38.gene.loc"
Same for this file. Needs to be downloadable. If we don't have a downloader for this already (I don't think so), please add it to the pacakge.
Reply via ReviewNB
| @@ -0,0 +1,914 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #35. G1000_BFILE = "/project/genomics/ayshan/ldsc_analysis/data_2/1000G_EUR_Phase3/g1000_eur"
Same for these files. Needs to be downloadable. The package already has a function for this:
from cellink.resources import get_1000genomes_ld_scores, get_1000genomes_ld_weights
ldscores_path, ldscores_prefix = get_1000genomes_ld_scores(
config_path="../../src/cellink/resources/config/1000genomes.yaml", population="EUR", return_path=True
)
ldweights_path, ldweights_prefix = get_1000genomes_ld_weights(
config_path="../../src/cellink/resources/config/1000genomes.yaml", population="EUR", return_path=True
)
Reply via ReviewNB
| @@ -0,0 +1,914 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #3. data_home="/project/genomics/ayshan/1k1k_dataset",
Ideally the tutorial should have no absolute paths, so users can just use it directly without adapting, could you just set a symlink in your home directory or similar, where data_home defaults to?
Reply via ReviewNB
| @@ -0,0 +1,914 @@ | |||
| { | |||
There was a problem hiding this comment.
Why is continuous S-LDSC here, and not in the other cell_level_Ldsc_analysis.ipynb tutorial?
Reply via ReviewNB
| @@ -0,0 +1,914 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #2. GWAS_PVAL_FILE = "/project/genomics/ayshan/ldsc_analysis/MODEL/INPUT/magma_zscore/scz_gwas.txt"
Where is this file coming from? People can't reproduce the tutorial without this file then, ideally the tutorial should have no absolute paths, so users can just use it directly without adapting.
Reply via ReviewNB
There was a problem hiding this comment.
this file is a sample I preprocessed to run here, should I skip it?
| @@ -0,0 +1,914 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #5. bfile=G1000_BFILE,
Here the 1000G files downloaded from the package need to be used.
Reply via ReviewNB
| @@ -0,0 +1,914 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #3. GENES_RAW = "/project/genomics/ayshan/ldsc_analysis/MODEL/INPUT/magma_zscore/scz_result.genes.raw"
Where is this file coming from? People can't reproduce the tutorial without this file then, ideally the tutorial should have no absolute paths, so users can just use it directly without adapting.
Reply via ReviewNB
No description provided.