diff --git a/README.md b/README.md index 49ec2a1..31a69e9 100644 --- a/README.md +++ b/README.md @@ -26,9 +26,9 @@ For example, in a TMAP of pet breed images, following the branch from terriers t Because the layout is a tree, you get operations that point clouds can't support: ```python -path = model.path(idx_a, idx_b) # nodes along the tree path -d = model.distance(idx_a, idx_b # sum of edge weights along the path -pseudotime = model.distances_from(idx) # tree distance from one point to all others +path = model.path(idx_a, idx_b) # nodes along the tree path +d = model.distance(idx_a, idx_b) # sum of edge weights along the path +pseudotime = model.distances_from(idx) # tree distance from one point to all others ``` ## Installation @@ -116,22 +116,22 @@ from tmap.utils.singlecell import from_anndata | Notebook | Topic | |----------|-------| -| [01 Quick Start](notebooks/01_quickstart.ipynb) | End-to-end walkthrough | -| [02 MinHash Deep Dive](notebooks/02_minhash_deep_dive.ipynb) | Encoding methods and when to use each | -| [03 Legacy LSH Pipeline](notebooks/03_legacy_lsh_pipeline.ipynb) | Lower-level MinHash + LSHForest + layout workflow | -| [04 Notebook Widgets](notebooks/04_jscatter_demo.ipynb) | Selection, filtering, zoom, export | +| [01 Quickstart](notebooks/01_quickstart.ipynb) | Shortest end-to-end walkthrough on a small molecule table | +| [02 Cheminformatics](notebooks/02_cheminformatics.ipynb) | SMILES → fingerprints → interactive molecular map | +| [03 Continuous Embeddings](notebooks/03_continuous_embeddings.ipynb) | Cosine and euclidean on MNIST: when to use each | +| [04 What's New](notebooks/04_new_functionalities.ipynb) | `add_points`, `transform`, tree paths, save/load, external kNN | | [05 Single-Cell](notebooks/05_single_cell.ipynb) | RNA-seq with PBMC 3k, pseudotime, UMAP comparison | -| [06 Metric Guide](notebooks/06_metric_guide.ipynb) | Choosing the right metric | -| [07 FAQ](notebooks/07_faq.ipynb) | Troubleshooting and common questions | -| [08 Cheminformatics](notebooks/08_cheminformatics.ipynb) | Molecules, fingerprints, SAR | -| [09 Protein Analysis](notebooks/09_protein_analysis.ipynb) | FASTA, ESM embeddings, AlphaFold | -| [11 Card Configuration](notebooks/11_card_configuration.ipynb) | Pinned card layout, fields, and links | -| [11 Default Params Benchmark](notebooks/11_default_params_benchmark.ipynb) | Defaults across dataset sizes and types | -| [12 USearch Jaccard](notebooks/12_usearch_jaccard.ipynb) | Binary Jaccard with USearch backend | +| [06 FAQ](notebooks/06_faq.ipynb) | Troubleshooting and common questions | +| [07 MinHash Deep Dive](notebooks/07_minhash_deep_dive.ipynb) | Encoding methods and when to use each | +| [08 Notebook Widgets](notebooks/08_jscatter_demo.ipynb) | Coloring, tooltips, lasso selection with jupyter-scatter | +| [09 Card Configuration](notebooks/09_card_configuration.ipynb) | Pinned card layout, fields, and links | +| [10 Protein Analysis](notebooks/10_protein_analysis.ipynb) | FASTA, ESM embeddings, AlphaFold | +| [11 USearch Jaccard](notebooks/11_usearch_jaccard.ipynb) | Native binary Jaccard backend (high recall, low memory) | +| [12 Legacy LSH Pipeline](notebooks/12_legacy_lsh_pipeline.ipynb) | Lower-level MinHash + LSHForest + layout workflow | ## Lower-Level Pipeline -For direct control over indexing, hashing, and layout, see the [legacy pipeline notebook](notebooks/03_legacy_lsh_pipeline.ipynb). The main building blocks: +For direct control over indexing, hashing, and layout, see the [legacy pipeline notebook](notebooks/12_legacy_lsh_pipeline.ipynb). The main building blocks: ```python from tmap.index import USearchIndex # dense / binary kNN diff --git a/notebooks/01_quickstart.ipynb b/notebooks/01_quickstart.ipynb index 008a728..8ca588d 100644 --- a/notebooks/01_quickstart.ipynb +++ b/notebooks/01_quickstart.ipynb @@ -163,7 +163,7 @@ "source": [ "## What Next\n", "\n", - "Move to `08_cheminformatics.ipynb` for molecular properties, scaffolds, and richer color layers.\n" + "Move to `02_cheminformatics.ipynb` for molecular properties, scaffolds, and richer color layers.\n" ] } ], diff --git a/notebooks/03_continuous_embeddings.ipynb b/notebooks/03_continuous_embeddings.ipynb index 5a41426..54e7c90 100644 --- a/notebooks/03_continuous_embeddings.ipynb +++ b/notebooks/03_continuous_embeddings.ipynb @@ -220,7 +220,7 @@ "source": [ "## What about Jaccard?\n", "\n", - "For binary fingerprints (molecular Morgan, MACCS, ECFP), use `metric=\"jaccard\"`. The estimator auto-routes to USearch with native Jaccard distance on the bits. See `08_cheminformatics.ipynb` for a full chemistry walkthrough.\n", + "For binary fingerprints (molecular Morgan, MACCS, ECFP), use `metric=\"jaccard\"`. The estimator auto-routes to USearch with native Jaccard distance on the bits. See `02_cheminformatics.ipynb` for a full chemistry walkthrough.\n", "\n", "For sparse single-cell data, `metric=\"jaccard\"` with a CSR matrix routes to MinHash and LSHForest. See `05_single_cell.ipynb` for that path.\n" ] diff --git a/notebooks/04_new_functionalities.ipynb b/notebooks/04_new_functionalities.ipynb index 13f6dc2..ead7ad0 100644 --- a/notebooks/04_new_functionalities.ipynb +++ b/notebooks/04_new_functionalities.ipynb @@ -449,11 +449,11 @@ "source": [ "## Where to go next\n", "\n", - "- `08_cheminformatics.ipynb`: chemistry workflows with binary fingerprints\n", - "- `09_protein_analysis.ipynb`: protein sequences and embeddings\n", + "- `02_cheminformatics.ipynb`: chemistry workflows with binary fingerprints\n", + "- `10_protein_analysis.ipynb`: protein sequences and embeddings\n", "- `05_single_cell.ipynb`: large sparse single-cell data\n", - "- `10_jscatter_demo.ipynb`: interactive notebook widgets\n", - "- `07_faq.ipynb`: short answers to common questions\n" + "- `08_jscatter_demo.ipynb`: interactive notebook widgets\n", + "- `06_faq.ipynb`: short answers to common questions\n" ] } ], diff --git a/notebooks/04_single_cell.ipynb b/notebooks/05_single_cell.ipynb similarity index 100% rename from notebooks/04_single_cell.ipynb rename to notebooks/05_single_cell.ipynb diff --git a/notebooks/05_faq.ipynb b/notebooks/06_faq.ipynb similarity index 98% rename from notebooks/05_faq.ipynb rename to notebooks/06_faq.ipynb index 19f80f4..9a98e0a 100644 --- a/notebooks/05_faq.ipynb +++ b/notebooks/06_faq.ipynb @@ -24,7 +24,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": "## My map changes between runs\n\nSet `seed=42`. If you also pass a `LayoutConfig`, set `cfg.deterministic = True` and `cfg.seed = 42`.\n\nThe `seed` controls the OGDF tree layout, which is fully deterministic: same kNN graph + same seed = identical coordinates.\n\nThe kNN step depends on the backend:\n\n- **MinHash + LSHForest** (sets / strings): deterministic for a given seed.\n- **USearch HNSW** (binary matrices, cosine, euclidean): approximate and multi-threaded. Neighbor sets may vary slightly across runs or platforms, but the resulting trees are nearly identical because the MST is robust to small kNN variations.\n\nIf you need bit-exact reproducibility for binary data, use the MinHash + LSHForest pipeline directly (see [03_legacy_lsh_pipeline.ipynb](03_legacy_lsh_pipeline.ipynb))." + "source": "## My map changes between runs\n\nSet `seed=42`. If you also pass a `LayoutConfig`, set `cfg.deterministic = True` and `cfg.seed = 42`.\n\nThe `seed` controls the OGDF tree layout, which is fully deterministic: same kNN graph + same seed = identical coordinates.\n\nThe kNN step depends on the backend:\n\n- **MinHash + LSHForest** (sets / strings): deterministic for a given seed.\n- **USearch HNSW** (binary matrices, cosine, euclidean): approximate and multi-threaded. Neighbor sets may vary slightly across runs or platforms, but the resulting trees are nearly identical because the MST is robust to small kNN variations.\n\nIf you need bit-exact reproducibility for binary data, use the MinHash + LSHForest pipeline directly (see [12_legacy_lsh_pipeline.ipynb](12_legacy_lsh_pipeline.ipynb))." }, { "cell_type": "markdown", diff --git a/notebooks/06_minhash_deep_dive.ipynb b/notebooks/07_minhash_deep_dive.ipynb similarity index 100% rename from notebooks/06_minhash_deep_dive.ipynb rename to notebooks/07_minhash_deep_dive.ipynb diff --git a/notebooks/07_jscatter_demo.ipynb b/notebooks/08_jscatter_demo.ipynb similarity index 100% rename from notebooks/07_jscatter_demo.ipynb rename to notebooks/08_jscatter_demo.ipynb diff --git a/notebooks/08_card_configuration.ipynb b/notebooks/09_card_configuration.ipynb similarity index 100% rename from notebooks/08_card_configuration.ipynb rename to notebooks/09_card_configuration.ipynb diff --git a/notebooks/09_protein_analysis.ipynb b/notebooks/10_protein_analysis.ipynb similarity index 100% rename from notebooks/09_protein_analysis.ipynb rename to notebooks/10_protein_analysis.ipynb diff --git a/notebooks/10_usearch_jaccard.ipynb b/notebooks/11_usearch_jaccard.ipynb similarity index 100% rename from notebooks/10_usearch_jaccard.ipynb rename to notebooks/11_usearch_jaccard.ipynb diff --git a/notebooks/11_legacy_lsh_pipeline.ipynb b/notebooks/12_legacy_lsh_pipeline.ipynb similarity index 99% rename from notebooks/11_legacy_lsh_pipeline.ipynb rename to notebooks/12_legacy_lsh_pipeline.ipynb index eced3c1..195ee24 100644 --- a/notebooks/11_legacy_lsh_pipeline.ipynb +++ b/notebooks/12_legacy_lsh_pipeline.ipynb @@ -63,7 +63,7 @@ "## 2. MinHash\n", "\n", "`batch_from_binary_array()` is the usual entry point for dense binary fingerprints.\n", - "See `02_minhash_deep_dive.ipynb` for the full set of `from_*` and `batch_from_*` methods.\n" + "See `07_minhash_deep_dive.ipynb` for the full set of `from_*` and `batch_from_*` methods.\n" ] }, {