LFX Research Copilot

A local AI-powered research intelligence assistant for literature discovery, evidence synthesis, research gap analysis, hypothesis generation, manuscript development, and research planning.

Key Features

Multi-Source Literature Retrieval — Search papers across Crossref, OpenAlex, Semantic Scholar, PubMed, arXiv, and CORE in parallel. Deduplicates by DOI and semantic title similarity.
Adaptive Theme Discovery — Clusters papers into research themes using MiniLM embeddings. Falls back to multi-method consensus (NMF + hierarchical + fixed-k) for small corpora.
Evidence Extraction & Synthesis — Extracts objectives, methods, results, limitations from abstracts. Compares findings across studies to detect consensus and disagreement.
Citation Intelligence — Builds directed citation graphs via OpenAlex API. Computes PageRank, HITS, identifies foundational papers and hidden gems.
Research Gap Validation — Validates claimed gaps by searching the corpus with semantic similarity. Assigns confidence scores (Confirmed / Uncertain / Not Supported).
Contradiction Detection — Detects opposing claims across the literature using 15 semantic opposition pairs.
Hypothesis Generation — Produces structured, reproducible hypothesis banks with priority scoring, IV/DV specification, and methodology suggestions.
Research Question Optimization — Generates 20 research questions per topic ranked by novelty, feasibility, funding potential, and translational impact.
Manuscript Generation — Drafts introduction, literature review, methods, and discussion sections with APA-formatted inline citations.
Reviewer Simulation — Evaluates manuscript drafts for weak arguments, missing citations, unsupported claims, and methodological concerns.
Reproducibility Auditing — Scores each paper across 6 dimensions: data availability, code availability, sample size, statistical rigor, validation strategy, and controls.
Meta-Analysis Readiness — Assesses whether the corpus contains comparable studies suitable for quantitative synthesis.
Novelty Scoring — Estimates topic saturation, publication density, and emerging concept presence to classify themes from Highly Novel to Highly Saturated.
Scientific Claim Graph — Extracts claims from abstracts and builds a directed evidence graph (supporting / contradictory) for RAG applications.
Study Design Advisor — Recommends experimental designs, controls, sample sizes, statistical tests, and validation strategies based on theme maturity.
Bioinformatics Mode — Detects omics data types (genomics, transcriptomics, proteomics, metabolomics, epigenomics, metagenomics), maps to repositories (GEO, SRA, ArrayExpress, ProteomeXchange, MetaboLights), and recommends pathway tools.
Statistical Consultant — Recommends statistical tests for 6 design types, estimates sample size via normal approximation, computes post-hoc power.
Protocol Generation — Generates lab protocols (PCR, Western blot) and bioinformatics pipelines (RNA-seq, variant calling) with QC checklists.
Grant Proposal Generation — Drafts grant concepts, specific aims, and project summaries from research gaps and opportunity rankings.
Explainability — Every output includes evidence source, confidence score, supporting papers, alternative interpretations, and limitations.
Semantic Alerts — Compares knowledge base snapshots between runs to detect new themes, theme shifts, and confidence changes.
Research Dashboard — Aggregates active projects, themes, gaps, datasets, manuscripts, grants, and alerts into a single-page overview.
42-Stage Pipeline — Orchestrates all modules in dependency order with --quick (17 priority modules), --life-science (adds bioinformatics modules), --skip, and --until flags.

System Requirements

Python 3.10+
8 GB RAM minimum (16 GB recommended)
~2 GB disk for the sentence-transformers model cache

Installation

git clone https://github.com/dpikaArya/lfx-research-copilot.git
cd lfx-research-copilot

python -m venv venv
source venv/bin/activate       # Windows: venv\Scripts\activate

pip install -r requirements.txt

Usage

Quick Start

python run_validation.py

This runs the 17 priority pipeline modules on a default biomedical query. Outputs are written to outputs/LFX_Research_Copilot/.

Search for Papers

python src/search_papers.py "deep learning drug discovery" --max 30

The search query is a positional argument. Results are saved to search_results.csv in the current directory. Supported sources: Crossref, OpenAlex, Semantic Scholar, PubMed, arXiv, CORE.

Run the Full Pipeline

python src/pipeline.py

Executes all 42 stages in order. On a 21-paper corpus this completes in ~5 minutes depending on API call latency.

Pipeline Modes

# Quick mode — 17 high-value modules (no paper retrieval, PDF, or full-document stages)
python src/pipeline.py --quick

# Life-science mode — enables bioinformatics, study design, statistical, and protocol modules
python src/pipeline.py --life-science

# Skip specific stages
python src/pipeline.py --skip pdf_manager citation_network_analysis

# Run up to a specific stage
python src/pipeline.py --until hypothesis_generator

Individual Modules

Every module can be run independently:

python src/citation_intelligence.py
python src/contradiction_detector.py
python src/hypothesis_generator.py
python src/manuscript_copilot.py
python src/research_brief.py

Most modules accept --papers, --consensus, --knowledge-base or similar arguments to specify input files. Run any module with --help to see its options.

Outputs

All generated files are organized under outputs/LFX_Research_Copilot/:

Directory	Contents
`reports/`	Executive summaries, research briefs, gap reports, hypothesis banks
`evidence/`	Evidence matrices, synthesis reports
`knowledge_base/`	Machine-readable JSON snapshots, claim graphs
`references/`	APA citation support files
`dashboard/`	Aggregated research dashboard, research memory
`manuscript/`	Generated manuscript drafts
`grants/`	Grant proposal drafts
`protocols/`	Lab and bioinformatics protocol checklists
`statistics/`	Sample size estimates, power analyses
`bioinformatics/`	Omics dataset reports
`citation_network/`	Network analysis reports
`pdf_library/`	PDF library indexes
`figures/`	Figure reference catalogs
`tables/`	Table reference catalogs
`alerts/`	Semantic change detection reports
`explainability/`	Evidence trace reports
`projects/`	Project tracking databases

Major Modules

Module	Purpose	Input	Output
`search_papers.py`	Multi-source literature retrieval	Query string	`search_results.csv`
`cluster_themes.py`	Unsupervised theme discovery	`search_results.csv`	`consensus_themes.csv`, clustering reports
`generate_reports.py`	Executive summary, gaps, knowledge base	`consensus_themes.csv`, embeddings	Reports, `knowledge_base.json`, RAG chunks
`citation_intelligence.py`	Citation graph, PageRank, HITS	`search_results.csv`	Citation metrics, foundational papers
`hypothesis_generator.py`	Structured hypothesis bank	Knowledge base, gaps	`hypothesis_bank.csv`
`manuscript_copilot.py`	Draft manuscript sections	Evidence matrix, knowledge base	`manuscript_draft.md`
`contradiction_detector.py`	Cross-paper claim contradictions	`search_results.csv`, themes	`contradictory_findings.md`
`research_gap_validator.py`	Gap validation with confidence	Papers, evidence, existing gaps	`gap_confidence_scores.csv`
`study_design_advisor.py`	Design recommendations	Knowledge base, evidence strength	`study_design_report.md`
`bioinformatics_mode.py`	Omics data detection	`search_results.csv`	`bioinformatics_report.md`
`statistical_consultant.py`	Test selection, power analysis	CLI parameters	`statistical_report.md`, sample size estimates
`pipeline.py`	42-stage orchestrator	All upstream outputs	Pipeline summary

Dependencies

Package	Purpose
pandas	Data processing and CSV I/O
numpy	Numerical computing
scikit-learn	Clustering (NMF, hierarchical), metrics
sentence-transformers	MiniLM text embeddings for semantic similarity
networkx	Citation graph construction and analysis
scipy	Spatial distance computations
requests	HTTP API calls to Crossref, OpenAlex, Semantic Scholar
tqdm	Progress bars for API-heavy operations

PDF backends (pypdf, pdfplumber, pymupdf) are optional and only needed for PDF figure/table extraction and PDF management.

Use Cases

Literature review automation — Search, cluster, and synthesize papers on any research topic
Research gap identification — Detect and validate underexplored areas with confidence scoring
Manuscript preparation — Generate drafts with inline citations and peer-review simulation
Grant writing — Produce proposal components from gap and opportunity analyses
Bioinformatics exploration — Identify omics datasets and recommend analysis pipelines
Reproducibility assessment — Audit papers for data/code availability and statistical rigor

License

MIT License

Citation

If you use this software in your research, teaching, or publications, please cite:

Arya, D. (2026). LFX Research Copilot (Version 1.0) [Computer software]. GitHub. https://github.com/matrixflora/lfx-research-copilot

BibTeX

@software{arya2026lfx, author = {Arya, D.}, title = {LFX Research Copilot}, year = {2026}, version = {1.0}, url = {https://github.com/matrixflora/lfx-research-copilot} }

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
outputs		outputs
src/agents		src/agents
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md
author_intelligence.py		author_intelligence.py
bioinformatics_mode.py		bioinformatics_mode.py
citation_intelligence.py		citation_intelligence.py
citation_network_analysis.py		citation_network_analysis.py
cluster_themes.py		cluster_themes.py
consensus_metadata.json		consensus_metadata.json
consensus_themes.csv		consensus_themes.csv
consensus_themes.json		consensus_themes.json
contradiction_detector.py		contradiction_detector.py
dataset_discovery.py		dataset_discovery.py
evidence_strength.py		evidence_strength.py
evidence_synthesis.py		evidence_synthesis.py
executive_summary.md		executive_summary.md
explainability_engine.py		explainability_engine.py
figure_table_extractor.py		figure_table_extractor.py
fixed4_themes.csv		fixed4_themes.csv
full_text_evidence_extraction.py		full_text_evidence_extraction.py
funding_alignment.py		funding_alignment.py
generate_reports.py		generate_reports.py
grant_proposal_copilot.py		grant_proposal_copilot.py
hierarchical_themes.csv		hierarchical_themes.csv
hypothesis_generator.py		hypothesis_generator.py
journal_intelligence.py		journal_intelligence.py
knowledge_base.json		knowledge_base.json
literature_map.csv		literature_map.csv
living_knowledge_base.py		living_knowledge_base.py
manuscript_copilot.py		manuscript_copilot.py
manuscript_exporter.py		manuscript_exporter.py
meta_analysis_readiness.py		meta_analysis_readiness.py
methodology_mining.py		methodology_mining.py
nmf_themes.csv		nmf_themes.csv
opportunity_ranking.py		opportunity_ranking.py
pdf_manager.py		pdf_manager.py
pipeline.py		pipeline.py
project_manager.py		project_manager.py
protocol_generator.py		protocol_generator.py
reproducibility_auditor.py		reproducibility_auditor.py
requirements.txt		requirements.txt
research_brief.py		research_brief.py
research_dashboard.py		research_dashboard.py
research_gap_validator.py		research_gap_validator.py
research_gaps.md		research_gaps.md
research_memory.py		research_memory.py
research_novelty_scorer.py		research_novelty_scorer.py
research_question_optimizer.py		research_question_optimizer.py
research_roadmap.py		research_roadmap.py
reviewer_simulator.py		reviewer_simulator.py
scientific_claim_graph.py		scientific_claim_graph.py
search_papers.py		search_papers.py
search_results.csv		search_results.csv
search_results.json		search_results.json
semantic_alert_system.py		semantic_alert_system.py
statistical_consultant.py		statistical_consultant.py
study_design_advisor.py		study_design_advisor.py
support_claims_with_references.py		support_claims_with_references.py
systematic_review.py		systematic_review.py
theme_analysis_report.md		theme_analysis_report.md
theme_evolution.py		theme_evolution.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LFX Research Copilot

Key Features

System Requirements

Installation

Usage

Quick Start

Search for Papers

Run the Full Pipeline

Pipeline Modes

Individual Modules

Outputs

Major Modules

Dependencies

Use Cases

License

Citation

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LFX Research Copilot

Key Features

System Requirements

Installation

Usage

Quick Start

Search for Papers

Run the Full Pipeline

Pipeline Modes

Individual Modules

Outputs

Major Modules

Dependencies

Use Cases

License

Citation

BibTeX

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages