Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
3088296
everything related to graphtool added as a backup
aritraroy24 Apr 27, 2026
6df68f3
feat: add unresolved composition tracking and improve arithmetic reso…
aritraroy24 Apr 29, 2026
2072f2c
fix: add MaterialParserTool substitution verification and update cita…
aritraroy24 May 1, 2026
31dc74b
fix: json parsing handling improved and all images saving by default …
aritraroy24 May 1, 2026
7a2f065
docs: add SCIENCEDIRECT_INSTTOKEN documentation, env example, changel…
aritraroy24 May 2, 2026
daf28ca
feat: layered value_error_thresholds ranges with tuple-order insensit…
aritraroy24 May 6, 2026
440b14b
fix: auto-create output directory for semantic evaluation result file
aritraroy24 May 6, 2026
7cf47dd
first draft of graphtool blog added
aritraroy24 May 7, 2026
79a919f
docs: align documentation with current code behaviour
aritraroy24 May 8, 2026
4475cc9
overall workflow and data-extraction workflow images are modified
aritraroy24 May 8, 2026
b167efb
blogs section changed to news, comproscanner publication news added
aritraroy24 May 8, 2026
19220cd
blog UI modified for better designing
aritraroy24 May 8, 2026
b1bdd46
docs: limit TOC depth to h3 and add Open Graph/Twitter Card meta tags…
aritraroy24 May 8, 2026
fabbfd0
data related to vlm-test added
aritraroy24 May 8, 2026
a95af91
fix: create vector database for caption-keyword-matched articles in a…
aritraroy24 May 10, 2026
88767b4
feat: add additional_figure_keywords for figure-only extraction witho…
aritraroy24 May 10, 2026
1bf1ccd
graphtool release blog modified
aritraroy24 May 19, 2026
78652b0
fix: springer test case fixed along with some minor changes
aritraroy24 May 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 10 additions & 2 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

# Publisher API keys
SCOPUS_API_KEY=YOUR_SCOPUS_API_KEY
SCIENCEDIRECT_INSTTOKEN=YOUR_SCIENCEDIRECT_INSTTOKEN # Optional: institutional token for ScienceDirect full-text access (contact your institution's library)
SPRINGER_OPENACCESS_API_KEY=YOUR_SPRINGER_OPENACCESS_API_KEY
SPRINGER_TDM_API_KEY=YOUR_SPRINGER_TDM_API_KEY/API_METRIC
WILEY_API_KEY=YOUR_WILEY_API_KEY
Expand All @@ -14,13 +15,20 @@ DATABASE_PASSWORD=DB_PASSWORD
DATABASE_NAME=DB_NAME

# LLM Providers
GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY
GEMINI_API_KEY=YOUR_GEMINI_API_KEY
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
DEEPSEEK_API_KEY=YOUR_DEEPSEEK_API_KEY
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
OPENROUTER_API_KEY=YOUR_OPENROUTER_API_KEY
TOGETHER_API_KEY=YOUR_TOGETHER_API_KEY
COHERE_API_KEY=YOUR_COHERE_API_KEY
FIREWORKS_API_KEY=YOUR_FIREWORKS_API_KEY

# neo4j
NEO4J_URI=YOUR_NEO4J_URI # default URI for Neo4j is bolt://localhost:7687
NEO4J_USER=YOUR_NEO4J_USERNAME
NEO4J_PASSWORD=YOUR_NEO4J_PASSWORD
NEO4J_DATABASE=YOUR_NEO4J_DATABASE_NAME
NEO4J_DATABASE=YOUR_NEO4J_DATABASE_NAME

# Optional model access
HF_TOKEN=YOUR_HUGGINGFACE_TOKEN
14 changes: 3 additions & 11 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -178,17 +178,9 @@ cython_debug/
.claude
CLAUDE.md

# Remove example directory primarily
# Remove db directory related files to avoid accidentally committing large files
examples/db/10.*
tests example/
examples/vlm_piezo_test/db/10.*
examples/vlm_piezo_test/db/chroma.sqlite3

applications
vlm_test
examples/vlm_piezo_test

# Test results
db
results
elsevier_test.xml
springer_test.xml
wiley_test.pdf
44 changes: 42 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
# Unreleased

### Added

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix heading level jump under # Unreleased.

Line 3 (and related section headings) jumps from H1 to H3, which triggers MD001 and can fail markdown lint checks. Use H2 headings for Added, Changed, and Fixed under # Unreleased.

Suggested diff
-### Added
+## Added
...
-### Changed
+## Changed
...
-### Fixed
+## Fixed

Also applies to: 47-47, 51-51

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 3-3: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` at line 3, Under the "# Unreleased" section replace the H3
headings "### Added", "### Changed", and "### Fixed" with H2 headings "##
Added", "## Changed", and "## Fixed" so heading levels progress from H1 to H2
and avoid MD001; locate these headings by searching for the "# Unreleased"
header and the literal "Added", "Changed", and "Fixed" headings and update them
accordingly.


- Added `SCIENCEDIRECT_INSTTOKEN` environment variable support in `ElsevierArticleProcessor` for off-campus remote access to subscription-based Elsevier articles and figures. When set, the token is sent as the `X-ELS-Insttoken` header in all ScienceDirect API requests and figure downloads. The variable is optional; omitting it does not affect on-campus access.

- New `value_error_thresholds` parameter added to both `evaluate_semantic()` and `evaluate_agentic()` for range-based absolute error tolerances on numeric property value comparisons:

- Accepts a dict mapping `(min, max)` tuples to absolute error thresholds. When a ground-truth value falls inside a range, the extracted value is accepted if `|extracted - ground_truth| ≤ threshold`. Values outside all configured ranges fall back to exact comparison.
- Accepts a dict mapping `(min, max)` tuples to absolute error thresholds. Ranges are interpreted as **layers**: the narrowest range containing the ground-truth value determines the tolerance. For example, `(-150, 150): 1` applies only to values in (-150, -50) and (50, 150) when `(-50, 50): 0.5` is also present — no need for separate positive/negative sub-ranges. Tuple element order is irrelevant: `(-150, 150)` and `(150, -150)` are equivalent. Values outside all configured ranges fall back to exact comparison.

- **Semantic evaluation**: handled inside `_is_value_in_range()` via the new `_get_error_threshold()` helper in `MaterialsDataSemanticEvaluator`.

Expand All @@ -15,14 +20,49 @@

- New `FigureExtractor` utility — shared helper for caption keyword-based figure filtering and saving, used by all article processors.

- New `caption_keywords` parameter in `process_articles()` and `extract_composition_property_data()`, and new `vlm_model` and `related_figures_base_path` parameters in `extract_composition_property_data()`.
- New `main_figure_keywords` parameter in `process_articles()` and `extract_composition_property_data()`, and new `vlm_model` and `related_figures_base_path` parameters in `extract_composition_property_data()`.

- New unit tests added for all three agent tools in `tests/test_agent_tools/`.

- Added `save_failed_pdf_report` and `failed_pdf_report_path` to `process_articles()`, with filename-derived DOI validation and failed-PDF reporting for local PDF workflows.

- Added `save_failed_automated_report` and `failed_automated_report_path` to `process_articles()` for automated publisher sources (Elsevier, Springer Nature, IOP, Wiley), mirroring the existing PDF failure report. Failed articles are written as tab-separated `doi`, `publisher`, `reason` entries to `results/failed_automated_articles.txt` by default.

- Added image-aware fallback in `DataExtractionFlow.identify_materials_data_presence()`:

- The Materials Data Identifier still runs text RAG first.
- If RAG returns `no`, the flow now checks saved DOI figures with VLM and upgrades the decision to `yes` when relevant graph/figure evidence is found (including doping concentration vs property plots where full formulas are absent).

- Added `is_store_unresolved_compositions` and `unresolved_compositions_file` parameters to `clean_data()` to optionally log split composition-property resolution statistics (`source`, `filtered`, `unresolved`, `resolved` counts) and persist filtered and unresolved composition keys in a JSON file keyed by DOI under `"filtered"` and `"unresolved"` top-level keys.

- Added explicit Equation Tool model control:

- New `equation_model` parameter in `extract_composition_property_data()` (threaded through `DataExtractionFlow` and `CompositionExtractionCrew` into `EquationTool`).
- EquationTool model precedence is now: `equation_model` argument -> API-key-based auto-selection.

- Clarified Equation Tool instruction customization in extraction docs and API:

- `formula_instruction` remains available in `extract_composition_property_data()` for domain-specific formula-derivation guidance, while preserving the built-in default instruction when unset.

### Changed

- Versioning scheme migrated from [Semantic Versioning](https://semver.org/) (SemVer) to [Calendar Versioning](https://calver.org/) (CalVer) using the `YYYY.MM.DD` format. Starting from this release, version numbers reflect the release date rather than an incrementing major/minor/patch scheme.

### Fixed

- `_parse_json_output()` now recovers JSON from mixed-text crew outputs (e.g. `Thought: … { "json": "here" }`) by scanning for the first `{` / `[` and last `}` / `]` and retrying `json.loads()` on the extracted substring, before falling back to `ast.literal_eval()`.

- Composition formatter agent now verifies `MaterialParserTool` output for incomplete variable substitution (e.g. `(1-x-y)` partially resolved as `(0.9-0.010)`) and overrides with the correct fully-substituted BODMAS expression when the tool is wrong.

- `process_articles()` now routes user-provided `doi_list` by `general_publisher` from metadata and sends each DOI only to its matching source processor.

- PNG, GIF, and WEBP figures now convert correctly to JPEG: transparent images are composited onto a white background, animated GIFs are pinned to frame 0, and two additional Springer Nature CDN URL patterns are tried to improve download success for these formats.

- Added and updated tests for new extraction-flow behavior:

- EquationTool model selection tests now cover explicit arg override, env override, and updated model defaults.
- DataExtractionFlow tests now cover figure-based materials-data fallback and `equation_model` forwarding into `CompositionExtractionCrew`.

---
## [0.1.6] - 2026-04-02
### Changed
Expand Down
8 changes: 6 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,12 @@ preferred-citation:
description: "arXiv preprint"
journal: "Digital Discovery"
publisher:
name: "RSC"
status: advance-online
name: "Royal Society of Chemistry"
volume: 5
issue: 4
start: 1794
end: 1808
year: 2026
title: "ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature"
type: article
url: "https://doi.org/10.1039/D5DD00521C"
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License (MIT)

Copyright (c) 2025 SLIMES Lab
Copyright © 2025-2026 SLIMES Lab

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
59 changes: 12 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
<img src="https://raw.githubusercontent.com/aritraroy24/ComProScanner/refs/heads/main/assets/comproscanner_logo.png" alt="ComProScanner Logo" width="500"/>
</p>

[![Python Version](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue.svg?logo=python&logoColor=white)](https://www.python.org/downloads/) [![License: MIT](https://custom-icon-badges.demolab.com/badge/license-MIT-yellow.svg?logo=law&logoColor=white)](https://opensource.org/licenses/MIT) [![PyPI](https://img.shields.io/pypi/v/comproscanner?logo=pypi&logoColor=white)](https://pypi.org/project/comproscanner/) [![Documentation](https://custom-icon-badges.demolab.com/badge/docs-latest-brightgreen.svg?logo=materialformkdocs&logoColor=white)](https://slimeslab.github.io/ComProScanner/) [![Coverage](https://img.shields.io/codecov/c/github/aritraroy24/ComProScanner?logo=codecov&logoColor=white&label=coverage&color=e62277)](https://codecov.io/gh/aritraroy24/ComProScanner) [![PyPI - Downloads](https://custom-icon-badges.demolab.com/pypi/dm/comproscanner?logo=download&logoColor=white&color=purple)](https://pypistats.org/packages/comproscanner) [![Ask DeepWiki](https://custom-icon-badges.demolab.com/badge/Ask%20DeepWiki-brightgreen.svg?logo=deepwikidevin&logoColor=white&labelColor=grey&color=5ab998)](https://deepwiki.com/slimeslab/ComProScanner) [![Digital Discovery](https://custom-icon-badges.demolab.com/badge/Digital_Discovery-10.1039/D5DD00521C-brightgreen.svg?logo=rsc&logoColor=white&color=c8c300)](https://doi.org/10.1039/D5DD00521C)
[![Python Version](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue.svg?logo=python&logoColor=white)](https://www.python.org/downloads/) [![License: MIT](https://custom-icon-badges.demolab.com/badge/license-MIT-brown.svg?logo=law&logoColor=white)](https://opensource.org/licenses/MIT) [![PyPI](https://img.shields.io/pypi/v/comproscanner?logo=pypi&logoColor=white)](https://pypi.org/project/comproscanner/) [![Documentation](https://custom-icon-badges.demolab.com/badge/docs-latest-brightgreen.svg?logo=materialformkdocs&logoColor=white)](https://slimeslab.github.io/ComProScanner/) [![Coverage](https://img.shields.io/codecov/c/github/aritraroy24/ComProScanner?logo=codecov&logoColor=white&label=coverage&color=e62277)](https://codecov.io/gh/aritraroy24/ComProScanner) [![PyPI - Downloads](https://custom-icon-badges.demolab.com/pypi/dm/comproscanner?logo=download&logoColor=white&color=purple)](https://pypistats.org/packages/comproscanner) [![Ask DeepWiki](https://custom-icon-badges.demolab.com/badge/Ask%20DeepWiki-brightgreen.svg?logo=deepwikidevin&logoColor=white&labelColor=grey&color=5ab998)](https://deepwiki.com/slimeslab/ComProScanner) [![Digital Discovery](https://custom-icon-badges.demolab.com/badge/Digital_Discovery-10.1039/D5DD00521C-brightgreen.svg?logo=rsc&logoColor=white&color=c8c300)](https://doi.org/10.1039/D5DD00521C)

# ComProScanner

Expand Down Expand Up @@ -120,43 +120,6 @@ The ComProScanner workflow consists of four main stages:
- Data Visualization
- Evaluation Visualization

## Example Use Cases

### Extract Data from Multiple Sources

```python
scanner.process_articles(
property_keywords=property_keywords,
source_list=["elsevier", "springer", "wiley"]
)
```

### Customize RAG Configuration

```python
scanner.extract_composition_property_data(
main_extraction_keyword="d33",
rag_chat_model="gemini-2.5-pro",
rag_max_tokens=2048,
rag_top_k=5
)
```

### Visualize Results

```python
from comproscanner import data_visualizer, eval_visualizer

# Create knowledge graph
data_visualizer.create_knowledge_graph(result_file="results.json")

# Plot evaluation metrics
eval_visualizer.plot_multiple_radar_charts(
result_sources=["model1.json", "model2.json"],
model_names=["GPT-4o", "Claude-3.5"]
)
```

## Requirements

- Python 3.12 or 3.13
Expand All @@ -169,15 +132,17 @@ eval_visualizer.plot_multiple_radar_charts(
If you use ComProScanner in your research, please cite:

```bibtex
@Article{roy2026comproscannermultiagentbasedframework,
author ="Roy, Aritra and Grisan, Enrico and Buckeridge, John and Gattinoni, Chiara",
title ="ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature",
journal ="Digital Discovery",
year ="2026",
pages ="Accepted",
publisher ="RSC",
doi ="10.1039/D5DD00521C",
url ="https://doi.org/10.1039/D5DD00521C"
@Article{roy2026comproscanner,
title={ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature},
author={Roy, Aritra and Grisan, Enrico and Buckeridge, John and Gattinoni, Chiara},
journal={Digital Discovery},
volume={5},
number={4},
pages={1794--1808},
year={2026},
publisher={Royal Society of Chemistry},
doi ="10.1039/D5DD00521C",
url ="https://doi.org/10.1039/D5DD00521C"
}
```

Expand Down
Binary file modified assets/overall_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 43 additions & 3 deletions docs/about/changelog.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
# Unreleased

### Added

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize heading levels for changelog sections.

Line 3 (and sibling section headings) skips from H1 to H3. This violates MD001 and may break docs linting; these should be H2 under # Unreleased.

Suggested diff
-### Added
+## Added
...
-### Changed
+## Changed
...
-### Fixed
+## Fixed

Also applies to: 47-47, 51-51

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 3-3: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/about/changelog.md` at line 3, The changelog uses H3 headings like "###
Added" under the top-level "# Unreleased" which skips heading levels and breaks
MD001; update those sibling section headings ("### Added", "### Fixed", etc.) in
docs/about/changelog.md to H2 ("## Added", "## Fixed", ...) so they are direct
children of "# Unreleased", ensuring consistent heading hierarchy and fixing the
lint errors referenced.


- Added `SCIENCEDIRECT_INSTTOKEN` environment variable support in `ElsevierArticleProcessor` for off-campus remote access to subscription-based Elsevier articles and figures. When set, the token is sent as the `X-ELS-Insttoken` header in all ScienceDirect API requests and figure downloads. The variable is optional; omitting it does not affect on-campus access.

- New `value_error_thresholds` parameter added to both `evaluate_semantic()` and `evaluate_agentic()` for range-based absolute error tolerances on numeric property value comparisons:

- Accepts a dict mapping `(min, max)` tuples to absolute error thresholds. When a ground-truth value falls inside a range, the extracted value is accepted if `|extracted - ground_truth| ≤ threshold`. Values outside all configured ranges fall back to exact comparison.
- Accepts a dict mapping `(min, max)` tuples to absolute error thresholds. Ranges are interpreted as **layers**: the narrowest range containing the ground-truth value determines the tolerance. For example, `(-150, 150): 1` applies only to values in (-150, -50) and (50, 150) when `(-50, 50): 0.5` is also present — no need for separate positive/negative sub-ranges. Tuple element order is irrelevant: `(-150, 150)` and `(150, -150)` are equivalent. Values outside all configured ranges fall back to exact comparison.

- **Semantic evaluation**: handled inside `_is_value_in_range()` via the new `_get_error_threshold()` helper in `MaterialsDataSemanticEvaluator`.

Expand All @@ -15,18 +20,53 @@

- New `FigureExtractor` utility — shared helper for caption keyword-based figure filtering and saving, used by all article processors.

- New `caption_keywords` parameter in `process_articles()` and `extract_composition_property_data()`, and new `vlm_model` and `related_figures_base_path` parameters in `extract_composition_property_data()`.
- New `main_figure_keywords` parameter in `process_articles()` and `extract_composition_property_data()`, and new `vlm_model` and `related_figures_base_path` parameters in `extract_composition_property_data()`.

- New unit tests added for all three agent tools in `tests/test_agent_tools/`.

- Added `save_failed_pdf_report` and `failed_pdf_report_path` to `process_articles()`, with filename-derived DOI validation and failed-PDF reporting for local PDF workflows.

- Added `save_failed_automated_report` and `failed_automated_report_path` to `process_articles()` for automated publisher sources (Elsevier, Springer Nature, IOP, Wiley), mirroring the existing PDF failure report. Failed articles are written as tab-separated `doi`, `publisher`, `reason` entries to `results/failed_automated_articles.txt` by default.

- Added image-aware fallback in `DataExtractionFlow.identify_materials_data_presence()`:

- The Materials Data Identifier still runs text RAG first.
- If RAG returns `no`, the flow now checks saved DOI figures with VLM and upgrades the decision to `yes` when relevant graph/figure evidence is found (including doping concentration vs property plots where full formulas are absent).

- Added `is_store_unresolved_compositions` and `unresolved_compositions_file` parameters to `clean_data()` to optionally log split composition-property resolution statistics (`source`, `filtered`, `unresolved`, `resolved` counts) and persist filtered and unresolved composition keys in a JSON file keyed by DOI under `"filtered"` and `"unresolved"` top-level keys.

- Added explicit Equation Tool model control:

- New `equation_model` parameter in `extract_composition_property_data()` (threaded through `DataExtractionFlow` and `CompositionExtractionCrew` into `EquationTool`).
- EquationTool model precedence is now: `equation_model` argument -> API-key-based auto-selection.

- Clarified Equation Tool instruction customization in extraction docs and API:

- `formula_instruction` remains available in `extract_composition_property_data()` for domain-specific formula-derivation guidance, while preserving the built-in default instruction when unset.

### Changed

- Versioning scheme migrated from [Semantic Versioning](https://semver.org/) (SemVer) to [Calendar Versioning](https://calver.org/) (CalVer) using the `YYYY.MM.DD` format. Starting from this release, version numbers reflect the release date rather than an incrementing major/minor/patch scheme.

### Fixed

- `_parse_json_output()` now recovers JSON from mixed-text crew outputs (e.g. `Thought: … { "json": "here" }`) by scanning for the first `{` / `[` and last `}` / `]` and retrying `json.loads()` on the extracted substring, before falling back to `ast.literal_eval()`.

- Composition formatter agent now verifies `MaterialParserTool` output for incomplete variable substitution (e.g. `(1-x-y)` partially resolved as `(0.9-0.010)`) and overrides with the correct fully-substituted BODMAS expression when the tool is wrong.

- `process_articles()` now routes user-provided `doi_list` by `general_publisher` from metadata and sends each DOI only to its matching source processor.

- PNG, GIF, and WEBP figures now convert correctly to JPEG: transparent images are composited onto a white background, animated GIFs are pinned to frame 0, and two additional Springer Nature CDN URL patterns are tried to improve download success for these formats.

- Added and updated tests for new extraction-flow behavior:

- EquationTool model selection tests now cover explicit arg override, env override, and updated model defaults.
- DataExtractionFlow tests now cover figure-based materials-data fallback and `equation_model` forwarding into `CompositionExtractionCrew`.

---
## [0.1.6] - 2026-04-02
### Changed
- Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access:
- Updated [README.md](https://github.com/slimeslab/ComProScanner/blob/main/README.md), [CITATION.cff](https://github.com/slimeslab/ComProScanner/blob/main/CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access:
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)

### Added
Expand Down
20 changes: 11 additions & 9 deletions docs/about/citation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,16 @@
If you use ComProScanner in your research, please cite our related paper:

```bibtex
@Article{roy2026comproscannermultiagentbasedframework,
author ="Roy, Aritra and Grisan, Enrico and Buckeridge, John and Gattinoni, Chiara",
title ="ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature",
journal ="Digital Discovery",
year ="2026",
pages ="Accepted",
publisher ="RSC",
doi ="10.1039/D5DD00521C",
url ="https://doi.org/10.1039/D5DD00521C"
@Article{roy2026comproscanner,
title={ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature},
author={Roy, Aritra and Grisan, Enrico and Buckeridge, John and Gattinoni, Chiara},
journal={Digital Discovery},
volume={5},
number={4},
pages={1794--1808},
year={2026},
publisher={Royal Society of Chemistry},
doi ="10.1039/D5DD00521C",
url ="https://doi.org/10.1039/D5DD00521C"
}
```
Loading
Loading