MisinfoEQA

MisinfoEQA is a lightweight Evaluation Quality Assurance harness for misinformation datasets. It loads public or local datasets, normalizes them into a common schema, runs simple baselines, creates stress-test slices, and writes a report showing where model rankings, labels, evidence fields, or dataset assumptions look fragile.

The project is intentionally about dataset QA, not about creating new misinformation or claiming a new state-of-the-art misinformation detector.

Author

Fortune Nwachukwu Onwe
MSc Artificial Intelligence and Data Science, University of Hull, United Kingdom
Institutional email: f.onwe-2025@hull.ac.uk
Personal email: fortonwe@gmail.com
LinkedIn: https://www.linkedin.com/in/fortune-onwe/
GitHub: https://github.com/FortOnwe
Project repository: https://github.com/FortOnwe/misinfo-eqa DOI: https://doi.org/10.5281/zenodo.19695104

What It Tests

MisinfoEQA currently implements five stressors:

keyword_shortcut: masks label-associated tokens and measures performance collapse.
temporal_shift: compares random evaluation with date-aware evaluation when dates are available.
evidence_ablation: compares claim-only, evidence-only, and combined inputs.
ambiguity_slice: isolates unknown, hedged, mixed, or disputed examples.
label_rationale_mismatch: flags examples where the evidence appears weak or inconsistent with the assigned label.

The label-rationale stressor uses sentence/window-level evidence relevance instead of whole-document lexical overlap. This matters for long fact-checking articles, where the relevant span may be surrounded by unrelated background.

Quick Start

Install the optional dependencies for Hugging Face dataset loading:

python -m pip install -e ".[full]"

Run the MVP configuration:

python -m misinfo_eqa scan --config configs/mvp.yaml
python -m misinfo_eqa run --config configs/mvp.yaml
python -m misinfo_eqa report --latest

Create and summarize a manual audit sheet:

python -m misinfo_eqa audit --latest --per-dataset 25
python -m misinfo_eqa audit-summary --latest

If the misinfo-eqa console script is not on your Windows PATH, use python -m misinfo_eqa ... as shown above.

MVP Dataset Scope

The default MVP uses the ComplexDataLab Hugging Face collection:

fever
climate_fever
pubhealthtab
snopes

The pipeline also supports local CSV, JSON, and JSONL files. Local records are normalized into:

id, dataset, split, claim, evidence_text, label3, date, source_url, metadata

Accepted label3 values are true, false, and unknown.

Outputs

Each run writes a timestamped directory under runs/:

config.json
normalized_examples.jsonl
data_summary.json
metrics.json
stressors.json
risk_flags.json
flagged_examples.jsonl
audit_sheet.csv after running audit
audit_summary.md and audit_summary.json after running audit-summary
report.md
report.html
plots/*.svg

runs/ is ignored by Git so large or sensitive local artifacts are not committed by accident.

Validated MVP Result

The current validation run uses max_examples_per_dataset: 1000, seed 42, and the four MVP datasets above. The strongest result is not a model accuracy claim: it is an evidence-quality finding.

Dataset	Natural-language evidence	Raw evidence	Main QA finding
`climate_fever`	99.8%	100.0%	Claim+evidence underperformed claim-only by 0.109 macro-F1; manual audit found 76.0% precision among reviewed flags.
`fever`	0.0%	88.2%	Evidence fields are mostly structured references, not usable natural-language evidence in this normalized source.
`pubhealthtab`	0.0%	0.0%	Evidence-based stressors are skipped because usable evidence is absent.
`snopes`	100.0%	100.0%	Long-form evidence produces lower-precision heuristic flags; manual audit precision was 16.0%.

See docs/RESULTS.md for the full interpretation.

Repository Layout

misinfo_eqa/        Python package and CLI implementation
configs/            MVP, smoke, and demo configs
examples/           Tiny local demo dataset
tests/              Unit and smoke tests
docs/               Usage, audit, reproducibility, and result notes
paper/              Paper in Markdown and LaTeX
.github/workflows/  CI test workflow

Development

Run the unit suite:

python -m unittest discover -s tests

Run the dependency-light demo:

python -m misinfo_eqa run --config configs/demo.yaml

The default baselines are intentionally lightweight:

majority-class classifier
dependency-free hashed TF-IDF plus softmax logistic regression
simple heuristic NLI-style cue classifier

This keeps the tool runnable in small environments and makes failures easier to interpret.

Documentation

docs/USAGE.md: CLI and config guide.
docs/AUDIT_PROTOCOL.md: manual audit workflow and verdict labels.
docs/REPRODUCIBILITY.md: exact reproduction steps.
docs/RESULTS.md: MVP findings and limitations.
paper/misinfoeqa_paper.md: paper-readable Markdown version.
paper/misinfoeqa_paper.tex: LaTeX version.
paper/SUBMISSION_GUIDE.md: arXiv, Zenodo, OSF, and Papers with Code distribution guide.

Safety Notes

MisinfoEQA evaluates existing public examples. It does not synthesize new false claims, retrieve live web pages, or publish transformed misinformation. Stressors mask, ablate, slice, or audit existing records.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
docs		docs
examples		examples
misinfo_eqa		misinfo_eqa
paper		paper
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MisinfoEQA

Author

What It Tests

Quick Start

MVP Dataset Scope

Outputs

Validated MVP Result

Repository Layout

Development

Documentation

Safety Notes

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MisinfoEQA

Author

What It Tests

Quick Start

MVP Dataset Scope

Outputs

Validated MVP Result

Repository Layout

Development

Documentation

Safety Notes

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages