Pipeline

Getting started

Prerequisites/Dependencies

You will need the following installed on your system:

Python 3.8+
Pip

Setup

If you would like to update the api, please follow the instructions below.

Create a local virtual environment and activate it:

python -m venv .venv #or py -m venv .venv
source .venv/bin/activate # linux
.venv\Scripts\activate # windows

Install the dependencies:
```
pip install -r requirements.txt
```
Add your environment variables. An example is provided at .env.example
```
cp .env.example .env
```
Make sure to update the values in .env to match your local setup.
Format the code:
```
poe format_with_isort
poe format_with_black
```
You can also run poe format to run both commands at once.
Check the code quality:
```
poe typecheck
poe pylint
poe flake8
```
You can also run poe lint to run all three commands at once.
To start the local server, run:
```
poe init # pick python
poe dev
```
This runs func start with the --python flag.

Running Fuji Score Pipeline

The fill-database-fuji.py script queries the database for datasets without Fuji scores and fills them by calling the FUJI API.

📖 For detailed instructions, see RUN-FUJI-PIPELINE.md

Quick Start

Option 1: Docker Compose (Recommended)

# Set MINI_DATABASE_URL in .env file, then:
docker-compose up fill-database-fuji

Option 2: Local Development

# Start FUJI services
docker-compose up -d

# Run the script
python fill-database-fuji.py

What the Script Does

Queries the database for datasets where score IS NULL
Processes datasets in batches using 30 FUJI API instances
Calls the FUJI API in parallel across all endpoints
Updates the database with scores and evaluation dates
Retries failed API calls up to 3 times with exponential backoff
Shows real-time progress with detailed statistics

See RUN-FUJI-PIPELINE.md for complete documentation, troubleshooting, and configuration options.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.vscode		.vscode
prisma		prisma
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pylint.ini		.pylint.ini
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
analyze-fuji-distribution.py		analyze-fuji-distribution.py
build-identifier-datasetid-map-remote.py		build-identifier-datasetid-map-remote.py
build-identifier-datasetid-map.py		build-identifier-datasetid-map.py
build-meilisearch-automated-organizations-index.py		build-meilisearch-automated-organizations-index.py
build-meilisearch-automated-users-index.py		build-meilisearch-automated-users-index.py
build-meilisearch-datasets-index.py		build-meilisearch-datasets-index.py
config.py		config.py
docker-compose-1.yml		docker-compose-1.yml
estimate-fuji-scores.py		estimate-fuji-scores.py
fill-database-automated-organization-sindex.py		fill-database-automated-organization-sindex.py
fill-database-automated-organization.py		fill-database-automated-organization.py
fill-database-automated-user-sindex.py		fill-database-automated-user-sindex.py
fill-database-automated-user.py		fill-database-automated-user.py
fill-database-citation.py		fill-database-citation.py
fill-database-d-index.py		fill-database-d-index.py
fill-database-dataset.py		fill-database-dataset.py
fill-database-estimated-fuji-db.py		fill-database-estimated-fuji-db.py
fill-database-fuji-unused.py		fill-database-fuji-unused.py
fill-database-fuji.py		fill-database-fuji.py
fill-database-mention.py		fill-database-mention.py
fill-database-topics.py		fill-database-topics.py
format-citation.py		format-citation.py
format-fuji-score.py		format-fuji-score.py
format-mention.py		format-mention.py
format-raw-data.py		format-raw-data.py
format-topics.py		format-topics.py
fuji.ipynb		fuji.ipynb
generate-authors.py		generate-authors.py
generate-d-index-files.py		generate-d-index-files.py
generate-fuji-files-old.py		generate-fuji-files-old.py
generate-fuji-files-remote.py		generate-fuji-files-remote.py
generate-fuji-files.py		generate-fuji-files.py
generate-organizations.py		generate-organizations.py
get-metrics.py		get-metrics.py
get-top-ranking-profiles.py		get-top-ranking-profiles.py
identifier_mapping.py		identifier_mapping.py
mock_norm.duckdb		mock_norm.duckdb
openalex_topic_mapping_table.csv		openalex_topic_mapping_table.csv
prepare-database-fuji.py		prepare-database-fuji.py
requirements.txt		requirements.txt
subfield_norm_factors.duckdb		subfield_norm_factors.duckdb
top-datasets-by-mentions.tsv		top-datasets-by-mentions.tsv
top-organizations-by-avg-dataset-index.tsv		top-organizations-by-avg-dataset-index.tsv
top-users-by-avg-dataset-index.tsv		top-users-by-avg-dataset-index.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipeline

Getting started

Prerequisites/Dependencies

Setup

Running Fuji Score Pipeline

Quick Start

What the Script Does

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pipeline

Getting started

Prerequisites/Dependencies

Setup

Running Fuji Score Pipeline

Quick Start

What the Script Does

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages