Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 9 additions & 13 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,15 @@ jobs:
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
- uses: actions/checkout@v6
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
python-version: ${{ matrix.python-version }}
cache: pip
cache-dependency-path: '**/pyproject.toml'
- name: Install poetry
run: |
python -m pip install poetry
- name: Install project with dependencies
run: |
poetry install --with dev
enable-cache: true

- name: Install project
run: uv sync --all-extras --dev

- name: Run tests
run: |
poetry run pytest
run: uv run pytest tests/
8 changes: 5 additions & 3 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,17 @@ version: 2
build:
os: ubuntu-22.04
tools:
python: "3.10"
python: "3.13"

python:
version: 3.10
install:
- method: pip
path: .
extra: docs
- requirements: docs/requirements.txt

sphinx:
configuration: docs/conf.py
fail_on_warning: false

formats:
- htmlzip
81 changes: 80 additions & 1 deletion docs/source/batch_poster_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Batch Poster works exclusively with FOLIO Inventory storage records:
| `Instances` | `/instance-storage/batch/synchronous` | Bibliographic records |
| `Holdings` | `/holdings-storage/batch/synchronous` | Holdings records |
| `Items` | `/item-storage/batch/synchronous` | Item records |
| `ShadowInstances` | `/instance-storage/batch/synchronous` | Consortium shadow instances (ECS) |

```{note}
For other data types, use the appropriate tool:
Expand Down Expand Up @@ -69,11 +70,12 @@ folio-data-import batch-poster \

| Parameter | Default | Description |
|-----------|---------|-------------|
| `--object-type` | (required) | Record type: `Instances`, `Holdings`, or `Items` |
| `--object-type` | (required) | Record type: `Instances`, `Holdings`, `Items`, or `ShadowInstances` |
| `--file-path` | (required) | Path(s) to JSONL file(s). Accepts multiple values and glob patterns |
| `--batch-size` | 100 | Number of records per batch (1-1000) |
| `--upsert` | false | Enable upsert mode to update existing records |
| `--failed-records-file` | none | Path to file for writing failed records |
| `--rerun-failed-records` | false | After main run, reprocess failed records one at a time |
| `--no-progress` | false | Disable progress bar display |

#### Upsert Options
Expand Down Expand Up @@ -183,6 +185,40 @@ folio-data-import batch-poster \

This updates **only** `barcode` and `materialTypeId` from your input file while preserving all other fields from the existing record.

## Protected Fields

### Always-Preserved Fields

Certain fields are **always preserved** from existing records during upsert, regardless of configuration:

| Field | Applies To | Reason |
|-------|------------|--------|
| `hrid` | All record types | Human-readable ID - changing it breaks external references |
| `lastCheckIn` | Items | Circulation data - should not be overwritten by migrations |

These fields cannot be overwritten through upsert operations. If you need to change an `hrid`, you must delete and recreate the record.

### MARC-Sourced Instance Protection

When updating Instance records that have a MARC source (i.e., `source` contains "MARC"), Batch Poster automatically restricts which fields can be patched. This protects MARC-managed fields from being overwritten, as they would be reverted on the next SRS update anyway.

**Allowed fields for MARC-sourced Instances:**

| Field | Purpose |
|-------|---------|
| `discoverySuppress` | Discovery suppression flag |
| `staffSuppress` | Staff suppression flag |
| `deleted` | Deletion flag |
| `statisticalCodeIds` | Statistical codes (merged with existing) |
| `administrativeNotes` | Administrative notes (merged with existing) |
| `instanceStatusId` | Instance status |

**Example:** If you try to update the `title` of a MARC-sourced Instance, that change will be ignored to protect the MARC-managed data.

```{note}
This protection is automatic. You don't need to configure anything - MARC-sourced records are detected and handled appropriately.
```

## Multiple Files

Process multiple files in one run:
Expand Down Expand Up @@ -308,6 +344,49 @@ folio-data-import batch-poster \
--patch-paths "barcode"
```

### Rerun Failed Records

When a batch fails, some records in that batch may have succeeded individually. The `--rerun-failed-records` flag automatically reprocesses failed records one at a time after the main run completes, giving each record a second chance:

```bash
folio-data-import batch-poster \
--object-type Items \
--file-path items.jsonl \
--upsert \
--failed-records-file failed_items.jsonl \
--rerun-failed-records
```

This will:
1. Process all records in batches (normal operation)
2. If any batches fail, reprocess those failed records individually
3. Write still-failing records to a new file with `_rerun` suffix (e.g., `failed_items_rerun.jsonl`)

The original failed records file is preserved, and the rerun processes records in a streaming fashion without loading them all into memory.

```{note}
`--rerun-failed-records` requires `--failed-records-file` to be set.
```

### Consortium Shadow Instances (ECS)

For FOLIO ECS (consortium) environments, use `ShadowInstances` to post shadow copies to member tenants. This automatically converts the `source` field:

- `MARC` → `CONSORTIUM-MARC`
- `FOLIO` → `CONSORTIUM-FOLIO`

```bash
folio-data-import batch-poster \
--gateway-url https://folio-snapshot-okapi.dev.folio.org \
--tenant-id central \
--member-tenant-id member1 \
--username admin \
--password admin \
--object-type ShadowInstances \
--file-path instances.jsonl \
--upsert
```

## Environment Variables

Set connection parameters as environment variables to simplify commands:
Expand Down
38 changes: 38 additions & 0 deletions docs/source/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ Batch Poster works with FOLIO Inventory storage records:
- **Instances**: Bibliographic records
- **Holdings**: Holdings records attached to instances
- **Items**: Item records attached to holdings
- **ShadowInstances**: Consortium shadow copies (ECS environments)

### Input Format

Expand Down Expand Up @@ -171,6 +172,22 @@ When updating existing records, you can preserve specific data:
- `--preserve-temporary-loan-types`: Keep existing temporary loan type (Items only)
- Item status is **preserved by default**; use `--overwrite-item-status` to change

### Always-Protected Fields

Certain fields are **always preserved** from existing records, regardless of configuration:

- `hrid` (human-readable ID): Changing it would break external references
- `lastCheckIn` (Items only): Circulation data that should not be overwritten

### MARC Source Protection

For Instance records with a MARC source (e.g., `source: "MARC"` or `source: "CONSORTIUM-MARC"`), Batch Poster automatically restricts patching to only these fields:

- `discoverySuppress`, `staffSuppress`, `deleted` (suppression flags)
- `statisticalCodeIds`, `administrativeNotes`, `instanceStatusId`

This prevents overwriting MARC-managed fields like title or contributors, which would be reverted on the next SRS update anyway.

### Selective Patching

For fine-grained updates, use `--patch-existing-records` with `--patch-paths`:
Expand All @@ -181,6 +198,27 @@ For fine-grained updates, use `--patch-existing-records` with `--patch-paths`:

This updates only the specified fields while preserving all others from the existing record.

### Consortium Shadow Instances

For FOLIO ECS (consortium) environments, use `--object-type ShadowInstances` to post shadow copies to member tenants. This automatically converts the `source` field to consortium format:

- `MARC` → `CONSORTIUM-MARC`
- `FOLIO` → `CONSORTIUM-FOLIO`

Use `--member-tenant-id` to specify the target member tenant.

### Rerunning Failed Records

When `--rerun-failed-records` is enabled (along with `--failed-records-file`), the tool automatically reprocesses any failed records one at a time after the main batch run completes:

```bash
folio-data-import batch-poster --object-type Items \
--file-path items.jsonl --upsert \
--failed-records-file failed.jsonl --rerun-failed-records
```

This streams through the failed records file, giving each record an individual retry. Records that still fail are written to a new file with `_rerun` suffix (e.g., `failed_rerun.jsonl`).

### Workflow

```
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "folio_data_import"
version = "0.5.0b4"
version = "0.5.0b5"
description = "A python module to perform bulk import of data into a FOLIO environment. Currently supports MARC and user data import."
authors = [{ name = "Brooks Travis", email = "brooks.travis@gmail.com" }]
license = "MIT"
Expand Down
Loading