Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions docs/guides/performance_tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,64 @@ The choice of mappers can impact performance.

**Recommendation**: If you need to map values based on data in Odoo, it is much more performant to first export the necessary mapping data from Odoo (e.g., using `fluvo export`) into a Python dictionary or a separate CSV file, and then use the much faster `mapper.map_val` or other in-memory lookups to do the translation.

---

## Minimizing Odoo's ORM Work

For most imports the wall-clock time is dominated by what **Odoo's ORM does on the
server** — resolving each relational value with a `name_search`, recomputing fields,
firing mail/tracking side-effects — not by the client sending data. The biggest wins
come from doing that work on the client, in Polars, *before* `load()`.

### Pre-resolve relations (`--resolve-relation`)

When a column holds a natural key (a country name, a partner reference), Odoo would
normally run a `name_search` **per row** to turn it into a database id. Instead, fluvo
can resolve the whole column in one vectorized Polars join against a cached id-map of
the related model, and hand `load()` an already-resolved `field/id` column — so Odoo
performs **no `name_search`** for that field.

```bash
# 'country' column holds res.country codes -> resolve into country_id, no name_search.
fluvo import --connection-file conf/connection.conf --file partners.csv \
--model res.partner \
--resolve-relation country:res.country:code:country_id
```

Format: `source_column:model:key_field:relation_field[:xmlid|dbid]` (repeatable).
`xmlid` (default) is portable; `dbid` is fastest (zero server resolution) but
database-specific. The id-map is cached to parquet and reused across runs. From a
transform script, the same is available as `Processor.resolve_relation(...)`.

> **Measured:** importing **2,000 `res.partner`** records with `country_id` ran in
> **6.96s** the naive way (Odoo `name_search` per row) versus **4.27s** pre-resolved —
> a **1.6× speedup** *even though the data has only 5 distinct countries* (which Odoo
> caches). The win grows with relation cardinality: a column where most values are
> distinct (suppliers, categories, partner refs) gets no server-side `name_search`
> caching, so pre-resolving it saves far more.

### Skip unchanged records (`--skip-unchanged`)

On a re-import, fluvo can fetch the current field values, compare them to the incoming
rows with a **vectorized Polars anti-join**, and send only the rows that are new or
changed. Re-running an unchanged dataset then sends ~0 rows.

```bash
fluvo import --connection-file conf/connection.conf --file partners.csv \
--model res.partner --skip-unchanged
```

### Suppress side-effects (default) and auto-clean

By default fluvo imports with `tracking_disable`, `mail_create_nolog`, and
`mail_notrack` set, so Odoo skips chatter/tracking work (override any of them with
`--context '{"tracking_disable": false}'`). `--auto-clean` applies safe, type-aware
coercions (whitespace, null tokens, booleans) on the client before load; an
uncoercible value routes that **row** to the fail file rather than aborting the batch.

> These optimizations are all opt-in (except the default side-effect suppression) and
> correctness-preserving: the resulting Odoo state is identical to a naive import.

---
## Performance Strategy for Relational Data (Automatic Two-Pass Import)

Expand Down
54 changes: 54 additions & 0 deletions src/fluvo/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,46 @@
from .writer import run_write


def _parse_resolve_relation_specs(specs: tuple[str, ...]) -> list[dict[str, Any]]:
"""Parse --resolve-relation strings into resolve_relations specs.

Each string is ``source_column:model:key_field:relation_field[:to]`` where
``to`` is ``xmlid`` (default) or ``dbid``.

Args:
specs: Raw --resolve-relation option values.

Returns:
list[dict[str, Any]]: Spec dicts for run_import's resolve_relations.

Raises:
click.BadParameter: If a spec string is malformed.
"""
parsed: list[dict[str, Any]] = []
for raw in specs:
parts = [p.strip() for p in raw.split(":")]
if len(parts) not in (4, 5) or not all(parts[:4]):
raise click.BadParameter(
f"--resolve-relation {raw!r}: expected "
"'source_column:model:key_field:relation_field[:xmlid|dbid]' "
"with non-empty fields."
)
spec: dict[str, Any] = {
"source_column": parts[0],
"model": parts[1],
"key_field": parts[2],
"relation_field": parts[3],
}
if len(parts) == 5:
if parts[4] not in ("xmlid", "dbid"):
raise click.BadParameter(
f"--resolve-relation {raw!r}: 'to' must be 'xmlid' or 'dbid'."
)
spec["to"] = parts[4]
parsed.append(spec)
return parsed


def _run_dry_run_validation(connection_file: str, **kwargs: Any) -> None:
"""Run dry-run validation mode without importing."""
from .lib.conf_lib import get_connection_from_config, get_connection_from_dict
Expand Down Expand Up @@ -1107,6 +1147,15 @@ def vat_validate_cmd(
help="Apply safe, type-aware coercions before load (strip whitespace, "
"normalize null tokens, canonicalize booleans). Off by default.",
)
@click.option(
"--resolve-relation",
"resolve_relation_specs",
multiple=True,
help="Pre-resolve a relation column in Polars before load, so Odoo performs "
"no name_search for it. Format "
"'source_column:model:key_field:relation_field[:xmlid|dbid]'. Repeatable. "
"Example: --resolve-relation country:res.country:code:country_id",
)
@click.option(
"--fix-missing-variants",
is_flag=True,
Expand Down Expand Up @@ -1401,6 +1450,11 @@ def import_cmd(connection_file: str, **kwargs: Any) -> None: # noqa: C901
context["fallback_values"] = fallback_values

kwargs["context"] = context
resolve_relation_specs = kwargs.pop("resolve_relation_specs", ())
if resolve_relation_specs:
kwargs["resolve_relations"] = _parse_resolve_relation_specs(
resolve_relation_specs
)

# Handle groupby option
groupby = kwargs.get("groupby")
Expand Down
92 changes: 92 additions & 0 deletions tests/test_resolve_relations_cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
"""Tests for the --resolve-relation CLI flag and its parser."""

from unittest.mock import MagicMock, patch

import click
import pytest
from click.testing import CliRunner

from fluvo import __main__
from fluvo.__main__ import _parse_resolve_relation_specs


def test_parse_four_part_spec() -> None:
"""A 4-part spec parses to a dict without a 'to' key."""
out = _parse_resolve_relation_specs(("country:res.country:code:country_id",))
assert out == [
{
"source_column": "country",
"model": "res.country",
"key_field": "code",
"relation_field": "country_id",
}
]


def test_parse_five_part_spec_includes_to() -> None:
"""A 5-part spec carries the 'to' target."""
out = _parse_resolve_relation_specs(("c:res.country:code:country_id:dbid",))
assert out[0]["to"] == "dbid"


def test_parse_multiple_specs() -> None:
"""Several specs produce several dicts."""
out = _parse_resolve_relation_specs(
(
"country:res.country:code:country_id",
"parent:res.partner:ref:parent_id:xmlid",
)
)
assert len(out) == 2


def test_parse_rejects_wrong_part_count() -> None:
"""A spec with too few parts is rejected."""
with pytest.raises(click.BadParameter):
_parse_resolve_relation_specs(("country:res.country:code",))
Comment on lines +43 to +46

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure that the new validation for empty fields in the --resolve-relation spec is thoroughly tested, we should add a test case that verifies that specs with empty fields are rejected.

Suggested change
def test_parse_rejects_wrong_part_count() -> None:
"""A spec with too few parts is rejected."""
with pytest.raises(click.BadParameter):
_parse_resolve_relation_specs(("country:res.country:code",))
def test_parse_rejects_wrong_part_count() -> None:
"""A spec with too few parts is rejected."""
with pytest.raises(click.BadParameter):
_parse_resolve_relation_specs(("country:res.country:code",))
def test_parse_rejects_empty_fields() -> None:
"""A spec with empty fields is rejected."""
with pytest.raises(click.BadParameter):
_parse_resolve_relation_specs(("country::code:country_id",))



def test_parse_rejects_empty_fields() -> None:
"""A spec with an empty required field is rejected."""
with pytest.raises(click.BadParameter):
_parse_resolve_relation_specs(("country::code:country_id",))


def test_parse_rejects_invalid_to() -> None:
"""An invalid 'to' value is rejected."""
with pytest.raises(click.BadParameter):
_parse_resolve_relation_specs(("c:res.country:code:country_id:nope",))


@patch("fluvo.__main__.run_import")
def test_cli_resolve_relation_flows_to_run_import(mock_run_import: MagicMock) -> None:
"""--resolve-relation is parsed and passed to run_import as resolve_relations."""
mock_run_import.return_value = {"x": 1}
runner = CliRunner()
with runner.isolated_filesystem():
with open("conn.conf", "w") as f:
f.write("[Connection]")
result = runner.invoke(
__main__.cli,
[
"import",
"--connection-file",
"conn.conf",
"--file",
"my.csv",
"--model",
"res.partner",
"--resolve-relation",
"country:res.country:code:country_id",
],
)
assert result.exit_code == 0
call_kwargs = mock_run_import.call_args.kwargs
assert call_kwargs["resolve_relations"] == [
{
"source_column": "country",
"model": "res.country",
"key_field": "code",
"relation_field": "country_id",
}
]
Loading