Skip to content

pytgaen/fimod

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

fimod

the data shaper CLI

πŸ—οΈ Mold your data, shape your CI, play with your pipelines

πŸͺΆ Python-powered molding without Python installed

πŸ’‘ DRY your pipelines Β· Slim your container images Β· Tame your configs

Release License CI


fimod (Flexible Input, Mold Output Data) is a single Rust binary (~3.3 MB in the UPX-compressed standard build) with an embedded Python runtime (Monty). It reads JSON, YAML, TOML, CSV, NDJSON, and plain text - from files or directly from HTTP URLs - lets you transform data with Python expressions, and writes the result in any of those formats. No system Python, no pip install, no dependencies.

Fimod runs molds on Monty, not CPython: use familiar Python syntax and built-ins, with Rust-powered helpers for data shaping.

# πŸ” Filter, reshape, convert - in one command
fimod s -i users.json -e '[u for u in data if u["active"]]' -o active.csv

Hero Demo

# ⛓️ Chain transforms like Unix pipes - inside a single process. Also have some built-in helpers.
fimod s -i data.json -e '[u for u in data if u["age"] > 30]' -e 'it_sort_by(data, "name")'

# πŸ“¦ Batch-process entire directories
fimod s -i logs/*.json -m normalize.py -o cleaned/

πŸ“¦ Install

Linux / macOS

curl -fsSL https://raw.githubusercontent.com/pytgaen/fimod/main/install.sh | sh

The script downloads the right binary, installs it, then runs fimod setup all defaults --if-needed for community registries (example molds) and the recommended sandbox policy (~/.config/fimod/sandbox.toml). Already-configured blocks are skipped; missing blocks ask unless you answer with env vars. When installing slim or fast, the script may also ask whether to install that variant as the default fimod command.

πŸ’‘ Options via env vars: FIMOD_VARIANT=standard|slim|fast Β· FIMOD_SET_DEFAULT=yes|no Β· FIMOD_INSTALL=~/.local/bin Β· FIMOD_VERSION=0.1.0 Β· FIMOD_SETUP_ALL=yes|no (or per category: FIMOD_SETUP_REGISTRY / FIMOD_SETUP_SANDBOX)

Variants install as separate commands by default: standard β†’ fimod, slim β†’ fimod-slim, fast β†’ fimod-fast. For slim or fast, answer the prompt or set FIMOD_SET_DEFAULT=yes to also install that variant as the default fimod command.

The standard build includes HTTP/HTTPS and proxy support through reqwest + rustls + AWS-LC. Use FIMOD_VARIANT=slim when binary size matters more than HTTP input or remote mold loading.

The fast build keeps the standard feature set but uses the speed profile (opt-level=3) and ships without UPX compression. Internal smoke tests show roughly 15-25% faster CPU-heavy JSON/CSV/chain workloads; use it for large files, long chains, or repeated batch jobs.

Windows

Option 1 β€” via ubi (no script, antivirus-friendly)

ubi is a universal binary installer available on winget (pre-installed on Windows 10/11):

# πŸ“¦ 1. Install ubi (one-time, uses winget which is built into Windows)
winget install houseabsolute.ubi

# πŸ”„ Then restart PowerShell so ubi is found in PATH

# ⬇️ 2. Install fimod (classic β€” includes HTTP support)
ubi --project pytgaen/fimod --matching "fimod-v" --in "$env:USERPROFILE\.local\bin"

# Or install the slim variant (no HTTP support, smaller binary)
# ubi --project pytgaen/fimod --matching "fimod-slim-v" --in "$env:USERPROFILE\.local\bin"

# Or install the fast variant (speed optimized, larger uncompressed binary)
# ubi --project pytgaen/fimod --matching "fimod-fast-v" --in "$env:USERPROFILE\.local\bin"

# πŸ›€οΈ 3. Add to PATH (if not already present)
$BinDir = "$env:USERPROFILE\.local\bin"
$UserPath = [Environment]::GetEnvironmentVariable('PATH', 'User')
if ($UserPath -notlike "*$BinDir*") {
    [Environment]::SetEnvironmentVariable('PATH', "$BinDir;$UserPath", 'User')
    $env:PATH = "$BinDir;$env:PATH"
}

# πŸ—‚οΈ 4. Install missing registries + sandbox policy
fimod setup all defaults --if-needed
Option 2 β€” PowerShell script (execution policy / antivirus may block)

⚠️ If your antivirus blocks this script, use Option 1 (ubi) instead β€” it downloads a signed binary directly from GitHub Releases with no script execution.

Download first, then run:

Invoke-RestMethod https://raw.githubusercontent.com/pytgaen/fimod/main/install.ps1 -OutFile "$env:TEMP\fimod-install.ps1"
& "$env:TEMP\fimod-install.ps1"

πŸ’‘ Same env var options as Linux: $env:FIMOD_VARIANT, $env:FIMOD_SET_DEFAULT, $env:FIMOD_INSTALL, $env:FIMOD_VERSION

⚠️ VCRUNTIME140.dll not found?

fimod requires the Microsoft Visual C++ Redistributable, pre-installed on most Windows systems but missing in minimal environments (Windows Sandbox, fresh server installs).

winget install Microsoft.VCRedist.2015+.x64

Or download directly from Microsoft: https://aka.ms/vs/17/release/vc_redist.x64.exe

From source

git clone https://github.com/pytgaen/fimod && cd fimod
cargo build --release   # β†’ target/release/fimod

πŸ€” Why not jq / yq / awk / sed?

You already know Python. Why learn another DSL?

jq / yq - powerful but you need to learn a custom query language:

# jq: filter users older than 30
jq '[.[] | select(.age > 30)]' users.json

# fimod: same idea, with Python syntax
fimod s -i users.json -e '[u for u in data if u["age"] > 30]'
# jq: project + sort + deduplicate
jq '[.[] | {id, name}] | sort_by(.name) | unique_by(.id)' data.json

# fimod: chain expressions, each feeds the next
fimod s -i data.json -e '[{"id": u["id"], "name": u["name"]} for u in data]' \
  -e 'it_unique_by(it_sort_by(data, "name"), "id")'

Python one-liner - works but painful boilerplate:

python3 -c "
import json, sys
data = json.load(sys.stdin)
print(json.dumps([u for u in data if u['active']]))
" < users.json

# fimod: same logic, zero boilerplate, no Python install
fimod s -i users.json -e '[u for u in data if u["active"]]'

πŸ‘‰ See the full feature comparison against jq, yq, and Python

πŸ‘€ A taste of what fimod can do

🐍 Python-syntax transforms β€” Rust-powered I/O, serialization & builtins:

# YAML to JSON, filter active users, sort by name
fimod s -i users.yaml -e '[u for u in data if u["active"]]' -e 'it_sort_by(data, "name")' -o result.json
# Filter active users, then group by role β€” Unix pipes just work
fimod s -i users.json -e '[u for u in data if u["active"]]' | fimod s -e 'it_group_by(data, "role")'
# Enrich records with Python string methods β€” try this in jq...
fimod s -i users.json -e '[{**u, "slug": u["name"].lower().replace(" ", "-"), "domain": u["email"].split("@")[1]} for u in data]'

πŸ“¦ Registry molds β€” reusable recipes, one @name away:

# πŸ”€ Patch a YAML config with dot-path assignments
fimod s -i deployment.yaml -m @yaml_merge --arg set="spec.replicas=3,metadata.labels.env=prod" -o deployment.yaml
# πŸ” Anonymize PII fields with SHA-256
fimod s -i users.json -m @anonymize_pii --arg fields=email,phone -o users_anon.json
# πŸ“Š Deduplicate records by a field
fimod s -i data.json -m @dedup_by --arg field=email

πŸ“¦ More molds in the fimod-powered registry:

Mold Description
@gh_latest GitHub release resolver
@download wget-like fetch
@poetry_migrate Poetry β†’ uv/Poetry 2
@skylos_to_gitlab dead code β†’ GitLab Code Quality
fimod registry add fimod-powered https://github.com/pytgaen/fimod-powered
🍿 Even more taste... (in-place, regex, log parsing, env templating)
# πŸ”’ Anonymize emails in-place β€” replace with SHA-256 hashes
fimod s -i customers.csv -e '[{**r, "email": hs_sha256(r["email"])} for r in data]' --in-place
# πŸ•΅οΈ Mask IPs with regex β€” 192.168.1.42 β†’ 192.168.x.x
fimod s -i logs.json -e '[{**r, "ip": re_sub(r"\d+\.\d+$", "x.x", r["ip"])} for r in data]'
# πŸ“Š Raw log lines β†’ structured JSON records
fimod s -i server.log -m @log_parse \
  --arg regex='(\S+) \[(.+?)\] "(.+?)" (\d+)' \
  --arg fields=ip,timestamp,request,status
# πŸ”€ Inject environment variables into ${VAR} placeholders
fimod s -i config.json --env 'DB_*' -e '{k: env_subst(v, env) for k, v in data.items()}'

Run fimod mold list to browse all built-in molds.

πŸ”‹ Batteries included

πŸ—‚οΈ Multi-file slurp

The classic yq/jq slurp use case β€” merge a base config with environment overrides β€” but across any mix of formats:

# Merge base.yaml with prod overrides in TOML β€” impossible with yq
fimod s -i base.yaml -i prod.toml -s -e '
def transform(data, **_):
    data[0].update(data[1])
    return data[0]
'

Slurp Demo

data is an array ordered like the -i flags; later entries win on conflict.

Named mode β€” append : to get a dict keyed by filename stem, clearer than an index when files have distinct roles:

# Merge base with prod overrides β€” role is explicit, no need to count -i flags
fimod s -i base.yaml: -i prod.yaml: -s -e '
def transform(data, **_):
    data["base"].update(data["prod"])
    return data["base"]
'

Explicit aliases β€” when two files share the same name:

# Merge configs from sibling directories
fimod s -i eu/limits.toml:eu -i us/limits.toml:us -s \
  -e '{ region: v["max_requests"] for region, v in data.items() }'

The mold runs once on the combined result. Works across formats (JSON + YAML + TOML + CSV…).

⛓️ Chaining

Multiple -e expressions form an in-process pipeline - each step feeds data to the next:

fimod s -i data.json \
  -e '[u for u in data if u["age"] > 18]' \
  -e 'it_sort_by(data, "name")' \
  -e '[{"name": u["name"], "hash": hs_sha256(u["email"])} for u in data]'

Chaining Demo

🧰 Built-in helpers - no import needed

Family Functions Example
re_* search, match, findall, sub, split re_sub(r"(\w+)@(\w+)", r"\2/\1", text)
re_*_fancy same + fancy-regex $1/${name} syntax re_sub_fancy(r"(\w+)@(\w+)", "$2/$1", text)
dp_* get, set (nested dotpath) dp_set(data, "server.port", 8080)
it_* sort_by, group_by, unique, flatten, ... it_group_by(data, "status")
hs_* md5, sha1, sha256 hs_sha256(data["email"])
msg_* print, info, warn, error (to stderr) msg_warn("low coverage")
gk_* fail, assert, warn (validation gates) gk_assert(data.get("version"), "missing version")
env_subst ${VAR} substitution in templates env_subst("Hello ${NAME}", env)

Helpers are implemented in Rust. Regex patterns use fancy-regex (PCRE2). re_sub accepts Python \1/\g<name> syntax; re_sub_fancy uses $1/${name}.

πŸ“¦ Reusable molds & registries

A mold is a Python file with a transform(data, **_) function. Add named context parameters before **_ when needed, for example args or pipeline. Keeping **_ is the recommended convention for reusable molds because fimod passes context as keyword arguments.

# normalize.py
def transform(data, **_):
    return [{"name": u["name"].strip().title(), "email": u["email"].lower()} for u in data]
# Use a local mold
fimod s -i users.json -m normalize.py

# Use a remote mold - fetched and executed on the fly
fimod s -i users.json -m https://example.com/transforms/normalize.py

Registries are named collections of molds (local directories or GitHub/GitLab repos). The @ prefix resolves molds from registries:

fimod registry add team https://github.com/myorg/molds --default
fimod s -i data.csv -m @clean_csv          # from default registry
fimod s -i data.csv -m @team/clean_csv     # explicit registry
fimod mold list                           # browse available molds
fimod mold show @clean_csv               # inspect metadata & defaults
Private registry with token

For private GitHub/GitLab repos, fimod automatically uses $GITHUB_TOKEN or $GITLAB_TOKEN:

# 1. Export your token (add to .bashrc/.zshrc for persistence)
export GITHUB_TOKEN=ghp_xxx

# 2. Add a private registry
fimod registry add corp https://github.com/myorg/private-molds --default

# 3. Use molds β€” token is picked up automatically
fimod s -i data.json -m @corp/sanitize

# Verify token is detected
fimod registry show corp
#   Token:   $GITHUB_TOKEN (auto) β€” set βœ“

You can also use a custom env var per registry:

fimod registry add corp https://github.com/myorg/private-molds --token-env CORP_TOKEN
export CORP_TOKEN=ghp_yyy

CI/ephemeral environments β€” use FIMOD_REGISTRY instead of fimod registry add:

FIMOD_REGISTRY=./molds fimod s -i data.json -m @clean
FIMOD_REGISTRY="ci=./molds,staging=https://github.com/org/molds" fimod s -i data.json -m @ci/clean

fimod ships with a built-in mold catalog covering common tasks (CSV stats, JSON schema extraction, key renaming, PII anonymization, and more).

πŸ”₯ HTTP input (goodbye curl | jq)

The -i flag accepts URLs just like file paths. No curl, no wget, no pipes. Fimod fetches, parses, and transforms in a single command.

# Fetch and transform in one shot - replaces curl | jq
fimod s -i https://api.github.com/repos/pytgaen/fimod -e 'data["name"] + ": " + str(data["stargazers_count"]) + " stars"' --output-format txt

# Hit authenticated APIs with custom headers
fimod s -i https://api.github.com/user/repos \
    --http-header "Authorization: Bearer $GITHUB_TOKEN" \
    -e '[r["full_name"] for r in data]'

# πŸ‘€ Download binaries - bypass the transform pipeline entirely
fimod s -i https://example.com/archive.tar.gz --output-format raw -O

HTTP Demo

Powered by reqwest with rustls/AWS-LC - proxy-aware out of the box (HTTP_PROXY / HTTPS_PROXY / NO_PROXY). Smart format detection reads Content-Type headers automatically. Use --input-format http for full access to status codes and response headers.

Included in the standard and fast variants. Use FIMOD_VARIANT=slim to exclude HTTP support.

πŸ›‘οΈ Security model

Mold scripts run under a zero-authorization sandbox by default. All I/O stays in Rust β€” a mold cannot read/write files, reach the network, or inspect the host process. Every fimod s invocation also enforces hard limits (max_duration = 2m, max_memory = 1GB) and exits with code 137 on violation. Safe for remote and untrusted molds.

Opt in to the bits your molds need by writing ~/.config/fimod/sandbox.toml:

[sandbox]
allow_clock  = true              # enable datetime.now() / date.today()
max_duration = "2m"
max_memory   = "1GB"
allow_env    = ["LANG", "TZ_*"]  # glob-matched os.getenv() keys

Bootstrap it with fimod setup sandbox defaults --yes, then tune. Override per-invocation with --sandbox-file <path>; force zero-authorization with --sandbox-file="". See Sandbox policy for the full reference.

βš™οΈ How it works

 Input                  Python transform           Output
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ file / stdin β”‚       β”‚                   β”‚      β”‚ JSON / YAML  β”‚
β”‚ https://...  │─────▢│  your transform   │─────▢│ TOML / CSV   β”‚
β”‚ JSON / YAML  β”‚ Rust  β”‚  runs in Monty    β”‚ Rust β”‚ NDJSON / TXT β”‚
β”‚ TOML / CSV   β”‚ parse β”‚  (embedded Python)β”‚ ser. β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ NDJSON / TXT β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“– Documentation

πŸ“š Guides πŸ”§ Reference
Quick Start Formats - JSON, YAML, TOML, CSV, TXT, Lines, NDJSON, HTTP
Concepts Built-ins - re_*, dp_*, it_*, hs_*, msg_*, gk_*, env_subst
Mold Scripting Mold Defaults - # fimod: directives
CLI Reference Exit Codes - --check and set_exit()
Authoring Molds Cookbook 🍳
AI Integration & Agents πŸ€– Agent Skill ✨

⚠️ Project Status

🧠 Designed by humans β€” built by AI.

fimod is young software. AI accelerates the velocity; humans own the architecture.

Design decisions, invariants, and architectural boundaries stay explicit β€” see notes/ for the vision, architecture map, and design log. The discipline you see in the code (cargo deny, #[must_use], layered serde/Monty boundary, conventional commits, ~500 tests + e2e fixtures) is intentional; the speed is the AI.

  • Monty (the embedded Python runtime) is an early-stage project by Pydantic. It is not CPython, and its API may change between releases.
  • fimod depends directly on Monty and inherits that instability. Expect breaking changes as both projects mature.
  • Versioning follows Semantic Release - breaking changes bump the major version.
  • Mold scripts can use Python syntax, common built-ins, and selected stdlib modules, but not arbitrary PyPI packages or full stdlib parity.
  • Built-in helpers (re_*, dp_*, it_*, hs_*, tpl_*, msg_*, gk_*, env_subst) are implemented in Rust as part of fimod's data-shaping API. In particular, regex functions use fancy-regex syntax (Rust/PCRE2 flavour), not Python's re module - see Built-ins Reference.

Note

Regex: Fimod built-ins vs Monty's re module

Fimod was originally built on Monty v0.0.6, which had no regex support. We introduced re_search, re_sub, re_findall, etc. as Fimod built-in functions to fill that gap β€” a good example of the challenges of moving fast alongside a young runtime.

Since Monty v0.0.8, import re works β€” Monty implements a subset of Python's re module. Both approaches now work side by side:

  • Fimod's re_* built-ins β€” direct access to fancy-regex, including advanced features like variable-length lookbehind/lookahead
  • import re β€” familiar Python API, but only partially implemented in Monty (also backed by fancy-regex under the hood)

The re_* built-ins are here to stay for the foreseeable future (at least until late 2027). As Monty's re module matures, we'll reconsider.

Since import re is already well-known to Python developers, the documentation focuses on the re_* built-ins which are specific to Fimod.

πŸ“„ License

Apache License 2.0 - see LICENSE.txt.

About

πŸ—οΈ fimod - the data shaper CLI. Transform JSON, YAML, TOML, CSV with Python expressions. No Python install, no deps β€” single Rust binary πŸͺΆ

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors