Follow-up to the initial import-weight work, after rebasing it onto main:
- fileExplorer/funcs.py raises FlowfileHTTPException instead of fastapi's
HTTPException (new catalog notebook code had pulled fastapi back onto
the frame import path); catch sites and tests updated.
- Defer pyarrow (arrow_reader, flow_data_engine, flow_graph, flow_node
models/state, delta_utils), openpyxl+numpy (read_excel_tables via
create/funcs and flow_graph), requests + websockets (subprocess
operations, sample_users), yaml (flow_graph save, io_flowfile,
notebook_store), docker + httpx (kernel manager via kernel package
and execution), confluent_kafka (shared.kafka package init and
flow_graph), cryptography (secret_manager, auth/secrets), passlib
(settings.PWD_CONTEXT, now lazy and unused in-repo), fastexcel and
the legacy pickle schema map.
- Resolve CatalogService lazily: catalog package __getattr__ plus local
imports in flow_graph and flowfile_frame catalog_reference, keeping
the catalog schema/serializer pydantic build (~150ms) off the frame
import path.
- Extend the test_lazy_imports contract with the newly banned modules.
import flowfile_frame: ~2.2s on main, ~1.5s after the first pass,
~0.93s now (python 3.11, warm cache).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NH1z1i8P9uXhY8KAL8QvN2
Summary
Makes the Python package much faster to import.
import flowfile_framedrops from ~2.2s on main to ~0.93s (the earlierimprovement/flowfile-smaller-to-importwork, rebased in here, had gotten it to ~1.48s).This PR contains that original commit rebased onto latest main, plus a second round of import-weight reduction.
Changes
Rebase of
improvement/flowfile-smaller-to-importonto mainFlowfileHTTPException(fastapi-free) now composes with main's newvalidate_connection, catalog object-storagestorage_options, and RBAC changes.ensure_db_initialized) instead of running onimport flowfile_core.FastAPI regression fix
fastapiback onto the frame import path viafileExplorer/funcs.py; its path-security helpers now raiseFlowfileHTTPException(catch sites and tests updated).Newly deferred off the import path
utils/arrow_reader,flow_data_engine,flow_graph,flow_node/models+state,catalog/delta_utils, kafka consumerread_excel_tablesnow loads only on the excel read/schema pathssubprocess_operations(lazy module proxy — every use is a worker round-trip),streaming,sample_usersflowfile_core.catalogresolvesCatalogServicevia PEP 562__getattr__; flow_graph andflowfile_frame.catalog_referenceimport it at call sitesmanager.pyeagerlyshared.kafkapackage init is now lazy; flow_graph imports kafka helpers insideadd_kafka_sourcesettings.PWD_CONTEXT, still resolvable lazily), fastexcel, legacy.flowfilemigration schemasTest hardening
test_lazy_imports.pybanned-module list grows from 8 to 17 so regressions fail CI.Verification
flowfile_coresuite: 4745 passed (remaining failures are pre-existing/environment-specific: one test assumes a non-root home dir, three project-route tests are ordering flakes that pass in isolation and on the base commit).make check_stubsin sync; ruff matches main's baseline.Remaining import cost (structural, not addressed)
polars (~125ms), sqlalchemy (~135ms, model declarations), the
input_schemapydantic build (~115ms), and third-party plugin inits (pl_fuzzy_frame_matchetc.).🤖 Generated with Claude Code
https://claude.ai/code/session_01NH1z1i8P9uXhY8KAL8QvN2