diff --git a/docs/concepts/index.md b/docs/concepts/index.md index 48504f68..5a066352 100644 --- a/docs/concepts/index.md +++ b/docs/concepts/index.md @@ -417,6 +417,8 @@ classDiagram class IngestionParams { +clear_data: bool +n_cores: int + +resources: list[str]? + +vertices: list[str]? +batch_size: int +max_items: int? +dry: bool diff --git a/docs/examples/example-5.md b/docs/examples/example-5.md index 8dc0b01a..b2c100be 100644 --- a/docs/examples/example-5.md +++ b/docs/examples/example-5.md @@ -776,6 +776,38 @@ bindings = Bindings( ) ``` +### Runtime proxy wiring (no secrets in YAML) + +Once your `bindings` block contains `connector_connection` proxy labels, you must +register the real runtime connection config under each `conn_proxy` and bind the +manifest connectors to that proxy: + +```python +from graflo.hq.connection_provider import ( + InMemoryConnectionProvider, + PostgresGeneralizedConnConfig, +) +from graflo.hq import IngestionParams + +provider = InMemoryConnectionProvider() +provider.register_generalized_config( + conn_proxy="postgres_source", + config=PostgresGeneralizedConnConfig(config=postgres_conf), +) +provider.bind_from_bindings(bindings=bindings) + +engine.define_and_ingest( + manifest=manifest.model_copy(update={"bindings": bindings}), + target_db_config=conn_conf, + ingestion_params=IngestionParams(clear_data=True), + recreate_schema=False, + connection_provider=provider, +) +``` + +For the common single-proxy case you can replace the `register_generalized_config(...)` + +`bind_from_bindings(...)` steps with `provider.bind_single_config_for_bindings(...)`. + ## Viewing Results in Graph Database Web Interfaces After successful ingestion, you can explore your graph data using each database's web interface. The default ports and access information are listed below. Check the corresponding `.env` files in the `docker/` directories for custom port configurations. diff --git a/docs/examples/example-8.md b/docs/examples/example-8.md new file mode 100644 index 00000000..f562111b --- /dev/null +++ b/docs/examples/example-8.md @@ -0,0 +1,25 @@ +# Example 8: Multi-Edge Weights with Filters and `dress` Transforms + +This example ingests ticker CSV data into Neo4j with: + +- **Two vertex types** — `ticker` (by `oftic`) and `metric` (by `name` + `value`), where `metric` rows are filtered so only Open, Close, and Volume with positive values become vertices. +- **One edge** — `ticker` → `metric` with **multiple weights** (`direct` on `t_obs` plus nested `vertices` metadata on the metric endpoint). +- **Transforms with `dress`** — `round_str` and `int` transforms targeted at specific `(name, value)` pairs via `dress: { key: name, value: value }`, plus a date parse that emits `t_obs`. + +## Layout + +- `examples/8-multi-edges-weights/manifest.yaml` — logical schema, DB profile (Neo4j indexes, edge specs), transforms, and `ticker_data` resource pipeline. +- `examples/8-multi-edges-weights/ingest.py` — `FileConnector` + `Bindings`, then `GraphEngine.define_and_ingest(...)`. +- `examples/8-multi-edges-weights/data.csv` — sample OHLCV-style rows. + +## Run locally + +From the example directory, with Neo4j running (see repo `docker/neo4j`), run: + +```bash +uv run python ingest.py +``` + +## Related + +- [Polymorphic routing (Example 7)](example-7.md) uses `vertex_router` / `edge_router` for type-discriminated tables; this example uses **filters** on a vertex type and **multi-weight** edges instead. diff --git a/docs/examples/example-9.md b/docs/examples/example-9.md new file mode 100644 index 00000000..8c6f5997 --- /dev/null +++ b/docs/examples/example-9.md @@ -0,0 +1,64 @@ +# Example 9: Explicit `connector_connection` Proxy Wiring + +This example shows the full proxy chain end-to-end: + +`Resource -> Connector -> ConnectionProxy -> RuntimeConnectionConfig` + +The manifest stays credential-free: `bindings.connector_connection` only contains proxy labels (`conn_proxy`). The script then registers the real connection config at runtime. + +## Manifest: what `connector_connection` looks like + +Inside `bindings` you explicitly map each connector to a proxy label: + +```yaml +bindings: + connector_connection: + - connector: users + conn_proxy: postgres_source + - connector: products + conn_proxy: postgres_source + - connector: purchases + conn_proxy: postgres_source + - connector: follows + conn_proxy: postgres_source +``` + +In the code, connectors omit `connector.name` and use `connector.resource_name` (so the manifest references are stable and human-readable). + +## Runtime: how the proxy label becomes a real DB config + +The script wires runtime config and binds the manifest connectors to the chosen proxy: + +```python +from graflo.hq.connection_provider import ( + InMemoryConnectionProvider, + PostgresGeneralizedConnConfig, +) + +provider = InMemoryConnectionProvider() + +provider.register_generalized_config( + conn_proxy="postgres_source", + config=PostgresGeneralizedConnConfig(config=postgres_conf), +) + +provider.bind_from_bindings(bindings=bindings) +``` + +For the common single-DB / single-proxy case, you can also use: + +```python +provider.bind_single_config_for_bindings( + bindings=bindings, + conn_proxy="postgres_source", + config=PostgresGeneralizedConnConfig(config=postgres_conf), +) +``` + +## Full script + +See: + +- `examples/9-connector-connection-proxy/explicit_proxy_binding.py` +- `examples/9-connector-connection-proxy/README.md` + diff --git a/docs/examples/index.md b/docs/examples/index.md index 826e2d36..47fc754e 100644 --- a/docs/examples/index.md +++ b/docs/examples/index.md @@ -6,4 +6,6 @@ 4. [Neo4j Ingestion with Dynamic Relations from Keys](example-4.md) 5. **[🚀 PostgreSQL Schema Inference and Ingestion](example-5.md)** - **Automatically infer graph schemas from normalized PostgreSQL databases (3NF)** with proper primary keys (PK) and foreign keys (FK). Uses intelligent heuristics to detect vertices and edges - no manual schema definition needed! Perfect for migrating relational data to graph databases. 6. **[🔗 RDF / Turtle Ingestion with Explicit Resource Mapping](example-6.md)** - **Infer graph schemas from OWL ontologies and ingest RDF data** using explicit `SparqlConnector` resource mapping. Supports local Turtle files and remote SPARQL endpoints. Perfect for knowledge graph pipelines built on semantic web standards. -7. **[Polymorphic Objects and Relations](example-7.md)** — **Route polymorphic entities and dynamic relations** using `vertex_router` and `edge_router`. One objects table (Person, Vehicle, Institution) and one relations table (EMPLOYED_BY, OWNS, FUNDS, etc.) map to a rich graph with type discriminators and `relation_map`. \ No newline at end of file +7. **[Polymorphic Objects and Relations](example-7.md)** — **Route polymorphic entities and dynamic relations** using `vertex_router` and `edge_router`. One objects table (Person, Vehicle, Institution) and one relations table (EMPLOYED_BY, OWNS, FUNDS, etc.) map to a rich graph with type discriminators and `relation_map`. +8. **[Multi-Edge Weights with Filters and `dress` Transforms](example-8.md)** — **Ticker-style CSV to Neo4j** with vertex filters, multiple edge weights, and `dress`-scoped transforms on metric name/value pairs. +9. **[Explicit `connector_connection` Proxy Wiring](example-9.md)** — Show how manifest proxy labels (`conn_proxy`) are resolved at runtime into real DB configs via `ConnectionProvider`. \ No newline at end of file diff --git a/docs/getting_started/creating_manifest.md b/docs/getting_started/creating_manifest.md index 72f17d36..04512145 100644 --- a/docs/getting_started/creating_manifest.md +++ b/docs/getting_started/creating_manifest.md @@ -91,6 +91,30 @@ The block can be left empty in-file (`bindings: {}`) and supplied at runtime for Use `bindings` for **where data comes from** (and optionally **which proxy label** supplies runtime credentials for each SQL/SPARQL connector). +### Runtime proxy wiring (example) + +The manifest contains proxy labels only. At runtime you register the real connection config and bind manifest connectors to those proxy labels: + +```python +from graflo.hq.connection_provider import ( + InMemoryConnectionProvider, + PostgresGeneralizedConnConfig, +) + +provider = InMemoryConnectionProvider() +provider.bind_single_config_for_bindings( + bindings=bindings, + conn_proxy="postgres_source", + config=PostgresGeneralizedConnConfig(config=postgres_conf), +) + +engine.define_and_ingest( + manifest=manifest, + target_db_config=target_db_config, + connection_provider=provider, +) +``` + ## Authoring tips - Keep resource names unique across `ingestion_model.resources`. diff --git a/docs/getting_started/quickstart.md b/docs/getting_started/quickstart.md index a7246870..fb296f17 100644 --- a/docs/getting_started/quickstart.md +++ b/docs/getting_started/quickstart.md @@ -135,6 +135,30 @@ Here `schema` defines the logical graph, while `ingestion_model` defines resourc For SQL and SPARQL sources, add **`connector_connection`**: a list of `{"connector": "", "conn_proxy": "