Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/concepts/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,8 @@ classDiagram
class IngestionParams {
+clear_data: bool
+n_cores: int
+resources: list[str]?
+vertices: list[str]?
+batch_size: int
+max_items: int?
+dry: bool
Expand Down
32 changes: 32 additions & 0 deletions docs/examples/example-5.md
Original file line number Diff line number Diff line change
Expand Up @@ -776,6 +776,38 @@ bindings = Bindings(
)
```

### Runtime proxy wiring (no secrets in YAML)

Once your `bindings` block contains `connector_connection` proxy labels, you must
register the real runtime connection config under each `conn_proxy` and bind the
manifest connectors to that proxy:

```python
from graflo.hq.connection_provider import (
InMemoryConnectionProvider,
PostgresGeneralizedConnConfig,
)
from graflo.hq import IngestionParams

provider = InMemoryConnectionProvider()
provider.register_generalized_config(
conn_proxy="postgres_source",
config=PostgresGeneralizedConnConfig(config=postgres_conf),
)
provider.bind_from_bindings(bindings=bindings)

engine.define_and_ingest(
manifest=manifest.model_copy(update={"bindings": bindings}),
target_db_config=conn_conf,
ingestion_params=IngestionParams(clear_data=True),
recreate_schema=False,
connection_provider=provider,
)
```

For the common single-proxy case you can replace the `register_generalized_config(...)` +
`bind_from_bindings(...)` steps with `provider.bind_single_config_for_bindings(...)`.

## Viewing Results in Graph Database Web Interfaces

After successful ingestion, you can explore your graph data using each database's web interface. The default ports and access information are listed below. Check the corresponding `.env` files in the `docker/` directories for custom port configurations.
Expand Down
25 changes: 25 additions & 0 deletions docs/examples/example-8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Example 8: Multi-Edge Weights with Filters and `dress` Transforms

This example ingests ticker CSV data into Neo4j with:

- **Two vertex types** — `ticker` (by `oftic`) and `metric` (by `name` + `value`), where `metric` rows are filtered so only Open, Close, and Volume with positive values become vertices.
- **One edge** — `ticker` → `metric` with **multiple weights** (`direct` on `t_obs` plus nested `vertices` metadata on the metric endpoint).
- **Transforms with `dress`** — `round_str` and `int` transforms targeted at specific `(name, value)` pairs via `dress: { key: name, value: value }`, plus a date parse that emits `t_obs`.

## Layout

- `examples/8-multi-edges-weights/manifest.yaml` — logical schema, DB profile (Neo4j indexes, edge specs), transforms, and `ticker_data` resource pipeline.
- `examples/8-multi-edges-weights/ingest.py` — `FileConnector` + `Bindings`, then `GraphEngine.define_and_ingest(...)`.
- `examples/8-multi-edges-weights/data.csv` — sample OHLCV-style rows.

## Run locally

From the example directory, with Neo4j running (see repo `docker/neo4j`), run:

```bash
uv run python ingest.py
```

## Related

- [Polymorphic routing (Example 7)](example-7.md) uses `vertex_router` / `edge_router` for type-discriminated tables; this example uses **filters** on a vertex type and **multi-weight** edges instead.
64 changes: 64 additions & 0 deletions docs/examples/example-9.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Example 9: Explicit `connector_connection` Proxy Wiring

This example shows the full proxy chain end-to-end:

`Resource -> Connector -> ConnectionProxy -> RuntimeConnectionConfig`

The manifest stays credential-free: `bindings.connector_connection` only contains proxy labels (`conn_proxy`). The script then registers the real connection config at runtime.

## Manifest: what `connector_connection` looks like

Inside `bindings` you explicitly map each connector to a proxy label:

```yaml
bindings:
connector_connection:
- connector: users
conn_proxy: postgres_source
- connector: products
conn_proxy: postgres_source
- connector: purchases
conn_proxy: postgres_source
- connector: follows
conn_proxy: postgres_source
```

In the code, connectors omit `connector.name` and use `connector.resource_name` (so the manifest references are stable and human-readable).

## Runtime: how the proxy label becomes a real DB config

The script wires runtime config and binds the manifest connectors to the chosen proxy:

```python
from graflo.hq.connection_provider import (
InMemoryConnectionProvider,
PostgresGeneralizedConnConfig,
)

provider = InMemoryConnectionProvider()

provider.register_generalized_config(
conn_proxy="postgres_source",
config=PostgresGeneralizedConnConfig(config=postgres_conf),
)

provider.bind_from_bindings(bindings=bindings)
```

For the common single-DB / single-proxy case, you can also use:

```python
provider.bind_single_config_for_bindings(
bindings=bindings,
conn_proxy="postgres_source",
config=PostgresGeneralizedConnConfig(config=postgres_conf),
)
```

## Full script

See:

- `examples/9-connector-connection-proxy/explicit_proxy_binding.py`
- `examples/9-connector-connection-proxy/README.md`

4 changes: 3 additions & 1 deletion docs/examples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@
4. [Neo4j Ingestion with Dynamic Relations from Keys](example-4.md)
5. **[🚀 PostgreSQL Schema Inference and Ingestion](example-5.md)** - **Automatically infer graph schemas from normalized PostgreSQL databases (3NF)** with proper primary keys (PK) and foreign keys (FK). Uses intelligent heuristics to detect vertices and edges - no manual schema definition needed! Perfect for migrating relational data to graph databases.
6. **[🔗 RDF / Turtle Ingestion with Explicit Resource Mapping](example-6.md)** - **Infer graph schemas from OWL ontologies and ingest RDF data** using explicit `SparqlConnector` resource mapping. Supports local Turtle files and remote SPARQL endpoints. Perfect for knowledge graph pipelines built on semantic web standards.
7. **[Polymorphic Objects and Relations](example-7.md)** — **Route polymorphic entities and dynamic relations** using `vertex_router` and `edge_router`. One objects table (Person, Vehicle, Institution) and one relations table (EMPLOYED_BY, OWNS, FUNDS, etc.) map to a rich graph with type discriminators and `relation_map`.
7. **[Polymorphic Objects and Relations](example-7.md)** — **Route polymorphic entities and dynamic relations** using `vertex_router` and `edge_router`. One objects table (Person, Vehicle, Institution) and one relations table (EMPLOYED_BY, OWNS, FUNDS, etc.) map to a rich graph with type discriminators and `relation_map`.
8. **[Multi-Edge Weights with Filters and `dress` Transforms](example-8.md)** — **Ticker-style CSV to Neo4j** with vertex filters, multiple edge weights, and `dress`-scoped transforms on metric name/value pairs.
9. **[Explicit `connector_connection` Proxy Wiring](example-9.md)** — Show how manifest proxy labels (`conn_proxy`) are resolved at runtime into real DB configs via `ConnectionProvider`.
24 changes: 24 additions & 0 deletions docs/getting_started/creating_manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,30 @@ The block can be left empty in-file (`bindings: {}`) and supplied at runtime for

Use `bindings` for **where data comes from** (and optionally **which proxy label** supplies runtime credentials for each SQL/SPARQL connector).

### Runtime proxy wiring (example)

The manifest contains proxy labels only. At runtime you register the real connection config and bind manifest connectors to those proxy labels:

```python
from graflo.hq.connection_provider import (
InMemoryConnectionProvider,
PostgresGeneralizedConnConfig,
)

provider = InMemoryConnectionProvider()
provider.bind_single_config_for_bindings(
bindings=bindings,
conn_proxy="postgres_source",
config=PostgresGeneralizedConnConfig(config=postgres_conf),
)

engine.define_and_ingest(
manifest=manifest,
target_db_config=target_db_config,
connection_provider=provider,
)
```

## Authoring tips

- Keep resource names unique across `ingestion_model.resources`.
Expand Down
24 changes: 24 additions & 0 deletions docs/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,30 @@ Here `schema` defines the logical graph, while `ingestion_model` defines resourc

For SQL and SPARQL sources, add **`connector_connection`**: a list of `{"connector": "<connector name or hash>", "conn_proxy": "<label>"}`. At runtime, register each `conn_proxy` on an `InMemoryConnectionProvider` (or your own `ConnectionProvider`) with `GeneralizedConnConfig`. `GraphEngine` / `ResourceMapper` call `bind_connector_to_conn_proxy` when building bindings from Postgres or RDF workflows so HQ and the manifest stay aligned.

### Single-DB quick path (one proxy label)

When all SQL connectors use the same `conn_proxy`, you can wire the runtime config in one call:

```python
from graflo.hq.connection_provider import (
InMemoryConnectionProvider,
PostgresGeneralizedConnConfig,
)

provider = InMemoryConnectionProvider()
provider.bind_single_config_for_bindings(
bindings=bindings,
conn_proxy="postgres_source",
config=PostgresGeneralizedConnConfig(config=postgres_conf),
)

engine.define_and_ingest(
manifest=manifest,
target_db_config=conn_conf,
connection_provider=provider,
)
```

The `ingest()` method takes:
- `target_db_config`: Target graph database configuration (where to write the graph)
- `bindings`: Source data connectors (where to read data from - files or database tables)
Expand Down
10 changes: 10 additions & 0 deletions docs/reference/data_source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,16 @@ caster = Caster(schema=schema, ingestion_model=ingestion_model)
ingestion_params = IngestionParams(
batch_size=1000, # Process 1000 items per batch
clear_data=False,
# Optional: restrict ingestion scope.
# `resources` limits which logical resources (from the manifest) are processed.
# `vertices` is an allow-list of vertex types to ingest.
#
# Implementation detail: the allow-list is applied early in the runtime actor
# pipeline (not just as a late DB-write filter). Disallowed vertex types are
# not extracted/assembled, and edges are only emitted when both endpoints
# are allowed.
resources=["your_resource_name"],
vertices=["your_vertex_name"],
)

asyncio.run(
Expand Down
3 changes: 3 additions & 0 deletions docs/reference/db/cypher/__init__.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `graflo.db.cypher`

::: graflo.db.cypher
3 changes: 3 additions & 0 deletions docs/reference/db/cypher/escape.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `graflo.db.cypher.escape`

::: graflo.db.cypher.escape
3 changes: 3 additions & 0 deletions docs/reference/db/cypher/rel_merge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `graflo.db.cypher.rel_merge`

::: graflo.db.cypher.rel_merge
3 changes: 3 additions & 0 deletions docs/reference/hq/ingestion_parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `graflo.hq.ingestion_parameters`

::: graflo.hq.ingestion_parameters
2 changes: 2 additions & 0 deletions docs/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Database connection and management components:
- [Utilities](db/arango/util.md): ArangoDB-specific utility functions
- **[Neo4j](db/neo4j/__init__.md)**:
- [Connection](db/neo4j/conn.md): Neo4j-specific connection implementation
- **[Cypher helpers](db/cypher/__init__.md)** — Shared Cypher utilities ([escape](db/cypher/escape.md), [relationship MERGE](db/cypher/rel_merge.md))
- **[FalkorDB](db/falkordb/__init__.md)**:
- [Connection](db/falkordb/conn.md): FalkorDB-specific connection implementation
- **[TigerGraph](db/tigergraph/__init__.md)**:
Expand All @@ -48,6 +49,7 @@ Database connection and management components:
Main graflo functionality:

- **[Caster](hq/caster.md)**: Main data ingestion and transformation engine
- **[Ingestion parameters](hq/ingestion_parameters.md)**: `IngestionParams`, row-error policy types, and batch cast results (`CastBatchResult`, …)
- **[Data Sources](data_source/index.md)**: Data source abstraction layer (files, APIs, SQL, in-memory)
- **[Ontology](onto.md)**: Core data types and enums

Expand Down
12 changes: 6 additions & 6 deletions examples/5-ingest-postgres/generated-manifest.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,4 @@
db_profile:
db_flavor: tigergraph
vertex_storage_names:
products: products
users: users
graph:
core_schema:
edge_config:
edges:
- relation: follows
Expand Down Expand Up @@ -52,5 +47,10 @@ graph:
identity:
- id
name: users
db_profile:
db_flavor: tigergraph
vertex_storage_names:
products: products
users: users
metadata:
name: accounting
13 changes: 13 additions & 0 deletions examples/9-connector-connection-proxy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Example 9: Explicit connector_connection proxy wiring

This example demonstrates the non-secret runtime indirection:

`Resource -> Connector -> ConnectionProxy -> RuntimeConnectionConfig`

Key points:
- The manifest stores only `conn_proxy` labels inside `bindings.connector_connection`.
- The runtime script registers the real `PostgresConfig` under that proxy label
via `InMemoryConnectionProvider`.
- `provider.bind_from_bindings(bindings=...)` connects manifest connectors
to the proxy label so ingestion can resolve `conn_proxy -> config`.

Loading