Skip to content

feat(composio): migrate GitHub provider to memory_tree pipeline (#2885)#2889

Merged
senamakel merged 4 commits into
tinyhumansai:mainfrom
justinhsu1477:feat/github-provider-memory-tree-migration
May 30, 2026
Merged

feat(composio): migrate GitHub provider to memory_tree pipeline (#2885)#2889
senamakel merged 4 commits into
tinyhumansai:mainfrom
justinhsu1477:feat/github-provider-memory-tree-migration

Conversation

@justinhsu1477
Copy link
Copy Markdown
Contributor

@justinhsu1477 justinhsu1477 commented May 29, 2026

Summary

Second of the four-PR follow-up sequence in #2885 (after Notion in #2887). The GitHub Composio provider now writes through ingest_pipeline::ingest_documentmem_tree_chunks instead of MemoryClient::store_skill_sync → UnifiedMemory memory_docs, so memory.search / tree.read_chunk / tree.browse / the agent recall path / summary trees all see GitHub issues and PRs again.

Same shape as the Notion migration in #2887, applied to providers/github/:

  • New providers/github/ingest.rsgithub_source_id, parse_updated_at, render_issue_body, extract_html_url, ingest_issue_into_memory_tree. Pretty-JSON markdown body retains labels/assignees/state/comment count for retrieval signal.
  • Re-ingest cleanup via delete_chunks_by_source before each ingest_document call. Without this, the pipeline's content-blind already_ingested gate would silently drop edited issues — the same trap fixed in vault sync fix(vault): sync writes to memory_tree, not legacy UnifiedMemory (#2705) #2720 and Notion feat(composio): migrate Notion provider to memory_tree pipeline (#2885) #2887.
  • source_id = github:{connection_id}:{issue_id} — stable across re-syncs and across connections. Works for both numeric internal IDs and the owner/repo#number slug fallback from extract_issue_id.
  • source_ref carries the canonical GitHub html_url — audit trails one-click navigable from any downstream UI.
  • provider.rs — dropped persist_single_item + doc_id, threaded updated_at through to the new ingest fn, kept the existing composite-key dedup logic ({issue_id}@{updated_at}) intact so the cursor-boundary fast-path still works.

persist_single_item / store_skill_sync intentionally left in place — they're still used by the remaining unmigrated providers (Linear, ClickUp) and the wider UnifiedMemory removal lives under the #2585 follow-up. Becomes removable once Linear and ClickUp follow (also tracked in #2885).

Test plan

Lib tests (all green locally):

  • cargo test --lib -- memory_sync::composio::providers::github::ingest → 6/6 pass
    • github_source_id_is_stable_and_namespaced — pins the contract the cleanup path relies on
    • parse_updated_at_handles_valid_and_invalid_inputs — falls through to Utc::now() on bad input
    • render_issue_body_includes_title_header_and_pretty_json — pins the chunked-content shape
    • extract_html_url_handles_both_envelope_shapes — handles direct + data.html_url Composio envelopes
    • ingest_issue_writes_to_memory_treethe feat(memory-sync): migrate Notion/ClickUp/Linear/GitHub Composio providers off UnifiedMemory #2885 regression: count_chunks rises and the source_id lands in mem_tree_ingested_sources
    • re_ingesting_edited_issue_replaces_prior_chunks — delete-first guard works; chunk count stays steady (±1 chunker rounding) instead of doubling
  • cargo test --lib -- memory_sync::composio::providers::github → 48/48 pass (no other github-namespaced test broke)
  • cargo check --lib clean
  • cargo fmt --check clean
  • cargo test --tests --no-run — integration tests still compile

Refs #2885. Follows #2887 (Notion). Linear and ClickUp PRs come next.

Summary by CodeRabbit

  • New Features

    • GitHub issues and pull requests can be ingested into the memory system with title extraction, timestamp parsing, source references, and stable per-item source IDs.
  • Bug Fixes

    • Re-ingesting edited items replaces prior in-memory content instead of duplicating.
    • Sync now retains progress when ingest failures occur so failed ranges are retried.
  • Tests

    • Added unit and async regression tests for IDs, timestamp fallback, body rendering, URL extraction, and re-ingest behavior.

Review Change Stack

…humansai#2885)

Second of the four-PR follow-up sequence in tinyhumansai#2885 (after Notion in
tinyhumansai#2887). GitHub issue/PR sync now writes through
`ingest_pipeline::ingest_document` → `mem_tree_chunks` instead of
`MemoryClient::store_skill_sync` → UnifiedMemory `memory_docs`, so
`memory.search` / `tree.read_chunk` / `tree.browse` / the agent recall
path / summary trees all see GitHub items again.

Mirrors the Notion shape:

- `providers/github/ingest.rs` (NEW): `github_source_id`,
  `parse_updated_at`, `render_issue_body`, `extract_html_url`,
  `ingest_issue_into_memory_tree`.
- Re-ingest cleanup via `delete_chunks_by_source` before each ingest,
  so the pipeline's content-blind `already_ingested` gate doesn't drop
  edited issues (same pattern as vault sync tinyhumansai#2720 and Notion tinyhumansai#2887).
- `source_id = github:{connection_id}:{issue_id}` — stable across
  re-syncs and across connections. Works for both numeric internal IDs
  and the `owner/repo#number` slug fallback from `extract_issue_id`.
- `source_ref` carries the canonical GitHub `html_url` so audit trails
  are one-click navigable from any downstream UI.
- `provider.rs`: dropped `persist_single_item` + `doc_id`, threaded
  `updated_at` through to the new ingest fn, kept the existing
  composite-key dedup logic (`{issue_id}@{updated_at}`) intact.

Tests (lib): 6 in `ingest::tests`, all green.

- `github_source_id_is_stable_and_namespaced` — pins the contract the
  `delete_chunks_by_source` cleanup path relies on.
- `parse_updated_at_handles_valid_and_invalid_inputs` — falls through
  to `Utc::now()` on bad input instead of failing the ingest.
- `render_issue_body_includes_title_header_and_pretty_json` — pins
  the chunked-content shape; without it the retrieval body loses
  labels / assignees / state / comment count.
- `extract_html_url_handles_both_envelope_shapes` — handles direct
  and `data.html_url` Composio envelope variants.
- `ingest_issue_writes_to_memory_tree` — the tinyhumansai#2885 regression: asserts
  `count_chunks` rises and the source_id is registered in
  `mem_tree_ingested_sources`.
- `re_ingesting_edited_issue_replaces_prior_chunks` — confirms the
  delete-first guard works; chunk count after re-ingest stays steady
  (±1 chunker rounding) instead of doubling.

`persist_single_item` / `store_skill_sync` are intentionally left in
place — they're still used by the remaining unmigrated providers
(Linear, ClickUp) and the wider UnifiedMemory removal lives under
the senamakel tinyhumansai#2585 follow-up. They become removable once Linear
and ClickUp follow (also tracked in tinyhumansai#2885).

Refs tinyhumansai#2885.
@justinhsu1477 justinhsu1477 requested a review from a team May 29, 2026 02:28
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 617eaca3-de12-4efc-8415-18fd446e5c8f

📥 Commits

Reviewing files that changed from the base of the PR and between 9764b31 and 7d27b5d.

📒 Files selected for processing (1)
  • src/openhuman/memory_sync/composio/providers/github/ingest.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/memory_sync/composio/providers/github/ingest.rs

📝 Walkthrough

Walkthrough

Adds a GitHub-specific ingest module that converts single GitHub issue/PR payloads into canonical memory-tree DocumentInput, conditionally deletes prior chunks for idempotent re-ingest, calls ingest_pipeline::ingest_document, and updates the GitHub provider to route issue persistence through this pipeline.

Changes

GitHub Ingest Pipeline

Layer / File(s) Summary
GitHub ingest module implementation
src/openhuman/memory_sync/composio/providers/github/ingest.rs
Adds GITHUB_PLATFORM, DEFAULT_TAGS, github_source_id, helpers to render stored body (markdown title + fenced pretty JSON), parse updated_at (RFC3339 fallback to Utc::now()), extract source_ref, and implements ingest_issue_into_memory_tree which optionally deletes prior chunks via delete_chunks_by_source (spawn_blocking) and calls ingest_pipeline::ingest_document.
GitHub ingest validation tests
src/openhuman/memory_sync/composio/providers/github/ingest.rs
Unit tests for github_source_id, updated_at parsing fallback, body rendering contract, html_url extraction from envelope shapes; async regression tests verifying writes to memory-tree tables and that re-ingesting an edited issue replaces prior chunks.
Module declaration
src/openhuman/memory_sync/composio/providers/github/mod.rs
Adds mod ingest; to include the new ingest submodule.
Provider integration and routing
src/openhuman/memory_sync/composio/providers/github/provider.rs
Imports and per-issue loop updated to call ingest_issue_into_memory_tree (passing config, connection id, issue id, title, updated_at, and payload). On success marks sync key and increments persisted counter; on error logs a warning and continues. Adds had_ingest_failures to decide whether to advance cursor at end of pass.

Sequence Diagram

sequenceDiagram
  participant Provider as GitHub Provider
  participant Ingest as ingest_issue_into_memory_tree
  participant Cleanup as delete_chunks_by_source
  participant Pipeline as ingest_pipeline::ingest_document
  participant Storage as Memory Tree Storage
  Provider->>Ingest: issue payload + metadata
  Ingest->>Cleanup: spawn_blocking task (is_source_ingested -> delete_chunks_by_source)
  Cleanup->>Storage: delete old chunks for source_id
  Cleanup-->>Ingest: deletion result
  Ingest->>Ingest: build DocumentInput (title, body, modified_at, source_ref, tags)
  Ingest->>Pipeline: ingest_document with DocumentInput
  Pipeline->>Storage: write new chunks
  Storage-->>Pipeline: chunks_written count
  Pipeline-->>Ingest: success result
  Ingest-->>Provider: chunks_written count
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related Issues

Possibly Related PRs

  • tinyhumansai/openhuman#2413: Previously touched the GitHub provider persistence path; this PR routes writes through the new ingest module.
  • tinyhumansai/openhuman#2685: Related frontend/workspace allowlist updates coordinating GitHub as a syncable native source with backend ingest changes.

Suggested labels

memory, working

Suggested reviewers

  • M3gA-Mind
  • senamakel

Poem

🐰 I found a payload, crisp and neat,
I wrapped it in markdown, JSON tucked beneath,
Cleared old crumbs with a careful sweep,
Hopped it through the pipeline—chunks snug and deep,
Now memory-tree sleeps with the issue complete.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: migrating the GitHub provider to use the memory_tree pipeline.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels May 29, 2026
Gate `delete_chunks_by_source` behind `is_source_ingested`. The delete
path uses a `source_kind = ?1` scan + Rust-side source-id filter
(`store::delete_chunks_by_source_filter`), so a first-time ingest of a
never-seen issue would scan every Document-kind chunk just to find zero
matches. `is_source_ingested` is an indexed PK lookup against
`mem_tree_ingested_sources`, so the common fresh-issue case becomes one
cheap `COUNT(*)` and we only pay the scan cost on actual re-ingests of
edited issues.

Same `spawn_blocking` hop, single combined closure — the gate runs
inside the existing blocking task. The 6 ingest tests still pass:
the regression test exercises the false → no-delete path (fresh issue);
the re-ingest test exercises the true → delete path (edited issue).

Same fix carried forward from the parallel Notion PR (tinyhumansai#2887), where
CodeRabbit caught the issue first. Linear and ClickUp PRs will land
with the gate baked in from the start.

Refs tinyhumansai#2885.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/openhuman/memory_sync/composio/providers/github/ingest.rs (1)

191-193: ⚡ Quick win

Preserve the error chain with the alternate formatter.

{err} renders only the outermost message; the provider then logs this with error = %e (Display), so any underlying cause (DB/embedding failure) is lost. The rest of the codebase uses :# for anyhow errors (e.g., provider.rs lines 90/234). Bake the chain into the message:

♻️ Use the alternate (chain) formatter
-        Err(err) => Err(anyhow::anyhow!(
-            "ingest_document failed for {source_id}: {err}"
-        )),
+        Err(err) => Err(anyhow::anyhow!(
+            "ingest_document failed for {source_id}: {err:#}"
+        )),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_sync/composio/providers/github/ingest.rs` around lines
191 - 193, The error construction in the ingest_document path currently formats
the error with Display (`{err}`) which drops the error chain; change the Err arm
that builds the anyhow error so it uses the alternate (chain) formatter for
`err` (i.e., use `{:#}`) when composing the message for `ingest_document failed
for {source_id}` so underlying causes are preserved; update the Err(...)
expression in ingest.rs where `err` and `source_id` are referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_sync/composio/providers/github/provider.rs`:
- Around line 305-333: The ingest failure branch for
ingest_issue_into_memory_tree can cause lost edits because newest_updated is
bumped and the outer cursor is advanced even when an item failed to re-ingest;
declare a boolean flag (e.g., let mut had_ingest_failures = false) alongside
newest_updated and accumulators, set had_ingest_failures = true inside the Err
arm where you currently log the warning, and then only call
state.advance_cursor(&new_cursor) (or log and hold the cursor) when
had_ingest_failures is false; keep using state.mark_synced and synced_ids to
skip already-synced items on the next pass.

---

Nitpick comments:
In `@src/openhuman/memory_sync/composio/providers/github/ingest.rs`:
- Around line 191-193: The error construction in the ingest_document path
currently formats the error with Display (`{err}`) which drops the error chain;
change the Err arm that builds the anyhow error so it uses the alternate (chain)
formatter for `err` (i.e., use `{:#}`) when composing the message for
`ingest_document failed for {source_id}` so underlying causes are preserved;
update the Err(...) expression in ingest.rs where `err` and `source_id` are
referenced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 82cb18f9-fdbb-489b-b97f-ce021706fa7b

📥 Commits

Reviewing files that changed from the base of the PR and between 5bd47bb and 37facfd.

📒 Files selected for processing (3)
  • src/openhuman/memory_sync/composio/providers/github/ingest.rs
  • src/openhuman/memory_sync/composio/providers/github/mod.rs
  • src/openhuman/memory_sync/composio/providers/github/provider.rs

Comment thread src/openhuman/memory_sync/composio/providers/github/provider.rs
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinhsu1477 hey! the code looks solid to me, but CI is still pending on a few checks (Windows secrets ACL, E2E macOS/Windows/Linux, Tauri build, Rust Core Coverage, macOS smoke). once those are green I'll come back and approve this.

one thing worth addressing independent of CI:

ingest.rs uses log::debug! while the surrounding codebase uses tracing

provider.rs logs with tracing::warn!(issue_id = %issue_id, error = %e, "...") — structured key=value fields, queryable by any tracing subscriber. ingest.rs uses log::debug!("... connection_id={} issue_id={} removed_chunks={}", ...) — positional format strings with no structured fields. Both work if the tracing-log bridge is configured, but the structured fields get lost, which defeats the point of having them.

Swapping the two log::debug! calls in ingest_issue_into_memory_tree to use tracing::debug! with the same field syntax as provider.rs keeps the pattern consistent.

Also echoing CodeRabbit's cursor-advancement finding — that one's worth fixing before merge. The rest of their nitpick on {err:#} is valid too.

Otherwise the migration is clean: github:{connection_id}:{issue_id} source_id is stable, the delete-first re-ingest guard mirrors #2887 and #2720 correctly, and the regression test suite pins all the invariants that matter.

Comment thread src/openhuman/memory_sync/composio/providers/github/ingest.rs Outdated
justinhsu1477 added a commit to justinhsu1477/openhuman that referenced this pull request May 29, 2026
…chain

Two related correctness fixes flagged by CodeRabbit on the parallel
GitHub PR (tinyhumansai#2889) — same shape exists here in Notion, so applying the
fix here proactively rather than waiting for a second round.

## 1. Cursor advances on ingest failure → data-loss window

`provider.rs` updated `newest_edited_time` for every result before
attempting ingest, then advanced the persistent cursor unconditionally
at the end of the loop. Items whose ingest call returned `Err` were
logged + skipped (no `mark_synced`) — but the next sync queried
`last_edited_time > {cursor}`, so the failed item was never re-fetched.

Pre-tinyhumansai#2885 this was just "stale revision": the legacy
`persist_single_item` path wasn't delete-first, so a failed write left
old chunks alone. Post-tinyhumansai#2885 it's worse — `ingest_page_into_memory_tree`
calls `delete_chunks_by_source` *before* the (failing) write, so an
*edited* page that fails to re-ingest is left with neither old nor new
chunks in `mem_tree_chunks` until its next edit.

Fix: track `had_ingest_failures: bool`, set on every `Err`, and gate the
`advance_cursor` at the bottom of the sync loop. Already-synced items
are cheaply skipped via `state.is_synced` on the re-fetch next pass, so
the cost of holding is just one extra search-API page on the failure
range.

A retry-only-failed-items fix (separate `newest_successful_updated`
cursor) would be tighter but adds state and edge cases; the simple gate
matches CodeRabbit's exact suggestion and is provably safe (cursor only
ever advances when every item in the pass succeeded).

## 2. `{err}` drops anyhow context chain

`ingest.rs` wrapped the `ingest_pipeline::ingest_document` failure as
`anyhow!("ingest_document failed for {source_id}: {err}")`, where
`{err}` is Display-only and strips the chain. `provider.rs` then logs
the wrapped error with `tracing::warn!(error = %e)` (also Display), so
the underlying DB / embedding / persist cause was lost.

Fix: swap to `{err:#}` (alternate formatter) so the chain is baked into
the wrapping anyhow's Display impl. Matches the convention `provider.rs`
already uses on its own `ctx.execute(...).map_err(|e| ... "{e:#}")` call
sites (lines 92, 207).

## Tests

27/27 still pass in `memory_sync::composio::providers::notion::*`.
The provider's `sync()` is hard to unit-test without mocking
`ComposioContext` heavily — the invariant is pinned by the rustdoc on
`had_ingest_failures` and the inline gate comment. The ingest-side
tests (regression + re-ingest) cover the ingest function's contract;
the cursor logic lives entirely in `provider.rs`.

Refs CodeRabbit review on tinyhumansai#2889.
…chain

CodeRabbit caught two correctness issues on this PR. Same shape exists
on the parallel Notion PR (tinyhumansai#2887) — fix applied there in parallel.

## 1. Cursor advances on ingest failure → data-loss window

`provider.rs` updated `newest_updated` for every result before attempting
ingest, then advanced the persistent cursor unconditionally at the end
of the sync loop. Items whose ingest call returned `Err` were logged
+ skipped (no `mark_synced`) — but the next sync queried
`updated:>{cursor}`, so the failed item was never re-fetched.

Pre-tinyhumansai#2885 this was just "stale revision": the legacy `persist_single_item`
path wasn't delete-first, so a failed write left old chunks alone.
Post-tinyhumansai#2885 it's worse — `ingest_issue_into_memory_tree` calls
`delete_chunks_by_source` *before* the (failing) write, so an *edited*
issue that fails to re-ingest is left with neither old nor new chunks
in `mem_tree_chunks` until its next edit.

Fix: track `had_ingest_failures: bool`, set on every `Err`, and gate
the `advance_cursor` at the bottom of the sync loop. Already-synced
items are cheaply skipped via `state.is_synced` on the re-fetch next
pass, so the cost of holding is just one extra search-API page on the
failure range. `set_last_sync_at_ms` still fires — it's a heartbeat,
not a fetch-window boundary.

A retry-only-failed-items fix (separate `newest_successful_updated`
cursor) would be tighter but adds state and edge cases; the simple
gate matches CodeRabbit's exact suggestion and is provably safe
(cursor only ever advances when every item in the pass succeeded).

## 2. `{err}` drops anyhow context chain

`ingest.rs` wrapped the `ingest_pipeline::ingest_document` failure as
`anyhow!("ingest_document failed for {source_id}: {err}")`, where
`{err}` is Display-only and strips the chain. `provider.rs` then logs
the wrapped error with `tracing::warn!(error = %e)` (also Display),
so the underlying DB / embedding / persist cause was lost.

Fix: swap to `{err:#}` (alternate formatter) so the chain is baked
into the wrapping anyhow's Display impl. Matches the convention
`provider.rs` already uses on its own `ctx.execute(...).map_err(...)`
sites (lines 90, 234, 406).

## Tests

48/48 still pass in `memory_sync::composio::providers::github::*`.
The provider's `sync()` is hard to unit-test without mocking
`ComposioContext` heavily — the invariant is pinned by the rustdoc on
`had_ingest_failures` and the inline gate comment. The ingest-side
tests (regression + re-ingest) cover the ingest function's contract;
the cursor logic lives entirely in `provider.rs`.

Refs CodeRabbit review on tinyhumansai#2889, tinyhumansai#2885.
@justinhsu1477
Copy link
Copy Markdown
Contributor Author

Pushed `9764b31f` addressing both CodeRabbit findings + responding to @graycyrus's nits.

What's fixed in this push

# Reviewer Issue Fix
B1 CodeRabbit 🟠 + @graycyrus echo Failed ingest advances cursor → edited issue can be silently dropped New `had_ingest_failures: bool` flag in `provider.rs::sync`; set on every `Err` arm; `advance_cursor` gated behind `if !had_ingest_failures`. Already-synced items skip cheaply via `is_synced` on the re-fetch, so the cost of holding is just one extra search-API page on the failure range. `set_last_sync_at_ms` still advances — it's a heartbeat, not a fetch-window boundary.
B2 CodeRabbit 🧹 + @graycyrus echo `{err}` drops anyhow chain `ingest.rs` line 209 changed to `{err:#}` (alternate formatter). Provider.rs's existing `tracing::warn!(error = %e)` will now show the underlying DB / embedding cause instead of just the outer wrap.

I also carried both fixes back to #2887 (Notion) — same shape there, fixed in `0c15fd27` rather than wait for a second round.

On the `log::debug!` vs `tracing::debug!` nit — I'd like to push back politely

@graycyrus, before I swap that one I want to double-check the convention. Walking the existing ingest layer:

…and `provider.rs` files across providers use `tracing::*` with structured fields.

So the codebase already has a two-layer split: provider.rs is the orchestration layer (tracing, structured), ingest.rs / sync.rs are plumbing (log, positional). My `github/ingest.rs` is intentionally matching the Slack / Gmail / vault shape — swapping to `tracing` here would make it the only ingest module using tracing.

Happy to convert if you want me to, but I think the right call is either (a) leave as-is to match the established pattern, or (b) file a follow-up issue to unify all ingest modules onto `tracing` in one sweep — both keep the layer convention consistent. Let me know which you prefer.

Tests

48/48 still pass in `memory_sync::composio::providers::github::*`. The cursor invariant is documented in the new rustdoc on `had_ingest_failures`; the existing 6 ingest tests cover the ingest function's contract.

@coderabbitai coderabbitai Bot added memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. working A PR that is being worked on by the team. labels May 29, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 29, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinhsu1477 the fixes in the third commit look correct — cursor hold on ingest failure is the right call (holds the full range rather than introducing a separate failure-tracking cursor, which would add state and edge cases), and {err:#} properly bakes the anyhow chain into the Display output so tracing::warn!(error = %e) actually surfaces the underlying cause. CodeRabbit's two findings are fully addressed.

CI is still pending on several checks (Build Tauri App, Rust Core Coverage, E2E Appium runs, Windows secrets ACL). Once those are green I'll approve this.

One holdover from my first review that the third commit didn't touch: log::debug! at ingest.rs lines 171 and 180 still uses the log crate with positional format args instead of tracing::debug! with structured key=value fields. Everything else in this module and in provider.rs uses tracing. Small thing, but please fix it before I approve — it's easy to miss in metrics/log aggregation pipelines that filter by tracing span fields.

… ingest

Per @graycyrus review on tinyhumansai#2889: the two `log::debug!` calls in
`ingest_issue_into_memory_tree` used positional format strings, while
`provider.rs` in the same directory logs via `tracing::*` with
structured key=value fields. Positional log args are flattened to an
opaque message by the tracing-log bridge, so log-aggregation pipelines
that filter on span fields (connection_id, issue_id, …) lose them.

Swap both calls to `tracing::debug!` with named fields, matching the
provider.rs shape exactly:
- re-ingest cleanup: connection_id / issue_id / removed_chunks
- issue persisted: connection_id / issue_id / chunks_written / already_ingested

Scope note: the older ingest modules (slack, gmail, vault::sync) still
use `log::*` — those are pre-existing and out of scope here. The new
memory-tree ingest modules (this + notion in tinyhumansai#2887) now consistently
use tracing alongside their provider.rs. A follow-up sweep to unify the
legacy modules can be filed separately if desired.

Refs tinyhumansai#2889.
justinhsu1477 added a commit to justinhsu1477/openhuman that referenced this pull request May 29, 2026
… ingest

Carries the @graycyrus review note from the parallel GitHub PR (tinyhumansai#2889)
back to Notion so the two new memory-tree ingest modules stay
consistent. The two `log::debug!` calls in `ingest_page_into_memory_tree`
used positional format strings; provider.rs in the same directory logs
via `tracing::*` with structured key=value fields, which log-aggregation
pipelines can filter on (the tracing-log bridge flattens positional
args to an opaque message and loses them).

Swap both to `tracing::debug!` with named fields matching provider.rs:
- re-ingest cleanup: connection_id / page_id / removed_chunks
- page persisted: connection_id / page_id / chunks_written / already_ingested

Scope note: the older ingest modules (slack, gmail, vault::sync) still
use `log::*` — pre-existing, out of scope. A follow-up sweep can unify
them separately if desired.

Refs tinyhumansai#2887, tinyhumansai#2889.
@justinhsu1477
Copy link
Copy Markdown
Contributor Author

Done — pushed 7d27b5d4. You're right that for a new module, matching provider.rs's structured logging is the better call; the observability point (span-field filtering in aggregation pipelines) outweighs my "match the other ingest modules" argument, especially since provider.rs sits right next to it in the same directory.

Both log::debug! calls in ingest_issue_into_memory_tree are now tracing::debug! with named fields:

tracing::debug!(
    connection_id = %connection_id,
    issue_id = %issue_id,
    removed_chunks = removed,
    "[composio:github] ingest: re-ingest cleanup"
);

…and the issue-persisted log now carries chunks_written / already_ingested as fields too.

I also carried the same swap to #2887 (Notion) in 23118615 so the two new memory-tree ingest modules stay consistent with each other.

On the broader codebase: slack/ingest.rs, gmail/ingest.rs, and vault/sync.rs still use log::* — I left those alone as pre-existing and out of scope here, but if you'd like the whole ingest layer unified on tracing I'm happy to file a follow-up issue and take it as a separate sweep. LMK.

48/48 tests still green. Should be clear for approval now — thanks for the careful review on both findings.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinhsu1477 the tracing fix in 7d27b5d addresses the last open thread — both debug sites are now using tracing::debug! with structured fields as requested. All prior feedback resolved.

Code is clean. Waiting on CI (Build Tauri App, Rust Core Tests + Coverage, E2E suites, Frontend Coverage are all still pending). Will approve once those are green.

@senamakel senamakel self-assigned this May 30, 2026
@senamakel senamakel merged commit 31263d1 into tinyhumansai:main May 30, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Net-new user-facing capability or product behavior. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants