Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,11 @@ let results = try await library.search("apple guide")
let firstResult = results.first
let matchedFields = firstResult?.matchedFields
let snippetField = firstResult?.snippetField
try await library.removeDocuments(withIDs: ["guide"])
```

`matchedFields` identifies every indexed field that contributed to a search result. `snippetField` identifies the field used to build the returned snippet. Simple result lists can show why a result appeared immediately, while richer UIs can render title evidence differently from body evidence.
Mutation results report the affected document IDs, including `removeAllDocuments()`, so app code can update local UI state without re-deriving which records changed.

On macOS, the persistent conventional-search surface is now also shaped around one library storage location instead of separate store and index URLs:

Expand Down
10 changes: 5 additions & 5 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,14 +172,14 @@ Completed

### Status

In Progress
Complete

### Scope

- [x] Refine conventional-search ranking and snippet behavior now that the first SearchKit backend works end to end.
- [x] Validate the current refinement pass against a broader checked-in fixture corpus with near-miss ranking and longer-body snippet cases.
- [x] Validate whether the current refinement pass is enough for ordinary app callers against larger real app corpora.
- [ ] Keep the public `FetchKitLibrary` surface polished as the conventional-search side moves from foundation into quality work.
- [x] Keep the public `FetchKitLibrary` surface polished as the conventional-search side moves from foundation into quality work.

### Tickets

Expand All @@ -193,15 +193,15 @@ In Progress
- [x] Add a Hugging Face-derived audit micro-corpus that combines short stories, markdown reference records, and line-oriented literary text across the default in-memory and macOS SearchKit-backed paths.
- [x] Add an opt-in Hugging Face corpus audit lane that downloads bounded Dataset Viewer slices, indexes a larger temporary corpus locally, and reports ranking/snippet checks without making default CI network-dependent.
- [x] Audit larger app-like corpus result quality now that field-aware ranking, compact all-term evidence, phrase weighting, truncation cues, multi-term snippets, and field-evidence metadata are in place.
- [ ] Keep the persistent `FetchKitLibrary` construction and search API surface under review as real callers exercise the current design.
- [ ] Explore an opt-in extended snippet surface that can use idle time to precompute short document summaries for larger records, with Apple's [`FoundationModels`](https://developer.apple.com/documentation/foundationmodels) or another local summarization path as the first candidate instead of making foreground full-text search wait on summarization.
- [x] Keep the persistent `FetchKitLibrary` construction and search API surface under review as real callers exercise the current design.
- [x] Defer an opt-in extended snippet surface until real caller corpora show foreground snippets are insufficient; if it returns later, use idle-time precomputed summaries such as Apple's [`FoundationModels`](https://developer.apple.com/documentation/foundationmodels) or another local summarization path instead of making foreground full-text search wait on summarization.
- [x] Decide whether Core Data-backed test helpers should adopt explicit temporary-directory cleanup or keep relying on unique system temporary directories for short-lived local and CI runs.

### Exit Criteria

- [x] Conventional-search results feel intentionally ranked and include useful snippet behavior for ordinary app callers.
- [x] The SearchKit-backed path runs in normal local validation and the default GitHub CI lane.
- [ ] `FetchKitLibrary` still reads like a small Swift-native facade instead of exposing backend detail drift.
- [x] `FetchKitLibrary` still reads like a small Swift-native facade instead of exposing backend detail drift.

## Milestone 5: Semantic Index Persistence

Expand Down
25 changes: 21 additions & 4 deletions Sources/FetchKit/FetchKitLibrary.swift
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
import Foundation
import FetchCore

public actor FetchKitLibrary {
public struct IndexSyncError: Error, Sendable {
public struct IndexSyncError: Error, LocalizedError, Sendable {
public let pendingIndexSync: FetchPendingIndexSync
public let underlyingErrorDescription: String

public init(pendingIndexSync: FetchPendingIndexSync, underlyingError: Error) {
self.pendingIndexSync = pendingIndexSync
self.underlyingErrorDescription = String(describing: underlyingError)
}

public var errorDescription: String? {
let affectedDocumentIDs = pendingIndexSync.affectedDocumentIDs.map(\.rawValue)
let affectedSummary = affectedDocumentIDs.isEmpty ? "none" : affectedDocumentIDs.joined(separator: ", ")
return "FetchKit could not apply pending index sync \(pendingIndexSync.id.rawValue) after the corpus store write succeeded; the sync remains queued for retry. Affected documents: \(affectedSummary). Underlying error: \(underlyingErrorDescription)."
}
}

public struct IndexSyncRetryResult: Hashable, Sendable {
Expand All @@ -26,6 +33,10 @@ public actor FetchKitLibrary {
public var count: Int {
completedSyncIDs.count
}

public var isEmpty: Bool {
completedSyncIDs.isEmpty
}
}

public struct Configuration: Hashable, Sendable {
Expand All @@ -52,6 +63,10 @@ public actor FetchKitLibrary {
public var count: Int {
documentIDs.count
}

public var isEmpty: Bool {
documentIDs.isEmpty
}
}

private let documentStore: any FetchDocumentStore
Expand Down Expand Up @@ -103,11 +118,11 @@ public actor FetchKitLibrary {

@discardableResult
public func removeDocument(withID id: FetchDocumentID) async throws -> BatchResult {
try await removeDocuments([id])
try await removeDocuments(withIDs: [id])
}

@discardableResult
public func removeDocuments(_ ids: [FetchDocumentID]) async throws -> BatchResult {
public func removeDocuments(withIDs ids: [FetchDocumentID]) async throws -> BatchResult {
guard !ids.isEmpty else {
return BatchResult(documentIDs: [])
}
Expand All @@ -117,9 +132,11 @@ public actor FetchKitLibrary {
return BatchResult(documentIDs: mutation.affectedDocumentIDs)
}

public func removeAllDocuments() async throws {
@discardableResult
public func removeAllDocuments() async throws -> BatchResult {
let mutation = try await documentStore.removeAllDocuments()
try await applyIndexingChanges(for: mutation)
return BatchResult(documentIDs: mutation.affectedDocumentIDs)
}

public func search(_ query: FetchSearchQuery) async throws -> [FetchSearchResult] {
Expand Down
30 changes: 29 additions & 1 deletion Tests/FetchKitTests/FetchKitLibraryTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ struct FetchKitLibraryTests {
let index = RecordingFetchIndex()
let library = FetchKitLibrary(documentStore: store, index: index)

let result = try await library.removeDocuments(["doc-apple", "doc-orange"])
let result = try await library.removeDocuments(withIDs: ["doc-apple", "doc-orange"])

let removedIDs = await store.removedDocumentIDs
let appliedChangesets = await index.appliedChangesets
Expand All @@ -69,6 +69,31 @@ struct FetchKitLibraryTests {
#expect(appliedChangesets[0].removedDocumentIDs == ["doc-apple", "doc-orange"])
}

@Test("FetchKitLibrary remove all returns affected document IDs")
func fetchKitLibraryRemoveAllReturnsAffectedDocumentIDs() async throws {
let library = FetchKitLibrary()
try await library.addDocuments([
FetchDocumentRecord(
id: "doc-apple",
title: "Apple Guide",
body: "Apples are bright and crisp."
),
FetchDocumentRecord(
id: "doc-orange",
title: "Orange Guide",
body: "Oranges are bright and tart."
),
])

let result = try await library.removeAllDocuments()
let searchResults = try await library.search("bright", fields: [.body], limit: 5)

#expect(Set(result.documentIDs) == Set(["doc-apple", "doc-orange"]))
#expect(result.count == 2)
#expect(!result.isEmpty)
#expect(searchResults.isEmpty)
}

@Test("FetchKitLibrary search convenience builds queries and delegates to the index")
func fetchKitLibrarySearchConvenienceDelegates() async throws {
let store = RecordingFetchDocumentStore()
Expand Down Expand Up @@ -226,6 +251,8 @@ struct FetchKitLibraryTests {
Issue.record("Expected FetchKitLibrary to surface an index sync error.")
} catch let error as FetchKitLibrary.IndexSyncError {
#expect(error.pendingIndexSync.changeset.upsertedDocuments == [record.indexDocument])
#expect(error.errorDescription?.contains("sync remains queued for retry") == true)
#expect(error.errorDescription?.contains("doc-apple") == true)
} catch {
Issue.record("Expected FetchKitLibrary.IndexSyncError but received \(String(describing: error)).")
}
Expand Down Expand Up @@ -261,6 +288,7 @@ struct FetchKitLibraryTests {
let appliedChangesets = await retryingIndex.appliedChangesets

#expect(retryResult.count == 1)
#expect(!retryResult.isEmpty)
#expect(retryResult.affectedDocumentIDs == ["doc-apple"])
#expect(pendingSyncsAfterRetry.isEmpty)
#expect(appliedChangesets.count == 1)
Expand Down
9 changes: 5 additions & 4 deletions docs/maintainers/fetchkit-product-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ Current status:
- the default in-memory all-term ranker now gives a small compactness boost to tighter evidence, so a focused passage can rank ahead of a scattered near-miss when both documents satisfy the same query terms
- title-only hits intentionally keep a title snippet, and `FetchSearchResult` now reports `matchedFields` plus `snippetField` so consumers can distinguish title evidence from body evidence without losing the simple "why did this result appear?" explanation
- the first checked-in fixture corpus now covers both the default in-memory index path and the macOS SearchKit-backed path, using a tiny attributed Project Gutenberg sample from Hugging Face plus small synthetic near-miss and longer-body records instead of making CI download a live dataset
- the larger live Hugging Face audit pass requested the current cap from each configured dataset, indexed 209 usable documents, and found no ranking or snippet redesign work that should block the v1 conventional-search refinement milestone
- the `FetchKitLibrary` caller surface now keeps batch removal explicit with `removeDocuments(withIDs:)`, returns affected document IDs from `removeAllDocuments()`, exposes empty-result checks on small mutation/retry result types, and gives index-sync failures a localized message that explains the store write succeeded while the index sync remains queued
- the CI investigation on GitHub-hosted macOS found that the Core Data-backed store path could abort under Swift Testing with `Incorrect actor executor assumption`, even after global test parallelism was disabled
- that investigation surfaced two store-shape fixes worth keeping regardless of the runner: the durable Core Data store should use a private-queue background context instead of `viewContext`, and it should use Core Data's async `perform` API directly instead of manually bridging context work through checked continuations
- the Core Data-backed store coverage now lives on XCTest rather than Swift Testing so the package keeps the newer test surface where it is stable while reserving the older runner for framework-heavy Core Data verification
Expand Down Expand Up @@ -159,12 +161,11 @@ That pass landed:
- the stored-record to index changeset boundary
- the first macOS SearchKit-backed implementation path

The next work is refinement, not first architecture:
The next work is no longer conventional-search refinement. That v1 pass is now a settled foundation for the umbrella package:

- keep the persistent `FetchKitLibrary` surface polished as real callers exercise it
- keep the SearchKit-backed path inside ordinary validation unless a future framework regression forces it back out
- use larger app corpora to decide whether the current ranking, snippet, and result-evidence heuristics are already enough for ordinary callers now that the checked-in fixture corpus covers title evidence, body evidence, near misses, and longer-body snippets
- explore opt-in extended snippets later as background summary metadata for larger documents, not as work that foreground full-text search has to perform before returning results
- move the next product-building work toward the `SwiftlyFetch` umbrella facade and one-corpus ingestion flow
- explore opt-in extended snippets later only if real caller corpora show that foreground snippets are not enough; treat background summaries as a future derived-metadata feature, not as work that foreground full-text search has to perform before returning results

## First Core Data Entity Shape

Expand Down
Loading