feat(cli): Add an rg-equivalent CLI interface for fff#561
feat(cli): Add an rg-equivalent CLI interface for fff#561markovejnovic wants to merge 3 commits into
Conversation
| /// Case sensitivity strategy for grep searches. Mirrors `fff::CaseMode` but | ||
| /// with rkyv derives — fff-core doesn't depend on rkyv. |
There was a problem hiding this comment.
this is up for debate whether we want to keep it this way or not.
I wanted to avoid adding another dependency to fff that is only used in the ipc code-path, but with optimizers being what they are, maybe that's not so bad.
either way, i chose this path of having two CaseModes (one in fff and one in fff-ipc-domain), but open to discussion
There was a problem hiding this comment.
we can have optional dependencies - this is absolutely fine
| #[rkyv(derive(Debug))] | ||
| pub struct SearchRequest { | ||
| /// Root directory to search in (must be an absolute path). | ||
| pub directory: String, |
There was a problem hiding this comment.
need to stop passing paths as strings. much better bet is to pass pathbufs as deep as possible until the fff barrier.
3a207ad to
9969bba
Compare
Bootstrap the CLI layer for FFF: - fff-ipc-domain: wire types and IPC protocol (Unix socket, bincode) - fff-daemon: background search daemon with session pooling, rg-compatible output formatting, and ANSI color matching - fff-rg: ripgrep-compatible CLI frontend with daemon/fallback searcher backends Includes 120 e2e tests: - 95 comparison tests (fff-rg vs rg side-by-side) across inline, heading, vimgrep, context, color, quiet, count, regex, unicode, and edge-case modes using test-case crate for parametrization - 25 synthetic repo scale tests (50/200/500 files) verifying match counts, line numbers, output formats, concurrency, and per-needle findability without rg comparison
dmtrKovalenko
left a comment
There was a problem hiding this comment.
some nits on the code I need to go through the way it actually works one more time
| shared_picker.wait_for_scan(Duration::from_secs(120)); | ||
| let file_count = { | ||
| let guard = shared_picker.read().expect("read lock"); | ||
| guard.as_ref().expect("picker present").get_files().len() |
There was a problem hiding this comment.
this would require a major version bump
| //! status byte. Spawns the daemon on first use if it isn't already running. | ||
|
|
||
| use std::io::{Read, Write}; | ||
| use std::os::unix::io::AsRawFd; |
There was a problem hiding this comment.
ideally we should get all the unix specific mode into a separate file cause I would love this to work on windows at some point
| fn into_core(self) -> Self::Core; | ||
| } | ||
|
|
||
| impl IntoCoreExt for fff_ipc_domain::CaseMode { |
| fn main() { | ||
| let args = Args::parse(); | ||
|
|
||
| tracing_subscriber::fmt() |
| /// Default max file size for grep when the client doesn't specify one (4 MiB). | ||
| const DEFAULT_MAX_FILE_SIZE: u64 = 4 * 1024 * 1024; | ||
|
|
||
| use fff::{ |
There was a problem hiding this comment.
do not intermix imports and constants
| )] | ||
| /// Mirrors a subset of `rg` flags so `fff-rg` is a drop-in replacement. | ||
| #[allow(clippy::struct_excessive_bools)] | ||
| pub struct Args { |
There was a problem hiding this comment.
i think we should restructure the crates to be somethjing like this
cli/
daemon
ffd
frg
ipc
this iwll make it much easier to keep backward compatibility with those tools
| use crate::types::cli::Args; | ||
|
|
||
| #[global_allocator] | ||
| static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; |
There was a problem hiding this comment.
i don't think we need mimaloc here - can drop a dependency
| @@ -0,0 +1,147 @@ | |||
| use crate::util::Dir; | |||
|
|
|||
| const EXTENSIONS: &[&str] = &["rs", "ts", "json", "md", "txt", "toml", "yaml"]; | |||
There was a problem hiding this comment.
for tests I would prefer tests in the big repo using proptest on various flags + queries
| bytesize = "2" | ||
| mimalloc = { workspace = true } | ||
| git2 = { workspace = true } | ||
| which = "8.0.3" |
There was a problem hiding this comment.
im not sure we need it - if it's on a path Command will resolve it
| @@ -0,0 +1,6 @@ | |||
| edition = "2024" | |||
|
@gustav-fff please split out all the changes related to the new case_sensetivity api to a separate PR (keep Co-Authored-by marco) but make sure that we are not doing any breaking changes |
|
[triage-bot] DIRECTED: split case-sensitivity API into #609. Approach: added Marko credited via Co-authored-by trailer on the commit. Once #609 merges, this PR can drop the Honk-Honk 🪿 |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new fff-rg CLI that behaves like an rg-style frontend and (when inside a git worktree) delegates searches to a long-lived fff-daemon over a Unix domain socket, keeping indexes warm for repeated queries. It also expands grep case handling from a boolean smart_case to an explicit CaseMode across the core search options and downstream consumers.
Changes:
- Add
fff-daemon(session pool + query service) andfff-ipc-domain(wire types) to support fd-passing IPC for result output. - Add
fff-rgCLI with backend selection (daemon vs spawningrg) and an initial rg-compat test suite. - Replace
smart_case: boolwithcase_mode: CaseModeinfff-coregrep options and update call sites.
Reviewed changes
Copilot reviewed 42 out of 43 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/fff-nvim/src/lib.rs | Update Neovim binding to map smart_case to core CaseMode. |
| crates/fff-nvim/src/bin/grep_vs_rg.rs | Update benchmark harness to use case_mode. |
| crates/fff-nvim/src/bin/grep_profiler.rs | Update profiler to use case_mode. |
| crates/fff-nvim/src/bin/fuzzy_grep_test.rs | Update test binary to use case_mode. |
| crates/fff-nvim/src/bin/bench_grep_query.rs | Update benchmark to use case_mode. |
| crates/fff-nvim/benches/grep_bench.rs | Update benches to use case_mode. |
| crates/fff-nvim/benches/fuzzy_search_bench.rs | Update benches to use case_mode. |
| crates/fff-core/tests/path_separator_constraint_test.rs | Update test options to case_mode. |
| crates/fff-core/tests/new_directory_watcher_test.rs | Update test options to case_mode. |
| crates/fff-core/tests/grep_integration.rs | Update integration tests to case_mode. |
| crates/fff-core/tests/fuzz_real_repos.rs | Update fuzz tests to case_mode. |
| crates/fff-core/tests/fuzz_git_watcher_stress.rs | Update stress tests to case_mode. |
| crates/fff-core/tests/fuzz_file_operations.rs | Update fuzz tests to case_mode. |
| crates/fff-core/tests/bigram_overlay_integration.rs | Update overlay tests to case_mode. |
| crates/fff-core/tests/bigram_overlay_coherence_test.rs | Update overlay tests to case_mode. |
| crates/fff-core/src/grep.rs | Introduce CaseMode, migrate grep options and matching logic. |
| crates/fff-core/Cargo.toml | Align libc dependency to workspace. |
| crates/fff-c/src/lib.rs | Map C API smart_case boolean into core CaseMode. |
| crates/cli/rustfmt.toml | Add CLI-local rustfmt configuration. |
| crates/cli/fff-rg/tests/rg_compat/util.rs | Add test utilities (daemon bootstrap, normalization, rg comparisons). |
| crates/cli/fff-rg/tests/rg_compat/synthetic.rs | Add synthetic repo generator for scale/format tests. |
| crates/cli/fff-rg/tests/rg_compat/hay.rs | Add fixed test corpora for rg-compat tests. |
| crates/cli/fff-rg/tests/rg_compat.rs | Add rg-compat integration tests for output parity and behaviors. |
| crates/cli/fff-rg/src/types/mod.rs | Add CLI types module root. |
| crates/cli/fff-rg/src/types/cli/mod.rs | Define clap-parsed CLI args for fff-rg. |
| crates/cli/fff-rg/src/types/cli/case_mode.rs | Implement -i/-s/-S case mode parsing and translation. |
| crates/cli/fff-rg/src/searcher/search.rs | Introduce common Search trait for backends. |
| crates/cli/fff-rg/src/searcher/rg.rs | Implement fallback backend that shells out to rg. |
| crates/cli/fff-rg/src/searcher/mod.rs | Implement backend selection (daemon inside git, else rg). |
| crates/cli/fff-rg/src/searcher/fffd.rs | Implement daemon IPC backend (connect/spawn, fd passing, request build). |
| crates/cli/fff-rg/src/main.rs | Add fff-rg entry point and exit code handling. |
| crates/cli/fff-rg/src/app_ctx.rs | Add startup context (cwd/git root/tty/rg discovery/daemon path). |
| crates/cli/fff-rg/Cargo.toml | Define fff-rg crate deps (clap/rkyv/sendfd/etc.). |
| crates/cli/fff-ipc-domain/src/lib.rs | Add IPC wire types + socket path helper. |
| crates/cli/fff-ipc-domain/Cargo.toml | Define IPC domain crate deps. |
| crates/cli/fff-daemon/src/session_pool.rs | Add session pool with LRU + idle eviction thread. |
| crates/cli/fff-daemon/src/query_service.rs | Add UDS accept loop, request parsing, dispatch to fff-core search. |
| crates/cli/fff-daemon/src/output.rs | Add rg-like output formatting for grep and file listings. |
| crates/cli/fff-daemon/src/main.rs | Add daemon entry point and signal handling. |
| crates/cli/fff-daemon/src/convert.rs | Add IPC-to-core type conversions (CaseMode). |
| crates/cli/fff-daemon/Cargo.toml | Define fff-daemon crate deps. |
| Cargo.toml | Add new CLI crates to workspace + add shared deps (rkyv/clap/etc.). |
| Cargo.lock | Lockfile updates for new dependencies/crates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Max concurrent directory indexes the daemon will keep alive. | ||
| pub const MAX_SESSIONS: usize = 8; | ||
| /// Sessions idle longer than this are evicted. | ||
| const IDLE_TIMEOUT: Duration = Duration::from_mins(10); |
| if n < RequestHeader::SIZE { | ||
| return Err("incomplete header".into()); | ||
| } |
| let header = RequestHeader::decode(raw_header); | ||
| let body_len = header.body_len as usize; | ||
|
|
||
| stream.set_read_timeout(Some(READ_TIMEOUT))?; | ||
| let mut body = smallvec::SmallVec::<[u8; IPC_BODY_INLINE]>::from_elem(0, body_len); | ||
| stream.read_exact(&mut body)?; |
| pub struct GrepSearchOptions { | ||
| pub max_file_size: u64, | ||
| pub max_matches_per_file: usize, | ||
| pub smart_case: bool, | ||
| pub case_mode: CaseMode, | ||
| /// File-based pagination offset: index into the sorted/filtered file list |
| "crates/cli/fff-daemon", | ||
| "crates/cli/fff-ipc-domain", | ||
| "crates/cli/fff-rg", |
| if !args.no_line_number && (args.line_number || pretty || is_tty) { f |= OutputFlags::LINE_NUMBER; } | ||
| if args.column || args.vimgrep { f |= OutputFlags::COLUMN; } | ||
| if !args.no_heading && (args.heading || pretty || is_tty) { f |= OutputFlags::HEADING; } | ||
| if !args.no_filename { f |= OutputFlags::WITH_FILENAME; } |
| @@ -0,0 +1,846 @@ | |||
| #[path = "rg_compat/hay.rs"] | |||
| pub fn assert_rg_match(dir: &Dir, args: &[&str], heading: bool) { | ||
| let fff_out = dir.command().args(args).full_output(); | ||
| let rg_out = dir.rg().args(args).full_output(); | ||
|
|
| fn ensure_daemon() { | ||
| DAEMON.call_once(|| { | ||
| use std::os::unix::net::UnixStream; | ||
|
|
||
| let socket = fff_ipc_domain::daemon_socket_path(); | ||
| if UnixStream::connect(&socket).is_ok() { | ||
| return; | ||
| } |
| @@ -1125,7 +1132,7 @@ const fn is_utf8_char_boundary(b: u8) -> bool { | |||
| /// - The input is passed directly to the regex engine without escaping | |||
| /// - Smart case still applies | |||
This PR adds a CLI interface to
fff. After talking with @dmtrKovalenko, we decided that the best architecture is to have anfffdaemon run in the background to keep the index hot. The command that the user interacts with isfff-rg, which talks to this daemon.Before we proceed, it's worth looking at the architecture diagram here:
flowchart LR subgraph fff-rg rg-searcher fffd-searcher end subgraph fff-daemon query-service subgraph session-pool session-1 session-2 end query-service --> session-1 query-service --> session-2 end rg-bin[[rg]] fffd-searcher <-->|unix sock|query-service rg-searcher <-->|subproc| rg-binLet's walk over each component:
fff-rgfff-rgcan search either through talking through thefff-daemon, or through shelling torg. It picks thefff-daemonif it believes to be working within a git repo. Otherwise, it will fall back torg, and if the user doesn't have it installed, it will abort.The core reasoning for this is that holding an
fff-daemonalive for directories that don't need to be indexed makes no sense, as you'll hold a very large amount of RAM in memory for an effectively one-off search.fff-daemonThe
fff-daemonholds within it aquery-service. Thisquery-serviceis responsible for accepting connections from new clients and then passing the request down to thesession-pool.The
session-poolis a pool of "active" sessions, ie. active git repositories for which we have anfffindex in memory. A couple notable facts:IPC protocol
I spent a good bit of time thinking through how best to handle the IPC protocol, and the epiphany I had is that we don't actually need to shuttle results between the daemon and the client. There may be hundreds, if not thousands of strings we'd need to copy, so if we can avoid it, that would be awesome.
On UNIX, you can pass file descriptors by using SCM_RIGHTS between processes. This enables us to take the stdout fd of the client, pass it to the daemon, and have the daemon directly write to the client's stdout. Neat!
The downside of this approach is that
SCM_RIGHTSrequires a unix domain socket, which means that passing the fd needs to happen over that. I evaluated:iceoryx2for all the other message passing, but ultimately I couldn't justify the maintenance burden. There's a ton of complexity added in figuring synchronizing the req-rep flow of themmaped iceoryx2 channels and the req-rep of the fd transfer, so it made no sense.This lives in the
crates/cli/fff-ipc-domaincrate.Holes in the implementation