fix(resolve): Prevent submodule deletion#12577
fix(resolve): Prevent submodule deletion#12577dkattan wants to merge 3 commits intogitbutlerapp:masterfrom
Conversation
|
@dkattan is attempting to deploy a commit to the GitButler Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where submodule working trees could be left empty after resolving conflicts in edit mode via but resolve finish. The fix adds best-effort repopulation of existing submodules after workspace checkout operations.
Changes:
- Adds
repopulate_existing_submodules()helper that runsgit submodule update --init --recursiveafter checkout operations - Calls this helper in both git2-based and gix-based checkout paths within
update_uncommitted_changes_with_tree() - Adds regression test proving submodule files remain present after
save_and_return_to_workspace()
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/gitbutler-workspace/src/branch_trees.rs | Implements repopulate_existing_submodules() and integrates it into checkout operations |
| crates/gitbutler-edit-mode/tests/fixtures/edit_mode.sh | Adds test fixture with submodule setup |
| crates/gitbutler-edit-mode/tests/edit_mode.rs | Adds test verifying submodule repopulation works correctly |
| let _ = Command::new("git") | ||
| .arg("-c") | ||
| .arg("protocol.file.allow=always") | ||
| .arg("submodule") | ||
| .arg("update") | ||
| .arg("--init") | ||
| .arg("--recursive") | ||
| .current_dir(workdir) | ||
| .status(); |
There was a problem hiding this comment.
The command execution uses .status() but ignores the result completely with let _ = .... While the comment mentions "best-effort" behavior, there's no logging when the command fails, which could make debugging issues difficult. Consider adding tracing/logging on failure to help diagnose submodule repopulation issues in production, similar to patterns in but-worktrees/src/git.rs where output is logged with tracing::info! and tracing::error!.
| let _ = Command::new("git") | ||
| .arg("-c") | ||
| .arg("protocol.file.allow=always") | ||
| .arg("submodule") | ||
| .arg("update") | ||
| .arg("--init") | ||
| .arg("--recursive") | ||
| .current_dir(workdir) | ||
| .status(); |
There was a problem hiding this comment.
The code should use std::process::Command::from(gix::command::prepare(gix::path::env::exe_invocation())) instead of Command::new("git") to follow the codebase's established pattern for git command execution. The gix::command::prepare() wrapper handles platform-specific concerns (especially on Windows) and proper environment setup, which is particularly important for submodule operations that may need access to the git configuration.
| let _ = Command::new("git") | |
| .arg("-c") | |
| .arg("protocol.file.allow=always") | |
| .arg("submodule") | |
| .arg("update") | |
| .arg("--init") | |
| .arg("--recursive") | |
| .current_dir(workdir) | |
| .status(); | |
| let _ = Command::from(gix::command::prepare( | |
| gix::path::env::exe_invocation(), | |
| )) | |
| .arg("-c") | |
| .arg("protocol.file.allow=always") | |
| .arg("submodule") | |
| .arg("update") | |
| .arg("--init") | |
| .arg("--recursive") | |
| .current_dir(workdir) | |
| .status(); |
|
Addressed the Copilot review feedback in follow-up commits:
Validation rerun: |
|
Hi! Thanks for submitting this PR. Please can you sign your PR's commits. It's an interesting question whether we want to manually restore submodules like this. I'm not super familiar with with submodules so I'm hoping you can help me understand some of the following questions: Does it always make sense to do a What is git's behaviour after a checkout with submodules - can we mirror that exactly? I'm also curious about your reasoning to add this call in the Otherwise, is this a sustainable approach to trying to better support submodules? Are there downsides to calling this after every update of HEAD? If we do want to take this as a more wide-spread approach of maintaining submodules, could you help me understand how this fits into the current state of play with our submodule support. Many thanks, Caleb |
| let mut checkout = git2::build::CheckoutBuilder::new(); | ||
| checkout.force().conflict_style_diff3(true); | ||
| // Submodule worktrees can be deleted by aggressive untracked cleanup. | ||
| // Keep that cleanup only for repos without configured submodules. | ||
| if !has_submodules { | ||
| checkout.remove_untracked(true); | ||
| } | ||
|
|
||
| repo.checkout_index(Some(&mut new_uncommitted_changes), Some(&mut checkout))?; |
There was a problem hiding this comment.
PR description mentions a new repopulate_existing_submodules helper (and running git submodule update --init --recursive via Command) in this file, but the actual change here only disables remove_untracked(true) when submodules are detected. Either update the PR description to match the implemented approach, or add the missing repopulation helper if that behavior is still required (e.g., to recover already-empty submodule worktrees).
| fn is_submodule_related_path(path: &str, submodule_paths: &[String]) -> bool { | ||
| submodule_paths | ||
| .iter() | ||
| .any(|sm| path == sm || path.strip_prefix(sm).is_some_and(|rest| rest.starts_with('/'))) |
There was a problem hiding this comment.
is_submodule_related_path() treats the submodule root path itself as “submodule-related” (path == sm), so collect_checkout_paths() will exclude gitlink changes for that submodule. As a result, entering edit mode can silently skip commits that add/remove/modify submodule entries (or change only the submodule pointer), because the checkout pathspec will be empty and the checkout is skipped. Consider only excluding paths inside submodules (e.g., sm/…) while still allowing the submodule root path so the superproject’s gitlink entry is updated.
| .any(|sm| path == sm || path.strip_prefix(sm).is_some_and(|rest| rest.starts_with('/'))) | |
| .any(|sm| path.strip_prefix(sm).is_some_and(|rest| rest.starts_with('/'))) |
| let Ok(mut entries) = config.entries(Some("submodule\\..*\\.url")) else { | ||
| return repo | ||
| .path() | ||
| .join("modules") | ||
| .read_dir() | ||
| .map(|mut it| it.next().is_some()) | ||
| .unwrap_or(false); | ||
| }; | ||
|
|
||
| entries.next().transpose().ok().flatten().is_some() | ||
| || repo | ||
| .path() | ||
| .join("modules") | ||
| .read_dir() | ||
| .map(|mut it| it.next().is_some()) | ||
| .unwrap_or(false) |
There was a problem hiding this comment.
has_submodules_configured() is duplicated here and in crates/gitbutler-workspace/src/branch_trees.rs with identical logic. To avoid the two copies drifting (and to keep submodule detection consistent across workspace checkouts and edit-mode operations), consider extracting this into a shared utility (e.g., a small helper module in gitbutler-workspace or a common crate) and reusing it from both locations.
| let Ok(mut entries) = config.entries(Some("submodule\\..*\\.url")) else { | |
| return repo | |
| .path() | |
| .join("modules") | |
| .read_dir() | |
| .map(|mut it| it.next().is_some()) | |
| .unwrap_or(false); | |
| }; | |
| entries.next().transpose().ok().flatten().is_some() | |
| || repo | |
| .path() | |
| .join("modules") | |
| .read_dir() | |
| .map(|mut it| it.next().is_some()) | |
| .unwrap_or(false) | |
| let modules_root = repo.path().join("modules"); | |
| let modules_dir_has_entries = modules_root | |
| .read_dir() | |
| .map(|mut it| it.next().is_some()) | |
| .unwrap_or(false); | |
| let Ok(mut entries) = config.entries(Some("submodule\\..*\\.url")) else { | |
| return modules_dir_has_entries; | |
| }; | |
| entries | |
| .next() | |
| .transpose() | |
| .ok() | |
| .flatten() | |
| .is_some() | |
| || modules_dir_has_entries |
22fb543 to
bad49bc
Compare
|
@Caleb-T-Owens What changed:
On your sustainability questions: I agree broad recursive submodule updates after every HEAD movement are too risky/surprising. This revision avoids that and keeps the fix narrowly targeted to preventing destructive checkout behavior in the edit-mode flow covered by the regression test. Human edit: Apologies for the AI slop in the responses. Should be cleaned up now. I swear I'm a human. |
| #[test] | ||
| fn save_and_return_to_workspace_preserves_submodule_worktree() -> Result<()> { | ||
| let (mut ctx, _tempdir) = command_ctx("save_and_return_to_workspace_preserves_submodule_worktree")?; | ||
| let (foobar, worktree_dir) = { | ||
| let repo = ctx.git2_repo.get()?; | ||
| let foobar = repo.head()?.peel_to_commit()?.parent(0)?.id(); | ||
| (foobar, repo.path().parent().unwrap().to_path_buf()) | ||
| }; | ||
| let vb_state = VirtualBranchesHandle::new(ctx.project_data_dir()); | ||
| let stacks = vb_state.list_stacks_in_workspace()?; | ||
| let stack = stacks.first().unwrap(); | ||
| let submodule_probe = worktree_dir.join("submodules/test-module/probe.txt"); | ||
| assert!( | ||
| submodule_probe.exists(), | ||
| "fixture should start with populated submodule worktree" | ||
| ); | ||
|
|
||
| enter_edit_mode(&mut ctx, foobar, stack.id)?; | ||
| assert!( | ||
| submodule_probe.exists(), | ||
| "submodule file should remain after entering edit mode" | ||
| ); | ||
| save_and_return_to_workspace(&mut ctx)?; | ||
|
|
||
| assert!( | ||
| submodule_probe.exists(), | ||
| "submodule file should remain after leaving edit mode" | ||
| ); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
The test only verifies that the submodule worktree file exists after entering and leaving edit mode, but it doesn't verify that the submodule is still properly configured and functional. Consider adding assertions to check that the submodule can still be updated or that its git state is intact (e.g., checking that submodules/test-module/.git exists and is valid).
| for submodule in &mut submodules { | ||
| let _ = submodule.update(true, None); | ||
| } |
There was a problem hiding this comment.
The error from submodule.update() is silently ignored with let _ =. While this might be intentional to prevent submodule update failures from blocking the main operation, it means users won't be notified if submodules fail to update. Consider logging the error at least at the debug or warn level so that users and developers can diagnose submodule issues. For example: if let Err(e) = submodule.update(true, None) { tracing::warn!("Failed to update submodule: {}", e); }
| for path in &checkout_head_paths { | ||
| checkout_head.path(path); | ||
| } | ||
| } |
There was a problem hiding this comment.
When has_submodules is true but checkout_head_paths is empty (meaning all changed paths are submodule-related), the checkout is skipped entirely. This is correct and preserves the worktree state. However, consider whether HEAD update (repo.set_head(EDIT_BRANCH_REF)) should still proceed even when there's nothing to checkout. The current implementation updates HEAD before the checkout decision, which seems correct, but it's worth documenting this intentional behavior with a comment explaining why skipping checkout is safe when all paths are submodule-related.
| } | |
| } | |
| // When `has_submodules` is true and `checkout_head_paths` is empty, | |
| // all changes are submodule-related. In that case we intentionally | |
| // skip `checkout_head` after updating HEAD above. This preserves the | |
| // existing worktree (including submodules), which may be managed | |
| // independently, while still moving HEAD to `EDIT_BRANCH_REF`. |
| let result = (|| -> Result<()> { | ||
| let current_head_tree = repo.head()?.peel_to_tree()?; | ||
|
|
||
| // Checkout commits's parent | ||
| let commit_parent = find_or_create_base_commit(repo, &commit)?; | ||
| let commit_parent_tree = commit_parent.tree()?; | ||
| let checkout_head_paths = collect_checkout_paths( | ||
| repo, | ||
| ¤t_head_tree, | ||
| &commit_parent_tree, | ||
| &submodule_paths, | ||
| )?; | ||
|
|
||
| // Checkout commits's parent | ||
| let commit_parent = find_or_create_base_commit(repo, &commit)?; | ||
| repo.reference(EDIT_BRANCH_REF, commit_parent.id(), true, "")?; | ||
| repo.set_head(EDIT_BRANCH_REF)?; | ||
| repo.checkout_head(Some(CheckoutBuilder::new().force().remove_untracked(true)))?; | ||
| repo.reference(EDIT_BRANCH_REF, commit_parent.id(), true, "")?; | ||
| repo.set_head(EDIT_BRANCH_REF)?; | ||
| let mut checkout_head = CheckoutBuilder::new(); | ||
| checkout_head.force(); | ||
| if !has_submodules { | ||
| checkout_head.remove_untracked(true); | ||
| } else { | ||
| for path in &checkout_head_paths { | ||
| checkout_head.path(path); | ||
| } | ||
| } | ||
| if !has_submodules || !checkout_head_paths.is_empty() { | ||
| repo.checkout_head(Some(&mut checkout_head))?; | ||
| } | ||
|
|
||
| // Checkout the commit as unstaged changes | ||
| let mut index = get_commit_index(ctx, &commit)?; | ||
|
|
||
| repo.checkout_index( | ||
| Some(&mut index), | ||
| Some( | ||
| CheckoutBuilder::new() | ||
| .force() | ||
| .remove_untracked(true) | ||
| .conflict_style_diff3(true), | ||
| ), | ||
| )?; | ||
| // Checkout the commit as unstaged changes | ||
| let mut index = get_commit_index(ctx, &commit)?; | ||
| let commit_tree = commit.tree()?; | ||
| let checkout_index_paths = collect_checkout_paths( | ||
| repo, | ||
| &commit_parent_tree, | ||
| &commit_tree, | ||
| &submodule_paths, | ||
| )?; | ||
| let mut checkout_index = CheckoutBuilder::new(); | ||
| checkout_index.force().conflict_style_diff3(true); | ||
| if !has_submodules { | ||
| checkout_index.remove_untracked(true); | ||
| } else { | ||
| for path in &checkout_index_paths { | ||
| checkout_index.path(path); | ||
| } | ||
| } | ||
|
|
||
| Ok(()) | ||
| if !has_submodules || !checkout_index_paths.is_empty() { | ||
| repo.checkout_index(Some(&mut index), Some(&mut checkout_index))?; | ||
| } | ||
|
|
||
| // Keep configured submodule worktrees populated after moving into edit mode. | ||
| if has_submodules { | ||
| sync_configured_submodules(repo); | ||
| } | ||
|
|
||
| Ok(()) | ||
| })(); | ||
| result |
There was a problem hiding this comment.
This code has an unnecessary wrapper pattern. The closure returns Result<()>, which is immediately bound to result, and then result is returned unchanged. This pattern doesn't add any value and makes the code less readable. The closure body should either be inlined directly, or if error context needs to be added, use .context() instead of this pattern.
| // When only .git/modules entries exist, derive logical submodule paths from that layout. | ||
| let modules_root = repo.path().join("modules"); | ||
| if modules_root.exists() { | ||
| let mut stack = vec![modules_root.clone()]; | ||
| while let Some(dir) = stack.pop() { | ||
| let Ok(entries) = std::fs::read_dir(&dir) else { | ||
| continue; | ||
| }; | ||
|
|
||
| for entry in entries.flatten() { | ||
| let path = entry.path(); | ||
| if !path.is_dir() { | ||
| continue; | ||
| } | ||
|
|
||
| if path.join("HEAD").is_file() | ||
| && let Ok(relative) = path.strip_prefix(&modules_root) | ||
| { | ||
| paths.insert(relative.to_string_lossy().to_string()); | ||
| } | ||
|
|
||
| stack.push(path); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The recursive directory traversal doesn't distinguish between actual Git submodule repositories and other directories that happen to contain a HEAD file. A directory with just a HEAD file isn't necessarily a valid Git repository or submodule. Consider checking for additional indicators of a valid Git repository (like config, refs, or objects directories) to avoid false positives. Also, this implementation will traverse into subdirectories indefinitely, which could be a performance issue in repositories with deeply nested module structures.
|
Addressed Copilot feedback on PR/body mismatch by updating the PR description to match the current implementation (path-filtered checkout behavior, guarded remove_untracked usage, shared submodule detection helper, and targeted configured-submodule sync). |
|
Thanks @dkattan. Could you also hit the resolve button for everything you addressed, or comment on it before resolving if it wasn't addressed? Lastly, the PR needs rebasing due to conflicts, and I am putting it back to draft now so you can put it back in 'ready for review' when it's truly ready. |
045a786 to
a98e193
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot please review this PR. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce81c5410a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !is_submodule_related_path(&path, submodule_paths) { | ||
| paths.insert(path); |
There was a problem hiding this comment.
Preserve commit-accurate gitlinks when building checkout paths
Filtering out submodule paths here means the subsequent edit-mode checkouts never apply gitlink entries from commit_parent -> commit. In repositories where the currently checked-out workspace head has a newer submodule pointer (for example, editing a non-tip commit after a later commit changed the same submodule), the edit branch keeps that newer gitlink and save_and_return_to_workspace can silently rewrite the edited commit with the wrong submodule SHA.
Useful? React with 👍 / 👎.
| let _ = std::fs::remove_dir_all(&absolute); | ||
| } else { | ||
| let _ = std::fs::remove_file(&absolute); |
There was a problem hiding this comment.
Propagate failures when deleting untracked paths
This cleanup routine discards filesystem deletion errors, so callers get Ok(()) even when untracked files could not be removed (e.g., read-only or permission-denied paths). That leaves the worktree in an unexpected state after forced cleanup and can leak files into later operations such as index.add_all("*") during save-from-edit-mode. Returning the I/O error would prevent silent corruption of subsequent steps.
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| fn collect_submodule_paths_from_modules_dir(repo: &git2::Repository, paths: &mut HashSet<String>) { | ||
| let modules_root = repo.path().join("modules"); |
There was a problem hiding this comment.
collect_submodule_paths_from_modules_dir() uses repo.path().join("modules"), but in linked-worktree repositories the common git dir (where modules/ lives) may differ from repo.path(). Consider using the repository common dir (e.g. repo.commondir() in libgit2/git2-rs) so submodule detection still works in linked worktrees and you don't fall back to the unsafe broad cleanup behavior.
| let modules_root = repo.path().join("modules"); | |
| let modules_root = repo.commondir().join("modules"); |
| let absolute = workdir.join(path); | ||
| if absolute.is_dir() { | ||
| let _ = std::fs::remove_dir_all(&absolute); | ||
| } else { | ||
| let _ = std::fs::remove_file(&absolute); | ||
| } |
There was a problem hiding this comment.
remove_untracked_excluding_submodule_paths() ignores errors from remove_dir_all/remove_file. That can silently leave untracked files behind (e.g. permission issues) and make later operations behave unpredictably. Prefer returning an error with context for failures other than NotFound, and only suppress errors you explicitly want to treat as best-effort.
| @@ -109,15 +114,16 @@ pub fn update_uncommitted_changes_with_tree( | |||
| } | |||
| } | |||
|
|
|||
| repo.checkout_index( | |||
| Some(&mut new_uncommitted_changes), | |||
| Some( | |||
| git2::build::CheckoutBuilder::new() | |||
| .force() | |||
| .remove_untracked(true) | |||
| .conflict_style_diff3(true), | |||
| ), | |||
| )?; | |||
| let mut checkout = git2::build::CheckoutBuilder::new(); | |||
| checkout.force().conflict_style_diff3(true); | |||
| if submodule_paths.is_empty() { | |||
| checkout.remove_untracked(true); | |||
| } | |||
|
|
|||
| repo.checkout_index(Some(&mut new_uncommitted_changes), Some(&mut checkout))?; | |||
| if !submodule_paths.is_empty() { | |||
| remove_untracked_excluding_submodule_paths(repo)?; | |||
| } | |||
There was a problem hiding this comment.
configured_submodule_paths(repo) is computed here only to decide whether to call remove_untracked(true), but when submodule_paths is non-empty you then call remove_untracked_excluding_submodule_paths(repo), which recomputes configured_submodule_paths() internally. Consider changing remove_untracked_excluding_submodule_paths to accept the already-computed list (or adding an internal helper) to avoid repeated filesystem/config scanning on each checkout.
| pub fn is_submodule_related_path(path: &str, submodule_paths: &[String]) -> bool { | ||
| submodule_paths.iter().any(|sm| { | ||
| path == sm | ||
| || path | ||
| .strip_prefix(sm) | ||
| .is_some_and(|rest| rest.starts_with('/')) | ||
| }) |
There was a problem hiding this comment.
is_submodule_related_path() assumes Git-style / path separators (rest.starts_with('/')), but configured_submodule_paths() builds paths via Path::to_string_lossy() (from repo.submodules() and .git/modules scanning), which will produce \ on Windows. That can cause submodule paths to not match, re-enabling submodule worktree deletion on Windows. Normalize paths to a single separator (e.g. convert \→/ in both path and each sm before comparisons, or store submodule paths as normalized Git paths when collecting).
Summary
This PR is now scoped to the submodule-preservation fix in resolve/edit mode. The recovery guidance/UI changes and overwrite diagnostics that were previously on this branch have been split out.
What changed
gitbutler-workspace::submodulesto detect configured submodules, identify submodule-related paths, and remove untracked files without deleting populated submodule worktrees.update_uncommitted_changes_with_tree(...)to stop using broadremove_untracked(true)cleanup and instead perform targeted cleanup viaremove_untracked_excluding_submodule_paths(...).Key files touched
crates/gitbutler-workspace/src/submodules.rscrates/gitbutler-workspace/src/branch_trees.rscrates/gitbutler-workspace/src/lib.rscrates/gitbutler-edit-mode/src/lib.rscrates/gitbutler-edit-mode/tests/edit_mode.rscrates/gitbutler-edit-mode/tests/fixtures/edit_mode.shcrates/gitbutler-workspace/Cargo.tomlCargo.lockValidation run
cargo test -p gitbutler-edit-mode -p gitbutler-workspace