feat(curvine-cli): auto sync metadata from ufs when mount#712
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an automatic metadata resync from UFS on first cv mount for fs_mode mounts, improves resync visibility via progress reporting, and strengthens end-to-end coverage to validate nested/high-volume resync behavior.
Changes:
- Reuse a single resync execution path for both
mount resyncand first-mount auto-resync infs_mode, with periodic progress output. - Ensure missing Curvine directories are created during resync traversal to avoid list failures.
- Expand
build/tests/resync_e2e.shwith nested stress uploads, stricter assertions, and more proactive cleanup.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| curvine-common/src/state/mount.rs | Adds unit coverage for fs_mode vs cache_mode guard behavior used by resync logic. |
| curvine-cli/src/cmds/mount.rs | Introduces auto-resync on first mount in fs_mode, unifies resync flow, and adds progress reporting + CV dir creation during traversal. |
| build/tests/resync_e2e.sh | Adds new scenarios (cache-mode rejection, auto-resync), plus high-volume nested stress testing and cleanup helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
curvine-cli/src/cmds/mount.rs
Outdated
| || err.to_string().contains("not exists") | ||
| || err.to_string().contains("not found") |
There was a problem hiding this comment.
Thanks for catching this. I removed the string-based fallback and now classify missing CV directories by ErrorKind only (FileNotFound/Expired), so unrelated errors are no longer silently treated as missing.
build/tests/resync_e2e.sh
Outdated
| UFS_PREFIX="${UFS_PREFIX:-curvine-test}" | ||
| CV_PATH="${CV_PATH:-/miniocluster/curvine-test}" | ||
| S3_ENDPOINT_URL="${S3_ENDPOINT_URL:-http://127.0.0.1:9009}" | ||
| S3_REGION="${S3_REGION:-cn-beigjing}" |
There was a problem hiding this comment.
Good catch. Updated the default region typo from cn-beigjing to cn-beijing to avoid confusion in logs/configs.
build/tests/resync_e2e.sh
Outdated
|
|
||
| TMP_DIR="$(mktemp -d)" | ||
| trap 'rm -rf "$TMP_DIR"' EXIT | ||
| trap 'cleanup_base_test_data; rm -rf "$TMP_DIR"' EXIT |
There was a problem hiding this comment.
Agreed. I extended the EXIT trap to run cleanup_previous_test_prefixes in addition to base cleanup, so dynamic auto/stress prefixes are also cleaned when the script exits early.
| run_cv mount "s3://$BUCKET/$CACHE_MODE_UFS_PREFIX" "$CACHE_MODE_CV_PATH" \ | ||
| --write-type cache_mode \ | ||
| --config s3.endpoint_url="$S3_ENDPOINT_URL" \ | ||
| --config s3.credentials.access="$S3_ACCESS_KEY" \ | ||
| --config s3.credentials.secret="$S3_SECRET_KEY" \ | ||
| --config s3.force.path.style="$S3_FORCE_PATH_STYLE" \ | ||
| >/dev/null 2>&1 |
There was a problem hiding this comment.
Fixed. Added --config s3.region_name="$S3_REGION" to this cache-mode mount invocation for consistency and AWS compatibility.
| mc cp "$TMP_DIR/auto-b.txt" "$auto_b_mc_path" >/dev/null | ||
| log "scenario F: first mount auto-resync should not fail on missing cv dirs" | ||
| out_f="$(run_cv mount "s3://$BUCKET/$AUTO_UFS_PREFIX" "$AUTO_CV_PATH" \ | ||
| --config s3.endpoint_url="$S3_ENDPOINT_URL" \ |
There was a problem hiding this comment.
Fixed. Added --config s3.region_name="$S3_REGION" to the auto-resync scenario mount invocation as well.
| create_nested_stress_files "$STRESS_UFS_PREFIX" "$STRESS_FILE_COUNT" "$TMP_DIR/stress-seed.txt" | ||
| log "scenario G: mount and trigger auto resync for stress path" | ||
| out_g_mount="$(run_cv mount "s3://$BUCKET/$STRESS_UFS_PREFIX" "$STRESS_CV_PATH" \ | ||
| --config s3.endpoint_url="$S3_ENDPOINT_URL" \ |
There was a problem hiding this comment.
Fixed. Added --config s3.region_name="$S3_REGION" to the stress-test mount invocation to avoid AWS endpoint failures.
Use ErrorKind-based missing-dir detection to avoid string-based false positives. Also harden resync e2e by fixing region defaults, adding missing region configs, and extending EXIT cleanup to include stale dynamic prefixes. Made-with: Cursor
Problem
Mount resync on fs_mode lacked first-mount auto synchronization and had weak visibility for long sync runs. Existing e2e coverage was also too shallow for nested/batch metadata synchronization.
Design
Unify manual and auto resync into one reusable flow with in-place progress reporting. Ensure missing Curvine directories are created during traversal, and harden e2e with high-volume nested data plus strict post-sync assertions and cleanup.
Key Changes
mount resyncand firstmountauto-resync in fs_mode.running/done, elapsed, scanned/recreated/skipped/failed, pending dirs).build/tests/resync_e2e.shwith nested stress distribution (3-5 levels, >=100 dirs, >=1000 files), parallel upload, strict summary/count checks, and proactive/after-run cleanup for both UFS and Curvine test data.Made with Cursor