Skip to content

fix(macrobenchmarks): remove HNS validation and external model bucket check#942

Merged
zhixiangli merged 2 commits into
fsspec:mainfrom
zhixiangli:fix-macrobench-bugs
Jun 30, 2026
Merged

fix(macrobenchmarks): remove HNS validation and external model bucket check#942
zhixiangli merged 2 commits into
fsspec:mainfrom
zhixiangli:fix-macrobench-bugs

Conversation

@zhixiangli

@zhixiangli zhixiangli commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Remove HNS validation and external model bucket checks from the macrobenchmark initialization script.

Changes:

  • HNS Validation: Removed HNS status detection (via gcloud JSON parsing) and the corresponding validation checks for regional and hns bucket types.
  • Model Bucket Check: Removed the gcloud storage buckets describe check for the external model bucket, which was previously only warning on failure.

gcloud storage buckets describe currently filters out the hierarchicalNamespace
field, causing the verification to fail even when HNS is enabled.
We now only ensure the bucket is not zonal (RAPID).

TAG=agy
CONV=65d58535-7d1a-4671-8201-87c8d64accc2
… check

- Remove HNS check from `validate_bucket` for regional runs.
- Remove external model-weights bucket describe check to prevent false-negative warnings.

TAG=agy
CONV=bb9a8524-f974-44c7-bcb6-6eb30a649e34
# (hierarchicalNamespace vs hierarchical_namespace); accept either so a real
# HNS bucket is never silently read as non-HNS (which would reject hns runs
# and, worse, bypass the regional-run "no HNS" guard).
HNS=$(echo "$JSON" | python3 -c "import sys,json;d=json.load(sys.stdin);print(bool((d.get('hierarchicalNamespace') or d.get('hierarchical_namespace') or {}).get('enabled')))")

@zhixiangli zhixiangli Jun 30, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't working. Delete it to unblock.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the Hierarchical Namespace (HNS) validation checks and the model-weights bucket check from the initialization script. The reviewer noted that removing the HNS validation entirely could lead to invalid benchmark results by allowing runs on incompatible bucket types. They recommended restoring the validation using a more robust Python parser to handle varying CLI formats safely.

I am having trouble creating individual review comments. Click here to see my feedback.

cloudbuild/macrobenchmarks/scripts/init_variables.sh (76-80)

high

Removing the HNS validation entirely allows hns runs to silently execute on non-HNS buckets, and regional runs to execute on HNS buckets. This can lead to invalid benchmark results and defeats the purpose of the validation step.

If the previous implementation was fragile due to potential AttributeErrors (e.g., if hierarchicalNamespace is returned as a boolean instead of a dictionary in some CLI versions), we can make the Python parser more robust instead of removing the check entirely.

Here is how the complete validate_bucket function can be safely restored and improved:

validate_bucket() {
  BPATH="$1"; KIND="$2"
  BUCKET=$(echo "$BPATH" | sed -E 's#^gs://([^/]+).*#\1#')
  JSON=$(gcloud storage buckets describe "gs://$BUCKET" --project=${PROJECT_ID} --format=json 2>/dev/null) || {
    echo "ERROR: cannot describe $KIND bucket gs://$BUCKET (missing or no read access)."; exit 1; }
  LOC=$(echo "$JSON" | python3 -c "import sys,json;print((json.load(sys.stdin).get('location') or '').lower())")
  case "$LOC" in
    $REGION|$REGION-*) : ;;
    *) echo "ERROR: $KIND bucket gs://$BUCKET is in '$LOC', not region '$REGION'."; exit 1 ;;
  esac
  IS_RAPID=no; echo "$JSON" | grep -qiF 'RAPID' && IS_RAPID=yes
  
  # gcloud's JSON key for HNS has varied across CLI versions
  # (hierarchicalNamespace vs hierarchical_namespace); accept either so a real
  # HNS bucket is never silently read as non-HNS (which would reject hns runs
  # and, worse, bypass the regional-run "no HNS" guard).
  HNS=$(echo "$JSON" | python3 -c "import sys,json;d=json.load(sys.stdin);hns=d.get('hierarchicalNamespace') or d.get('hierarchical_namespace');enabled=hns.get('enabled',False) if isinstance(hns,dict) else bool(hns) if isinstance(hns,bool) else False;print(str(enabled).lower())")

  case "${_BUCKET_TYPE}" in
    regional)
      if [ "$IS_RAPID" = yes ]; then echo "ERROR: regional run but $KIND bucket gs://$BUCKET is RAPID/zonal."; exit 1; fi
      if [ "$HNS" = "true" ]; then echo "ERROR: regional run but $KIND bucket gs://$BUCKET has HNS enabled."; exit 1; fi ;;
    zonal)
      if [ "$IS_RAPID" != yes ]; then echo "ERROR: zonal run requires a RAPID $KIND bucket (gs://$BUCKET)."; exit 1; fi
      if ! echo "$JSON" | grep -qiF "${_ZONE}"; then echo "ERROR: zonal $KIND bucket gs://$BUCKET is not placed in zone ${_ZONE}."; exit 1; fi ;;
    hns)
      if [ "$IS_RAPID" = yes ]; then echo "ERROR: hns run but $KIND bucket gs://$BUCKET is RAPID/zonal."; exit 1; fi
      if [ "$HNS" != "true" ]; then echo "ERROR: hns run requires an HNS-enabled $KIND bucket (gs://$BUCKET)."; exit 1; fi ;;
  esac
  echo "OK: $KIND bucket gs://$BUCKET matches ${_BUCKET_TYPE} in $LOC."
}
  # gcloud's JSON key for HNS has varied across CLI versions
  # (hierarchicalNamespace vs hierarchical_namespace); accept either so a real
  # HNS bucket is never silently read as non-HNS (which would reject hns runs
  # and, worse, bypass the regional-run "no HNS" guard).
  HNS=$(echo "$JSON" | python3 -c "import sys,json;d=json.load(sys.stdin);hns=d.get('hierarchicalNamespace') or d.get('hierarchical_namespace');enabled=hns.get('enabled',False) if isinstance(hns,dict) else bool(hns) if isinstance(hns,bool) else False;print(str(enabled).lower())")

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.68%. Comparing base (7e658e0) to head (ad594ff).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #942   +/-   ##
=======================================
  Coverage   89.68%   89.68%           
=======================================
  Files          16       16           
  Lines        3579     3579           
=======================================
  Hits         3210     3210           
  Misses        369      369           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zhixiangli zhixiangli changed the title Fix macrobench bugs fix(macrobenchmarks): remove HNS validation and external model bucket check Jun 30, 2026
@zhixiangli zhixiangli merged commit 8f47e73 into fsspec:main Jun 30, 2026
10 checks passed
@zhixiangli zhixiangli deleted the fix-macrobench-bugs branch June 30, 2026 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants