Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 24 additions & 3 deletions .github/workflows/sitemap_resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,27 @@ jobs:
echo "url-count = ${{ steps.sitemap_all.outputs.url-count }}"
echo "excluded-count = ${{ steps.sitemap_all.outputs.excluded-count }}"

- name: Generate sitemap for just AI Generated JSON-LD resources
- name: Check optional JSON-LD subfolders
id: folders
run: |
echo "earthface=$([ -d data/objects/summoned/earthface ] && echo true || echo false)" >> $GITHUB_OUTPUT
echo "generated=$([ -d data/objects/summoned/generated ] && echo true || echo false)" >> $GITHUB_OUTPUT

- name: Generate sitemap for earthface JSON-LD resources
if: steps.folders.outputs.earthface == 'true'
id: sitemap_earthface
uses: cicirello/generate-sitemap@v1
with:
base-url-path: https://raw.githubusercontent.com/earthcube/communityCollections/refs/heads/${{ github.ref_name }}/data/objects/summoned/earthface
path-to-root: data/objects/summoned/earthface
include-html: false
include-pdf: false
additional-extensions: jsonld json xml
exclude-paths:
.git .github docs scripts crawler prompts .vscode

- name: Generate sitemap for generated JSON-LD resources
if: steps.folders.outputs.generated == 'true'
id: sitemap_generated
uses: cicirello/generate-sitemap@v1
with:
Expand All @@ -58,8 +78,9 @@ jobs:
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
ls -la data/objects/summoned/sitemap.xml data/objects/summoned/generated/sitemap.xml 2>/dev/null || true
git add data/objects/summoned/sitemap.xml data/objects/summoned/generated/sitemap.xml
git add data/objects/summoned/sitemap.xml
[ -f data/objects/summoned/earthface/sitemap.xml ] && git add data/objects/summoned/earthface/sitemap.xml
[ -f data/objects/summoned/generated/sitemap.xml ] && git add data/objects/summoned/generated/sitemap.xml
git status
if ! git diff --staged --quiet; then
git commit -m "chore: update JSON-LD sitemaps"
Expand Down
17 changes: 12 additions & 5 deletions .github/workflows/validate_with_dataset_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ on:
branches-ignore: [ 'gh-pages' ]

jobs:
validate-jsonld-generated:
validate-jsonld-folders:
runs-on: ubuntu-latest
name: Validate generated JSON-LD files
name: Validate earthface and generated JSON-LD files
steps:
- name: Checkout the repo
uses: actions/checkout@v4
Expand All @@ -21,9 +21,16 @@ jobs:
with:
python-version: '3.11'

- name: Find and validate generated JSON-LD files
- name: Validate JSON-LD in earthface and generated folders
run: |
python scripts/validate_jsonld_batch.py data/objects/summoned/generated
for dir in data/objects/summoned/earthface data/objects/summoned/generated; do
if [ -d "$dir" ]; then
echo "Validating $dir..."
python scripts/validate_jsonld_batch.py "$dir"
else
echo "Skipping (folder not present): $dir"
fi
done

validate-jsonld-summoned:
runs-on: ubuntu-latest
Expand All @@ -47,7 +54,7 @@ jobs:
dir_ = Path("data/objects/summoned")
if not dir_.exists():
print("Directory not found, skipping."); sys.exit(0)
files = [f for f in dir_.rglob("*.jsonld") if "generated" not in str(f)]
files = [f for f in dir_.rglob("*.jsonld") if "earthface" not in str(f) and "generated" not in str(f)]
if not files:
print("No JSON-LD files found."); sys.exit(0)
errs = []
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ build/
datasets.csv

# Generated JSON-LD files
#data/objects/summoned/earthface/
#data/objects/summoned/generated/
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@

Documentation, files and code related to the exposure of resource on the web for indexing.

Sitemaps (generated by [sitemap_resources.yaml](.github/workflows/sitemap_resources.yaml) on push to `master` / `main` / feature branches):
Sitemaps (generated by [sitemap_resources.yaml](.github/workflows/sitemap_resources.yaml) on push to `master` / `main`):

- **All JSON-LD under data/objects/summoned:** [GitHub Pages](https://earthcube.github.io/communityCollections/data/objects/summoned/sitemap.xml) · [Raw (e.g. master)](https://raw.githubusercontent.com/earthcube/communityCollections/master/data/objects/summoned/sitemap.xml)
- **AI-generated JSON-LD only:** [GitHub Pages](https://earthcube.github.io/communityCollections/data/objects/summoned/generated/sitemap.xml) · [Raw (e.g. master)](https://raw.githubusercontent.com/earthcube/communityCollections/master/data/objects/summoned/generated/sitemap.xml)
- **Earthface JSON-LD only:** [GitHub Pages](https://earthcube.github.io/communityCollections/data/objects/summoned/earthface/sitemap.xml) · [Raw (e.g. master)](https://raw.githubusercontent.com/earthcube/communityCollections/master/data/objects/summoned/earthface/sitemap.xml)
- **Generated JSON-LD only** (if present): [GitHub Pages](https://earthcube.github.io/communityCollections/data/objects/summoned/generated/sitemap.xml) · [Raw (e.g. master)](https://raw.githubusercontent.com/earthcube/communityCollections/master/data/objects/summoned/generated/sitemap.xml)
2 changes: 1 addition & 1 deletion docs/jsonld-validation-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ Validate generated JSON-LD against the authoritative dataset webpage, linked dow

## Validation

- Run `python3 scripts/validate_jsonld_batch.py data/objects/summoned/generated`.
- Run `python3 scripts/validate_jsonld_batch.py data/objects/summoned/earthface` (and/or `.../generated` when that folder exists).
- Run `git diff --check`.
- Review `git diff` for metadata-only changes and confirm no unrelated files are modified.
2 changes: 1 addition & 1 deletion scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ python scripts/generate_jsonld.py --ai-service gemini --csv datasets.csv

### Options
- `--csv`: Path to CSV file (default: `datasets.csv`)
- `--output-dir`: Output directory for JSON-LD files (default: `data/objects/summoned/generated`)
- `--output-dir`: Output directory for JSON-LD files (default: `data/objects/summoned/earthface`)
- `--ai-service`: Choose `gemini` (default), `nrp`, `openai`, or `anthropic` (optional - defaults to `gemini`)
- `--api-key`: API key (or use environment variable)
- `--model`: Model name (optional, uses defaults)
Expand Down
2 changes: 1 addition & 1 deletion scripts/generate_jsonld.py
Original file line number Diff line number Diff line change
Expand Up @@ -994,7 +994,7 @@ def save_jsonld(jsonld_str: str, output_dir: Path, dataset_name: str, url: str)
def main():
parser = argparse.ArgumentParser(description='Generate JSON-LD for datasets')
parser.add_argument('--csv', default='datasets.csv', help='Path to CSV file or URL (e.g. Google Sheets export)')
parser.add_argument('--output-dir', default='data/objects/summoned/generated', help='Output directory for JSON-LD files')
parser.add_argument('--output-dir', default='data/objects/summoned/earthface', help='Output directory for JSON-LD files')
parser.add_argument('--ai-service', choices=['openai', 'anthropic', 'nrp', 'gemini'], default='gemini', help='AI service to use (default: gemini)')
parser.add_argument('--api-key', help='API key (or set environment variable)')
parser.add_argument('--model', help='Model name (optional)')
Expand Down
2 changes: 1 addition & 1 deletion scripts/validate_jsonld_batch.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env python3
"""
Validate JSON-LD files under a directory (e.g. data/objects/summoned/generated).
Validate JSON-LD files under a directory (e.g. data/objects/summoned/earthface).
Checks: valid JSON, @context, @type, name; spatialCoverage box format; distribution encodingFormat as array.
WebPage and DataCatalog are accepted with a warning (expected Dataset for dataset files).
Exits 0 if all pass, 1 if any file fails.
Expand Down
Loading