Skip to content

Add acled-sub-saharan-africa-events: ACLED Sub-Saharan Africa event-level conflict data#119

Open
jonaraphael wants to merge 1 commit into
repo-hardening-ci-publish-schemafrom
add-acled-sub-saharan-africa-events
Open

Add acled-sub-saharan-africa-events: ACLED Sub-Saharan Africa event-level conflict data#119
jonaraphael wants to merge 1 commit into
repo-hardening-ci-publish-schemafrom
add-acled-sub-saharan-africa-events

Conversation

@jonaraphael

Copy link
Copy Markdown
Contributor

Summary

Publish new shared dataset acled-sub-saharan-africa-events: ACLED Sub-Saharan Africa Event-Level Conflict Data.

Validation

  • Artifact validation: FGB 47937 MultiPoint WGS84 features with 31 source fields plus feature_id/geometry_hash/properties_hash; PMTiles spec v3 mvt z0-12 metadata-lookup tiles (feature_id only) built via GeoJSONSeq->Tippecanoe v2.79.0 MBTiles->pmtiles convert; release vector contract validation decoded zoom-0 tiles and cross-checked FGB/PMTiles; sidecar 47937 records valid; schema 31 fields valid; manifest valid. check-schema-compatibility blocked by local GDAL 3.6.2 ogrinfo -json limitation (new asset, no prior snapshot).
  • Commands run: uv run ogrinfo -so -al publish/acled-sub-saharan-africa-events.fgb (47937 features, MultiPoint WGS84 EPSG:4326, 34 fields incl feature_id/geometry_hash/properties_hash), head -c 7 publish/acled-sub-saharan-africa-events.pmtiles | xxd (PMTiles magic bytes confirmed), uv run pmtiles verify publish/acled-sub-saharan-africa-events.pmtiles (completed verify, no errors), uv run pmtiles show publish/acled-sub-saharan-africa-events.pmtiles (spec v3, mvt, z0-12, bounds -25.0921,-34.583,72.4234,34.7519, layer acled_sub_saharan_africa_events), PYTHONPATH=. uv run python build/finalize_acled.py (feature_metadata.validate_release_vector_contract with decode_zoom=0 valid; validate_sidecar_records 47937 valid; validate_release_schema 31 fields valid; validate_release_manifest valid), uv run python scripts/dataset_alerts.py check-schema-compatibility --asset-slug acled-sub-saharan-africa-events --dataset-path publish/acled-sub-saharan-africa-events.fgb (FAILED: local GDAL 3.6.2 lacks ogrinfo -json vector support added in GDAL 3.7; known local-toolchain limitation, new asset has no prior schema snapshot)
  • Catalog docs check passed: True
  • Remote object generations are encoded in the publish plan below.

Dataset Admission

  • Intended consumer(s): SkyTruth internal regional conflict, risk, and exposure analyses, End-to-end pipeline validation for the repo-hardening-ci-publish-schema publish workflow changes (PR Add CI quality gates and extract tested publish workflow modules #118)
  • Why this belongs in shared-datasets instead of project storage, scratch storage, or direct upstream access: Provides a reusable, curated private snapshot of ACLED event-level conflict and demonstration records for Sub-Saharan Africa in canonical geospatial formats with the release feature metadata contract, complementing the existing ACLED weekly Admin 1 aggregated assets and the ucdp-ged-events event-level asset. Also serves as the end-to-end test of the current branch publish pipeline.
  • Source, license, and citation status: Source: ACLED (Armed Conflict Location & Event Data) event-level data export for Sub-Saharan Africa; license/terms: ACLED Terms and Conditions; private internal shared-datasets use confirmed by SkyTruth authorization on 2026-05-08 (recorded for the existing ACLED aggregated assets; assumed to cover this event-level export, flagged for reviewer confirmation); citation: Raleigh, C., Linke, A., Hegre, H., & Karlsen, J. (2010). Introducing ACLED: An Armed Conflict Location and Event Dataset. Journal of Peace Research, 47(5), 651-660. ACLED (Armed Conflict Location & Event Data), Sub-Saharan Africa event-level export, accessed 2026-07-02, https://acleddata.com.
  • Named steward: SkyTruth
  • Update expectations: manual
  • Estimated published size in GB, one total across canonical files, companion artifacts, and expected release copies: 0.1
  • Large-data exception, required when the proposed published footprint is >= 10 GB: See contract exception flags in concierge workflow state.
  • Alternatives considered: Project-specific scratch storage, direct ACLED download by each consumer, and keeping the XLSX export only. Shared-datasets is preferred for reviewed private reuse, documented schema, feature identity, metadata sidecars, and catalog discovery.
  • Deprecation or exit policy: Keep dated releases readable; mark deprecated or superseded if ACLED changes terms, schema, or geography, or if a scheduled ingestion replaces this manual snapshot.

Publish Plan

{
  "asset_slug": "acled-sub-saharan-africa-events",
  "breaking_changes": [],
  "promotions": [
    {
      "cache_control": "",
      "compatibility_waiver": null,
      "content_type": "application/octet-stream",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/releases/2026-06-30/acled-sub-saharan-africa-events.fgb",
      "source_generation": "1783047698770778",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.fgb"
    },
    {
      "cache_control": "no-cache, max-age=0, must-revalidate",
      "compatibility_waiver": null,
      "content_type": "application/vnd.pmtiles",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/releases/2026-06-30/acled-sub-saharan-africa-events.pmtiles",
      "source_generation": "1783047702705675",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.pmtiles"
    },
    {
      "cache_control": "",
      "compatibility_waiver": null,
      "content_type": "application/x-ndjson",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/releases/2026-06-30/acled-sub-saharan-africa-events.metadata.ndjson.gz",
      "source_generation": "1783047706387018",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.metadata.ndjson.gz"
    },
    {
      "cache_control": "",
      "compatibility_waiver": null,
      "content_type": "application/json",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/releases/2026-06-30/acled-sub-saharan-africa-events.schema.json",
      "source_generation": "1783047709167017",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.schema.json"
    },
    {
      "cache_control": "",
      "compatibility_waiver": null,
      "content_type": "application/json",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/releases/2026-06-30/acled-sub-saharan-africa-events.manifest.json",
      "source_generation": "1783047711906908",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.manifest.json"
    },
    {
      "cache_control": "",
      "compatibility_waiver": null,
      "content_type": "application/octet-stream",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/latest/acled-sub-saharan-africa-events.fgb",
      "source_generation": "1783047698770778",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.fgb"
    },
    {
      "cache_control": "no-cache, max-age=0, must-revalidate",
      "compatibility_waiver": null,
      "content_type": "application/vnd.pmtiles",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/latest/acled-sub-saharan-africa-events.pmtiles",
      "source_generation": "1783047702705675",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.pmtiles"
    },
    {
      "cache_control": "",
      "compatibility_waiver": null,
      "content_type": "application/x-ndjson",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/latest/acled-sub-saharan-africa-events.metadata.ndjson.gz",
      "source_generation": "1783047706387018",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.metadata.ndjson.gz"
    },
    {
      "cache_control": "",
      "compatibility_waiver": null,
      "content_type": "application/json",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/latest/acled-sub-saharan-africa-events.schema.json",
      "source_generation": "1783047709167017",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.schema.json"
    },
    {
      "cache_control": "",
      "compatibility_waiver": null,
      "content_type": "application/json",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/latest/acled-sub-saharan-africa-events.manifest.json",
      "source_generation": "1783047711906908",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/acled-sub-saharan-africa-events.manifest.json"
    },
    {
      "cache_control": "",
      "compatibility_waiver": null,
      "content_type": "text/markdown",
      "destination_generation": "",
      "destination_uri": "gs://skytruth-shared-datasets-1/400-events-observations/440-field-observations/acled-sub-saharan-africa-events/README.md",
      "source_generation": "1783047714905137",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/README.md"
    },
    {
      "cache_control": "no-cache, max-age=0, must-revalidate",
      "compatibility_waiver": null,
      "content_type": "application/json",
      "destination_generation": "1782963645807307",
      "destination_uri": "gs://skytruth-shared-datasets-1/_catalog/web/catalog.json",
      "source_generation": "1783047717861986",
      "source_uri": "gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/catalog.json"
    }
  ],
  "proposal_id": "add-acled-sub-saharan-africa-events",
  "release_index_asset_slugs": []
}

Repo Changes

  • docs/assets/acled-sub-saharan-africa-events.md (new asset doc with admission evidence, data profile, feature identity, and release metadata contract)
  • catalog/shared-datasets-catalog.csv (regenerated, +1 row)
  • docs/assets/index.md (regenerated, +1 row)

Feature Identity Decision

feature_id strategy: source_field copied from event_id_cnty. Exact full-row profile over all 47,937 rows: 47,937 distinct values, 0 blank, 0 non-URL-safe, no duplicates. Generated-sequence fallback not needed. Maintainer confirmed this choice and chose no autogenerated metadata translations for the first upload.

Source Evidence (not promoted)

  • gs://skytruth-shared-datasets-1/_scratch/pending-publishes/acled-sub-saharan-africa-events/add-acled-sub-saharan-africa-events/source/ACLED_Dataset_Sub-SaharanAfrica.xlsx (generation 1783047721882269, sha256 76271676f26740fad075acc3c0f8d4144a8e41747ca47b296e1679360ca97a36). Staged as review evidence only; XLSX is not an approved canonical format, so the canonical artifacts above were built from it.

Review / Merge Notes

  • Author is jonaraphael; GitHub blocks self-review requests, so this PR records restricted self-acceptance per AGENTS.md instead of a reviewer request.
  • This PR is stacked on repo-hardening-ci-publish-schema (PR Add CI quality gates and extract tested publish workflow modules #118) because it is the end-to-end test of that branch's publish pipeline. It must be retargeted to main (or re-verified) after Add CI quality gates and extract tested publish workflow modules #118 merges; promotion only runs after this PR's plan reaches main.
  • Dataset upload announcement: not sent. Promotion has not run; the upload summary is expected from the approved mutation workflow after merge. No manual Slack alert was sent.
  • License note: ACLED terms are restricted-redistribution; access_tier: private. The SkyTruth ACLED authorization recorded 2026-05-08 for the aggregated assets is assumed to cover this event-level export — reviewer should confirm.
  • check-schema-compatibility could not run locally (GDAL 3.6.2 lacks ogrinfo -json vector support, added in 3.7). New asset slug, so there is no prior schema snapshot; the protected workflow's schema preflight is expected to pass with no waiver.

Pipeline-Test Observations

  1. scripts/vector_asset.py build mis-detects canonical-JSON GeoJSONSeq sources (lines start with {"geometry":... because keys are sorted), so GDAL's plain GeoJSON driver silently read 1 feature. Ingestion jobs avoid this by forcing the GeoJSONSeq: driver prefix; the manual path had to route through a GPKG intermediate. Worth hardening in vector_asset.py.
  2. scripts/catalog_docs.py generate rewraps prose in four unrelated asset docs (drc-cami-mining-cadastre, lsib, world-polygons, wri-forest-atlas-drc-mining-permits) in this working tree. Those files were intentionally left unstaged to keep this PR focused; if the branch CI check fails on doc drift, that is generator/env drift, not this asset.
  3. publishing_concierge.py staged-object validation reports 1-based indexes in error messages, which reads as an off-by-one when debugging evidence lists.

Validation Commands Run

  • uv run ogr2ogr -f GeoJSONSeq ... -dialect sqlite -sql 'SELECT MakePoint(...)' (xlsx -> WGS84 points)
  • PYTHONPATH=. uv run python enrich_acled.py (feature_id/geometry_hash/properties_hash + sidecar + schema via ingestion.common.feature_metadata)
  • uv run python scripts/vector_asset.py build ... --metadata-lookup --pmtiles-detail-hint detailed (FGB + PMTiles, auto maxzoom -> 12)
  • pmtiles verify / pmtiles show / magic-byte check / zoom-0 decode via validate_release_vector_contract
  • uv run python scripts/catalog_docs.py generate && check (25 asset docs current) and export-readmes
  • uv run python scripts/catalog_site.py --out .../catalog-web (catalog.json includes the new asset with metadata_sidecar_schema colorizer)
  • Tooling: GDAL 3.6.2 (/Users/jonathanraphael/miniforge3/bin/ogr2ogr), Tippecanoe v2.79.0 (/usr/local/bin/tippecanoe), pmtiles CLI dev (/usr/local/bin/pmtiles)

New private release-oriented vector asset: ACLED Sub-Saharan Africa
event-level conflict data (47,937 points, events 2025-06-30 through
2026-06-30). feature_id is copied from the verified-unique source field
event_id_cnty. Canonical FGB, metadata-lookup PMTiles, metadata sidecar,
schema, and manifest are staged under _scratch/pending-publishes for
promotion via the reviewed publish plan in the PR. This intake also
serves as the end-to-end test of the publish pipeline on the
repo-hardening-ci-publish-schema branch.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jonaraphael jonaraphael temporarily deployed to shared-datasets-production July 3, 2026 12:10 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant