5983: DCAT-US v1.1 to v3.0 Translation Script#149
Merged
Conversation
jbrown-xentity
approved these changes
Jun 2, 2026
jbrown-xentity
left a comment
Collaborator
There was a problem hiding this comment.
LGTM as a first pass. Tested GSA JSON and https://opendata.hawaii.gov/data.json, gave the expected success and error output.
Having all the translation functions is helpful, thank you!
Comment on lines
+204
to
+219
| def transform_replaces(dataset: dict) -> dict: | ||
| """Normalize the 'replaces' field to conform to the DCAT-US 3.0 schema.""" | ||
| if "replaces" not in dataset: | ||
| return dataset | ||
|
|
||
| value = dataset["replaces"] | ||
| if isinstance(value, list): | ||
| return dataset | ||
|
|
||
| new_dataset = copy.deepcopy(dataset) | ||
| del new_dataset["replaces"] | ||
|
|
||
| if isinstance(value, str) and _is_iri(value): | ||
| new_dataset["relation"] = [value] | ||
|
|
||
| return new_dataset |
Collaborator
There was a problem hiding this comment.
I don't recognize the field replaces in the schema: https://resources.data.gov/resources/dcat-us/. Is this necessary?
I'm wondering if this is an "open schema" problem, where people were adding fields in 1.1 that are defined and reserved in 3.0?
Comment on lines
+420
to
+430
| def _parse_bbox(value: str) -> tuple[float, float, float, float] | None: | ||
| """Return (minLon, minLat, maxLon, maxLat) if `value` is a comma- | ||
| separated bbox string, otherwise None.""" | ||
| parts = [p.strip() for p in value.split(",")] | ||
| if len(parts) != 4: | ||
| return None | ||
| try: | ||
| nums = tuple(float(p) for p in parts) | ||
| except ValueError: | ||
| return None | ||
| return nums # type: ignore[return-value] |
Collaborator
There was a problem hiding this comment.
Since this is working we don't have to replace this; however a more complete version of this is here: https://github.com/GSA/datagov-harvester/blob/main/harvester/utils/general_utils.py#L885-L955
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
GSA/data.gov#5983
Adds a utility script that converts a valid DCAT-US v1.1 catalog to a valid DCAT-US v3.0 catalog. Catalogs that fail v1.1 validation cannot be converted. The script has been tested against the following valid v1.1 catalogs: