Skip to content

add(eco): new model and transformation rules#521

Open
zubeydecivelek wants to merge 1 commit into
CERNDocumentServer:masterfrom
zubeydecivelek:eco
Open

add(eco): new model and transformation rules#521
zubeydecivelek wants to merge 1 commit into
CERNDocumentServer:masterfrom
zubeydecivelek:eco

Conversation

@zubeydecivelek
Copy link
Copy Markdown
Contributor

@zubeydecivelek zubeydecivelek commented May 5, 2026

DUMPS

# Posters
inveniomigrator dump records -q '980__a:POSTER -595__a:Press -980:DELETED -980:HIDDEN -980__c:MIGRATED -980__a:DUMMY' --file-prefix eco-posters --chunk-size=1000
# ECO Notes
inveniomigrator dump records -q '980:NOTE 710__5:IR -595__a:Press -980:DELETED -980:HIDDEN -980__c:MIGRATED -980__a:DUMMY' --file-prefix eco-notes --chunk-size=1000

Experiment and press brochures excluded since they'll be migrated with experiment collections

# Experiment Brochures
inveniomigrator dump records -q '980__a:BROCHURE 690C_a:CERNEXPERIMENTBROCHURE -595__a:Press -980:DELETED -980:HIDDEN -980__c:MIGRATED -980__a:DUMMY' --file-prefix eco-experiment-brochures --chunk-size=1000

inveniomigrator dump records -q '980__a:CMSOUTREACH 6531_a:Brochure -595__a:Press -980:DELETED -980:HIDDEN -980__c:MIGRATED -980__a:DUMMY' --file-prefix cms-experiment-brochures --chunk-size=1000
# Official Press Brochures
inveniomigrator dump records -q '980__a:BROCHURE 690C_a:CERNOFFICIALPRESSBROCHURE -595__a:Press -980:DELETED -980:HIDDEN -980__c:MIGRATED -980__a:DUMMY' --file-prefix eco-official-press-brochures --chunk-size=1000

@zubeydecivelek zubeydecivelek linked an issue May 5, 2026 that may be closed by this pull request
1 task
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/base.py Outdated
Comment on lines +101 to +105
# TODO: handle photo identifier
if scheme == "phopho":
id_value = StringValue(value.get("a", "")).parse()
new_id = {"scheme": "photo", "identifier": id_value}
raise IgnoreKey("eco_identifiers")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we handle photo identifiers?

example record: https://cds.cern.ch/record/43679

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be linked in related identifiers with Photo resource type

Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py
# n = script catalogued or via submission
if source not in ["n", "h", "m", "r"]:
raise UnexpectedValue(subfield="s", field=key, value=value)
if source not in ["n", "h", "m", "r", "d"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does d mean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

311 record has d in the source field. I checked but couldnt find the meaning of d. Maybe digitized? I'll add a question to curation sheet.
Some example recids: 43247, 43430, 824753, 1221556

},
"paper": {
"relation_type": {"id": "references"},
# TODO: https://cds.cern.ch/record/2948638/export/xm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo left?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i'll remove it. Just to give an example

Comment on lines +42 to +49
model.over("additional_titles", "(^246_[1_])", override=True)(
additional_titles_bulletin
)

model.over("additional_descriptions", "(^500__)")(additional_descriptions)
model.over("additional_descriptions", "(^590__)")(translated_description)
model.over("internal_notes", "^562__")(internal_notes)
model.over("contributors", "^901__")(organisation)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't we using the noes from base here? Why do they need to be imported here? It shoulnd't be necessary

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're not in the base model. And some records missing 245 but they have 246, so we can import from bulletin to use 246 as title if 245 missing. Or if you prefer I can add missing title records to curation list.

Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
return contributor[0]


@model.over("eco_report_number", "(^037__)|(^088__)", override=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, why do you reimplement it ?

Copy link
Copy Markdown
Contributor Author

@zubeydecivelek zubeydecivelek May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to handle emails in 088, since 037 and 088 implemented in the same rule in base, only overriding 088 is not working

scheme = original_scheme.lower()

# TODO: handle photo identifier
if scheme == "phopho":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I am a bit worried about this... why do we get photo identifiers there? is this a relation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
"PRIVATLAS",
"PUBLATLASSLIDE",
"POSTER",
"PREPRINT",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have example preprints? it is quite unlikely we have research content in this data set

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's only one record with preprint:
https://cds.cern.ch/record/2675049/export/xm

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be checked by the curators if it is really preprint. If not, the tag should be removed both t=from the record and from the code here

Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py
Comment thread cds_migrator_kit/rdm/records/transform/xml_processing/rules/eco.py Outdated
value = dict(value)
affiliation = value.get("u", "").strip()
# Some records have "-" as affiliation: 1614471, 1953712
if affiliation and affiliation == "-":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shold remove these values from MARC instead of reimplementing the function

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt it faster/easier to ignore these values during migration instead of fixing the MARC?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that we can't handle all of the corner cases because it makes the code less readable and possibility of a mistake higher, each time we are adding a conditional statement

Copy link
Copy Markdown
Contributor Author

@zubeydecivelek zubeydecivelek May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'lll add these records to curation sheet but if it's +100 records isn't it time waste? editing all the records?

Copy link
Copy Markdown
Contributor

@kpsherva kpsherva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some places for improvement. Overall, I think you should try avoid reimplementing existing code whenever possible

@zubeydecivelek zubeydecivelek force-pushed the eco branch 5 times, most recently from bd0bdc3 to 6ffca0f Compare May 15, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Q2 simple collections to analyse

2 participants