add(eco): new model and transformation rules#521
Conversation
| # TODO: handle photo identifier | ||
| if scheme == "phopho": | ||
| id_value = StringValue(value.get("a", "")).parse() | ||
| new_id = {"scheme": "photo", "identifier": id_value} | ||
| raise IgnoreKey("eco_identifiers") |
There was a problem hiding this comment.
How should we handle photo identifiers?
example record: https://cds.cern.ch/record/43679
There was a problem hiding this comment.
it should be linked in related identifiers with Photo resource type
| # n = script catalogued or via submission | ||
| if source not in ["n", "h", "m", "r"]: | ||
| raise UnexpectedValue(subfield="s", field=key, value=value) | ||
| if source not in ["n", "h", "m", "r", "d"]: |
There was a problem hiding this comment.
311 record has d in the source field. I checked but couldnt find the meaning of d. Maybe digitized? I'll add a question to curation sheet.
Some example recids: 43247, 43430, 824753, 1221556
| }, | ||
| "paper": { | ||
| "relation_type": {"id": "references"}, | ||
| # TODO: https://cds.cern.ch/record/2948638/export/xm |
There was a problem hiding this comment.
yes, i'll remove it. Just to give an example
| model.over("additional_titles", "(^246_[1_])", override=True)( | ||
| additional_titles_bulletin | ||
| ) | ||
|
|
||
| model.over("additional_descriptions", "(^500__)")(additional_descriptions) | ||
| model.over("additional_descriptions", "(^590__)")(translated_description) | ||
| model.over("internal_notes", "^562__")(internal_notes) | ||
| model.over("contributors", "^901__")(organisation) |
There was a problem hiding this comment.
why aren't we using the noes from base here? Why do they need to be imported here? It shoulnd't be necessary
There was a problem hiding this comment.
they're not in the base model. And some records missing 245 but they have 246, so we can import from bulletin to use 246 as title if 245 missing. Or if you prefer I can add missing title records to curation list.
| return contributor[0] | ||
|
|
||
|
|
||
| @model.over("eco_report_number", "(^037__)|(^088__)", override=True) |
There was a problem hiding this comment.
same question, why do you reimplement it ?
There was a problem hiding this comment.
to handle emails in 088, since 037 and 088 implemented in the same rule in base, only overriding 088 is not working
| scheme = original_scheme.lower() | ||
|
|
||
| # TODO: handle photo identifier | ||
| if scheme == "phopho": |
There was a problem hiding this comment.
hmm I am a bit worried about this... why do we get photo identifiers there? is this a relation?
There was a problem hiding this comment.
| "PRIVATLAS", | ||
| "PUBLATLASSLIDE", | ||
| "POSTER", | ||
| "PREPRINT", |
There was a problem hiding this comment.
do you have example preprints? it is quite unlikely we have research content in this data set
There was a problem hiding this comment.
there's only one record with preprint:
https://cds.cern.ch/record/2675049/export/xm
There was a problem hiding this comment.
it should be checked by the curators if it is really preprint. If not, the tag should be removed both t=from the record and from the code here
| value = dict(value) | ||
| affiliation = value.get("u", "").strip() | ||
| # Some records have "-" as affiliation: 1614471, 1953712 | ||
| if affiliation and affiliation == "-": |
There was a problem hiding this comment.
we shold remove these values from MARC instead of reimplementing the function
There was a problem hiding this comment.
Isnt it faster/easier to ignore these values during migration instead of fixing the MARC?
There was a problem hiding this comment.
the problem is that we can't handle all of the corner cases because it makes the code less readable and possibility of a mistake higher, each time we are adding a conditional statement
There was a problem hiding this comment.
i'lll add these records to curation sheet but if it's +100 records isn't it time waste? editing all the records?
kpsherva
left a comment
There was a problem hiding this comment.
there are some places for improvement. Overall, I think you should try avoid reimplementing existing code whenever possible
bd0bdc3 to
6ffca0f
Compare
DUMPS
Experiment and press brochures excluded since they'll be migrated with experiment collections