Merge option create#86
Conversation
84b6938 to
4da3197
Compare
… if more sections are present
|
What it does: The method create_archive is usually called in the parsers. This PR is therefore relevant when a file with the same name is uploaded again and when reprocessing uploads. In contrast to the already existing overwrite option where the old archive is replaced by the new one, the merge works a little differently. Even if the term merge may seem a little misleading, this corresponds to the logic that we often use in the normalizers. There we have the logic: if quantity not None then set the quantity as follows. This means that overwriting previous values is also not possible, only setting new quantities. At the moment I can't think of the use case where you don't want to overwrite but add all the new things but not having the normalizer. |
|
|
||
| if not archive.m_context.raw_path_exists(file_name) or overwrite: | ||
| if not archive.m_context.raw_path_exists(file_name) or overwrite or merge: | ||
| if merge and file_name.endswith('.json'): |
There was a problem hiding this comment.
I guess this could be even more specific with .archive.json instead of only .json ? Just in case one day some group wants to upload json data
There was a problem hiding this comment.
yes that is true, i also thought checking if the file exists also, so or (merge and archive.m_context.raw_path_exists(file_name)) and file_name.endswith('archive.json') even
| import json | ||
|
|
||
| if not archive.m_context.raw_path_exists(file_name) or overwrite: | ||
| if not archive.m_context.raw_path_exists(file_name) or overwrite or merge: |
There was a problem hiding this comment.
merge only seem to work if archive.m_context.raw_path_exists(file_name) exists. Maybe we want to check for that.
Because in case of merge==True and (not archive.m_context.raw_path_exists(file_name))==True I get an error.
There was a problem hiding this comment.
yes, i also wrote sth above
|
Ok it seems like everything that is parsed from a file gets replaced when uploading a new file and reprocess. Other quantities that can be set manually are not replaced. But quantities do not get removed. And the behavior is slightly different when clicking the save button in between or not. Then a different "new_entry" is used compared to just reprocessing. That's not intuitive for me. |
|
so the background for this is the following. right now we usually do the parsing in the normalizer of an entry. for the huge excel table in in the solar cell field, we actually create all the process entries in the parsing, so the create_archive function is not just getting an entry with sample_id and file name, but also filled out all properties. so the excel table creates 30-40 archives and all of the process dont have a data file reference. Now imagine that someone adds a column which is not picked up, then i need to have process that this is added un reprocessing. actually writing this i realize that the hole parsing here is completely different than the one where there is a one to one correspondents between entry and measurement file. the actual issue was, that there was a typo in a column header of the excel and i fixed the parsing. but reprocessing did not update the archive, since it already existed. and overwriting might destroy manual changed data. i am really not sure how to deal with this. |
|
Ok I see. I did not test this "one file to many archives" option but I understand the idea now. Actually this is quite a typical thing when we do the stuff with the excels. I think Christina also replaced a lot of the already uploaded data by hand to reprocess with the new parsing. But I am also not sure how to deal with that. Overwriting is really a tricky thing to do automatically since NOMAD does not provide a real history of changes. And things could get lost really uncontrollable. |
|
yes this is why i never overwrite only add |
|
When uploading a file with an already existing name this code somehow overwrites all quantities that are different from the existing archive without even clicking the reprocess button. |
Hey Carla, can you review this PR and understand what it does and maybe also test it?
I am not yet 100% sure if this is the way to go, let me know what you think!
Best
Micha