fix: move runNumber in legacy scientificMetada to top#2609
fix: move runNumber in legacy scientificMetada to top#2609
Conversation
da48c89 to
e9a134d
Compare
e9a134d to
66c6208
Compare
| filter: { _id: dataset._id }, | ||
| update: { | ||
| $set: { "scientificMetadata.runNumber": dataset.runNumber }, | ||
| $unset: {runNumber: ""}, |
There was a problem hiding this comment.
Does this script addresses the following runNumber format?
"runNumber" : {
"value" : NumberInt(482),
"unit" : ""
},
There was a problem hiding this comment.
is anyone doing that? runNumber is simply an integer in the top level schema so I guess noone did that? I think we would have a hard time trying to imagine how everyone could have used runNumber in scientificMetadata, but if you reckon this shape is something they used, I can add to the migration
There was a problem hiding this comment.
also, what should the migration do if there are unit? Simply concat?
There was a problem hiding this comment.
I just checked our DB and some scientificMetadata contains the format above.
This change does not affect our side, as we are going to use high level run number anyways.
But I'm not sure would it affect others or not
|
I am not oppose to this migration script but it can only be optional. Also it is a facility decision if they want to migrate legacy data or not and how to convert it from quantity (if they have it as quantity) to a numeric field. |
the migration will run only if there are datasets with scientificMetadata.runNumber. I am not sure I understand the comment fully, doesn't this apply to all migrations then? This is a change in behaviour between the old backend and the new one, so a migration is needed. For example the FE runNumber in the datasets overview used to rely on scientificMetadata.runNumber and now fetches the top level run number. People will always be able to use scientificMetadata.runNumber in the v3 endpoints and this will move it to the topLevel runNumber. One could argue why this field was moved to a top level field without checking the old behaviour, but that's another question... |
The migration should run only if the facility decided to do so. |
nitrosx
left a comment
There was a problem hiding this comment.
This change, although technical and syntactically correct, needs more community discussion
but why not for all migrations then? Also, noone forces anyone to run migrations, they are not run on startup. We do the same exact thing here Say I am a user and I have a look at the POST datasets DTO. I will see a runNumber field. So I will add my runNumber there. Then all historical metadata used to use scientificMetadata.runNumber. If no migration I will have inconsistent behaviour If you think we should not standardise how to use runNumber as a top level element, I think this will create a lot of confusion for the use, but I can manage that from my side |
|
just to summarise the change, given the latest commit:
since the migration is a source of discussion, I am happy to manage it from my side and not have it in this repo |
dc1f63b to
f233b25
Compare
|
any further reason this is blocked @nitrosx ? |
|
@nitrosx , @Junjiequan any news here? |
|
If this PR is urgent we can dismiss review from @nitrosx , as he will mostly not available untill the week after next week. |
Requested changes has been addressed. I don't see any other blockers. If later we find any blockers we can revert the PR by then
If you are happy with the changes, I would indeed proceed with the merge |
|
I just thought about one potential issues here. if we allow overwritting runNumber from scientificMetadata.runNumber if not exists, what about old datasets? wouldn't it causing incosistent datasets? |
That's exactly why we will run a migration at PSI. I think it's indeed not a trivial matter, because:
I am happy with other solutions but this bug is causing complains from our users. Also, I honestly think asking ourselves "what if the user meant something by leaving runNumber empty and used scientificMetadata.runNumber" is a sensible question, but in reality a bit too theoretic, I believe anyone seeing runNumber would interpret it in the same way as scientificMetadata.runNumber and simply leave one of the 2 out (likely the nested one as it's not in the DTO) Last point, when we moved scientificMetadata.runNumber to runNumber not only we caused this possible inconsistence, but we also introduced a breaking change, as any client would not know anymore what field to consume. |
|
After this change I'm afraid It still remains the root issue that what if user’s intent for scientificMeata.runNumber is not meant to be used for dataset.runNumber. |
|
sorry, I updated the comment here quite a bit |
|
and also, we've done these defaultings a lot of time it's done in publishedData DTOs and job DTOs. But maybe to be pragmatic, what if I make another PR where I expose conditionally runNumber as a top level field? In this way, PSI could opt out from having it as a top level field and keep the old behaviour or I can make the change in this PR configurable via ENV? |
|
no, the main problem will be this
|
|
In this PR I think the output part looks fine to me, my only concern was the create/update dto part which tries to predict/ twist user’s intent that it assumes dataset.scientificMeatadata.runNumber = dataset.runNumber |
|
If I understand the whole situation corretctly, the issue is that legacyusers or services that dont want to use or see the top level runNumber , as it can be confusing which one to use. if that's the case, please go ahead and add env variables within this PR and expose the field conditionally. And we can treat top level runNumber as a new feature in v4 |
I think the change cannot happen only in the get dto though, we have dashboarding tools that query the db directly, I'd like to avoid querying for both fields in each query. But that's valuable feedback, I'll propose something different next week |

Description
Since runNumber was moved from scientificMetadata to a top level dataset property #1485 , this PR adds a migration for legacy data and a fallback when posting or patching v3 datasets which moves scientificMetadata.runNumber to runNumber. This 2nd change is to facilitate customers' adoption as they would need to change their ingestions.
Closes #2592
Tests included
Documentation
official documentation info