Backfill missing Wall Street Journal aggregate data#471
Conversation
- Created a command to fix a given orgs monthly aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
jsnshrmn
left a comment
There was a problem hiding this comment.
Approved, pending the linkevents count == 0 check that we discussed to guard against trying to load linkevents that already exist. Great work! Thanks!
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
suecarmol
left a comment
There was a problem hiding this comment.
Thank you for all of your work on this! I have a few nits and changes that need to be made before we can merge this.
extlinks/aggregates/management/commands/reaggregate_link_archives.py
Outdated
Show resolved
Hide resolved
extlinks/aggregates/management/commands/reaggregate_link_archives.py
Outdated
Show resolved
Hide resolved
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
There was a problem hiding this comment.
We need to change the _get_existing_link_aggregates function to accept a string as a parameter so we can check if user aggregates and page project aggregates exist in the object store. Right now, it's only checking for link aggregates. Other than that, it looks good, although I haven't tested the newest iteration of the code because of problems with restoring a backup in my local environment.
jsnshrmn
left a comment
There was a problem hiding this comment.
approved pending addressing @suecarmol's feedback
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
This should be resolved now. I modified the prefix search param to be more generic to include all aggregate types. |
suecarmol
left a comment
There was a problem hiding this comment.
Thank you for this extensive work!


Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
Description
Rationale
Monthly:
python manage.py reaggregate_link_archives --month 202503 --organisation {ORG_ID} --dir "./backup/"Daily:
python manage.py reaggregate_link_archives --day 20250301 --organisation {ORG_ID} --dir "./backup/"Phabricator Ticket
(https://phabricator.wikimedia.org/T404879)
How Has This Been Tested?
Since this is modifying production data, we should make sure to be extra thorough in testing and reviewing it locally.
Screenshots of your changes (if appropriate):
Types of changes
What types of changes does your code introduce? Add an
xin all the boxes that apply: