Cross Reference Feature and basic mappings#876
Cross Reference Feature and basic mappings#876augustjohnson wants to merge 37 commits intostagingfrom
Conversation
…tion and indexing. Update export command logic to simplify model checks.
…ferences for various game elements, enhancing data structure and accessibility.
…sts to verify crossreference inclusion/exclusion for various models.
…alizers - Introduced `get_source_url` utility to generate API URLs for source objects. - Enhanced `CrossReference` model with `reference_api_url` and `source_api_url` methods. - Updated `HasDescription` abstract model to include `get_crossreferences_to_from` method for structured crossreference data. - Modified `GameContentSerializer` to utilize the new crossreference structure. - Updated tests to verify crossreference inclusion and exclusion across various models.
- Introduced `_get_basename_for_object` and `_build_object_url` utility functions to streamline URL generation for reference and source objects. - Updated `get_reference_url` and `get_source_url` to utilize the new utilities, improving code clarity and maintainability. - Enhanced handling for `Item` model to differentiate between magic items and regular items in URL generation.
- Added entries for `references_report.json` and `sources_report.json` to the .gitignore file to prevent tracking of temporary report files.
- Updated URL generation logic in `url_utils.py` to resolve nested models correctly, particularly for models with "Description" in their name. - Introduced `_resolve_to_parent_if_description` function to ensure URLs point to parent objects when applicable. - Modified `REFERENCE_MODEL_TO_BASENAME` mappings for improved consistency in URL generation. - Enhanced cross-reference identification logic in `core.py` to prevent duplicate references from the same source. - Updated `REFERENCE_MODEL_NAMES` to exclude models with "Description" in their name, streamlining reference handling.
…key adjustments - Added multiple new cross-reference entries related to "Proficiency Bonus" and "Resistance" for various game elements. - Updated primary keys for existing entries to ensure uniqueness and consistency in the data structure. - Enhanced the overall cross-reference model to improve data accessibility and integrity.
- Updated primary keys in CrossReference.json to ensure uniqueness for new entries. - Added context suggestions for "acid" damage type and "chain" related to "Chain Lightning" spell in core.py to improve cross-reference functionality. - Enhanced data structure and context handling for better accessibility and accuracy in game mechanics.
…list-item serializers
… for root and list-item serializers
…the correct list.
…t project root, ensuring correct loading regardless of current working directory. Also, add a new entry to the crossref_source_blacklist.txt.
…pdate export functionality to support natural foreign keys.
… and enhance reference matching.
|
@calumbell monster PR incoming! |
There was a problem hiding this comment.
Hey @augustjohnson, nice work getting this PR together. Looking forward to this hitting production!
As previously mentioned, this is a big one. Will likely take me some time to get everything reviewed. But off the bat I spotted that we are experiencing what I suspect are N+1 problems on Views where crossreferences are returned.
I A+B tested a few endpoints after noticing the API on the feature branch was a bit sluggish. I installed the Django Debug Toolbar to help diagnose the issues:
Here is /v2/magicitems on the staging branch. Note the CPU time the request took (512.81ms) and the number of SQL queries made (17)
Compare this with /v2/magicitems on feature branch. Note that the CPU time is more than 2.5x longer (1348.26ms) and the number of SQL queries made has ballooned to 131.
Seems like classic sympthoms of an N+1 problem with the new fields.
I observed similar results on the /spells endpoint, and not on the /creatures endpoint, which doesn't look like it returns the crossreferences field?
Would you be able to investigate? I had previously resolved similar issues by making sure serializers were correctly configured and then leveraging the EagerLoadingMixin on the problematic Views
|
@augustjohnson - auxilary note: do you think I would be a good idea to add Django Debug Toolbar as a development-only dependency to this project? I use it on my local fork and have gotten a lot of value from it. Let me know what you think. If you like the idea i'll open an issue : ) |
|
On it, I'll see what I can find here. I'm not familiar with debug toolbar so I'll have to learn that. |
… now, no longer from.
…es and update related viewsets to reflect changes in crossreference handling.
|
Ok, I've got it to 32 queries, but I'm not yet convinced that's where it needs to be. I'll keep tinkering with it. There's some funky queries in there. Also, I like djang-debug-toolbar, I'd generally agree with adding it. I'm having trouble with dependencies right this minute though, so that may be a different PR. Spells should be ok, and yes, it should return crossreferences. |
|
@augustjohnson sounds like progress! Ping me when you are ready for another review. Yeah, this isn't the PR to add the django-debug-toolbar, i'll open a new issue. Forgot that |
|
Ok, I've got good news and bad news.
Bad news.
|
But TBH as long as you are happy with the schema, we can sweat the specifics of the matching algorithm in a follow up PR. |
…rgs for URL resolution in item serializers.
|
Ok, I believe this is ready again for review @calumbell |
calumbell
left a comment
There was a problem hiding this comment.
Hey @augustjohnson, thanks for making those changes. Looks like the N+1 errors are all fixed. The form of the API response is great, exactly what is needed on the FE.
Only remaining issue is that the matching algorithm is still quite off. The generation of false crossreferences seems very high. Perhaps we could narrow this process further:
- An idea might be to limit the automatic generation of crossreferences to between certain endpoints to solve common use-cases:
a. CreatureAction -> Spell (this is to handle spells imbedded in statblocks)
b. Spell -> Creature (to handle spells that summon/conjure specific creatures)
c. MagicItem -> Spell (to handle magic items that can be used to cast spells)
d. Spell -> Spell (some Spells mention other spells, ie.srd-2024_contingencymentions Water Breathing) - Is it possible to only match a Spell when its name is written in Title Case (as it usually is when it is mentioned elsewhere).
- If the above is not possible, could we add Command to the blacklist?
Honestly, i'd be happy if we just striped the crossreference generator out for now, added a few example crossreferences in to help support getting this all hooked up on the frontend, and handling the crossreference generator in its own PR.
|
Understood and agreed, I will do a data-lite PR. Just as a note, the method I took on matching quality was roughly this:
As I was going through that validation loop, I encountered a ton of references which needed to be blacklisted (best just to never reference this), and a bunch of special cases, which I wrote some code for ("Light" for example). Those were the two tools I used to try and resolve. I'd be interested in your specific examples of bad matches, but I'm OK with the data population being pushed downstream. |
|
Cool, here are some specific examples of weird matches I spotted Most common false match:
Less common false matches include
Technically correct matches, but over-keen, occasionally to the point of absurdity
|
This PR adds a new field into any object with a desc field, where it provides a set of cross-references to other objects.
The overall (empty) data structure looks like this, and appears on (almost) anything with a desc.
Within the from and to objects, is approximately this:
The intent for the "to" dataset is for a UI layer to have data that, within the description of the object you are viewing, the word "Pantheons" reasonably means another object in the API, and that object's url.
The intent for the "from" dataset is to provide the UI with a list of "Things that link here".
How I Tested
The script I used to identify and generate crossreferences had a few "top 10" lists which I spot checked for general accuracy. There were various objects that resulted in false positives, like "Fly" the spell. These were added to special conditions (requiring context) or blacklists to prevent them from being linked at all.
There are quite a few items that have been blacklisted, including some standards. These can be modified or adjusted at a later time, but the list that is crossreferenced now is acceptable in my mind.
I also added a few internal tests, as well as ensured that each modified integration test showed the correct result.
I estimate that I checked probably 60 or 70 objects and their types, focused on the most crossreferenced across the two SRDs. They seemed good enough, but probably not perfect.
Follow Ups
I have added #880 to the backlog to provide the data for a few other documents. It will however require a few scripting and code changes.
Review Suggestions
The scripting part is ugly and intended to be pretty temporary and ad-hoc usage.