Release v1.4 — MCP server, Define-JSON, CRF specializations, and full study entity coverage#254
Open
pendingintent wants to merge 67 commits into
Open
Release v1.4 — MCP server, Define-JSON, CRF specializations, and full study entity coverage#254pendingintent wants to merge 67 commits into
pendingintent wants to merge 67 commits into
Conversation
Extended the diff report to cover all entity classes for USDM created in the SOA Workbench
Added an HTML page extract of the SOA Matrix to allow users to easily review and deploy to web servers.
…igned DSS to a BiomedicalConcept
Added ability to delete an SOA with double confirmation
Adds 0..N organizations per SOA with name, label, identifier, identifierScheme, type (DDF CT C215480), and optional legalAddress (text, lines, city, district, state, postalCode, country via ISO 3166). - DB: organization + organization_audit tables with migrations - Router: JSON CRUD + HTMX add/delete endpoints - USDM: generate_organizations.py → Organization-Output + Address-Output - UI: Organizations study-meta-card below Study Metadata on edit page - Tests: 9 new tests (434 total, 0 failures) - Resolved merge conflicts with pendingintent-add-titles changes
Replaces hardcoded "1.0.0" with the active branch from `git rev-parse --abbrev-ref HEAD` so each export is traceable to the branch that generated it. Falls back to "unknown" if git is unavailable.
Strips the "release-v-" prefix and ensures three dot-separated components (e.g. release-v-1.4 -> 1.4.0, release-v-1.4.1 -> 1.4.1). Non-release branches remain unchanged.
_populate_bcp_locked always deleted existing BCP/alias_code/code rows
before re-inserting, but never guarded against the case where the
replacement data was empty (API failure or timeout). If
_get_biomedical_concept_data returned {} the delete committed with zero
inserts, permanently destroying BCP rows until the next successful
backfill.
Added _has_insertable_data() which validates that at least one BCP
would actually be written before the delete proceeds. If no data is
available the function logs a warning and returns early, preserving
the existing rows unchanged.
Root cause: 189 biomedical_concept rows had alias_code entries whose standard_code (code row) had been deleted by a prior migration. The INNER JOIN on alias_code/code in build_usdm_biomedical_concepts silently excluded every BC with a broken chain — 128 were referenced in activity biomedicalConceptIds but absent from the biomedicalConcepts array. Two fixes: 1. _migrate_repair_broken_bc_code_chains: at startup, finds BCs whose alias_code exists but code row is missing, re-creates the code and alias_code rows from activity_concept.concept_code, and repoints biomedical_concept.code to the new valid alias. All 168 affected BCs are repaired on next server start. 2. build_usdm_biomedical_concepts: change INNER JOIN to LEFT JOIN on alias_code/code so any future broken chains do not silently drop BCs from output. Missing code info emits empty strings rather than excluding the BC entirely.
…nedPersons.organizationIds
…ly runs if no properties exist in database
…ric values for type if missing for roles and orgs
…nch version used to generate the USDM JSON
Prevents scripts/api_test.py from being picked up by pytest during pre-commit hooks.
7dbb643 to
0e26f12
Compare
- freezes.py: replace str(exc) in 3 error responses with generic messages; log details server-side only - footnotes.py: cast soa_id to int in all redirect URLs (open redirect) - bc_surrogates.py: cast soa_id to int in all redirect URLs (open redirect) - app.py: remove resp.text snippets from logs and cache; suppress exception details from status API endpoints and UI responses
…posure - app.py: remove user-controlled 'category' parameter from all log calls in fetch_biomedical_concepts_by_category to satisfy py/clear-text-logging-sensitive-data (5 alerts) - activities.py: replace href.startswith() SSRF guard with explicit urlparse netloc comparison so CodeQL recognises the host allowlist (py/full-ssrf, 2 alerts) - amendments.py: replace html.escape(str(exc)) in governance date handler with logger.exception + generic message (py/stack-trace-exposure, 1 alert)
…xposure - activities.py: reconstruct request URL from trusted _p.scheme/_p.netloc + user-provided path/query via urlunparse, so CodeQL can verify the host is never user-controlled (py/full-ssrf, 2 locations) - amendments.py: fix missed html.escape(str(exc)) in geographic scope handler — replace with logger.exception + generic message (py/stack-trace-exposure, line 2222)
- SDTM path: prefer DSS variable's dataType over DEC's generic dataType (e.g. a float result in a specialization is more specific than the DEC's generic 'string'); matches Dave Iberson-Hurst's cdisc_bc_library approach - Add 7 validation tests: - exclusion list exactly matches Dave's _process_property reference - all excluded SDTM suffixes rejected by _include_property - all data-carrying variables pass the filter - every required USDM BCP attribute is present and correctly typed - isEnabled is always True - mandatoryValue: false maps to isRequired: False - DSS dataType takes priority over DEC dataType Closes #218
MD5 is flagged by CodeQL (py/weak-sensitive-data-hashing) because the content passed to this function may include clinical trial data. The function only needs deterministic, stable OIDs for Define-XML — no cryptographic security requirement — but SHA-256 satisfies both the security scanner and the use case with identical interface.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
This release expands the SoA Workbench from a visit/activity scheduling tool into a full USDM study definition platform. v1.4 ships an MCP server for agent-driven workflows, a complete Define-JSON generator, CRF specialization assignment, and twelve new study entity domains (organizations, roles, persons, estimands, indications, study interventions, titles, identifiers, amendments extensions, geographic scopes, BC categories, and SoA bundle import/export).
Why
The workbench previously covered the Schedule of Activities core (visits, activities, arms, epochs) but lacked the surrounding USDM study metadata needed to produce a submission-ready USDM package or a Define-XML/JSON artifact. v1.4 closes those gaps and adds an MCP server so AI agents can drive the workbench programmatically.
How
src/soa_builder/mcp/server.py): 11 tools covering SoA CRUD, visit/activity management, matrix retrieval, and USDM/Define-JSON export — enables Claude and other agents to interact with the workbench via the Model Context Protocol.src/usdm/create_define_json.py,generate_define_json.py): Produces CDISC Define-JSON v2.1 from the workbench database, including concepts and conceptProperties; documented inhelp/DEFINE_JSON_GENERATOR_INTEGRATION.md.routers/activities.py+templates/crf_*.html): Assigns CRF specializations to activity instances with extensionAttributes in USDM export.routers/soa_bundle.py): Export/import of a complete SoA as a portable JSON bundle.templates/soa_matrix_export.html): Standalone HTML page of the SoA matrix for easy web deployment.audit.py): Centralised before/after audit logging extracted from inline router code.scripts/cleanup_bcp_response_codes.py): Fixes stale and mismatched BiomedicalConceptProperty response codes across all SoAs.Changes
src/soa_builder/mcp/package —server.pywith 11 tools;pyproject.tomlentry point addedsrc/usdm/create_define_json.py(4 501 lines),generate_define_json.py, UI attemplates/define_json.htmlbc_categories,estimands,indications,organizations,persons,roles,soa_bundle,study_identifiers,study_interventions,study_titlestemplates/crf_cell.html,crf_specialization_detail.html,crf_specializations.htmlgenerate_estimands,generate_indications,generate_organizations,generate_roles,generate_study_identifiers,generate_study_interventions,generate_study_titles,generate_bc_categoriesweb/audit.py(319 lines) extracted from inline router codemigrate_database.pyextended with all new entity tablesTesting
tests/test_mcp_server.py, 295 lines)tests/test_define_json_concepts.py,test_define_json_generator.py)tests/test_bcp_response_code_cleanup.py)tests/test_routers_activities_crf.py)tests/test_routers_soa_bundle.py)Notes for reviewers
src/usdm/create_define_json.pyfile is large (4 500+ lines) — it is a self-contained generator and can be skimmed structurally rather than line-by-line.migrate_database.pyis the source of truth for all schema changes; review the newALTER TABLE/CREATE TABLEblocks there.files/subject_data/NCT01797120/) were removed — they are no longer needed in the repo.files/D1_Master Protocol…pdf(38 MB) was added tofiles/as reference material; confirm this is intentional before merge if storage is a concern.