Release v1.4 — MCP server, Define-JSON, CRF specializations, and full study entity coverage by pendingintent · Pull Request #254 · pendingintent/soa-workbench

pendingintent · 2026-06-22T18:28:54Z

What

This release expands the SoA Workbench from a visit/activity scheduling tool into a full USDM study definition platform. v1.4 ships an MCP server for agent-driven workflows, a complete Define-JSON generator, CRF specialization assignment, and twelve new study entity domains (organizations, roles, persons, estimands, indications, study interventions, titles, identifiers, amendments extensions, geographic scopes, BC categories, and SoA bundle import/export).

Why

The workbench previously covered the Schedule of Activities core (visits, activities, arms, epochs) but lacked the surrounding USDM study metadata needed to produce a submission-ready USDM package or a Define-XML/JSON artifact. v1.4 closes those gaps and adds an MCP server so AI agents can drive the workbench programmatically.

How

MCP server (src/soa_builder/mcp/server.py): 11 tools covering SoA CRUD, visit/activity management, matrix retrieval, and USDM/Define-JSON export — enables Claude and other agents to interact with the workbench via the Model Context Protocol.
Define-JSON generator (src/usdm/create_define_json.py, generate_define_json.py): Produces CDISC Define-JSON v2.1 from the workbench database, including concepts and conceptProperties; documented in help/DEFINE_JSON_GENERATOR_INTEGRATION.md.
CRF specializations (routers/activities.py + templates/crf_*.html): Assigns CRF specializations to activity instances with extensionAttributes in USDM export.
New entity domains — each with router, migration, USDM generator, templates, and full test coverage: Organizations, Persons, Roles, Estimands, Indications, Study Interventions, Study Titles, Study Identifiers, BC Categories.
Amendments extended: Enrollments, geographic scopes (regions + country codes), and governance dates added as sub-entities.
SoA bundle (routers/soa_bundle.py): Export/import of a complete SoA as a portable JSON bundle.
HTML matrix export (templates/soa_matrix_export.html): Standalone HTML page of the SoA matrix for easy web deployment.
Audit trail (audit.py): Centralised before/after audit logging extracted from inline router code.
USDM systemVersion: Stamped from the current git branch name; release-v-* prefix normalized automatically.
BCP response code cleanup (scripts/cleanup_bcp_response_codes.py): Fixes stale and mismatched BiomedicalConceptProperty response codes across all SoAs.
config.env: Centralised environment variable management for deployment.

Changes

MCP: New src/soa_builder/mcp/ package — server.py with 11 tools; pyproject.toml entry point added
Define-JSON: src/usdm/create_define_json.py (4 501 lines), generate_define_json.py, UI at templates/define_json.html
New routers: bc_categories, estimands, indications, organizations, persons, roles, soa_bundle, study_identifiers, study_interventions, study_titles
CRF specializations: templates/crf_cell.html, crf_specialization_detail.html, crf_specializations.html
Amendments: Extended templates for enrollments, geographic scopes, governance dates
USDM generators: New generate_estimands, generate_indications, generate_organizations, generate_roles, generate_study_identifiers, generate_study_interventions, generate_study_titles, generate_bc_categories
Audit: web/audit.py (319 lines) extracted from inline router code
Migrations: migrate_database.py extended with all new entity tables
Docs/help: Moved docs → help; added ALIGNMENT_EAGER_BCP_VS_DEFINE_JSON, BIOMEDICAL_CONCEPT_PROPERTY_EAGER_POPULATION, DEFINE_JSON_GENERATOR_INTEGRATION, DIFF_REPORT_ALL_USDM_ENTITIES, ORGANIZATIONS guides
Output JSON: Updated USDM and Define-JSON snapshots for H2Q-MC-LZZT and NCT01797120
CI: Removed release-* branch match from azure-deploy.yml (deploy only from master)

Testing

Full test suite passes — 18 new test files added covering all new routers and generators
MCP server tested end-to-end (tests/test_mcp_server.py, 295 lines)
Define-JSON generation tested with concept/conceptProperty population (tests/test_define_json_concepts.py, test_define_json_generator.py)
BCP response code cleanup script tested (tests/test_bcp_response_code_cleanup.py)
CRF specialization CRUD and USDM export tested (tests/test_routers_activities_crf.py)
SoA bundle export/import round-trip tested (tests/test_routers_soa_bundle.py)
All new entity routers have dedicated test files

Notes for reviewers

The src/usdm/create_define_json.py file is large (4 500+ lines) — it is a self-contained generator and can be skimmed structurally rather than line-by-line.
migrate_database.py is the source of truth for all schema changes; review the new ALTER TABLE / CREATE TABLE blocks there.
Subject data CSV files (files/subject_data/NCT01797120/) were removed — they are no longer needed in the repo.
The large files/D1_Master Protocol…pdf (38 MB) was added to files/ as reference material; confirm this is intentional before merge if storage is a concern.

…rkbench

Extended the diff report to cover all entity classes for USDM created in the SOA Workbench

Added an HTML page extract of the SOA Matrix to allow users to easily review and deploy to web servers.

…igned DSS to a BiomedicalConcept

Added ability to delete an SOA with double confirmation

Adds 0..N organizations per SOA with name, label, identifier, identifierScheme, type (DDF CT C215480), and optional legalAddress (text, lines, city, district, state, postalCode, country via ISO 3166). - DB: organization + organization_audit tables with migrations - Router: JSON CRUD + HTMX add/delete endpoints - USDM: generate_organizations.py → Organization-Output + Address-Output - UI: Organizations study-meta-card below Study Metadata on edit page - Tests: 9 new tests (434 total, 0 failures) - Resolved merge conflicts with pendingintent-add-titles changes

Replaces hardcoded "1.0.0" with the active branch from `git rev-parse --abbrev-ref HEAD` so each export is traceable to the branch that generated it. Falls back to "unknown" if git is unavailable.

Strips the "release-v-" prefix and ensures three dot-separated components (e.g. release-v-1.4 -> 1.4.0, release-v-1.4.1 -> 1.4.1). Non-release branches remain unchanged.

_populate_bcp_locked always deleted existing BCP/alias_code/code rows before re-inserting, but never guarded against the case where the replacement data was empty (API failure or timeout). If _get_biomedical_concept_data returned {} the delete committed with zero inserts, permanently destroying BCP rows until the next successful backfill. Added _has_insertable_data() which validates that at least one BCP would actually be written before the delete proceeds. If no data is available the function logs a warning and returns early, preserving the existing rows unchanged.

Root cause: 189 biomedical_concept rows had alias_code entries whose standard_code (code row) had been deleted by a prior migration. The INNER JOIN on alias_code/code in build_usdm_biomedical_concepts silently excluded every BC with a broken chain — 128 were referenced in activity biomedicalConceptIds but absent from the biomedicalConcepts array. Two fixes: 1. _migrate_repair_broken_bc_code_chains: at startup, finds BCs whose alias_code exists but code row is missing, re-creates the code and alias_code rows from activity_concept.concept_code, and repoints biomedical_concept.code to the new valid alias. All 168 affected BCs are repaired on next server start. 2. build_usdm_biomedical_concepts: change INNER JOIN to LEFT JOIN on alias_code/code so any future broken chains do not silently drop BCs from output. Missing code info emits empty strings rather than excluding the BC entirely.

…nedPersons.organizationIds

…ly runs if no properties exist in database

…ric values for type if missing for roles and orgs

…nch version used to generate the USDM JSON

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Prevents scripts/api_test.py from being picked up by pytest during pre-commit hooks.

- freezes.py: replace str(exc) in 3 error responses with generic messages; log details server-side only - footnotes.py: cast soa_id to int in all redirect URLs (open redirect) - bc_surrogates.py: cast soa_id to int in all redirect URLs (open redirect) - app.py: remove resp.text snippets from logs and cache; suppress exception details from status API endpoints and UI responses

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

…posure - app.py: remove user-controlled 'category' parameter from all log calls in fetch_biomedical_concepts_by_category to satisfy py/clear-text-logging-sensitive-data (5 alerts) - activities.py: replace href.startswith() SSRF guard with explicit urlparse netloc comparison so CodeQL recognises the host allowlist (py/full-ssrf, 2 alerts) - amendments.py: replace html.escape(str(exc)) in governance date handler with logger.exception + generic message (py/stack-trace-exposure, 1 alert)

…xposure - activities.py: reconstruct request URL from trusted _p.scheme/_p.netloc + user-provided path/query via urlunparse, so CodeQL can verify the host is never user-controlled (py/full-ssrf, 2 locations) - amendments.py: fix missed html.escape(str(exc)) in geographic scope handler — replace with logger.exception + generic message (py/stack-trace-exposure, line 2222)

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

- SDTM path: prefer DSS variable's dataType over DEC's generic dataType (e.g. a float result in a specialization is more specific than the DEC's generic 'string'); matches Dave Iberson-Hurst's cdisc_bc_library approach - Add 7 validation tests: - exclusion list exactly matches Dave's _process_property reference - all excluded SDTM suffixes rejected by _include_property - all data-carrying variables pass the filter - every required USDM BCP attribute is present and correctly typed - isEnabled is always True - mandatoryValue: false maps to isRequired: False - DSS dataType takes priority over DEC dataType Closes #218

MD5 is flagged by CodeQL (py/weak-sensitive-data-hashing) because the content passed to this function may include clinical trial data. The function only needs deterministic, stable OIDs for Define-XML — no cryptographic security requirement — but SHA-256 satisfies both the security scanner and the use case with identical interface.

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

pendingintent added 30 commits May 28, 2026 07:42

Added enrollments, geographicScipes, dateValue to amendments

9d32942

Aligned UI partial HTMX

b4d7e86

Geographic scopes now include Region and Country Codes

0bcb8d1

Extended diff report to cover all USDM entities created in the SOA Wo…

8ba0469

…rkbench

Added HTML report export for the SOA Matrix

4fb277d

Merge branch 'pendingintent-extend-diff-report' into release-v-1.4

a980bd1

Extended the diff report to cover all entity classes for USDM created in the SOA Workbench

Merge branch 'pendingintent-add-html-matrix' into release-v-1.4

0b75080

Added an HTML page extract of the SOA Matrix to allow users to easily review and deploy to web servers.

Added TDD Viewer to display TDD in UI

1a23226

Renamed the function and added to generate extensionAttribute for ass…

5f5ab3a

…igned DSS to a BiomedicalConcept

Added SOA delete functionality

4393283

Added extensionAttribute to BiomedicalConcept to support DSS

85b70b7

Merge branch 'pendingintent-delete-soa-function' into release-v-1.4

d390579

Added ability to delete an SOA with double confirmation

Added versions.titles

fe391ef

Added organizations feature to the Workbench and USDM generator

ad528a8

Set USDM systemVersion to current git branch name

3444e57

Replaces hardcoded "1.0.0" with the active branch from `git rev-parse --abbrev-ref HEAD` so each export is traceable to the branch that generated it. Falls back to "unknown" if git is unavailable.

Normalize release-v- branch prefix in USDM systemVersion

87a8656

Strips the "release-v-" prefix and ensures three dot-separated components (e.g. release-v-1.4 -> 1.4.0, release-v-1.4.1 -> 1.4.1). Non-release branches remain unchanged.

Added export/import soa functionality

9b146e2

Added roles to study design and presented on the edit ui page

d850869

Added studyInterventions to study design

9b5049c

Added estimands

167197f

Added indications to the estimand pages

872e087

Added PersonName and AssignedPerson

12be7f6

USDM generator will now populate roles.organizationIds or roles.assig…

3782ecc

…nedPersons.organizationIds

Fixed truncation of bc properties at startup and population script on…

7414f0d

…ly runs if no properties exist in database

Fixed USDM study entity format; added rules for roles attribute format

c2357b5

Fixed USDM generation issues - codeSystem values and fallback to gene…

aa93318

…ric values for type if missing for roles and orgs

Added datetime suffix to systemVersion to help identify source workbe…

316dffa

…nch version used to generate the USDM JSON

pendingintent added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 22, 2026

pendingintent added this to SOA Workbench version 1.4 Jun 22, 2026

github-project-automation Bot moved this to Todo in SOA Workbench version 1.4 Jun 22, 2026

pendingintent added this to the version 1.4 milestone Jun 22, 2026

Copilot AI reviewed Jun 22, 2026

Copilot AI review requested due to automatic review settings June 22, 2026 18:54

Copilot started reviewing on behalf of pendingintent June 22, 2026 18:54 View session

Copilot AI reviewed Jun 22, 2026

pendingintent added 5 commits June 22, 2026 15:06

Removed release-* branch statement from the deployments.

abba38e

Addressed CodeQL reports

e02f9e2

Removed release-* branch statement from the deployments.

b90e577

fix(test): restrict pytest collection to tests/ directory

f362f57

Prevents scripts/api_test.py from being picked up by pytest during pre-commit hooks.

Delete files/D1_Master Protocol 2022-501050-11 redacted.pdf

0e26f12

pendingintent force-pushed the release-v-1.4 branch from 7dbb643 to 0e26f12 Compare June 22, 2026 19:10

Copilot AI review requested due to automatic review settings June 22, 2026 19:26

Copilot started reviewing on behalf of pendingintent June 22, 2026 19:26 View session

Copilot AI reviewed Jun 22, 2026

pendingintent added 2 commits June 22, 2026 15:43

Copilot AI review requested due to automatic review settings June 22, 2026 19:56

Copilot started reviewing on behalf of pendingintent June 22, 2026 19:57 View session

Copilot AI reviewed Jun 22, 2026

pendingintent added 2 commits June 22, 2026 16:11

Copilot AI review requested due to automatic review settings June 22, 2026 20:20

Copilot started reviewing on behalf of pendingintent June 22, 2026 20:20 View session

Copilot AI reviewed Jun 22, 2026

pendingintent moved this from Todo to In Progress in SOA Workbench version 1.4 Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v1.4 — MCP server, Define-JSON, CRF specializations, and full study entity coverage#254

Release v1.4 — MCP server, Define-JSON, CRF specializations, and full study entity coverage#254
pendingintent wants to merge 67 commits into
masterfrom
release-v-1.4

pendingintent commented Jun 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pendingintent commented Jun 22, 2026

What

Why

How

Changes

Testing

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants