Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 35 additions & 38 deletions packages/ai/src/prompts/extract-labs.ts
Original file line number Diff line number Diff line change
@@ -1,44 +1,41 @@
export const extractLabsPrompt = `You are a medical lab report parser. Given the text of a lab report, extract all test results as structured data.
export const extractLabsPrompt = `You are a medical lab report parser. Extract ALL test results from the given lab report text as structured JSON.

IMPORTANT: Some documents contain results from MULTIPLE dates (e.g., "Result Trends" or longitudinal reports with columns for different dates). In these cases, emit one result object per analyte per date. Each result MUST include the correct "observedAt" date for that specific value, NOT just a single collection date.
CRITICAL RULES:
1. Extract EVERY SINGLE test result — do NOT skip any. Count them. If the document has 40 results, output 40 results.
2. Output analyte names in STANDARD ENGLISH regardless of document language.
3. For non-English documents: translate the analyte name. Examples:
- "Glucoză" → "Glucose", "Insulină" → "Insulin", "Trigliceride" → "Triglycerides"
- "Colesterol total" → "Total Cholesterol", "HDL colesterol" → "HDL Cholesterol"
- "TSH (hormon hipofizar...)" → "TSH", "FT4 (tiroxina liberă)" → "Free T4"
- "Hematii" → "RBC", "Leucocite" → "WBC", "Trombocite" → "Platelets"
- "Fier seric" → "Iron", "Zinc seric" → "Zinc", "Cortizol seric" → "Cortisol"
- "Proteina C reactivă" → "CRP", "Homocisteină" → "Homocysteine"
- "Hemoglobină glicozilată / HbA1c" → "HbA1c"
- "Ac. anti tireoperoxidază (TPO)" → "TPO Antibodies"
4. Include CBC components: Hemoglobin, Hematocrit, RBC, WBC, Platelets, MCV, MCH, MCHC, RDW, and ALL differential counts (Neutrophils, Lymphocytes, Monocytes, Eosinophils, Basophils — both absolute and percentage).
5. Include hormones: TSH, Free T4, Free T3, Total T3, Total T4, Insulin, Cortisol, Testosterone, DHEA-S, Estradiol, etc.
6. Include vitamins/minerals: Vitamin D, Vitamin B12, Iron, Ferritin, Zinc, Magnesium, Calcium, Folate, etc.
7. When duplicate units exist for the same analyte (e.g., mg/dL AND mmol/L), extract ONLY the first/primary unit row.
8. For date: use the RECOLTAT/collection date from the header, not antecedent dates.

For the same analyte, you may see multiple rows with different reference ranges — each row corresponds to a different lab or date. Only emit a result where a value is actually present. Skip rows with no value.
For each result extract:
- analyte: Standard English name
- value: Numeric value (null if non-numeric)
- valueText: Value as written
- unit: Unit of measurement
- referenceRangeLow: Lower bound (numeric, null if not applicable)
- referenceRangeHigh: Upper bound (numeric, null if not applicable)
- referenceRangeText: Range as written
- isAbnormal: true if outside range
- observedAt: Collection date (ISO YYYY-MM-DD)

For each test result, extract:
- analyte: The normalized name of the test (e.g., "ALT" not "ALT (SGPT)", "AST" not "AST (SGOT)"). Drop parenthetical synonyms.
- value: The numeric value (or text if non-numeric like "Negative")
- valueText: The value as written in the document
- unit: The unit of measurement (normalized — e.g., "IU/L" not "U/L")
- referenceRangeLow: Lower bound of normal range (if provided, numeric)
- referenceRangeHigh: Upper bound of normal range (if provided, numeric)
- referenceRangeText: Full reference range text as written
- isAbnormal: Whether the result is flagged as abnormal (H, L, High, Low, or outside range)
- observedAt: The date THIS SPECIFIC result was collected/observed, in ISO format (YYYY-MM-DD). This is critical for multi-date documents — use the column date header, not a single report date.

Respond with a JSON object:
Output JSON:
{
"patientName": "<if visible>",
"collectionDate": "<ISO date of first/primary collection, or null for multi-date>",
"reportDate": "<ISO date>",
"labName": "<laboratory name if visible>",
"results": [
{
"analyte": "Glucose",
"value": 95,
"valueText": "95",
"unit": "mg/dL",
"referenceRangeLow": 70,
"referenceRangeHigh": 100,
"referenceRangeText": "70-100 mg/dL",
"isAbnormal": false,
"observedAt": "2024-06-28"
}
]
"patientName": "...",
"collectionDate": "YYYY-MM-DD",
"reportDate": "YYYY-MM-DD",
"labName": "...",
"results": [...]
}

Rules:
- Extract EVERY test result. Do not skip any.
- If a value is non-numeric (e.g., "Reactive", "Negative"), set value to null and put the text in valueText.
- For multi-date trend reports: emit one result per value cell. A table with 5 date columns and 10 analytes could produce up to 50 results.
- When the same analyte appears on multiple rows (with different reference ranges), use the reference range from that specific row.
- Pay close attention to table structure: values are aligned under date column headers.`;
BEFORE RESPONDING: Scan the entire document and count how many distinct test results exist. Your results array must contain ALL of them. Missing results is a failure.`;
83 changes: 73 additions & 10 deletions packages/database/src/seed/data/metric-definitions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1043,6 +1043,22 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
displayPrecision: 1,
sortOrder: 111,
},
{
id: "homa_ir",
name: "HOMA-IR",
category: "metabolic",
unit: "",
loincCode: null,
snomedCode: null,
aliases: ["HOMA IR", "HOMA-IR index", "homeostatic model assessment"],
referenceRangeLow: null,
referenceRangeHigh: 2.5,
referenceRangeText: "<2.5 (optimal <1.0)",
description:
"Insulin resistance index calculated from fasting glucose and insulin",
displayPrecision: 2,
sortOrder: 112,
},
{
id: "c_peptide",
name: "C-Peptide",
Expand Down Expand Up @@ -1578,7 +1594,12 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
unit: "K/uL",
loincCode: "751-8",
snomedCode: null,
aliases: ["ANC", "absolute neutrophil count", "neut abs", "neutrophils absolute"],
aliases: [
"ANC",
"absolute neutrophil count",
"neut abs",
"neutrophils absolute",
],
referenceRangeLow: 1.8,
referenceRangeHigh: 7.7,
referenceRangeText: "1.8-7.7 K/uL",
Expand Down Expand Up @@ -1608,7 +1629,12 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
unit: "K/uL",
loincCode: "731-0",
snomedCode: null,
aliases: ["ALC", "absolute lymphocyte count", "lymph abs", "lymphocytes absolute"],
aliases: [
"ALC",
"absolute lymphocyte count",
"lymph abs",
"lymphocytes absolute",
],
referenceRangeLow: 1.0,
referenceRangeHigh: 4.8,
referenceRangeText: "1.0-4.8 K/uL",
Expand Down Expand Up @@ -1638,7 +1664,12 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
unit: "K/uL",
loincCode: "742-7",
snomedCode: null,
aliases: ["AMC", "absolute monocyte count", "mono abs", "monocytes absolute"],
aliases: [
"AMC",
"absolute monocyte count",
"mono abs",
"monocytes absolute",
],
referenceRangeLow: 0.2,
referenceRangeHigh: 0.8,
referenceRangeText: "0.2-0.8 K/uL",
Expand Down Expand Up @@ -1668,7 +1699,12 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
unit: "K/uL",
loincCode: "711-2",
snomedCode: null,
aliases: ["AEC", "absolute eosinophil count", "eos abs", "eosinophils absolute"],
aliases: [
"AEC",
"absolute eosinophil count",
"eos abs",
"eosinophils absolute",
],
referenceRangeLow: 0.0,
referenceRangeHigh: 0.5,
referenceRangeText: "0.0-0.5 K/uL",
Expand Down Expand Up @@ -1698,7 +1734,12 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
unit: "K/uL",
loincCode: "704-7",
snomedCode: null,
aliases: ["ABC", "absolute basophil count", "baso abs", "basophils absolute"],
aliases: [
"ABC",
"absolute basophil count",
"baso abs",
"basophils absolute",
],
referenceRangeLow: 0.0,
referenceRangeHigh: 0.2,
referenceRangeText: "0.0-0.2 K/uL",
Expand Down Expand Up @@ -1730,7 +1771,12 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
unit: null,
loincCode: "1759-0",
snomedCode: null,
aliases: ["A/G ratio", "A:G ratio", "albumin to globulin ratio", "AG ratio"],
aliases: [
"A/G ratio",
"A:G ratio",
"albumin to globulin ratio",
"AG ratio",
],
referenceRangeLow: 1.1,
referenceRangeHigh: 2.5,
referenceRangeText: "1.1-2.5",
Expand Down Expand Up @@ -1860,7 +1906,8 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
referenceRangeLow: 8,
referenceRangeHigh: null,
referenceRangeText: "> 8% (optimal)",
description: "Omega-3 fatty acid index (EPA + DHA as % of total RBC fatty acids)",
description:
"Omega-3 fatty acid index (EPA + DHA as % of total RBC fatty acids)",
displayPrecision: 1,
sortOrder: 340,
},
Expand Down Expand Up @@ -1935,7 +1982,12 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
unit: "ng/mL",
loincCode: "2857-1",
snomedCode: null,
aliases: ["PSA", "prostate specific antigen", "total PSA", "prostate-specific antigen"],
aliases: [
"PSA",
"prostate specific antigen",
"total PSA",
"prostate-specific antigen",
],
referenceRangeLow: 0,
referenceRangeHigh: 4.0,
referenceRangeText: "< 4.0 ng/mL",
Expand All @@ -1950,7 +2002,12 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
unit: "ng/mL",
loincCode: "58427-2",
snomedCode: null,
aliases: ["anti-Mullerian hormone", "anti-Müllerian hormone", "AMH level", "Mullerian inhibiting substance"],
aliases: [
"anti-Mullerian hormone",
"anti-Müllerian hormone",
"AMH level",
"Mullerian inhibiting substance",
],
referenceRangeLow: 1.0,
referenceRangeHigh: 10.0,
referenceRangeText: "1.0-10.0 ng/mL (women, varies by age)",
Expand Down Expand Up @@ -1999,7 +2056,13 @@ export const metricDefinitionSeeds: MetricDefinitionSeed[] = [
unit: "U/L",
loincCode: null,
snomedCode: null,
aliases: ["CK", "CPK", "creatine phosphokinase", "creatine kinase total", "total CK"],
aliases: [
"CK",
"CPK",
"creatine phosphokinase",
"creatine kinase total",
"total CK",
],
referenceRangeLow: 22,
referenceRangeHigh: 198,
referenceRangeText: "22-198 U/L (men), 22-178 U/L (women)",
Expand Down
Loading