This document outlines the complete architectural approach for the Material ID Assignment Pipeline, encompassing both Phase 1 (Deterministic & ML Filtering) and Phase 2 (LLM Validation, Assignment & Inference).
The objective of this dual-phase system is to achieve maximum coverage of Material ID assignments while guaranteeing high accuracy and providing human-readable explanations for every decision made.
The following flowchart illustrates the step-by-step data journey through the pipeline.
graph TD
%% PHASE 1: Data Preparation & ML Matching
subgraph "Phase 1: Deterministic & Embedding Pipeline"
A[Raw PO Data: 16,567 rows] --> B(Data Normalization)
B --> C{Goods vs Service Classifier}
C -- Non-Goods --> D[Exclude from Assignment]
C -- Goods --> E(Duplicate Cluster Detection)
E --> F{Deterministic Matching}
F -- Fuzzy/Supplier/Price Match --> G[Assign from Historical or Item Master]
F -- No Exact Match --> H{Embedding Matching Chroma ANN}
H -- High Confidence Match --> I[Assign from Extracted Vector Index]
H -- Low Confidence Band --> J[Flag for Review]
H -- No Match --> K[Flag as Unassigned]
G -.-> L{Catalog Integrity Check}
I -.-> L
L -- Check Assigned IDs<br>against Item Master --> M{Valid?}
M -- Missing from Master --> N[Flag: Data Provider<br>Discrepancy]
M -- Valid --> CC[Confirmed Assigned]
end
%% PHASE 2: Validation & LLM Processing
subgraph "Phase 2: LLM Validation & Intelligent Grouping"
CC --> Q[LLM Processing Engine]
K -.-> Q
J -.-> Q
Q --> R1[RAG Similarity Score Check]
R1 -- Score Adequate --> R2[Task A: Matching the<br>9.3k Unassigned Cases]
R1 -- Score Too Low --> V[Task B: Suggest SAP<br>Group & Flag 'Needs Detail']
R2 --> T{RAG-Assisted<br>LLM Match?}
T -- Match Found --> U[Assign Existing ID<br>+ Explanation]
T -- No Catalog Match --> V
Q --> S[Task C: QA Cross-validation<br>of Phase 1 Matches]
S --> W[Confirm or Reject<br>Phase 1 Matches]
U --> Z[Task D: Finalize Goods<br>vs Service Flag]
V --> Z
W --> Z
end
%% FINAL OUTPUT
subgraph "Phase 3: Final Output Generation"
U --> X[Final Consolidated Dataset]
V --> X
W --> X
D --> X
X --> Y[(Excel Output with Granular Reasons)]
end
Phase 1 acts as the high-throughput, low-cost filter. It processes 16,000+ rows in under 6 minutes.
- Normalization & Cleanup: Strings are normalized, noise is removed, and common abbreviations (e.g.,
assy→assembly,pcs→pieces) are expanded to match SAP catalogs. - Goods vs Service Evaluation: Utilizing sentence embeddings, the pipeline identifies non-tangible services (e.g., "maintenance labor") and explicitly removes them from the Material ID requirement pool.
- Deduplication & Clustering: Exact duplicates (matching text, supplier, price, and UOM) are clustered. If a known material ID exists within a cluster, it is propagated to the unknown rows.
- Deterministic & Embedding Matching: The system attempts to find a match using exact fuzzy bounds. If it fails, it uses ChromaDB HNSW Vectors to find semantic matches against historical data and the active Item Master.
- Catalog Integrity Check: A strict deterministic check is then run against all successfully assigned IDs. It verifies if the generated ID fundamentally exists inside the actual Item Master catalog; if it doesn't, it flags
status: data_provider_discrepancy.
Phase 2 acts as the cognitive engine to resolve the ceiling hit by traditional ML (the remaining ~9,300 unassigned rows and the cross-validation of assigned rows).
Before sending an unassigned row to the LLM, we check its Chroma DB vector distance. If the Top 1 candidate's similarity score is below our minimum viable threshold (meaning the catalog returns completely irrelevant "garbage"), we bypass Task A entirely. We route this row directly to Task B and flag it as needs_more_detail. This saves API costs and prevents the LLM from hallucinating matches against irrelevant data.
For all unassigned rows that pass the similarity gate, we transition from broad ML matching to contextual LLM matching.
- LLM Processing: The LLM reads the raw PO line and evaluates against the Top 5 curated catalog items. If human logic dictates a match, it assigns the ID and generates a short
llm_explanation.
If an item fails the similarity gate OR the LLM explicitly determines the unassigned row truly does not exist in the catalog:
- The LLM generates a Recommended Material Group (e.g.,
Bearings & Seals,IT Hardware). - Instead of returning generic blanks, you receive actionable clustered groups predicting what new IDs need creation.
For all assigned rows, the LLM verifies if the existing mapping is logically coherent. If it detects a hallucination, it drops the confidence score and outputs status: llm_rejected_assignment.
For every single row processed by the LLM, the model will output a final_goods_flag. It evaluates the item description against the Phase-1 is_goods prediction. If the ML model mistakenly flagged an item, the LLM catches the context and issues the final verdict.
The final Excel output merges the technical tracking of Phase 1 with the contextual reasoning of Phase 2. The output ensures that every single row contains a dedicated explanation.
| Column Name | Description |
|---|---|
final_material_id |
The ultimate ID chosen (Phase 1, LLM Assigned, or None). |
assignment_stage |
The specific step that claimed the row (e.g., phase1_duplicate_cluster, llm_rag_match, llm_rejected). |
final_goods_flag |
The LLM-verified Goods vs. Non-Goods (Service) status. |
goods_flag_explanation |
Why the LLM classified it as a good or service. |
discrepancy_flag |
Data integrity result from Catalog Integrity Check (e.g., missing_from_item_master, valid). |
llm_explanation |
A plain English reasoning string for why the LLM agreed, disagreed, or couldn't match. |
suggested_material_group |
Exists only for unassigned items to facilitate new ID creation. |
You might wonder: "Why not just send everything to the LLM and let it figure it out?"
The answer comes down to API Cost and Context Window Limits. Your Item Master contains 1,550 rows. Your pending PO Data contains 9,353 unassigned rows. If we asked the LLM to scan all 1,550 items for every single one of the 9,353 rows, that would amount to millions of tokens of redundant information per request. This would be outrageously expensive, incredibly slow, and would likely cause the LLM to hallucinate or exceed its maximum context window.
Instead, we use a RAG (Retrieval-Augmented Generation) pattern:
- Chroma DB (Fast & Cheap): Instantly queries the 1,550 Item Master list and retrieves the Top 5 most semantically relevant candidates.
- Pre-LLM Score Gating (Quality Control): We evaluate the raw distance score from Chroma. If the score is inadequate (signaling the top candidates are "garbage" relative to the PO text), we do not send it to the LLM for matching. We route it straight to the Material Group suggestion phase.
- Azure OpenAI (Deep & Logical): Receives only the specific PO row and those highly curated Top 5 candidates, allowing it to spend its processing power strictly on deep logical evaluation rather than brute-force searching against irrelevant noise.
To ensure the LLM makes decisions with the same rigor as an expert Master Data Analyst, we will dynamically construct a strict context window.
For every candidate Chroma fetches, we will extract and provide the LLM with the following 8 critical data fields directly from the Item Master:
Plnt(Site Location)Material(The ID to assign)Material DescriptionMaterial PO TextProduct CategoryVendorVendor NameNet Price
The LLM will be instructed to cross-reference the PO row against those 8 fields using the following strict rules before declaring a match:
- Description & Functional Alignment:
- Rule: Do not just match keywords. Check functional equivalency. E.g., The LLM must understand that "1/2 inch drill bit" and "12.7mm steel bit" are functionally identical, but "Bearing 5mm" and "Bearing 8mm" are critically different items and cannot share a Material ID.
- Pricing Corroboration (Using
Net Price):- Rule: If the
Material Descriptionlooks like a perfect match, but the PO Net Price is$5.00and the Item MasterNet Priceis$5,000.00, the LLM must reject the match. It must infer a mismatch in scope (e.g., purchasing a single replacement part vs. a bulk container or entire machine assembly).
- Rule: If the
- Supplier & Brand Context (Using
Vendor&Vendor Name):- Rule: If the PO specifies a highly specialized OEM vendor (e.g., "Cisco" or "Caterpillar"), the LLM must severely penalize or reject Item Master candidates where the
Vendor Nameis flagged as an off-brand generic or belongs to a conflicting proprietary vendor.
- Rule: If the PO specifies a highly specialized OEM vendor (e.g., "Cisco" or "Caterpillar"), the LLM must severely penalize or reject Item Master candidates where the
- Site & Category Awareness (Using
Plnt&Product Category):- Rule: If available in the data, verify location constraints. If an Item Master
Materialis strictly designated for a specificPlnt, and the PO explicitly belongs to a different site (where cross-plant transfer is not applicable), the LLM must reject the assignment. Wait for explicit context within theProduct Categoryto validate matching intent.
- Rule: If available in the data, verify location constraints. If an Item Master
- Unit of Measure (UOM) Logic:
- Rule: Align packaging constraints. If the PO lists a UOM of
BOX (50)and the Item MasterMaterial PO TextimpliesEA (Each), the LLM must mathematically validate if they represent the exact same physical material before approving the ID.
- Rule: Align packaging constraints. If the PO lists a UOM of
- Goods vs. Service Status Override:
- Rule: The LLM must output a final truth value regarding whether the item is a tangible "Good" or a non-tangible "Service". It must compare this logic against the Phase-1 predicted Goods flag. If Phase 1 falsely flagged a service as a good (or vice versa), the LLM catches the context and outputs the correct
final_goods_flag.
- Rule: The LLM must output a final truth value regarding whether the item is a tangible "Good" or a non-tangible "Service". It must compare this logic against the Phase-1 predicted Goods flag. If Phase 1 falsely flagged a service as a good (or vice versa), the LLM catches the context and outputs the correct
To maximize the number of assigned rows while maintaining accuracy, the system prompt will include Few-Shot Examples. These examples will train the LLM on how to weigh the available raw data evidence against the Item Master:
- Sufficient Evidence for Match: An example showing a messy PO row where the combination of matching Supplier, comparable Price, and core Item Text provides enough inferred evidence to confidently map to a Material ID.
- Insufficient Evidence / Conflict: An example where the text matches but the Supplier or Price grossly contradicts the Item Master, teaching the LLM to output a detailed rejection reason.
- Service/Non-Tangible Identification: An example showing how to evaluate a raw description (e.g., "installation of pump") and explicitly flag it as
final_goods_flag: false(Service), generating an explanation anchored to these instructions.
Outcome: By injecting these 8 fields, 6 strict rules, and few-shot examples into the system prompt, Azure OpenAI acts as a protective yet highly adaptive data-governance layer, maximizing assignments when evidence is sufficient and refusing IDs when logistics, text, or finance present contradictions.