A zero-dependency Java library for extracting fields from Norwegian invoices.
Provides production-tested keyword rules and parsers for matching OCR output to invoice fields. Works with any OCR backend — AWS Textract, Google Vision, Azure Document Intelligence, or plain text. Bring your own OCR; this library handles the Norwegian-specific matching logic.
| Field | Examples |
|---|---|
| Invoice number | Fakturanummer, Invoice No, FACTURA N° |
| Invoice date | Fakturadato, Date of issue, Fecha factura |
| Due date | Forfallsdato, Betalingsfrist, Due Date |
| Amount | Sum inkl.mva, Total, Beløp å betale (NOK) |
| Bank account | Bankkonto, Kontonummer, IBAN |
| KID reference | KID, OCR, Betalingsreferanse/KID |
| Org number | Norwegian 9-digit org numbers from any context |
InvoiceExtractor extractor = new InvoiceExtractor();
// Convert your OCR output to label-value pairs
List<OcrLabelValue> pairs = List.of(
new OcrLabelValue("Fakturanummer", "INV-2024-001"),
new OcrLabelValue("Fakturadato", "15.03.2024"),
new OcrLabelValue("Forfallsdato", "15.04.2024"),
new OcrLabelValue("Sum inkl.mva", "12 500,00"),
new OcrLabelValue("Bankkonto", "1234.56.78901"),
new OcrLabelValue("KID", "12345678")
);
InvoiceResult result = extractor.extract(pairs, List.of());
result.invoiceNumber().ifPresent(n -> System.out.println("Invoice: " + n));
result.amount().ifPresent(a -> System.out.println("Amount: " + a + " NOK"));
result.dueDate().ifPresent(d -> System.out.println("Due: " + d));
System.out.printf("Found %d/7 fields%n", result.foundFieldCount());// Convert Textract SummaryFields to OcrLabelValue pairs
List<OcrLabelValue> pairs = expenseDocument.summaryFields().stream()
.filter(f -> f.labelDetection() != null && f.valueDetection() != null)
.map(f -> new OcrLabelValue(
f.labelDetection().text(),
f.valueDetection().text(),
f.labelDetection().confidence(),
pageIndex + 1))
.toList();
// LINE blocks for invoice type detection (Faktura vs Kreditnota)
List<String> lines = expenseDocument.blocks().stream()
.filter(b -> b.blockType() == BlockType.LINE)
.map(Block::text)
.toList();
InvoiceResult result = extractor.extract(pairs, lines);150 keywords covering invoice labels from Norwegian, Swedish, Danish, English, Spanish, French, Italian and German suppliers. Each keyword has a priority weight calibrated on 10k+ real Norwegian invoices.
[
{"t": 1, "c": "Fakturanummer", "w": 102},
{"t": 1, "c": "Invoice Number", "w": 100},
{"t": 2, "c": "Fakturadato", "w": 101},
{"t": 3, "c": "Forfallsdato", "w": 101},
{"t": 4, "c": "Sum inkl.mva", "w": 101},
...
]Load your own rules from JSON:
List<KeywordRule> rules = NorwegianInvoiceKeywords.fromJson(myJsonString);
InvoiceExtractor extractor = new InvoiceExtractor(rules);Or add rules to the defaults:
List<KeywordRule> rules = NorwegianInvoiceKeywords.defaultRules();
rules.add(new KeywordRule(1, "Vår referanse", 98));Handles every number format seen on real invoices:
| Input | Output | Format |
|---|---|---|
8.431,50 |
8431.50 |
Norwegian/German |
8,431.50 |
8431.50 |
English/US |
8 431,50 |
8431.50 |
Norwegian with space |
8.888.431,50 |
8888431.50 |
Multi-separator |
22750,- |
22750 |
Norwegian shorthand |
(NOK) 178 750,00 |
178750.00 |
Currency prefix |
-269100kr |
269100 |
Currency suffix |
14 date formats + Norwegian, English, Spanish and German month names:
16.08.2024 → 2024-08-16
16/08/24 → 2024-08-16
4. januar 2024 → 2024-01-04
20th January 2021 → 2021-01-20
29. november 2023 → 2023-11-29
17.02.2024\nNet 15 → 2024-02-17 (trailing line stripped)
30.01.2024(Netto 30 dager) → 2024-01-30 (parenthetical stripped)
Norwegian payment reference number validation with Mod10 (Luhn) and Mod11 check digits. Handles space-separated OCR output by splitting and picking the longest valid segment.
Norwegian organisation number (9-digit, Mod11 check digit) validation and extraction from arbitrary text.
Detects Kreditnota (credit notes) from document text lines — matches "kreditnota" and "Tilgode" as standalone labels.
None at runtime. Jackson is optional (only needed for NorwegianInvoiceKeywords.fromJson()).
<dependency>
<groupId>no.skatt</groupId>
<artifactId>norwegian-invoice-ocr</artifactId>
<version>1.0.0</version>
</dependency>mvn testApache 2.0