Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Keep the conformance corpus byte-for-byte reproducible across platforms.
# The corpus is hash-bound and signature-verified against exact bytes, so its
# line endings must stay LF regardless of the checkout platform.
src/test/resources/corpus/**/*.json text eol=lf
src/test/resources/corpus/**/*.md text eol=lf
src/test/resources/corpus/**/*.py text eol=lf
src/test/resources/corpus/tools/bip39_english.txt text eol=lf

# Source files: stable LF throughout the repo.
*.java text eol=lf
*.xml text eol=lf
*.md text eol=lf
32 changes: 32 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: CI

on:
push:
branches: ["**"]
pull_request:
branches: ["**"]

jobs:
build:
name: Build and test (JDK 21)
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4

- name: Set up JDK 21
uses: actions/setup-java@v4
with:
distribution: temurin
java-version: "21"
cache: maven

- name: Build and run unit tests
run: mvn -B test

- name: Verify implementation against the corpus (conformance)
# Runs the conformance suite that drives all 62 corpus vectors through
# the validation pipeline and asserts each verdict, diagnostic code, and
# structured details against corpus.json. This is the code-vs-corpus
# check: the build fails if the implementation diverges from any vector.
run: mvn -B test -Dtest=ConformanceTest
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
target/
*.class
.claude/
97 changes: 97 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,98 @@
# entangled-api-java

An independent Java reference implementation of the **Entangled v1.0** protocol,
built solely from the specification at
[`samjanny/entangled`](https://github.com/samjanny/entangled) tag `v1.0-rc.27`
(its `specs/`, `docs/`, and `corpus/`).

## Why this exists

This is a *second, isolated reading* of the Entangled specification. The existing
Rust implementation shares an author with the spec, so its conformance does not,
by itself, show that the spec reads unambiguously. This implementation was
written from scratch, by a different reader, **without reference to any other
implementation** of the protocol -- only the specification text and the
conformance corpus. Where the two implementations diverge, that divergence is a
signal about the spec, which is the point of the exercise.

## Status

Passes the full conformance corpus: **62 / 62 vectors** match the recorded
verdict, diagnostic code, and structured `details` byte-identically.

> Note on vector count: the corpus at `v1.0-rc.27` contains **62** vectors
> (`corpus.json` `rc_target: 1.0-rc.27`). Some older release notes refer to "60
> vectors"; the additional vectors are the rc.25-rc.27 additions (the
> manifest-updated future-skew, the runtime-pubkey resurrection, and the
> migration trio). This implementation targets the rc.27 corpus.

## Building and testing

Requires JDK 21 and Maven. The conformance corpus is checked in under
`src/test/resources/corpus` and is read as raw bytes (no normalization).

```sh
export JAVA_HOME=/path/to/jdk-21
mvn test # all unit tests + the 62-vector conformance suite
mvn test -Dtest=ConformanceTest # the code-vs-corpus conformance suite only
```

CI (`.github/workflows/ci.yml`) runs both on every push.

## Design notes

- **No third-party crypto.** Ed25519 verification, JCS canonicalization,
base64url, SHA, BIP-39 PIP derivation, and Tor v3 address decoding are all
implemented in-tree for byte-level control. The JDK's built-in Ed25519
(`SunEC`) does not implement the strict `verify_strict` profile section 05
mandates (small-order rejection for both `A` and `R`, canonical `R`, `S < L`,
cofactorless equation), so verification is hand-implemented over
`BigInteger` field arithmetic.
- **First-failing-stage precedence** (section 10) is enforced by running the
10-stage pipeline in order and converting the first stage's rejection into the
verdict.
- **The integer grammar** (section 04) is validated as a whole-document Stage 5
pre-pass, before closed-schema field-presence checks, to honor the spec's
requirement that numeric tokens are validated "before any conversion"; corpus
vector 140 fixes this ordering.
- **The Stage 2 byte cap** is selected by the expected document kind from the
fetch context (a real client knows whether it fetched `/manifest.json`, a
content path, or a submit response), since the kind-specific cap is enforced
before parsing.

## Ambiguities found

Per the spec's ambiguity protocol, genuine ambiguities encountered at the Java
boundary -- points where two conforming implementations could diverge with no
clear non-conformance, and which no corpus vector constrains -- were filed as
issues against `samjanny/entangled`:

- **AMB-10** (issue #11): the diagnostic for a bad `origin.carrier` value
(e.g. `"i2p"`) is not pinned -- `E_SCHEMA_FIELD_SYNTAX` vs
`E_SCHEMA_ENUM_VIOLATION`. This implementation chose `E_SCHEMA_ENUM_VIOLATION`.
- **AMB-11** (issue #12): the stage/code for an uppercase or otherwise
non-canonical `origin.address` is not pinned -- Stage 5
`E_SCHEMA_FIELD_SYNTAX` vs Stage 9 `E_BIND_ORIGIN`. This implementation chose
Stage 5 `E_SCHEMA_FIELD_SYNTAX`.

Each chosen reading is documented in a code comment citing the spec passages
that motivated it.

## Layout

```
src/main/java/org/entangled/
DiagnosticCode, Diagnostic, Verdict, RejectException normative codes and outcomes
json/ strict JSON lexer/parser, JCS canonicalization
crypto/ strict Ed25519, base64url, SHA, BIP-39 PIP, Tor v3 address
schema/ closed-schema field/block/document validators (Stage 5)
pipeline/ the 10-stage validation pipeline and per-stage logic
src/test/java/org/entangled/
ConformanceTest drives all 62 corpus vectors
unit tests for the JSON, JCS, crypto, and schema layers
src/test/resources/corpus/ the spec conformance corpus, verbatim
```

## License

Follows the licensing of the upstream specification corpus it is built against.
44 changes: 44 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org.entangled</groupId>
<artifactId>entangled-api-java</artifactId>
<version>0.1.0</version>
<packaging>jar</packaging>

<name>entangled-api-java</name>
<description>Independent Java reference implementation of the Entangled v1.0 protocol, built from the specification.</description>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.release>21</maven.compiler.release>
<junit.version>5.10.2</junit.version>
</properties>

<dependencies>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.13.0</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
</plugin>
</plugins>
</build>
</project>
63 changes: 63 additions & 0 deletions src/main/java/org/entangled/Diagnostic.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
package org.entangled;

import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Objects;

/**
* A single diagnostic: a normative {@link DiagnosticCode} plus an optional
* structured {@code details} map (section 11 "Structured diagnostic format").
*
* <p>The {@code details} map mirrors the structured {@code details} object the
* spec defines per code (for example {@code component}/{@code declared_bytes}/
* {@code budget_bytes} for {@code E_SUBMIT_BUDGET}). Conformance vectors that
* carry {@code diagnostic_details} are matched against this map exactly.
*/
public final class Diagnostic {

private final DiagnosticCode code;
private final Map<String, Object> details;

private Diagnostic(DiagnosticCode code, Map<String, Object> details) {
this.code = code;
this.details = Collections.unmodifiableMap(details);
}

public static Diagnostic of(DiagnosticCode code) {
return new Diagnostic(code, new LinkedHashMap<>());
}

public static Diagnostic of(DiagnosticCode code, Map<String, Object> details) {
return new Diagnostic(code, new LinkedHashMap<>(details));
}

public DiagnosticCode code() {
return code;
}

public Map<String, Object> details() {
return details;
}

@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (!(o instanceof Diagnostic other)) {
return false;
}
return code == other.code && details.equals(other.details);
}

@Override
public int hashCode() {
return Objects.hash(code, details);
}

@Override
public String toString() {
return details.isEmpty() ? code.name() : code.name() + " " + details;
}
}
139 changes: 139 additions & 0 deletions src/main/java/org/entangled/DiagnosticCode.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
package org.entangled;

/**
* The normative diagnostic codes from specification section 11.
*
* <p>Each constant carries its normative {@code severity} and the pipeline
* {@code stage} (1 through 10) at which it is detected, as defined in section 11
* and section 10. Codes that do not map to a pipeline stage use {@code stage 0}.
*
* <p>Only the codes reachable by this implementation are enumerated here; the
* full catalog is larger (transport, image, and several state/historical codes
* are not wire-constructible within the corpus scope, per corpus/README.md).
* Codes that are part of the catalog but not yet exercised are still listed so
* the enum mirrors section 11 faithfully.
*/
public enum DiagnosticCode {

// --- Transport diagnostics (Stage 1) ---
E_TRANSPORT_STATUS(Severity.ERROR, 1),
E_TRANSPORT_REDIRECT(Severity.ERROR, 1),
E_TRANSPORT_CONTENT_TYPE(Severity.ERROR, 1),
E_TRANSPORT_CONTENT_LENGTH(Severity.ERROR, 1),
E_TRANSPORT_BODY_FAILURE(Severity.ERROR, 1),
E_TRANSPORT_RATE_LIMITED(Severity.ERROR, 1),
E_TRANSPORT_NOT_FOUND(Severity.ERROR, 1),
E_TRANSPORT_METHOD_NOT_ALLOWED(Severity.ERROR, 1),
E_TRANSPORT_PAYLOAD_TOO_LARGE(Severity.ERROR, 1),
E_TRANSPORT_UNAVAILABLE(Severity.ERROR, 1),
E_TRANSPORT_BAD_REQUEST(Severity.ERROR, 1),
E_TRANSPORT_CONTENT_ENCODING(Severity.ERROR, 1),
E_TRANSPORT_TRANSFER_ENCODING(Severity.ERROR, 1),

// --- Input diagnostics (Stage 2) ---
E_INPUT_BYTE_CAP(Severity.ERROR, 2),
E_INPUT_UTF8(Severity.ERROR, 2),
E_INPUT_BOM(Severity.ERROR, 2),

// --- Parsing diagnostics (Stage 3) ---
E_PARSE_JSON(Severity.ERROR, 3),
E_PARSE_NESTING_DEPTH(Severity.ERROR, 3),
E_PARSE_STRING_LENGTH(Severity.ERROR, 3),
E_PARSE_ARRAY_LENGTH(Severity.ERROR, 3),
E_PARSE_OBJECT_KEYS(Severity.ERROR, 3),
E_PARSE_DUPLICATE_KEY(Severity.ERROR, 3),

// --- Document kind diagnostics (Stage 4) ---
E_KIND_MISSING_FIELDS(Severity.ERROR, 4),
E_KIND_SPEC_VERSION(Severity.ERROR, 4),
E_KIND_UNKNOWN(Severity.ERROR, 4),

// --- Schema diagnostics (Stage 5) ---
E_SCHEMA_REQUIRED_FIELD(Severity.ERROR, 5),
E_SCHEMA_UNKNOWN_FIELD(Severity.ERROR, 5),
E_SCHEMA_BLOCK_NOT_PERMITTED(Severity.ERROR, 5),
E_SCHEMA_FIELD_TYPE(Severity.ERROR, 5),
E_SCHEMA_FIELD_RANGE(Severity.ERROR, 5),
E_SCHEMA_FIELD_SYNTAX(Severity.ERROR, 5),
E_SCHEMA_ENUM_VIOLATION(Severity.ERROR, 5),
E_SCHEMA_DUPLICATE_ENTRY(Severity.ERROR, 5),
E_SCHEMA_FIELD_LENGTH(Severity.ERROR, 5),
E_SCHEMA_NULL_VALUE(Severity.ERROR, 5),
E_SCHEMA_NON_INTEGER(Severity.ERROR, 5),
E_SCHEMA_MALFORMED_UNICODE(Severity.ERROR, 5),
E_SUBMIT_BUDGET(Severity.ERROR, 5),
E_ORIGIN_INVALID(Severity.ERROR, 5),

// --- Signature diagnostics (Stage 6) ---
E_SIG_VERIFICATION(Severity.ERROR, 6),
E_SIG_INVALID_KEY(Severity.ERROR, 6),
E_SIG_MALFORMED(Severity.ERROR, 6),

// --- Trust state diagnostics (Stage 6 pre-check and Stage 7) ---
E_TRUST_MISMATCH(Severity.ERROR, 6),
E_TRUST_USER_REJECTED(Severity.ERROR, 6),
I_TRUST_FIRST_CONTACT(Severity.INFO, 7),
I_TRUST_TOFU_PINNED(Severity.INFO, 7),
I_TRUST_VERIFIED(Severity.INFO, 7),

// --- Canary diagnostics (Stage 8) ---
E_CANARY_INVALID(Severity.ERROR, 8),
E_CANARY_DOWNGRADE(Severity.ERROR, 8),
E_CANARY_CONFLICT(Severity.ERROR, 8),
W_CANARY_NEAR_EXPIRATION(Severity.WARNING, 8),
E_CANARY_EXPIRED(Severity.ERROR, 8),
W_CANARY_GAP(Severity.WARNING, 8),
W_CANARY_UNAVAILABLE(Severity.WARNING, 8),
E_CANARY_RUNTIME_REUSE(Severity.ERROR, 8),

// --- Binding diagnostics (Stage 9) ---
E_BIND_PATH(Severity.ERROR, 9),
E_BIND_RESPONSE_PATH(Severity.ERROR, 9),
E_BIND_REQUEST_ID(Severity.ERROR, 9),
E_BIND_REQUEST_HASH(Severity.ERROR, 9),
E_BIND_ORIGIN(Severity.ERROR, 9),
E_ORIGIN_EXPIRED(Severity.ERROR, 9),
E_MIGRATION_MISMATCH(Severity.ERROR, 9),
E_MIGRATION_INVALID(Severity.ERROR, 9),
E_CONTENT_INDEX_FETCH_FAILED(Severity.ERROR, 9),
E_CONTENT_INDEX_HASH_MISMATCH(Severity.ERROR, 9),
E_CONTENT_INDEX_INVALID(Severity.ERROR, 9),
E_CONTENT_SEQ_MISSING(Severity.ERROR, 9),
E_CONTENT_SEQ_ROLLBACK(Severity.ERROR, 9),
E_CONTENT_SEQ_UNCOMMITTED(Severity.ERROR, 9),
E_CONTENT_HASH_MISMATCH(Severity.ERROR, 9),

// --- State diagnostics ---
E_STATE_UNDECLARED(Severity.ERROR, 0),
E_STATE_VALUE_SIZE(Severity.ERROR, 0),
E_STATE_TTL(Severity.ERROR, 0),
E_STATE_OP(Severity.ERROR, 0),
E_STATE_STORAGE_CAP(Severity.ERROR, 0),
E_STATE_TRANSMIT_BUDGET(Severity.ERROR, 0),
E_STATE_DUPLICATE(Severity.ERROR, 0),

// --- Historical content diagnostics ---
E_HISTORICAL_NO_AUTHORIZATION(Severity.ERROR, 0),
E_HISTORICAL_NO_PUBLICATION_PROOF(Severity.ERROR, 0),
E_HISTORICAL_TRUST_BLOCKED(Severity.ERROR, 0),
E_HISTORICAL_RUNTIME_AMBIGUOUS(Severity.ERROR, 0);

/** Severity classes from section 11. */
public enum Severity { ERROR, WARNING, INFO }

private final Severity severity;
private final int stage;

DiagnosticCode(Severity severity, int stage) {
this.severity = severity;
this.stage = stage;
}

public Severity severity() {
return severity;
}

public int stage() {
return stage;
}
}
Loading
Loading