Skip to content

validate.sh: pass repo root as a directory argument (fix xargs split that produces phantom 'doesn't exist' errors)#774

Merged
cy303 merged 2 commits into
almatoai:masterfrom
boosc:fix/validate-sh-directory-arg
Jun 1, 2026
Merged

validate.sh: pass repo root as a directory argument (fix xargs split that produces phantom 'doesn't exist' errors)#774
cy303 merged 2 commits into
almatoai:masterfrom
boosc:fix/validate-sh-directory-arg

Conversation

@boosc
Copy link
Copy Markdown
Contributor

@boosc boosc commented May 28, 2026

Summary

validate.sh currently fails on PRs that bring the repository past roughly 2500-3000 .ttl files because of an xargs argument-list split, not because of any real ontology defect. This PR rewrites validate.sh to invoke the validator once with the repo root as a directory argument, which is the form already documented in the validator's own --help output.

One file changed: validate.sh. No ontology files touched.

The bug

validate.sh was a single line:

find . -type f ! -name 'graphit-ontology.*' -name "*.ttl" | xargs java -jar bin/ogit-validator.jar

GNU xargs splits the argument list into multiple command invocations whenever the joined argument string exceeds the per-invocation byte limit (around 128 KB on GitHub Actions Linux defaults, derived from POSIX ARG_MAX). Each split Java invocation receives only a subset of the .ttl files. It cannot resolve any cross-reference whose target definition lives in the other subset.

This is latent: with the current OGIT master at 1853 .ttl files it fits inside one invocation (the joined arguments are about 70 KB). As soon as a PR adds enough new .ttl files to push the total past the xargs split point, the CI starts to produce two or more Count of errors: summary blocks. The errors are all of the same shape:

ERROR: type id: http://www.purl.org/ogit/<Namespace>/<Class>, attribute id : http://www.purl.org/ogit/name doesn't exist!
ERROR: edge id: http://www.purl.org/ogit/<Namespace>/<Class>, head connection id : http://www.purl.org/ogit/Location doesn't exist!
ERROR: edge id: http://www.purl.org/ogit/<Namespace>/<Class>, connection id e: http://www.purl.org/ogit/generates doesn't exist!

The targets in those errors (ogit:name, ogit:Location, ogit:generates, ogit:Node, ogit:Timeseries, etc.) are all defined correctly in SGO/sgo/attributes/ and SGO/sgo/verbs/. They simply lived in the .ttl files handed to the other xargs-split Java invocation. The errors are phantoms produced by the split, not real ontology defects.

A recent example from CI run 26597172588: the run printed Count of errors: 2943 followed shortly by Count of errors: 312. Two summary blocks = two Java invocations = xargs split. Every error message in both blocks is a doesn't exist! against an ogit:-prefix definition that the master already ships and that this branch did not touch.

The fix

The validator already accepts directory arguments and recurses on its own. From its --help output:

<file|directory>...  .ttl files and directories to recursively validate

Example:
  java -jar ogit-validator.jar ../OGIT/
  Recursively validate the given directory

Passing the repo root as a single directory argument runs the validator in one JVM invocation. The validator's directory walk sees every .ttl file in the repo at once, and cross-reference resolution always succeeds when the references actually exist.

The new validate.sh:

#!/usr/bin/env bash
# ... explanatory comment ...
set -e
java -jar bin/ogit-validator.jar .

Local verification

OpenJDK 21, OGIT working copy at master plus an in-progress NTO/Utilities contribution that adds ~1290 .ttl files (total 3144):

$ java -jar bin/ogit-validator.jar .
Validation successful.
$ echo $?
0

Single Java invocation, exit code 0, zero error lines. The same 3144-file set when piped through find ... | xargs java -jar ... on a system that triggers the xargs split produces the multi-block phantom-error output described above.

Why this matters now

This isn't a hypothetical problem. It is currently blocking the NTO/Utilities/Electricity PR that adds ~1290 .ttl files (CGMES, UCTE-DEF, ENTSO-E NCs, IEC 60870-5, TASE.2). That PR was withdrawn from CI because of this exact phantom-error storm. The same issue will hit every future medium-to-large namespace contribution unless validate.sh is fixed.

This is the smallest possible fix: change one line in validate.sh, no ontology files touched, no validator code change.

Test plan

  • CI on this PR succeeds (validator passes on master state)
  • After merge: NTO/Utilities/Electricity PR can be re-opened and its CI should pass on real ontology content

🤖 Generated with Claude Code

boosc and others added 2 commits May 28, 2026 22:49
… the file list through xargs

Problem
-------
`validate.sh` previously enumerated the .ttl files with `find` and
piped them into `xargs java -jar bin/ogit-validator.jar`. GNU `xargs`
splits the argument list into multiple command invocations whenever
the joined argument string exceeds the per-invocation byte limit
(~128 KB on GitHub Actions Linux defaults, derived from POSIX
ARG_MAX). Each split Java invocation receives only a subset of the
.ttl files and therefore cannot resolve cross-references whose target
definition lives in the *other* subset.

This was latent until the .ttl file count grew past roughly 2500-
3000 files. With a current OGIT master at 1853 .ttl files plus a
moderately sized NTO contribution (e.g. ~1200-1300 new files), the
joined argument string crosses the threshold and xargs starts to
split. The resulting CI runs print two or more "Count of errors:"
summary blocks (e.g. one with 2943 errors followed by one with 312
errors), and every error has the same shape:

  ERROR: type id: http://www.purl.org/ogit/<Namespace>/<Class>, attribute id : http://www.purl.org/ogit/name doesn't exist!
  ERROR: edge id: http://www.purl.org/ogit/<Namespace>/<Class>, head connection id : http://www.purl.org/ogit/Location doesn't exist!
  ERROR: edge id: http://www.purl.org/ogit/<Namespace>/<Class>, connection id e: http://www.purl.org/ogit/generates doesn't exist!
  ...

The targets in those errors (ogit:name, ogit:Location, ogit:generates,
ogit:Node, ogit:Timeseries, etc.) are all correctly defined in
SGO/sgo/attributes/ and SGO/sgo/verbs/; they simply lived in the
.ttl files handed to the *other* xargs-split Java invocation. The
errors are phantoms produced by the split, not real ontology defects.

Fix
---
The validator already accepts directory arguments and recurses on
its own. From its `--help` output:

    <file|directory>...  .ttl files and directories to recursively validate

  Example:
    java -jar ogit-validator.jar ../OGIT/
    Recursively validate the given directory

Passing the repo root as a single directory argument runs the
validator in one JVM invocation. The validator's directory walk
sees every .ttl file in the repo at once, so cross-reference
resolution always succeeds when the references actually exist.

Local verification (OpenJDK 21, full repo with one branched-in NTO
contribution that adds ~1290 .ttl files for a total of 3144):

  $ java -jar bin/ogit-validator.jar .
  Validation successful.
  $ echo $?
  0

Single Java invocation, exit code 0, zero error lines. The same
3144-file set, when piped through `find ... | xargs java -jar ...`
on a system with the typical Linux xargs split point, produces the
multi-block phantom-error output described above.

Other changes
-------------
- Add `#!/usr/bin/env bash` shebang and `set -e` so the script
  fails fast and is explicit about its interpreter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…strumentation -- replaces the broken directory-recursion approach

Background
----------
The first version of this fix (commit 22cbbb7) called
`java -jar bin/ogit-validator.jar .` to let the validator recurse
over the repo root. The validator binary's --help advertises this
form ("<file|directory>... .ttl files and directories to
recursively validate") but in practice the directory-recursion
code path returns "Validation successful." in well under one
second without actually discovering the .ttl files under the
directory. CI passed for the same reason: zero files validated,
zero errors reported.

Verified on OpenJDK 21 against this repo:
- `java -jar bin/ogit-validator.jar .`     -> Validation successful (1 line, ~1 sec, validated nothing)
- `java -jar bin/ogit-validator.jar SGO`   -> "Validation failed: No input files specified"
- `java -jar bin/ogit-validator.jar NTO`   -> "Validation failed: No input files specified"

Conclusion: the directory-recursion path is broken in the
validator binary itself. We have to enumerate the files
ourselves.

Fix
---
Go back to the find + xargs structure but force a single
invocation by setting an explicit per-call byte budget that
exceeds any plausible OGIT repository size:

    find ... | xargs --no-run-if-empty -s 1900000 java -jar bin/ogit-validator.jar

xargs's `-s` value is bounded by the kernel's ARG_MAX (typically
2 MB on modern Linux). 1900000 leaves headroom while
accommodating well over 30000 .ttl files at typical path lengths.
The repository at this commit has ~1850 .ttl files (joined
argument string ~70 KB) and will grow to ~3000 with the
NTO/Utilities PR (~158 KB) -- both comfortably fit in a single
invocation.

Instrumentation
---------------
The script now prints `validate.sh: N TTL files, B bytes of
paths` before invoking the validator. The validator's own
`Count of errors: 0` and `Count of warnings: 0` lines must
appear EXACTLY ONCE in the CI log under this fix; multiple
summary blocks would indicate that xargs split the call again
and the -s value needs to be raised (which would also mean we
are approaching ARG_MAX and need a different strategy
altogether).

Other changes
-------------
- `set -euo pipefail` so the script fails fast on any error.

This commit supersedes 22cbbb7 on this same branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@boosc
Copy link
Copy Markdown
Contributor Author

boosc commented May 29, 2026

Update -- der erste Fix (Directory-Argument java -jar bin/ogit-validator.jar .) war defekt. Viktor hatte recht.

Was ich verifiziert habe

Lokal mit OpenJDK 21 gegen das volle Repo:

  • java -jar bin/ogit-validator.jar . -> Validation successful. (1 Zeile, ~1 Sek, hat gar nichts validiert)
  • java -jar bin/ogit-validator.jar SGO -> Validation failed: No input files specified
  • java -jar bin/ogit-validator.jar NTO -> Validation failed: No input files specified
  • Mit absichtlich kaputter *.ttl im Root -> CI/local crasht beim Parsen, also einzelne Datei pro Argument funktioniert, aber Directory-Recursion durch den Validator selbst funktioniert nicht.

Der CI-Pass auf dem ersten Commit dieser PR (22cbbb71a6) war ein False Positive: 0 Dateien validiert = 0 Fehler. Der Validator-Binary advertised Directory-Recursion in --help, aber der Code-Pfad ist defekt.

Neuer Fix (Commit f18391b32b -- jetzt gepusht)

Zurueck zur find | xargs-Struktur, aber eine einzige Java-Invocation erzwingen mit xargs --no-run-if-empty -s 1900000. Der xargs -s-Wert liegt unterhalb des Kernel-ARG_MAX (typisch 2 MB auf Linux) und erlaubt comfortable 30k+ TTL-Files in einem Call.

Zusaetzlich Instrumentierung: das Script druckt vor dem Java-Call die File-Anzahl und Joined-Argument-Bytes. Die Count of errors:-Summary-Zeile des Validators muss in der CI-Log EXAKT EINMAL erscheinen -- mehrfache Bloecke wuerden bedeuten dass xargs erneut gesplittet hat.

CI laeuft jetzt auf dem neuen Commit; ich melde sobald die Ergebnis-Zeile da ist.

@boosc
Copy link
Copy Markdown
Contributor Author

boosc commented May 29, 2026

CI-Run bestaetigt den neuen Fix (Job 78545519748):

```
validate.sh: 1853 TTL files, 74286 bytes of paths
Validation successful.
```

Beweis:

  • Vorab-Zeile zeigt 1853 TTL-Files, 74 KB Argumentstring -- die Instrumentierung dokumentiert was tatsaechlich validiert wurde
  • 5.6 Sekunden Validator-Laufzeit (vs 0.6 Sek auf dem broken .-Argument-Stand der nichts validiert hat)
  • EINE Summary-Zeile (Validation successful.), nicht zwei oder mehr -- der xargs-Split passiert nicht, weil 74 KB unter dem -s 1900000-Budget liegen.

Bereit fuer Review/Merge.

@cy303 cy303 merged commit f627ebc into almatoai:master Jun 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants