fix: use Go-compatible JSON encoding for OCI manifests#348
fix: use Go-compatible JSON encoding for OCI manifests#348sajayantony wants to merge 1 commit intooras-project:mainfrom
Conversation
52000ac to
017a00b
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical JSON serialization issue where System.Text.Json's default encoder escapes + as \u002B, breaking content-addressable storage (CAS) interoperability with oras-go and other OCI tools. The solution introduces a custom JSON serialization layer with Go-compatible escaping rules.
Changes:
- Introduced internal
OciJsonSerializerwith custom converters (OciStringConverter,OciDictionaryConverter) that bypass .NET's default encoder to preserve literal+characters while maintaining proper escaping for quotes, backslashes, control characters, and HTML-sensitive characters (<,>,&) - Centralized all OCI content JSON operations through
OciJsonSerializeracross 7 source files - Moved
GenerateIndexfromIndexclass toOciJsonSerializerfor architectural consistency - Consolidated duplicate 4 MiB size limit constants into shared
OciLimits.MaxManifestBytes - Updated Copilot instructions with serialization conventions and testing best practices learned from this fix
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
OciJsonSerializer.cs |
Central serialization entry point with Go-compatible EscapeJsonString method and 4 MiB size guard |
OciStringConverter.cs |
Custom JsonConverter<string> using WriteRawValue to bypass default encoder |
OciDictionaryConverter.cs |
Custom dictionary converter handling both keys and values (framework doesn't invoke string converter for dict keys) |
OciLimits.cs |
Shared 4 MiB manifest size constant |
Packer.cs |
Updated to use OciJsonSerializer.SerializeToUtf8Bytes |
ManifestStore.cs |
Updated to use OciJsonSerializer.DeserializeAsync and GenerateIndex |
Repository.cs |
Updated to use OciJsonSerializer.Deserialize for OCI content |
Registry.cs |
Updated to use OciJsonSerializer.Deserialize |
FetchableExtensions.cs |
Updated to use OciJsonSerializer.Deserialize |
ResponseException.cs |
Updated to use OciJsonSerializer.Deserialize for error responses |
Error.cs |
Updated to use OciJsonSerializer.FormatErrorDetail |
Index.cs |
Removed GenerateIndex method (moved to serializer) |
CopyGraphOptions.cs |
Uses OciLimits.MaxManifestBytes instead of duplicate constant |
RepositoryOptions.cs |
Uses OciLimits.MaxManifestBytes instead of duplicate constant |
OciStringConverterTest.cs |
Tests escaping rules, round-trip, verifies literal + not \u002B |
OciDictionaryConverterTest.cs |
Tests key/value escaping, null handling, invalid types, round-trip |
ManifestSerializationTest.cs |
Wire-format JSON fixtures, annotation fidelity, size limit enforcement |
PackerTest.cs |
Two new regression tests inspecting raw bytes for literal + |
.github/copilot-instructions.md |
Trimmed 28%, added serialization/testing conventions |
017a00b to
3f67c33
Compare
tests/OrasProject.Oras.Tests/Serialization/OciDictionaryConverterTest.cs
Outdated
Show resolved
Hide resolved
3f67c33 to
57092d1
Compare
57092d1 to
0489b9a
Compare
0489b9a to
14934ba
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #348 +/- ##
==========================================
+ Coverage 91.78% 92.09% +0.31%
==========================================
Files 64 67 +3
Lines 2752 2886 +134
Branches 365 374 +9
==========================================
+ Hits 2526 2658 +132
- Misses 138 139 +1
- Partials 88 89 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
14934ba to
af9d966
Compare
tests/OrasProject.Oras.Tests/Serialization/OciStringConverterTest.cs
Outdated
Show resolved
Hide resolved
af9d966 to
cd1e45b
Compare
cd1e45b to
f16fe71
Compare
f16fe71 to
ba708a3
Compare
|
Resolved by Copilot: Added section comments throughout the |
|
Resolved by Copilot: Good idea. The serialization path is only used for manifests/indexes (bounded at 4 MiB by |
|
Resolved by Copilot: Added a comment explaining the relationship — |
|
Resolved by Copilot: Named the tuple elements — |
1d6f5f4 to
d7a9498
Compare
akashsinghal
left a comment
There was a problem hiding this comment.
Caution
This review was generated by Copilot using Claude Opus 4.6, mimicking the review style of shizhMSFT. This is not the real shizhMSFT.
Solid fix for a real CAS interoperability problem. The escaping logic matches Go's encoding/json.Marshal semantics correctly, and the converter architecture (separate string + dictionary converters) is the right approach given System.Text.Json's limitation that JsonConverter<string> doesn't intercept dictionary keys. A few observations below.
- Revert copilot-instructions.md to upstream (split to issue oras-project#356) - Move GenerateIndex back to Index.cs, call OciJsonSerializer internally - Add byte-level idempotency assertion in round-trip tests - Document size-guard materialization behavior in OciJsonSerializer Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Sajay Antony <sajaya@microsoft.com>
|
Addressed all 4 unresolved review threads in commit 6eacd7d:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
You can also share your feedback on Copilot code review. Take the survey.
src/OrasProject.Oras/Oci/Index.cs
Outdated
| internal static (Descriptor Descriptor, byte[] Content) GenerateIndex( | ||
| IList<Descriptor> manifests) | ||
| { | ||
| var index = new Index(manifests); | ||
| var indexContent = JsonSerializer.SerializeToUtf8Bytes(index); | ||
| return (Descriptor.Create(indexContent, Oci.MediaType.ImageIndex), indexContent); | ||
| var indexContent = | ||
| OciJsonSerializer.SerializeToUtf8Bytes(index); | ||
| return ( | ||
| Descriptor.Create(indexContent, Oci.MediaType.ImageIndex), | ||
| indexContent); |
There was a problem hiding this comment.
PR description mentions Index.GenerateIndex() being removed/moved into the serializer, but this method still exists (now just using OciJsonSerializer). Either update the PR description to match the implementation, or remove/move the method if the intent is to centralize index generation in OciJsonSerializer.
There was a problem hiding this comment.
@copilot apply changes based on this feedback
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
| var first = true; | ||
| foreach (var kvp in value.OrderBy(k => k.Key, StringComparer.Ordinal)) | ||
| { |
There was a problem hiding this comment.
OciDictionaryConverter.Write sorts keys with StringComparer.Ordinal, which compares UTF-16 code units. Go’s encoding/json sorts map keys by Go string order (bytewise over UTF-8), which can differ from UTF-16 ordinal ordering for non-BMP characters (surrogate pairs) and some BMP ranges, potentially reintroducing cross-runtime digest mismatches for non-ASCII annotation keys. Consider sorting by UTF-8 byte sequence (e.g., Encoding.UTF8.GetBytes(key) with lexicographic compare) to match Go’s ordering semantics.
There was a problem hiding this comment.
@copilot apply changes based on this feedback
7df89de to
acfef7c
Compare
Custom JsonConverter classes with Utf8JsonWriter.WriteRawValue() and a Go-compatible escape function matching encoding/json.Marshal semantics. - OciJsonSerializer: single entry point for all JSON serialization - OciStringConverter: Go-compatible string escaping (literal '+') - OciDictionaryConverter: annotation dict with UTF-8 bytewise key sort - OciLimits: shared MaxManifestBytes constant (4 MiB) - All call sites routed through OciJsonSerializer Fixes oras-project#346 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Sajay Antony <sajaya@microsoft.com>
acfef7c to
65f18ef
Compare
Fix:
+escaped as\u002Bin JSON serializationFixes #346
Problem
System.Text.Json's defaultJavaScriptEncoderescapes+as\u002B. OCI media types likeapplication/vnd.oci.image.manifest.v1+jsonbecome...v1\u002Bjson, producing different digests than oras-go and breaking CAS interoperability.Root cause:
JavaScriptEncoder.Defaulthardcodes+on its block list.AllowCharacter('+')doesn't work.UnsafeRelaxedJsonEscapingfixes+but stops escaping<>&— unsafe.Solution
Custom
JsonConverterclasses withUtf8JsonWriter.WriteRawValue()and a Go-compatible escape function matchingencoding/json.Marshalsemantics.Design
Why two converters?
JsonConverter<string>is NOT invoked for dictionary keys — only values.OciDictionaryConverterhandles annotation dictionaries where both keys and values may contain+.Escaping Rules (Go
encoding/json.Marshalcompatible)"\"\\\\b \f \n \r \t\b \f \n \r \t< > &\u003c \u003e \u0026\u2028 \u2029\uXXXX++(literal — the fix)Changes
New files (all
internal):Serialization/OciJsonSerializer.cs— single entry point for all JSON opsSerialization/OciStringConverter.cs— string value converterSerialization/OciDictionaryConverter.cs— annotation dictionary converter (UTF-8 bytewise key sort for Go compatibility)Content/OciLimits.cs— sharedMaxManifestBytesconstant (4 MiB)Modified call sites:
Packer.cs,ManifestStore.cs,Repository.cs,Registry.cs,FetchableExtensions.cs,ResponseException.cs,Error.cs— all JSON throughOciJsonSerializerIndex.cs—GenerateIndex()now delegates toOciJsonSerializer.SerializeToUtf8BytesCopyGraphOptions.cs,RepositoryOptions.cs— use sharedOciLimits.MaxManifestBytesTests (17 new test methods, 438 total passing):
OciStringConverterTest.cs— escape branch coverage + round-tripOciDictionaryConverterTest.cs— keys, values, empty, null, invalid types, round-trip, UTF-8 key sort orderManifestSerializationTest.cs— JSON fixture deserialization, annotation fidelity, index, round-trip, size limitPackerTest.cs— 2 regression tests checking raw bytes for literal+Key Design Decisions
OciJsonSerializerIndex.GenerateIndex()stays onIndexbut delegates toOciJsonSerializerfor encodingOciLimits.MaxManifestBytesreplaces duplicate private constsOciDictionaryConvertersorts annotation keys by UTF-8 byte sequence (not UTF-16 ordinal) to match Go'sencoding/json.Marshalbehavior