feat: implement CSVDiscoverer and catalog registry models#271
feat: implement CSVDiscoverer and catalog registry models#271mnoserat wants to merge 42 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR introduces CSV catalog discovery functionality to locate base game manifests from verified CSV catalogs for Generals and Zero Hour games. It adds configuration models, constants, interface methods, service implementations, a CSVDiscoverer class implementing IContentDiscoverer with multi-language support, dependency injection registration, and comprehensive unit tests spanning 14+ new files. Changes
Sequence DiagramsequenceDiagram
participant Client
participant CSVDiscoverer
participant ConfigService
participant FileSystem
participant ResultBuilder
Client->>CSVDiscoverer: DiscoverAsync(query, cancellationToken)
activate CSVDiscoverer
CSVDiscoverer->>FileSystem: LoadCatalogEntriesAsync()
activate FileSystem
alt Index file exists & in cache
FileSystem-->>CSVDiscoverer: Return cached entries
else Index file not in cache
FileSystem->>FileSystem: Read index.json
FileSystem->>CSVDiscoverer: Return parsed entries
else Index file unavailable
CSVDiscoverer->>ConfigService: GetCsvCatalogConfiguration()
activate ConfigService
ConfigService-->>CSVDiscoverer: Return CsvValidationCatalogs
deactivate ConfigService
FileSystem-->>CSVDiscoverer: Return converted entries
end
deactivate FileSystem
CSVDiscoverer->>CSVDiscoverer: Filter by TargetGame & Language
CSVDiscoverer->>ResultBuilder: CreateSearchResult(entry, language)
activate ResultBuilder
ResultBuilder->>ResultBuilder: Set metadata (URL, gameType, version, language, fileCount)
ResultBuilder-->>CSVDiscoverer: Return ContentSearchResult
deactivate ResultBuilder
CSVDiscoverer-->>Client: Return OperationResult<ContentDiscoveryResult>
deactivate CSVDiscoverer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📋 Issue PlannerBuilt with CodeRabbit's Coding Plans for faster development and fewer bugs. View plan used: ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Additional Comments (1)
Prompt To Fix With AIThis is a comment left during a code review.
Path: docs/GameInstallationFilesRegistry/index.json
Line: 1:5
Comment:
schema mismatch - `CsvCatalogRegistryIndex` expects `Version` (int) and `Entries` (list), but JSON has `version` (string) and `registries` (list)
How can I resolve this? If you propose a fix, please make it concise. |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/GameInstallationFilesRegistry/index.json (1)
15-15:⚠️ Potential issue | 🟡 MinorMD5 checksum values have incorrect length.
The MD5 hashes are 31 characters but should be 32 hexadecimal characters. These appear to be placeholder values:
- Line 15:
"a1b2c3d4e5f67890123456789012345"(31 chars)- Line 31:
"f6e5d4c3b2a19876543210987654321"(31 chars)Please update with valid checksums or add a TODO comment if these are intentionally placeholder values for later implementation (as mentioned in line 2's version field).
Also applies to: 31-31
🤖 Fix all issues with AI agents
In `@GenHub/GenHub.Core/Models/Content/CsvCatalogConfiguration.cs`:
- Around line 27-53: CsvValidationCatalog currently declares FileCount as a
non-nullable int while CsvCatalogRegistryEntry.FileCount is nullable (int?),
which loses the distinction between "unknown" and "zero" during the conversion
in CSVDiscoverer.LoadCatalogEntriesAsync; change CsvValidationCatalog.FileCount
to int? to match CsvCatalogRegistryEntry, update its default to null, and ensure
CSVDiscoverer.LoadCatalogEntriesAsync continues to map FileCount = c.FileCount
without forcing a default value so null semantics are preserved.
In `@GenHub/GenHub.Core/Models/Content/CsvCatalogRegistryIndex.cs`:
- Around line 10-57: This file currently defines two types
(CsvCatalogRegistryIndex and CsvCatalogRegistryEntry) which triggers SA1402; to
fix, either move CsvCatalogRegistryEntry into its own file named
CsvCatalogRegistryEntry.cs with the same namespace and public class declaration
(keep properties and default initializers unchanged) and remove the nested
declaration from CsvCatalogRegistryIndex, or if you intend to keep them
together, add an explicit SA1402 suppression (e.g. a pragma or SuppressMessage
attribute) above the CsvCatalogRegistryEntry declaration explaining why they
must remain colocated.
In
`@GenHub/GenHub.Tests/GenHub.Tests.Core/Features/Content/CSVDiscovererTests.cs`:
- Line 17: Remove or rephrase the top-of-file AI authorship comment ("the whole
file is made by AI") in the CSVDiscovererTests file: either delete the comment
entirely or replace it with a standard authorship or copyright header; locate
the test class CSVDiscovererTests (and its containing namespace) to find the
file and update the header accordingly.
- Around line 22-93: Tests in CSVDiscovererTests are missing coverage for
language filtering, multi-language result generation, error handling,
cancellation support, and index.json loading for CSVDiscoverer.DiscoverAsync;
add focused unit tests that (1) assert case-insensitive language filtering and
"ALL" handling by passing queries with Language set (e.g., "de" vs "DE"), (2)
verify that when a CsvValidationCatalog lists multiple SupportedLanguages the
discoverer returns one item per language with ResolverMetadata["language"] set,
(3) simulate catalog loading failures (mock IConfigurationProviderService or
underlying loader to throw) and assert DiscoverAsync returns a graceful failure
or empty Data as expected, (4) exercise the CancellationToken by calling
DiscoverAsync with a canceled token and asserting operation respects
cancellation, and (5) add a test that supplies a non-empty IndexFilePath
(simulate loading index.json) to verify the primary index path is used rather
than the fallback configuration; locate tests by class CSVDiscovererTests and
target the DiscoverAsync method to add these cases.
In `@GenHub/GenHub/appsettings.json`:
- Around line 24-41: The CsvValidationCatalogs array is defined at the JSON root
instead of under the GenHub object, breaking binding to CsvCatalogConfiguration;
move the entire CsvValidationCatalogs array into the existing GenHub object so
it becomes GenHub.CsvValidationCatalogs, and while there, verify the two
CsvValidationCatalogs entries (the entries with gameType "Generals" and
"ZeroHour") have the correct distinct "url" values (update the ZeroHour URL if
it was a copy‑paste error) and ensure supportedLanguages/fileCount values are
correct.
In `@GenHub/GenHub/Features/Content/Services/ContentDiscoverers/CSVDiscoverer.cs`:
- Around line 229-230: The ResolverId value in CSVDiscoverer (the property
assignment line setting ResolverId = "CsvCatalogResolver") does not match issue
`#139`; change the string to "CSVResolver" so the ResolverId property in the
CSVDiscoverer class matches the specification (leave RequiresResolution = true
unchanged). Locate the ResolverId assignment in CSVDiscoverer.cs and replace
"CsvCatalogResolver" with "CSVResolver".
- Around line 25-30: CSVDiscoverer holds a SemaphoreSlim field named _cacheLock
which implements IDisposable but is never disposed; implement IDisposable on
CSVDiscoverer and add a Dispose (or DisposeAsync/ IAsyncDisposable if async
cleanup needed) that calls _cacheLock.Dispose() (and suppresses finalization if
you implement the full dispose pattern), ensure the class signature is changed
to public class CSVDiscoverer : IContentDiscoverer, IDisposable and update any
registrations or tests accordingly so the semaphore is released/cleaned up when
the discoverer is disposed.
- Around line 215-218: The hardcoded magic strings used when building manifest
IDs and resolver references (e.g., the "csvregistry" passed to
ManifestIdGenerator.GeneratePublisherContentId and the "CsvCatalogResolver"
usage) should be extracted to named constants; add a small static class like
CsvDiscovererConstants with const string PublisherId and ResolverId, then
replace literal occurrences in CSVDiscoverer (and any related uses) to reference
CsvDiscovererConstants.PublisherId and CsvDiscovererConstants.ResolverId to
centralize the values and remove the TODOs.
- Around line 126-138: The first unchecked read of _cachedEntries before
awaiting _cacheLock uses a non-volatile field which can lead to stale reads;
change the initial check to use Volatile.Read(ref _cachedEntries) (or mark the
backing field volatile) so the double-checked locking in the Get/Discover method
correctly observes updates, and keep the second check inside the lock as-is;
reference the _cachedEntries field and _cacheLock semaphore to locate the
double-check block and update the initial read accordingly.
…CatalogRegistryEntry
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@GenHub/GenHub.Core/Models/Content/CsvCatalogConfiguration.cs`:
- Around line 3-6: Update the XML summary in CsvCatalogConfiguration to
correctly reference the actual configuration key and property: replace the
mention of "CsvCatalogIndexPath" with "GenHub:IndexFilePath" and ensure it
refers to the IndexFilePath property on the CsvCatalogConfiguration class so the
documentation matches ConfigurationKeys.cs and the IndexFilePath member.
In `@GenHub/GenHub.Core/Models/Content/CsvCatalogRegistryEntry.cs`:
- Line 1: The file contains an unused using directive "using System;" at the top
of CsvCatalogRegistryEntry (class/file CsvCatalogRegistryEntry.cs); remove that
unused import so the file only includes necessary namespaces and no unused
System reference.
In `@GenHub/GenHub.Core/Models/Content/CsvCatalogRegistryIndex.cs`:
- Around line 19-23: The property CsvCatalogRegistryIndex.LastUpdatedAt
currently defaults to DateTime.UtcNow which masks missing/omitted JSON; change
it to use a sentinel (DateTime.MinValue) or make it nullable (DateTime?) so you
can distinguish "missing" vs "explicitly set"; update the property declaration
(and any callers/serialization expectations that rely on
JsonPropertyName("lastUpdated")) to remove the UtcNow initializer and handle
MinValue/null appropriately where the timestamp is consumed.
In
`@GenHub/GenHub.Tests/GenHub.Tests.Core/Features/Content/CSVDiscovererTests.cs`:
- Line 42: Update the test data in CSVDiscovererTests by changing the
SupportedLanguages entry from the lowercase "en" to the uppercase "EN" to match
issue `#139`'s CSV language-code specification; locate the SupportedLanguages =
new List<string> { "en" } usage in the test and replace with "EN", and if the
test contains any assertions that expect a normalized or case-sensitive value,
update those expected values to the uppercase form as well (e.g., any asserts
referencing language codes or normalization behavior in CSVDiscovererTests).
In `@GenHub/GenHub/Features/Content/Services/ContentDiscoverers/CSVDiscoverer.cs`:
- Around line 150-153: The Deserialize call in CSVDiscoverer.cs repeatedly
allocates a new JsonSerializerOptions (see the line with
JsonSerializer.Deserialize<CsvCatalogRegistryIndex>(json, new
JsonSerializerOptions { PropertyNameCaseInsensitive = true })); make a single
cached options instance by adding a private static readonly
JsonSerializerOptions (e.g., s_jsonOptions with PropertyNameCaseInsensitive =
true) to the CSVDiscoverer class and replace the inline new
JsonSerializerOptions usage with that static field when calling
JsonSerializer.Deserialize.
- Line 230: Update the CsvConstants.ResolverId constant to match issue `#139` by
changing the value of CsvConstants.ResolverId (referenced in CSVDiscoverer via
ResolverId = CsvConstants.ResolverId) from "CsvCatalogResolver" to
"CSVResolver"; locate the constant definition named ResolverId in the
CsvConstants class and set its string value to "CSVResolver".
- Around line 144-166: There is a TOCTOU race between File.Exists and
File.ReadAllTextAsync; remove the pre-check and attempt reading indexPath
directly using File.ReadAllTextAsync(indexPath, cancellationToken) inside the
existing try block, catch FileNotFoundException (and optionally
DirectoryNotFoundException) to treat as "missing index" and fall back to
configuration, and keep the existing catch(Exception ex) to log other failures;
update references to _config.IndexFilePath, CsvCatalogRegistryIndex,
loadedEntries, loadedFromIndex, and _logger accordingly so the code only reads
and deserializes when the file truly exists at read time.
There was a problem hiding this comment.
♻️ Duplicate comments (3)
GenHub/GenHub/Features/Content/Services/ContentDiscoverers/CSVDiscoverer.cs (3)
111-117:⚠️ Potential issue | 🟠 Major
NormalizeLanguageusesALL, which conflicts with the CSV policy tokenAll.The current canonicalization changes unified-language output to
ALL, which shifts manifest/metadata shape away from the policy/examples (...-All). KeepAllcanonical and uppercase only specific language codes.🔧 Proposed fix
- private static string NormalizeLanguage(string language) - { - if (string.Equals(language, "All", StringComparison.OrdinalIgnoreCase)) - return "ALL"; - return language.ToUpperInvariant(); - } + private static string NormalizeLanguage(string language) + { + var trimmed = language.Trim(); + return string.Equals(trimmed, CsvConstants.AllLanguagesFilter, StringComparison.OrdinalIgnoreCase) + ? CsvConstants.AllLanguagesFilter + : trimmed.ToUpperInvariant(); + }- if (string.IsNullOrWhiteSpace(query.Language) || normalizedQueryLanguage == "ALL") + if (string.IsNullOrWhiteSpace(query.Language) || normalizedQueryLanguage == CsvConstants.AllLanguagesFilter) { languagesToInclude = normalizedEntryLanguages; } - else if (normalizedEntryLanguages.Contains("ALL")) + else if (normalizedEntryLanguages.Contains(CsvConstants.AllLanguagesFilter)) { languagesToInclude = [normalizedQueryLanguage!]; }Also applies to: 187-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@GenHub/GenHub/Features/Content/Services/ContentDiscoverers/CSVDiscoverer.cs` around lines 111 - 117, The NormalizeLanguage canonicalization should preserve the policy token "All" (title case) rather than converting it to "ALL" and only uppercase specific language codes (e.g., "en", "fr") as needed; update the NormalizeLanguage implementation in CSVDiscoverer.cs so it returns "All" when input matches any case-variant of the policy token and otherwise normalizes language codes according to the existing rules, and then adjust any comparisons that rely on normalizedQueryLanguage or normalizedEntryLanguages (e.g., the condition branches using normalizedQueryLanguage == "ALL" and normalizedEntryLanguages.Contains("ALL") near the languagesToInclude logic and the similar block at 187-191) to check for "All" instead of "ALL".
81-97:⚠️ Potential issue | 🟠 MajorUnsupported
TargetGamecurrently broadens results for unknown enum values.When the switch does not map the enum value, the code logs and skips filtering, which returns unrelated game entries instead of honoring the filter intent. Return an empty success result (or explicit failure) for unsupported values.
🔧 Proposed fix
if (query.TargetGame.HasValue) { string? targetGameStr = query.TargetGame.Value switch { GameType.Generals => CsvConstants.GeneralsGameType, GameType.ZeroHour => CsvConstants.ZeroHourGameType, _ => null, }; - if (targetGameStr != null) - { - filteredEntries = filteredEntries.Where(e => e.GameType.Equals(targetGameStr, StringComparison.OrdinalIgnoreCase)).ToList(); - } - else + if (targetGameStr is null) { - _logger.LogWarning("Unsupported game type encountered: {GameType}. Skipping GameType filter.", query.TargetGame.Value); + _logger.LogWarning("Unsupported game type encountered: {GameType}. Returning no results.", query.TargetGame.Value); + return OperationResult<ContentDiscoveryResult>.CreateSuccess(new ContentDiscoveryResult()); } + + filteredEntries = filteredEntries + .Where(e => e.GameType.Equals(targetGameStr, StringComparison.OrdinalIgnoreCase)) + .ToList(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@GenHub/GenHub/Features/Content/Services/ContentDiscoverers/CSVDiscoverer.cs` around lines 81 - 97, The CSVDiscoverer currently logs and skips when query.TargetGame has an unmapped enum value, which broadens results; update the handling inside the method that processes query.TargetGame so that when the switch yields null (unsupported enum), you do not continue with the original filteredEntries—instead short-circuit to return an empty successful result (or set filteredEntries to an empty list) so no unrelated entries are returned; locate the block using query.TargetGame, the switch mapping to CsvConstants.GeneralsGameType/CsvConstants.ZeroHourGameType, and the _logger.LogWarning call, and replace the "skip filter" behavior with a clear early-return or assignment of an empty collection to filteredEntries.
284-309:⚠️ Potential issue | 🟠 MajorManifest identity is still built from raw
entry.GameTypecasing.
contentNameand metadata rely on source casing, so IDs can diverge (generalsvsGenerals) for semantically identical entries. Use canonical game type derived from parsed enum for stable IDs and metadata.🔧 Proposed fix
+ var canonicalGameType = gameType switch + { + GameType.Generals => CsvConstants.GeneralsGameType, + GameType.ZeroHour => CsvConstants.ZeroHourGameType, + _ => entry.GameType, + }; + - var contentName = $"{entry.GameType}-{entry.Version}-{language}"; + var contentName = $"{canonicalGameType}-{entry.Version}-{language}"; @@ - Name = $"{entry.GameType} {entry.Version} ({language})", - Description = $"Base game installation files for {entry.GameType} v{entry.Version}. Language: {language}", + Name = $"{canonicalGameType} {entry.Version} ({language})", + Description = $"Base game installation files for {canonicalGameType} v{entry.Version}. Language: {language}", @@ - result.ResolverMetadata[CsvConstants.GameTypeMetadataKey] = entry.GameType; + result.ResolverMetadata[CsvConstants.GameTypeMetadataKey] = canonicalGameType;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@GenHub/GenHub/Features/Content/Services/ContentDiscoverers/CSVDiscoverer.cs` around lines 284 - 309, Summary: contentName and GameType metadata are built from raw entry.GameType (source casing) causing non-canonical IDs; use the parsed/canonical gameType instead. Fix: in CSVDiscoverer (where contentName, Id and result are created) replace occurrences of entry.GameType when generating contentName, the manifest Id, and the ResolverMetadata[CsvConstants.GameTypeMetadataKey] with the canonical gameType (e.g., gameType.ToString() or a canonicalGameType variable derived from the parsed enum) so IDs and metadata use stable casing; ensure display strings (result.Name/Description) can still use entry.GameType or formatted gameType if you want user-facing casing preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@GenHub/GenHub/Features/Content/Services/ContentDiscoverers/CSVDiscoverer.cs`:
- Around line 111-117: The NormalizeLanguage canonicalization should preserve
the policy token "All" (title case) rather than converting it to "ALL" and only
uppercase specific language codes (e.g., "en", "fr") as needed; update the
NormalizeLanguage implementation in CSVDiscoverer.cs so it returns "All" when
input matches any case-variant of the policy token and otherwise normalizes
language codes according to the existing rules, and then adjust any comparisons
that rely on normalizedQueryLanguage or normalizedEntryLanguages (e.g., the
condition branches using normalizedQueryLanguage == "ALL" and
normalizedEntryLanguages.Contains("ALL") near the languagesToInclude logic and
the similar block at 187-191) to check for "All" instead of "ALL".
- Around line 81-97: The CSVDiscoverer currently logs and skips when
query.TargetGame has an unmapped enum value, which broadens results; update the
handling inside the method that processes query.TargetGame so that when the
switch yields null (unsupported enum), you do not continue with the original
filteredEntries—instead short-circuit to return an empty successful result (or
set filteredEntries to an empty list) so no unrelated entries are returned;
locate the block using query.TargetGame, the switch mapping to
CsvConstants.GeneralsGameType/CsvConstants.ZeroHourGameType, and the
_logger.LogWarning call, and replace the "skip filter" behavior with a clear
early-return or assignment of an empty collection to filteredEntries.
- Around line 284-309: Summary: contentName and GameType metadata are built from
raw entry.GameType (source casing) causing non-canonical IDs; use the
parsed/canonical gameType instead. Fix: in CSVDiscoverer (where contentName, Id
and result are created) replace occurrences of entry.GameType when generating
contentName, the manifest Id, and the
ResolverMetadata[CsvConstants.GameTypeMetadataKey] with the canonical gameType
(e.g., gameType.ToString() or a canonicalGameType variable derived from the
parsed enum) so IDs and metadata use stable casing; ensure display strings
(result.Name/Description) can still use entry.GameType or formatted gameType if
you want user-facing casing preserved.
| private static void AddCSVPipeline(IServiceCollection services) | ||
| { | ||
| // Register CSV discoverer (concrete and interface) | ||
| services.AddTransient<CSVDiscoverer>(); | ||
| services.AddTransient<IContentDiscoverer, CSVDiscoverer>(); | ||
| } |
There was a problem hiding this comment.
Transient lifetime makes the built-in cache useless and leaks SemaphoreSlim
CSVDiscoverer contains a double-checked locking cache (_cachedEntries + SemaphoreSlim _cacheLock) designed to load and re-use entries across calls. Registering it as Transient means the DI container creates a new instance on every resolution, so the cache is always empty and discarded immediately — providing zero benefit while adding unnecessary SemaphoreSlim allocations.
There is also a resource-leak concern: the .NET DI container does not call Dispose() on Transient IDisposable services (only scoped/singleton registrations enjoy that). The SemaphoreSlim allocated in each discarded instance is therefore never disposed.
Consider Singleton (safest for a read-only, no-user-state cache) or at minimum Scoped:
| private static void AddCSVPipeline(IServiceCollection services) | |
| { | |
| // Register CSV discoverer (concrete and interface) | |
| services.AddTransient<CSVDiscoverer>(); | |
| services.AddTransient<IContentDiscoverer, CSVDiscoverer>(); | |
| } | |
| private static void AddCSVPipeline(IServiceCollection services) | |
| { | |
| // Register CSV discoverer (concrete and interface) | |
| services.AddSingleton<CSVDiscoverer>(); | |
| services.AddSingleton<IContentDiscoverer, CSVDiscoverer>(); | |
| } |
If Singleton is not appropriate, the caching logic should be removed to avoid dead code.
Prompt To Fix With AI
This is a comment left during a code review.
Path: GenHub/GenHub/Infrastructure/DependencyInjection/ContentPipelineModule.cs
Line: 288-293
Comment:
**`Transient` lifetime makes the built-in cache useless and leaks `SemaphoreSlim`**
`CSVDiscoverer` contains a double-checked locking cache (`_cachedEntries` + `SemaphoreSlim _cacheLock`) designed to load and re-use entries across calls. Registering it as `Transient` means the DI container creates a **new instance on every resolution**, so the cache is always empty and discarded immediately — providing zero benefit while adding unnecessary `SemaphoreSlim` allocations.
There is also a resource-leak concern: the .NET DI container does **not** call `Dispose()` on `Transient` `IDisposable` services (only scoped/singleton registrations enjoy that). The `SemaphoreSlim` allocated in each discarded instance is therefore never disposed.
Consider `Singleton` (safest for a read-only, no-user-state cache) or at minimum `Scoped`:
```suggestion
private static void AddCSVPipeline(IServiceCollection services)
{
// Register CSV discoverer (concrete and interface)
services.AddSingleton<CSVDiscoverer>();
services.AddSingleton<IContentDiscoverer, CSVDiscoverer>();
}
```
If `Singleton` is not appropriate, the caching logic should be removed to avoid dead code.
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
The Use a proper placeholder version (e.g. Prompt To Fix With AIThis is a comment left during a code review.
Path: docs/GameInstallationFilesRegistry/index.json
Line: 2
Comment:
**Placeholder text in a production data file**
The `version` field currently contains `"test this will be implemented later in #156"` — a human-readable TODO comment embedded in a machine-consumed JSON file. This value will be deserialized into `CsvCatalogRegistryIndex.Version` and could surface in logs or diagnostics, making them confusing or misleading.
Use a proper placeholder version (e.g. `"0.0.0"` or `"1.0.0"`) until the real versioning scheme from issue #156 is in place.
```suggestion
"version": "0.0.0",
```
How can I resolve this? If you propose a fix, please make it concise. |
Code Review SummaryStatus: 1 Critical Issue Found | Recommendation: Fix before merge Overview
Issue Details (click to expand)CRITICAL
Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (9 files)
Previous Issues StatusCarried forward from previous review (54 comments):
Reviewed by glm-4.7 · 2,442,253 tokens |
Additional Comments (2)
The Prompt To Fix With AIThis is a comment left during a code review.
Path: docs/GameInstallationFilesRegistry/index.json
Line: 2
Comment:
**Placeholder text committed as production data**
The `version` field contains a TODO/placeholder string `"test this will be implemented later in #156"` instead of a real schema version. This value will be deserialized into `CsvCatalogRegistryIndex.Version` and could be surfaced in logs or diagnostics, which is misleading in any non-development environment. Either replace it with a proper version string (e.g. `"1.0"`) or track the schema versioning work in issue #156 and block this file from merging until it's resolved.
```suggestion
"version": "1.0",
```
How can I resolve this? If you propose a fix, please make it concise.
Both MD5 values in this file are 31 hex characters long, but a valid MD5 digest is always exactly 32 hex characters (128 bits). Verification: Any code that validates these checksums against the actual downloaded files will always fail (or silently pass if no length check is performed, which is equally problematic). The SHA-256 values are the correct 64 characters, so only the MD5 values need to be corrected. Replace them with the real MD5 digests of the corresponding CSV files, or remove the Prompt To Fix With AIThis is a comment left during a code review.
Path: docs/GameInstallationFilesRegistry/index.json
Line: 15-18
Comment:
**Invalid MD5 checksums — 31 hex characters instead of 32**
Both MD5 values in this file are 31 hex characters long, but a valid MD5 digest is always exactly 32 hex characters (128 bits). Verification:
```
"a1b2c3d4e5f67890123456789012345" → 31 chars ❌ (Generals entry)
"f6e5d4c3b2a19876543210987654321" → 31 chars ❌ (ZeroHour entry)
```
Any code that validates these checksums against the actual downloaded files will always fail (or silently pass if no length check is performed, which is equally problematic). The SHA-256 values are the correct 64 characters, so only the MD5 values need to be corrected. Replace them with the real MD5 digests of the corresponding CSV files, or remove the `"md5"` field until the correct values are available.
How can I resolve this? If you propose a fix, please make it concise. |
undead2146
left a comment
There was a problem hiding this comment.
Besides a few comments this looks good.
I am not sure if we need the CsvValidationCatalogs in the appsettings.json, as this can using index.json gist url on github seems enough since the registry entries themselves will be there anyway.
Only use case is that maybe users want change the index.json. But changing URL makes no sense, the index.json is a registry that contains a list of all the files in a GameInstallation and their structure.
@undead2146 I will talk about this on Discord |
| /// <summary> | ||
| /// Special language filter value to include all languages. | ||
| /// </summary> | ||
| public const string AllLanguagesFilter = "All"; |
There was a problem hiding this comment.
CRITICAL: Breaking change in AllLanguagesFilter constant value
Changed from "ALL" to "All" (title case) without updating ContentSearchQuery.NormalizeLanguage() which still returns "ALL" (uppercase). This incompatibility will cause CSV discoverer to fail when no language filter is specified in queries, as the comparison normalizedQueryLanguage == CsvConstants.AllLanguagesFilter will always be false ("ALL" != "All").
Fix: Either:
- Update
ContentSearchQuery.NormalizeLanguage()to return"All"instead of"ALL" - Or revert
AllLanguagesFilterback to"ALL"for consistency with existing code
Overview
This PR implements the CSVDiscoverer as specified in issue #139. It enables the discovery of base game manifests (Generals 1.08 and Zero Hour 1.04) from CSV catalogs with full multi-language support.
Changes
Related Issue
Closes #139
Greptile Summary
This PR implements
CSVDiscovererfor discovering base game manifests (Generals 1.08 / Zero Hour 1.04) from CSV catalogs, along with the supportingCsvCatalogRegistryIndex/CsvCatalogRegistryEntrymodels, configuration binding, DI registration, and a test suite. The PR author acknowledges it is not finished and still has outstanding warnings to resolve.Confidence Score: 4/5
Not ready to merge — PR author acknowledges outstanding warnings and the Transient DI / cache issue flagged in a prior thread remains unresolved.
All findings in this pass are P2. However, the prior thread contains an unresolved P1 (Transient lifetime makes the SemaphoreSlim cache useless and leaks disposables) and the author's own note says the PR still needs warning resolution, capping confidence at 4.
ContentPipelineModule.cs (Transient registration), CSVDiscoverer.cs (DefaultIndexFileUrl priority), AppConfiguration.cs (using ordering)
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[DiscoverAsync] --> B{ContentType != GameInstallation?} B -- yes --> C[Return empty success] B -- no --> D[LoadCatalogEntriesAsync] D --> E{_cachedEntries != null?} E -- yes --> F[Return cache] E -- no --> G[GetCatalogSources] G --> H[DefaultIndexFileUrl - always first] H --> I{Entries found?} I -- yes --> J[Cache & return entries] I -- no --> K[Configured IndexFilePath] K --> L{Entries found?} L -- yes --> J L -- no --> M[CsvValidationCatalogs fallback] M --> J J --> N[Filter by GameType] N --> O[Expand per language] O --> P[CreateSearchResult per entry+language] P --> Q[Return ContentDiscoveryResult]Prompt To Fix All With AI
Reviews (8): Last reviewed commit: "fix(csv-discovery): load CSV catalogs fr..." | Re-trigger Greptile
Context used: