Problem
The gene updater application is encountering validation errors when attempting to create tags with empty values. Based on the log analysis from gene-updater.log, there are specific gRPC validation errors:
Error from gRPC update pool for gene DDB_G0280013 (Job ID 6e10e367-ebd9-4c1e-8f5a-4f1958657ccc): failed to AddTag for property description: rpc error: code = InvalidArgument desc = validation error:
- tag.value: value is required [required]
Error Statistics:
- Total errors: 2 out of 8,389 processed genes
- Error rate: ~0.024%
- Affected gene: DDB_G0280013
- Property type: "description"
Root Cause
The issue occurs in the HTML processing pipeline:
- HTML Stripping Function: The
stripHTMLWithParser() function in internal/featureannotation/cli/worker_funcs.go processes HTML content from ArangoDB properties
- Empty Results: Some HTML content, when stripped of all tags, results in empty strings (e.g., HTML with only tags and no text content)
- Missing Validation: The code doesn't validate that stripped text is non-empty before sending to gRPC
- gRPC Validation: The downstream gRPC service (modware-annotation) validates that tag values cannot be empty
Code Location
File: internal/featureannotation/cli/worker_funcs.go
HTML Processing Function (lines ~114-130):
strippedText, err := stripHTMLWithParser(prop.Value)
if err != nil {
return ProcessedGeneData{}, fmt.Errorf(...)
}
strippedProps = append(strippedProps, StrippedProperty{
OriginalName: prop.Name,
StrippedText: strippedText, // Can be empty
})
gRPC AddTag Call (lines ~187-195):
_, err := params.grpcClient.AddTag(params.ctx,
&fanno.AddTagRequest{
Id: params.featAnno.Id,
Tag: &fanno.TagPropertyCreate{
Tag: params.prop.OriginalName,
Value: params.prop.StrippedText, // Empty value causes error
CreatedBy: DefaultUserName,
},
})
Proposed Solution
Add validation in the HTML processing worker function to filter out properties with empty stripped text:
// Skip properties with empty stripped text to avoid gRPC validation errors
if strings.TrimSpace(strippedText) == "" {
continue
}
This should be added after the stripHTMLWithParser() call but before appending to strippedProps.
Implementation Requirements
- Validation: Add empty string check in
htmlProcessingWorkerFunc
- Logging: Add debug log when skipping empty properties
- Testing: Add unit tests for empty value scenarios
- Documentation: Update comments to explain the validation
Benefits
- Eliminates gRPC validation errors for empty tag values
- Improves processing reliability
- Reduces noise in error logs
- Maintains data integrity by not creating meaningless empty tags
Impact
- Low Risk: Only affects edge cases with empty HTML content
- High Benefit: Eliminates validation errors and improves success rate
- Backward Compatible: No breaking changes to existing functionality
Problem
The gene updater application is encountering validation errors when attempting to create tags with empty values. Based on the log analysis from
gene-updater.log, there are specific gRPC validation errors:Error Statistics:
Root Cause
The issue occurs in the HTML processing pipeline:
stripHTMLWithParser()function ininternal/featureannotation/cli/worker_funcs.goprocesses HTML content from ArangoDB propertiesCode Location
File:
internal/featureannotation/cli/worker_funcs.goHTML Processing Function (lines ~114-130):
gRPC AddTag Call (lines ~187-195):
Proposed Solution
Add validation in the HTML processing worker function to filter out properties with empty stripped text:
This should be added after the
stripHTMLWithParser()call but before appending tostrippedProps.Implementation Requirements
htmlProcessingWorkerFuncBenefits
Impact