Skip to content

Fix: Add validation to prevent empty tag values in gene updater #536

@cybersiddhu

Description

@cybersiddhu

Problem

The gene updater application is encountering validation errors when attempting to create tags with empty values. Based on the log analysis from gene-updater.log, there are specific gRPC validation errors:

Error from gRPC update pool for gene DDB_G0280013 (Job ID 6e10e367-ebd9-4c1e-8f5a-4f1958657ccc): failed to AddTag for property description: rpc error: code = InvalidArgument desc = validation error:
 - tag.value: value is required [required]

Error Statistics:

  • Total errors: 2 out of 8,389 processed genes
  • Error rate: ~0.024%
  • Affected gene: DDB_G0280013
  • Property type: "description"

Root Cause

The issue occurs in the HTML processing pipeline:

  1. HTML Stripping Function: The stripHTMLWithParser() function in internal/featureannotation/cli/worker_funcs.go processes HTML content from ArangoDB properties
  2. Empty Results: Some HTML content, when stripped of all tags, results in empty strings (e.g., HTML with only tags and no text content)
  3. Missing Validation: The code doesn't validate that stripped text is non-empty before sending to gRPC
  4. gRPC Validation: The downstream gRPC service (modware-annotation) validates that tag values cannot be empty

Code Location

File: internal/featureannotation/cli/worker_funcs.go

HTML Processing Function (lines ~114-130):

strippedText, err := stripHTMLWithParser(prop.Value)
if err != nil {
    return ProcessedGeneData{}, fmt.Errorf(...)
}
strippedProps = append(strippedProps, StrippedProperty{
    OriginalName: prop.Name,
    StrippedText: strippedText, // Can be empty
})

gRPC AddTag Call (lines ~187-195):

_, err := params.grpcClient.AddTag(params.ctx,
    &fanno.AddTagRequest{
        Id: params.featAnno.Id,
        Tag: &fanno.TagPropertyCreate{
            Tag:       params.prop.OriginalName,
            Value:     params.prop.StrippedText, // Empty value causes error
            CreatedBy: DefaultUserName,
        },
    })

Proposed Solution

Add validation in the HTML processing worker function to filter out properties with empty stripped text:

// Skip properties with empty stripped text to avoid gRPC validation errors
if strings.TrimSpace(strippedText) == "" {
    continue
}

This should be added after the stripHTMLWithParser() call but before appending to strippedProps.

Implementation Requirements

  1. Validation: Add empty string check in htmlProcessingWorkerFunc
  2. Logging: Add debug log when skipping empty properties
  3. Testing: Add unit tests for empty value scenarios
  4. Documentation: Update comments to explain the validation

Benefits

  • Eliminates gRPC validation errors for empty tag values
  • Improves processing reliability
  • Reduces noise in error logs
  • Maintains data integrity by not creating meaningless empty tags

Impact

  • Low Risk: Only affects edge cases with empty HTML content
  • High Benefit: Eliminates validation errors and improves success rate
  • Backward Compatible: No breaking changes to existing functionality

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions