Skip to content

[RTI][SUB] Write the Ingestion Script for RTIs#5

Open
ChanukaUOJ wants to merge 5 commits intoLDFLK:mainfrom
ChanukaUOJ:ingestion-script
Open

[RTI][SUB] Write the Ingestion Script for RTIs#5
ChanukaUOJ wants to merge 5 commits intoLDFLK:mainfrom
ChanukaUOJ:ingestion-script

Conversation

@ChanukaUOJ
Copy link
Member

This PR includes the Ingestion Script to insert RTI data into the databases through OpenGIN. As the first step this covers the changes to insert Nodes and create the relations in the Graph db.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers the foundational ingestion script for the RTI Tracker project, written in Go. Its primary purpose is to automate the population of RTI data into the OpenGIN graph database. The script intelligently processes structured CSV data, creates new RTI entities as nodes, and dynamically links them to existing organizational entities, thereby building a comprehensive relationship graph. This initial step significantly streamlines the data onboarding process for the RTI system.

Highlights

  • RTI Data Ingestion Script: Introduced a new Go-based ingestion script designed to process RTI (Right to Information) data from CSV files and insert it into the OpenGIN graph database.
  • Graph Database Integration: The script handles the creation of RTI document nodes and establishes relationships with receiver institutions within the graph database, leveraging OpenGIN's ingestion and read APIs.
  • Data Processing Logic: Implemented logic to walk through data directories, parse request.csv files, extract relevant RTI request details, and validate input before processing.
  • API Client and Services: Developed internal API clients and services for interacting with the ingestion and read endpoints of the OpenGIN system, including retry mechanisms and custom error handling.
  • Testing and Utilities: Included unit tests for the core RTI service logic, a utility for date conversion, and a test_run.sh script for easy execution of tests and the main application.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an ingestion script for RTI data, which is a great step forward. The code is generally well-structured, separating concerns into different packages. However, I've identified several critical issues related to error handling, resource management, and logic that could lead to runtime panics, silent failures, or data duplication. Key areas for improvement include properly handling errors from file I/O and API calls, ensuring resource cleanup (like closing files), fixing a logical flaw in parent entity lookup, and making the data ingestion process idempotent. Please see my detailed comments for specific suggestions.


// 1. Insert the RTI Entity to Graph
// Create a deterministic UUID so rerunning the script on the same data doesn't duplicate nodes
hashInput := fmt.Sprintf("%s_%s", entity.Created, entity.Index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user tries to create a new rti request with the same created date and accidentally passes an index that already exists, this will give the same id and will overwrite the existing entity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index is passed by user and its uniqueness is not enforced anywhere

Copy link
Member Author

@ChanukaUOJ ChanukaUOJ Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the script the Index retrieves from the folder structure. Lets say in one particular date if multiple RTIs send, it should be stored in separate folders, like 1 , 2, 3, 4... like wise. So the use can't create multiple directories with the same name inside one date folder.

Comment on lines +84 to +113
var parentID string
if len(filteredSearchResult) > 0 {
sort.Slice(filteredSearchResult, func(i, j int) bool {
// Sort in descending order by created date
timeI, errI := time.Parse(time.RFC3339, filteredSearchResult[i].Created)
timeJ, errJ := time.Parse(time.RFC3339, filteredSearchResult[j].Created)
if errI != nil || errJ != nil {
return filteredSearchResult[i].Created > filteredSearchResult[j].Created
}
return timeI.After(timeJ)
})

entityCreatedTime, err := time.Parse(time.RFC3339, entity.Created)
if err != nil {
return nil, fmt.Errorf("failed time parsing")
}

for _, result := range filteredSearchResult {
resultTime, err := time.Parse(time.RFC3339, result.Created)
if err == nil && !resultTime.After(entityCreatedTime) {
parentID = result.ID
break
}
}

// Fallback: if no floor date is found or parse failed, pick the first one
if parentID == "" {
return nil, fmt.Errorf("skipping relation update (receiver not found for the given date): %s", entity.Created)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it comes to ministers and departments we can't rely on the created time of the node, we need to check the created time of the relationship. For each filtered node, you can retrieve all AS_DEPARTMENT/AS_MINISTER (depending on the node's minor kind) relationships that are incoming and then check their active date for which node has a relationship that was active during the rti's created date. If there are multiple nodes with the same name and a relationship active on that date then return an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RTI][SUB] Write the Ingestion Script for RTI node & Relation creation

2 participants