This repository contains ingest scripts created by Data Product for use with ETL.
Data ingestion scripts used by the Data Product team for inserting regulatory data (dockets, documents, comments, summaries, and extracted text) into both PostgreSQL and OpenSearch indices.
Main entry point for running ingestion scripts:
- OpenSearch: comments, extracted text
- PostgreSQL: documents, dockets, summaries
Handles comment and extracted text ingestion into OpenSearch.
Also supports bulk ingest from .json files in S3.
Scripts for inserting JSON data into PostgreSQL.
Each extracts fields and handles date conversion via date.py.
Database connection logic:
sql.pyconnects to PostgreSQL via AWS Secrets Manageropensearch.pysupports both local and production OpenSearch
Refer to requirements.md