Skip to content

UmdTask430_Data605_Spring2026_txtai_for_market_research_1#452

Open
Gauravp2104 wants to merge 3 commits intogpsaggese:masterfrom
Gauravp2104:UmdTask430_DATA605_Spring2026_txtai_for_market_research_1
Open

UmdTask430_Data605_Spring2026_txtai_for_market_research_1#452
Gauravp2104 wants to merge 3 commits intogpsaggese:masterfrom
Gauravp2104:UmdTask430_DATA605_Spring2026_txtai_for_market_research_1

Conversation

@Gauravp2104
Copy link
Copy Markdown

Related to #430

Progress update 1

  • Project template files (Dockerfile, requirements.txt, shell scripts)
  • README with architecture overview and setup instructions

Next steps

  • Implement txtai embeddings pipeline
  • Add data ingestion tools (NewsAPI, SEC EDGAR, web scraper)
  • Build out individual agents (sentiment, diligence, web research, earnings, regulatory)

Reviewers: @gpsaggese @protocorn
Assignee: @Gauravp2104 @SanjanaK1801

Gaurav Prakash and others added 3 commits April 1, 2026 10:22
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Architecture:
- Hot tier: KeyDB for caching and live data
- Warm tier: PostgreSQL + pgvector for filings and embeddings
- Cold tier: MinIO for raw document archive

Components:
- Storage clients (KeyDB, PostgreSQL, MinIO) with connection pooling
- FilingsManager for high-level warm tier operations
- SEC EDGAR collector with full pipeline support
- Data collectors for news, web, and social sources
- Ingestion pipeline orchestrator
- Docker Compose for local infrastructure

Scripts:
- run_sec_collector.py: CLI for SEC filings collection

Infrastructure:
- docker-compose.yml: KeyDB, pgvector, MinIO services
- sql/init.sql: Database schema with vector indexes
- .gitignore: Comprehensive exclusions for Python, IDE, secrets
@protocorn
Copy link
Copy Markdown
Collaborator

This PR currently includes a very large number of unrelated file changes

Your PR is expected to include only your own project folder under:
class_project/data605/Spring2026/projects/

Please remove unrelated repository-wide changes and ensure that your PR only contains the files required for your project submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants