[Infra] Dockerize AstroML Environment by jaynomyaro · Pull Request #1 · jaynomyaro/astroml

jaynomyaro · 2026-06-01T20:00:24Z

Summary

This PR introduces a fully containerized development and runtime environment for AstroML, enabling consistent setup across local machines, CI, and production deployments using Docker.

Changes Made
Added Dockerfile for building the AstroML application image.
Added docker-compose.yml for orchestrating services (app, database, and optional cache).
Introduced environment variable support via .env configuration.
Standardized runtime dependencies inside container.
Added health checks for application service.
Improved reproducibility of local development environment.
Problem

Previously, running AstroML required manual setup of:

system dependencies
Python/Node environment versions
database configuration
inconsistent local tooling setups

This led to:

onboarding friction
environment drift between developers
CI inconsistencies
Solution

Dockerization ensures:

identical runtime across environments
simplified onboarding (docker compose up)
isolated dependencies
reproducible builds in CI/CD pipelines
Key Features

Application Container
Consistent runtime environment
Locked dependency versions
Optimized build layers for faster rebuilds
Multi-Service Setup
App service
Database service (e.g., PostgreSQL)
Optional Redis cache for background tasks or ML pipelines
Environment Management
.env driven configuration
Secure separation of secrets and runtime config
Health Checks
Ensures service readiness before dependency startup
Improves orchestration stability
Example Usage
docker compose up --build
Testing
Local Tests
Verified app builds successfully inside container
Confirmed database connectivity from app service
Tested hot reload in development mode
Validated environment variable injection
Integration Tests
Multi-container startup order
Service discovery between app and DB
Persistent volume storage validation
Impact
Simplified onboarding for new developers
Reduced environment-related bugs
Improved CI/CD consistency
Faster setup time (minutes instead of hours)
Better production parity
Type of Change
Infrastructure
DevOps
Developer Experience Improvement..closed [Infra] Dockerize AstroML Environment Traqora/astroml#78

- Add new data_quality.py module with temporal, referential, business, and statistical validators - Extend test_data_quality.py with additional validation test classes - Add test_extended_data_quality.py with comprehensive test coverage - Update validation __init__.py to expose new validation utilities - Add comprehensive documentation in DATA_QUALITY_VALIDATION.md - Include test import script for validation verification Features: - Temporal consistency validation (timestamp ordering, future detection) - Referential integrity validation (account/asset formats, ledger sequences) - Business rules validation (fees, amounts, operation counts, balances) - Statistical validation (outlier detection, gap analysis, pattern detection) - Comprehensive validation pipeline with quality scoring and reporting - 50+ new test methods across all validation dimensions - Complete error type categorization and detailed error reporting

Add enterprise-grade feature management system with: Core Components: - FeatureStore: Main interface for feature registration, computation, storage - FeatureEngine: Parallel computation engine with task management - FeatureTransformers: Comprehensive feature preprocessing and engineering - FeatureCache: Multi-level caching (Memory, Disk, Redis) with optimization - FeatureVersionManager: Complete versioning and lineage tracking - FeatureStorage: SQLite + Parquet storage backend Key Features: - Feature registration and discovery with metadata management - Parallel feature computation with dependency resolution - Multi-level caching strategies (LRU, TTL, distributed) - Feature versioning with change tracking and lineage - Advanced feature engineering (interactions, polynomials, time features) - Storage optimization with compression and indexing - Point-in-time queries and entity-based filtering - Feature sets for organized feature groups Integration: - Seamless integration with existing astroml feature modules - Backward compatibility maintained - Built-in computers for frequency, structural, and node features - Support for custom feature computers and transformers Testing & Quality: - 400+ comprehensive test cases covering all components - Unit, integration, performance, and error handling tests - Complete test coverage for all major functionality - Robust error handling and validation Documentation & Examples: - Comprehensive documentation (800+ lines) - Complete working example script (420+ lines) - API reference, best practices, and troubleshooting - Verification report with quality assessment Files Added: - astroml/features/feature_store.py (1,005 lines) - astroml/features/feature_engine.py (715 lines) - astroml/features/feature_transformers.py (660 lines) - astroml/features/feature_cache.py (790 lines) - astroml/features/feature_versioning.py (825 lines) - tests/features/test_feature_store.py (704 lines) - tests/features/test_feature_transformers.py (550 lines) - tests/features/test_feature_cache.py (580 lines) - docs/FEATURE_STORE.md (800+ lines) - examples/feature_store_example.py (420+ lines) - FEATURE_STORE_VERIFICATION_REPORT.md Files Modified: - astroml/features/__init__.py (updated imports) Total: 15,000+ lines of production-ready code with enterprise-grade capabilities.

feat: add script to compress node embeddings for smart contract gating (#84)

docs: Add comprehensive API documentation for AstroML framework

Add Temporal GNN Models

Add comprehensive data quality validation framework

https://github.com/Traqora/astroml.git

Implement a real-time transaction stream chart in the loyalty dashboard so incoming Stellar activity is visible immediately, and fix frontend build/test issues required to ship and verify the feature. Made-with: Cursor

Resolve web merge conflicts by preserving the live Stellar transaction visualization while integrating upstream monitoring and fraud dashboard updates. Made-with: Cursor

…eam-visualization feat(web): add real-time Stellar transaction visualization

- Add integration test directory structure with shared fixtures - Add end-to-end ingestion pipeline integration tests - Add feature engineering pipeline integration tests - Add model training pipeline integration tests - Add validation and calibration integration tests - Add graph construction and snapshot integration tests - Add streaming ingestion integration tests - Add comprehensive full pipeline integration tests - Update requirements.txt with integration test dependencies

- Add blockchain transaction types to lib/types.ts - Create transaction API functions in api/transactions.ts - Create useTransactionHistory hook for data fetching - Create TransactionHistoryTable component for displaying transactions - Create TransactionHistoryPage component with filters and pagination - Add TransactionHistoryPage to App.tsx

- Create comprehensive unit tests for admin authentication checks - Create unit tests for validator registration authentication - Create unit tests for validator activation/deactivation - Create unit tests for reputation-based authentication - Create unit tests for confidence-based authentication - Create unit tests for unregistered address authentication - Create unit tests for session-like behavior through validator state - Create unit tests for configuration-based authentication - Create integration tests for authentication flow - Create integration tests for authorization scenarios - Add auth_tests module to lib.rs Note: This project uses Soroban smart contract address-based authentication rather than traditional token-based authentication. Tests cover the existing authentication mechanisms including admin checks, validator lifecycle, reputation/confidence thresholds, and configuration-based authorization.

…hints This PR addresses four issues to improve robustness and usability: - **#181**: Add pydantic validation for `config/database.yaml` with clear error messages and CLI flag `astroml config --print-db` to display effective configuration - **#172**: Parameterize example scripts to use script-relative paths, making them runnable from any working directory (note: `feature_store_example.py` not found, fixed existing examples instead) - **#158**: Add schema/version metadata to model checkpoints and enhance `load_checkpoint()` with comprehensive validation for architecture mismatches, device compatibility, and file corruption - **#191**: Verify type hints in public modules - all modules (graph_utils.py, cli.py, ingestion/) already have comprehensive type hints Closes #181, #172, #158, #191

Fix database validation, example paths, checkpoint loading, and type …

Add comprehensive integration tests

…-and-configs

Implement embedding drift detection

Build alert prioritization and triage system

…gration Implements an in-app feedback system for bug reports, feature requests, and general comments (#308). Backend (api): - POST /api/v1/feedback — create feedback (category, message, optional email + screenshot data URL); best-effort opens a GitHub issue when a token/repo are configured and stores its URL. - GET /api/v1/feedback — admin list with status/category filters + pagination. - PATCH /api/v1/feedback/{id} — admin status update (open/planned/ in_progress/completed/declined). - GET /api/v1/feedback/roadmap — public roadmap grouped by planned / in_progress / completed. - Feedback ORM model on Base.metadata; services/github.py issue creation (no-op without credentials, never fails the request). Frontend (web): - Feedback.tsx: category selection, message, optional email, optional screenshot attachment (read as a data URL), success state. Tests: 9 API tests (create, category/screenshot validation, list filter, status update + roadmap, 404/422) and 4 web tests (render, validation, submit with category, server error). Closes #308

feat(feedback): in-app feedback collection with roadmap & GitHub integration

feat: ai-Driven Analytics & System Resilience Implementation

feat: analytics and RAG system services with clustering

…r-llm-features Add LLM feedback endpoints, deterministic LLM mocking, and integration tests

[LLM] Create LLM feature documentation

…s, context, and RAG - feat(prompts): add prompt template engine with Jinja2 templating - TemplateEngine for rendering templates with variable substitution - PromptRegistry for versioned template storage and retrieval - Support for A/B testing with weighted variant selection - Cache management for performance optimization - feat(embeddings): implement embeddings service for vector operations - EmbeddingsService for generating and storing vector embeddings - Support for multiple embedding models (OpenAI, Cohere, Sentence-Transformers) - Configurable chunking strategies (fixed-size, semantic, recursive) - Similarity search with cosine distance and metadata filtering - Batch processing for efficient embedding generation - feat(context): build context management system for conversations - ContextManager with token budgeting and conversation history - Multiple pruning strategies (sliding window, importance, summarization, hybrid) - Message role tracking (system, user, assistant) - Token estimation and context window management - Conversation history persistence and export - feat(rag): implement end-to-end RAG pipeline - RAGPipeline orchestrator combining retrieval and generation - Retriever with document management and similarity search - Simple reranker for improving result relevance - Citation generation from retrieved sources - Hallucination detection comparing response against retrieved context - DocumentIngestor for ingesting from files and directories - Query history and statistics tracking ## Implementation Details ### Prompt Template Engine (Issue 441) - Jinja2-based template rendering with validation - Semantic versioning for prompt templates - A/B testing support with configurable traffic routing - Template caching with clear_cache() method - Variable type conversion (str, int, float, bool) ### Embeddings Service (Issue 443) - Provider-agnostic architecture for multiple embedding models - Document chunking with overlap for context preservation - Metadata storage alongside embeddings - Efficient similarity search (cosine, euclidean) - Cache management for frequently accessed embeddings - Batch processing for 10K+ documents ### Context Management (Issue 442) - Token counting per message and total budget - System prompt preservation (never pruned) - Four pruning strategies with configurable parameters - Message importance scoring based on role and metadata - Conversation history export for persistence - Token usage statistics ### RAG Pipeline (Issue 444) - Multi-source document ingestion (markdown, text, lists) - Chunking with 500 token size and 50 token overlap - Retrieval with top-k=10 then reranking to top-5 - Context injection with proper formatting - Citation generation with source attribution - Hallucination detection via source comparison - Query history tracking for analytics ## Performance Metrics - Embeddings: <100ms per 1K tokens - Vector search: <50ms for similarity search - Document ingestion: <5min for 1000-page documents - Chunking: Efficient recursive and semantic strategies - Cache hit rate: >80% for common queries

…s-context-prompts feat(llm): complete RAG infrastructure with prompts, embeddings, context management

…ueries Issue #412: Compliance and audit logging - Add LLMComplianceLog ORM model to track all LLM interactions - Implement PII detection and automatic redaction with pattern-based detection - Create compliance_logger service with structured logging capabilities - Add audit report endpoint for compliance metrics and statistics - Add export functionality (JSON/CSV) for compliance logs - Integrate logging into all LLM endpoints - Track latency, tokens, user info, and error details Issue #411: Voice interface for LLM queries - Implement speech-to-text (STT) endpoint with multi-language support - Implement text-to-speech (TTS) endpoint with voice synthesis - Support 8+ languages: English, Spanish, French, German, Japanese, Chinese, Portuguese, Korean - Create end-to-end voice query endpoint (STT -> LLM -> TTS) - All endpoints target <2s latency as specified in acceptance criteria - Integrate with compliance logging for voice interactions Closes #412 Closes #411

[LLM] Create golden dataset generation tool

Add Automated Security Dependency Scanning

309 337 community forum e2e tests

…atures feat(llm): implement compliance logging and voice interface

[LLM] Build A/B testing framework for prompts and models

kryputh and others added 30 commits April 25, 2026 23:38

feat: add script to compress node embeddings for smart contract gating

8db40aa

Update pyproject.toml to fix setuptools backend configuration.

3257c60

Merge branch 'main' into feature/temporal-gnn-models

7c5c907

Merge pull request #142 from kryputh/issue-84-compress-embeddings

1da4bc4

feat: add script to compress node embeddings for smart contract gating (#84)

Merge pull request #143 from jaynomyaro/main

d37f02d

docs: Add comprehensive API documentation for AstroML framework

Merge pull request #144 from jaynomyaro/feature/temporal-gnn-models

e7ebe60

Add Temporal GNN Models

Merge pull request #145 from Menjay7/main

a5da16b

Add comprehensive data quality validation framework

Add performance optimization guide and update index documentation

b8cff3a

https://github.com/Traqora/astroml.git

d36fcc2

Merge pull request #147 from Menjay7/jay

fa7c36e

https://github.com/Traqora/astroml.git

feat: add live stellar transaction visualization

7b6fd1d

Implement a real-time transaction stream chart in the loyalty dashboard so incoming Stellar activity is visible immediately, and fix frontend build/test issues required to ship and verify the feature. Made-with: Cursor

Merge upstream/main into fix/121-realtime-stream-visualization

9043d4e

Resolve web merge conflicts by preserving the live Stellar transaction visualization while integrating upstream monitoring and fraud dashboard updates. Made-with: Cursor

Merge pull request #148 from David-patrick-chuks/fix/121-realtime-str…

dd7b00f

…eam-visualization feat(web): add real-time Stellar transaction visualization

#175: Add type-checked Pydantic training config

014445f

Add pre-commit hooks and formatting checks

2648351

Benchmark reproducibility: store random seeds and configs

50a414c

https://github.com/Menjay7/astroml.git

2f667c1

Merge branch 'main' into main

774ef07

Graph builder memory usage spike on large windows

0442961

Merge branch 'main' into fix/validation-checkpoint-paths

3bb6f5b

Merge pull request #215 from anonfedora/fix/validation-checkpoint-paths

df12539

Fix database validation, example paths, checkpoint loading, and type …

Merge pull request #149 from Menjay7/main

9e473b4

Add comprehensive integration tests

Merge branch 'main' into add-pre-commit-hooks-and-formatting-checks

a1b1108

Merge branch 'main' into benchmark-reproducibility-store-random-seeds…

4b02157

…-and-configs

Emmzyemms and others added 30 commits June 27, 2026 15:42

Implement embedding drift detection

1d2e877

Merge branch 'main' into newmain

11ed953

Merge pull request #425 from Emmzyemms/newmain

069d945

Implement embedding drift detection

Build alert prioritization and triage system

8134d52

Merge branch 'main' into prioritization

842ea66

Merge pull request #426 from WHIZAB4TECH/prioritization

7fd807b

Build alert prioritization and triage system

Merge branch 'main' into 309-337-community-forum-e2e-tests

db0a1b9

feat: analytics and RAG system services with clustering

d081960

Merge pull request #355 from Asheeyah23/feat/feedback-collection

da7bbbb

feat(feedback): in-app feedback collection with roadmap & GitHub integration

AI-Driven Analytics & System Resilience Implementation

4ece6be

Merge pull request #428 from ayinde38/issue6

74a37dd

feat: ai-Driven Analytics & System Resilience Implementation

Merge pull request #427 from Yunusabdul38/wave6-issue

7999d49

feat: analytics and RAG system services with clustering

Merge branch 'Traqora:main' into mendy

78bd4ac

https://github.com/Menjay7/astroml.git

93db108

Merge branch 'main' into men

d3101d5

Add LLM feedback and integration quality tests

41a880c

Merge pull request #1 from Johnpii1/codex/create-integration-tests-fo…

dcd2a7a

…r-llm-features Add LLM feedback endpoints, deterministic LLM mocking, and integration tests

Merge pull request #433 from Johnpii1/issue-403

8b5ed68

[LLM] Create LLM feature documentation

Merge pull request #479 from williamedvard/llm-features/rag-embedding…

0ce0bb2

…s-context-prompts feat(llm): complete RAG infrastructure with prompts, embeddings, context management

https://github.com/Menjay7/astroml.git

25809cc

Merge remote changes with A/B testing and golden dataset features

27d775b

Merge pull request #481 from Menjay7/jay

7c36091

[LLM] Create golden dataset generation tool

Merge pull request #429 from Menjay7/mendy

915ae13

Add Automated Security Dependency Scanning

Merge pull request #350 from soma-enyi/309-337-community-forum-e2e-tests

752ddf9

309 337 community forum e2e tests

Merge pull request #480 from silhilston/implement-compliance-voice-fe…

8432397

…atures feat(llm): implement compliance logging and voice interface

Merge branch 'main' into men

106f395

Merge pull request #430 from Menjay7/men

922a7f2

[LLM] Build A/B testing framework for prompts and models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Infra] Dockerize AstroML Environment#1

[Infra] Dockerize AstroML Environment#1
jaynomyaro wants to merge 242 commits into
jaynomyaro:mainfrom
Traqora:main

jaynomyaro commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants