Soochi is an AI-powered content aggregation and processing system designed to collect, process, and analyze content from various sources, primarily using Google Alerts RSS feeds. The system extracts ideas from content, stores them in Pinecone vector database, and syncs them with Notion for easy access and management.
- RSS feed aggregation and content extraction
- AI-powered idea extraction using OpenAI and Google Gemini models
- Vector similarity search with Pinecone
- Notion integration for idea management
- Batch processing support for OpenAI
- Pinecone: Primary source of truth for ideas and similarity detection
- Notion: User-friendly interface for viewing and managing ideas, kept in sync with Pinecone
- MongoDB: Storage for URL metadata and batch job tracking
- Content is collected from RSS feeds
- Ideas are extracted using AI models (OpenAI or Google Gemini)
- Ideas are stored in Pinecone for vector similarity search
- Ideas are simultaneously added to Notion for user-friendly management
- Pinecone is used for similarity detection to avoid duplicate ideas
-
Clone the repository:
git clone https://github.com/abcvivek/soochi.git cd soochi -
Install dependencies using Poetry:
poetry install
-
Set up environment variables: Create a
.envfile in the root directory using the.env.examplefile as a reference.
- dev: Development environment (default)
- prod: Production environment
To switch between environments, set the SOOCHI_ENV environment variable before running your application:
# For development (default)
export SOOCHI_ENV=dev
# For production
export SOOCHI_ENV=prodpython -m soochi.openai_publisherpython -m soochi.openai_subscriberpython -m soochi.gemini_processor- feeds.yaml: Configure your RSS feeds and their enabled/disabled status
- .env: Set environment variables for API keys and logging levels
- .env.prod: Production-specific environment variables
Run the tests using pytest:
pytestContributions are welcome! Please submit a pull request or open an issue for any enhancements or bug fixes.
This project is licensed under the MIT License.