Is your feature request related to a problem? Please describe.
Chroma is a popular open-source vector database that's easy to self-host and provides excellent performance. Many developers prefer Chroma for:
- Local development with persistent storage
- Self-hosted deployments without cloud dependencies
- Docker-based deployments
- Python/JavaScript interoperability
Currently, toolpack-knowledge doesn't support Chroma, forcing users to either use in-memory storage (no persistence) or PostgreSQL (requires database setup).
Describe the solution you'd like
Implement ChromaProvider that integrates with Chroma's JavaScript client for vector storage and retrieval.
Implementation Details
1. Create ChromaProvider Class
File: packages/toolpack-knowledge/src/providers/chroma.ts
Implement a provider that:
- Uses the official
chromadb JavaScript client
- Supports connection to local or remote Chroma instances
- Implements collection management
- Handles authentication (token and basic auth)
- Supports multi-tenancy via tenant/database isolation
Configuration Options:
url - Chroma server URL (default: 'http://localhost:8000')
collection - Collection name (required)
tenant - Tenant for multi-tenancy (default: 'default_tenant')
database - Database name (default: 'default_database')
auth - Authentication configuration (optional)
provider: 'token' or 'basic'
credentials: token string or {username, password} object
2. Collection Management
Initialization:
- Initialize ChromaClient with URL and auth config
- Try to get existing collection
- If collection doesn't exist, create it with cosine similarity
- Configure HNSW index parameters
Collection Configuration:
- Use cosine distance for similarity metric
- Set appropriate HNSW parameters for performance
3. Core Methods Implementation
validateDimensions():
- Get collection count to check if it has data
- If data exists, fetch a sample vector to check dimensions
- Compare with embedder dimensions
- Throw
DimensionMismatchError if mismatch detected
- Store dimensions for future validation
add():
- Extract IDs, embeddings, metadatas, and documents from chunks
- Validate that all chunks have vectors
- Use Chroma's
upsert() method (automatic upsert behavior)
- Handle errors with
KnowledgeProviderError
query():
- Build where clause for metadata filtering
- Use Chroma's
query() method with query embeddings
- Specify nResults (limit)
- Include documents, metadatas, distances, and optionally embeddings
- Convert Chroma's distance to similarity score (1 - distance)
- Filter results by threshold
- Return formatted QueryResult array
delete():
- Use Chroma's
delete() method with array of IDs
- Handle empty arrays gracefully
clear():
- Delete the entire collection
- Recreate it with same configuration
- Reset stored dimensions
getAllChunks():
- Use Chroma's
get() method to fetch all data
- Include documents, metadatas, and embeddings
- Map results to Chunk format
close():
- Clean up resources (Chroma client doesn't require explicit cleanup)
- Set collection reference to null
4. Metadata Filtering
Implement buildWhereClause() helper to convert MetadataFilter to Chroma format:
- Simple equality:
{key: value}
- $in operator:
{key: {$in: [values]}}
- $gt operator:
{key: {$gt: value}}
- $lt operator:
{key: {$lt: value}}
- Chroma natively supports these operators
5. Multi-Tenancy Support
Leverage Chroma's built-in multi-tenancy:
- Use
tenant parameter to isolate data by user/organization
- Use
database parameter for additional isolation
- Same collection name can exist in different tenants
- Enables efficient multi-user deployments
6. Export from Index
File: packages/toolpack-knowledge/src/index.ts
Add exports for ChromaProvider and ChromaProviderOptions.
7. Add Dependencies
File: packages/toolpack-knowledge/package.json
Add dependency:
chromadb - Official Chroma JavaScript client
8. Testing
File: packages/toolpack-knowledge/src/__tests__/chroma-provider.test.ts
Create comprehensive tests (requires running Chroma instance):
- Add and retrieve chunks - Basic CRUD operations
- Metadata filtering - Test equality, $in, $gt, $lt operators
- Dimension validation - Test mismatch detection
- Delete operations - Test chunk deletion
- Upsert behavior - Test updating existing chunks (automatic in Chroma)
- Tenant isolation - Test multi-tenancy with same collection name
- Error handling - Test failure scenarios
Use environment variable CHROMA_URL for test server connection.
Skip tests if Chroma not available.
9. Documentation
File: packages/toolpack-knowledge/README.md
Add section on Chroma setup:
- Docker setup instructions
- Local installation guide
- Connection configuration examples
- Multi-tenancy patterns
- Authentication setup
10. Docker Compose Example
File: packages/toolpack-knowledge/examples/docker-compose.yml
Provide Docker Compose configuration for:
- Chroma service with persistent volume
- Health check configuration
- Port mapping
- Environment variables
- Volume management
Acceptance Criteria
Describe alternatives you've considered
- Chroma Python client with bridge: Use Python client via child process - rejected because it adds complexity and latency
- REST API directly: Implement REST calls without SDK - rejected because the official SDK is well-maintained
- Embedded Chroma: Run Chroma in-process - not available in JavaScript
Additional context
- Chroma is designed for ease of use and developer experience
- HNSW index provides excellent performance
- Native multi-tenancy support via tenant/database isolation
- Docker deployment is straightforward
- Good choice for self-hosted deployments without cloud dependencies
- Interoperable with Python applications using Chroma
Dependencies:
chromadb - Official Chroma JavaScript client
- Requires Chroma server running (Docker or local)
Related Issues:
- Complements
PgVectorProvider for different deployment scenarios
- Provides alternative to cloud-based vector databases
- Enables easy local development with persistence
Is your feature request related to a problem? Please describe.
Chroma is a popular open-source vector database that's easy to self-host and provides excellent performance. Many developers prefer Chroma for:
Currently,
toolpack-knowledgedoesn't support Chroma, forcing users to either use in-memory storage (no persistence) or PostgreSQL (requires database setup).Describe the solution you'd like
Implement
ChromaProviderthat integrates with Chroma's JavaScript client for vector storage and retrieval.Implementation Details
1. Create ChromaProvider Class
File:
packages/toolpack-knowledge/src/providers/chroma.tsImplement a provider that:
chromadbJavaScript clientConfiguration Options:
url- Chroma server URL (default: 'http://localhost:8000')collection- Collection name (required)tenant- Tenant for multi-tenancy (default: 'default_tenant')database- Database name (default: 'default_database')auth- Authentication configuration (optional)provider: 'token' or 'basic'credentials: token string or {username, password} object2. Collection Management
Initialization:
Collection Configuration:
3. Core Methods Implementation
validateDimensions():
DimensionMismatchErrorif mismatch detectedadd():
upsert()method (automatic upsert behavior)KnowledgeProviderErrorquery():
query()method with query embeddingsdelete():
delete()method with array of IDsclear():
getAllChunks():
get()method to fetch all dataclose():
4. Metadata Filtering
Implement
buildWhereClause()helper to convertMetadataFilterto Chroma format:{key: value}{key: {$in: [values]}}{key: {$gt: value}}{key: {$lt: value}}5. Multi-Tenancy Support
Leverage Chroma's built-in multi-tenancy:
tenantparameter to isolate data by user/organizationdatabaseparameter for additional isolation6. Export from Index
File:
packages/toolpack-knowledge/src/index.tsAdd exports for
ChromaProviderandChromaProviderOptions.7. Add Dependencies
File:
packages/toolpack-knowledge/package.jsonAdd dependency:
chromadb- Official Chroma JavaScript client8. Testing
File:
packages/toolpack-knowledge/src/__tests__/chroma-provider.test.tsCreate comprehensive tests (requires running Chroma instance):
Use environment variable
CHROMA_URLfor test server connection.Skip tests if Chroma not available.
9. Documentation
File:
packages/toolpack-knowledge/README.mdAdd section on Chroma setup:
10. Docker Compose Example
File:
packages/toolpack-knowledge/examples/docker-compose.ymlProvide Docker Compose configuration for:
Acceptance Criteria
ChromaProviderimplements allKnowledgeProvidermethods$in,$gt,$ltoperatorsDescribe alternatives you've considered
Additional context
Dependencies:
chromadb- Official Chroma JavaScript clientRelated Issues:
PgVectorProviderfor different deployment scenarios