EKBA is a RAG style chatbox application, based on open source OPEA project, to provide enterprise level QnA service with specified private data as the context.
Chatbox is one of the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLMs). The retrieval augmented generation (RAG) architecture is quickly becoming the industry standard for chatbox development. It combines the benefits of a knowledge base (via a vector store) and generative models to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge.
RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. The core of this architecture are vector databases, which are instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity.
This guide describes the deployment process for setting up the complete system, which consists of two steps:
- LLM Inference Service: Provides the language model capabilities for generating responses
- Knowledge Base Services: Handles document processing, storage, and retrieval
Note: By default, EKBA is configured with the knowledge base running on CPU, while LLM inference is deployed on HPU (Habana Gaudi). If you need to use different hardware configurations, please modify the corresponding Docker or Helm configuration files and images accordingly.
Deploy the LLM inference service which by default runs on Intel Gaudi HPU. For detailed deployment instructions, please refer to deployment/llm-serving/README.md. But in most cases, the LLM inferenc serving is ready to be used, so user need only to configure the endpoint of it.
Deploy the knowledge base services which include all the components needed for document processing, storage, and retrieval. You can choose either Docker Compose (deployment/docker-compose/README.md) or Helm deployment (deployment/helm-charts/README.md).
To start using the system, first ingest your documents using the dataprep service:
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@<you-file-absolute-path>" \
-F "collection_name=<your-collection-name>" \
http://<dataprep-service-ip>:<dataprep-service-port>/v1/dataprepReplace the following placeholders:
<your-file-absolute-path>: File's absolute path you want to ingest<your-collection-name>: Name for your document collection<dataprep-service-ip>: IP address of your dataprep service<dataprep-service-port>: Port number of your dataprep service
For more details about dataprep usage, please refer to src/comps/dataprep/README.md.
You can interact with the EKBA (Enterprise Knowledge Base Assistant) system in two ways:
Access the system through your web browser:
http://<ui-service-ip>:<ui-port> # Default port is 5174, may vary based on your deployment configuration
curl http://<host-ip>:8888/v1/chatqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the revenue of Nike in 2023?",
"k": 10,
"score_threshold": 0.5,
"top_n": 3,
"max_tokens": 16384
}'Get the service IP and port for the chatqna service:
# Get chatqna backen service details
export SERVICE_IP=$(kubectl get svc ekba-chatqna -n <your-namespace> -o jsonpath='{.spec.clusterIP}')
export SERVICE_PORT=$(kubectl get svc ekba-chatqna -n <your-namespace> -o jsonpath='{.spec.ports[0].port}')
# Use the service endpoint
curl http://${SERVICE_IP}:${SERVICE_PORT}/v1/chatqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is deep learning?",
"k": 10,
"score_threshold": 0.5,
"top_n": 3,
"max_tokens": 16384
}'API Parameters:
messages: Your questionk: Number of initial search results (default: 10)score_threshold: Minimum similarity score threshold (default: 0.5)top_n: Number of final results to return (default: 3)max_tokens: Maximum number of tokens in the response (default: 0)
For detailed API documentation of specific services, refer to their respective README files:
Each service's documentation includes detailed API specifications and usage examples.
To assist with troubleshooting, you can enable detailed logging for all individual services (dataprep, embedding, reranking, retriever, and llm services) by setting LOG_LEVEL=DEBUG before deployment:
- For Docker Compose deployment: Set in
set-env.shor.envfile under the working dir - For Helm deployment: Set in
ekba-values.yamlfile
Under the directory "scripts/tester", the developer can run docker-compose up to run the test-case runer tool, to confirm whether each EKBA service can work well. The test cases are defined in tests.json at the same dir.