Skip to content

ab-dx/pathway-sample-ps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pathway Sample PS Submission

Task 1: Dockerising a mock Pathway Application

  • Here, I've setup a simple RAG based application using Pathway's xpack-llm tooling.
  • The application can retrieve information related to this year's Pathway PS based on the user's query.
  • It exposes a HTTP POST endpoint where a user can ask a question regarding the Pathway PS and receive a response grounded in the truth of the knowledge base.
  • The application lives in the src/ directory which also hosts its Dockerfile
  • The Dockerfile is built into a Docker container under the name/tag of pw-rag (expects tag provided at build time)
docker build -t pw-rag .
  • This application can simply be run with
docker run -it -p 8011:8011 pw-rag

(port 8011 forwarded from the container to the host here)

Bonus

  • I've also created an NGINX service (expects tag nginx-pathway at build time) which aims to load balance multiple requests among 3 containers running the Pathway application following the defaut Round-Robin strategy.
  • All the containers, along with their appropriate port forwarding, can be run using Docker Compose after the containers are appropriately built.
docker-compose up
  • Additionally, Kubernetes manifests have also been created (using kompose) so that all the containers can be deployed on a Kubernetes cluster which can scale according to application needs, or following any other scaling policy.
  • Load balancing across multiple containers running on a distributed Kubernetes cluster makes this a robust and scalable deployment strategy.

Load Balancing Architecture

Task 3: Smart KYC Checker

Overview

  • The application accepts images of documents for the KYC process.
  • It validates the information extracted across all the documents, performing fraud detection.

Capabilities

  • Can keep track of separate contexts for different users via the user_idx
  • Validates documents provided against an incrementally-built store of user information (expects Aadhar and Pan card but can be extended to as many documents of choice)
  • Provides whether documents match, exact fields which resulted in the mismatch, level of fraud risk at stake, interpretable reasons for rejections.

Technical Details

  • The application is an agentic workflow built in Langgraph.
  • The workflow first aims to preprocess the image for further information extraction.
  • Textual information is then extracted. Here, both implementations of OCR based text extraction and Multimodal LLM based text extraction are available. (Defaults to Gemini's multi-modal capabilities for extracting text as Tesseract OCR performs poorly on my machine)
  • Extracted information is then parsed into a standardized UserProfile entity using structured output calls from an LLM.
  • The profile is compared with the previously extracted user details in the long-term memory store (if previously extracted details not available, it is initialized into the store at this step).
  • A ValidationResult object is returned which contains details about each field match with the current document.

Validation Architecture

Task 4: AI-powered Financial Customer Support

Overview

  • The application can take in user input as a string and display its response accordingly, as per the task's requirements.
  • The application also stores important user details, preferences and interests, in turn, improving and personalizing its responses over time.
  • Multi turn conversations supported, where history is remembered.
  • The agent only answers financial and customer related queries, designed to not go off-topic.
  • The agent can autonomously query its knowledge base, the internet or other stock-based tools.

Capabilities

  • The application has an extensive knowledge base (several articles scraped from SEBI, RBI, ICICI, FinMin, HDFC, Nippon India, AMFI India, Axis Bank, PayTM, TinNSDL, NSE India, BSE India, mca.gov.in, Motilal Oswal, Kotak Securities, Deloitte and Economic Times; should be useful for the actual PS itself)for Agentic Retrieval Augmented Generation.
  • The answers are always grounded in the sources.
  • If required, the internet can also be searched for queries.
  • Yahoo Finance prebuilt tools provided to the agent as well for retrieving stock data, balance sheets, etc.
  • The agent can also connect to external MCP tooling. (currently commented out as Alpha Vantage's MCP server is returning 500s)
  • Can maintain separate contexts for different users via thread configuration.

Technical Details

  • It is essentially a Langgraph application with an agentic workflow.
  • Built with an async-first approach, which can be extended to handle real-time events as per Pathway's original requirement.
  • The graph hosts its internal state, an InMemorySaver checkpointer for short-term memory and an InMemoryStore for long-term memory (as mentioned in the bonus points)
  • The InMemoryStore can easily be swapped out for a persistent option like a PostgreSQLStore for retaining user details throughout different conversations.
  • The same holds for the InMemorySaver which can be replaced with a cache-based solution like Redis.
  • The graph consists of a node for initial inference, understanding and research, and another node for interpretting this research in the context of the query, staying within the context of finance customer support, and replying to the query.
  • Makes use of Foundational Model based agents, bound with tools.
  • I preferred the usage of a agentic graph-based workflow here over a ReAct agent for more deterministic behaviour and potential extensibility options down the line.
  • The application makes use of modern AI application patterns such as MCP tooling, agentic RAG, short-term and long-term memory, guardrailed within the context of finance.

Agent Architecture

About

Preliminary tasks pertaining to AI Agents implementations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors