feat: infrastructure for both local GPU setup and Modal by ArnavBharti · Pull Request #11 · ScaledFocus/masala-embed

ArnavBharti · 2025-09-28T12:09:21Z

This has two scripts vllm_inference_modal.py and vllm_inference_local.py.

The Modal scripts sets up the vLLM server on A100 40GB (configurable). The local script is works on NVIDIA RTX 6000 Ada with CUDA version 12.4 (BITS Server).

Both scripts were tested with Qwen/Qwen3-Embedding-0.6B.

Modal deployment was tested using examples in README.

First time setup involves setting up Modal Volumes and caching. Subsequent deployments are fast. HuggingFace model is fixed to a commit rather than main.

After volumes are cached, response time on Modal:

commented out os.environ["CUDA_VISIBLE_DEVICES"] = "0" line.

…nto infra

ArnavBharti · 2025-10-18T09:36:10Z

This PR now contains code for the entire program flow.
It accepts JSON request -> converts to embeddings -> similarity search using FAISS -> respond with dish name in JSON format.

Tested on both Modal and GPU server.
Right now this runs Qwen 0.6B embedding model which is text only.

ArnavBharti added 2 commits September 28, 2025 17:32

feat: add vllm inference for both modal and local server

9c280b4

fix: linting error

84a4291

ArnavBharti requested a review from NirantK September 28, 2025 12:11

vedjaw and others added 9 commits October 2, 2025 22:18

changed model_name in vllm_inference_local.py

6c86453

Update vllm_inference_local.py

e6d49c1

commented out os.environ["CUDA_VISIBLE_DEVICES"] = "0" line.

commented a single line

1785303

Merge branch 'infra' of https://github.com/ScaledFocus/masala-embed i…

bd5c837

…nto infra

chore: lint error fix

0d64ab3

Local GPU full workflow

03efd14

refactor: local setup and convert it to modal

54802a9

fix: off by one error in index

b1b7abc

chore: lint fixes

04cd225

ArnavBharti changed the title ~~feat: add vllm inference for both modal and local server~~ feat: infrastructure for both local GPU setup and Modal Oct 18, 2025

ArnavBharti closed this Nov 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: infrastructure for both local GPU setup and Modal#11

feat: infrastructure for both local GPU setup and Modal#11
ArnavBharti wants to merge 11 commits intomainfrom
infra

ArnavBharti commented Sep 28, 2025 •

edited

Loading

Uh oh!

ArnavBharti commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArnavBharti commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArnavBharti commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ArnavBharti commented Sep 28, 2025 •

edited

Loading