It would be helpful to add a package like [xllamacpp](https://github.com/xorbitsai/xllamacpp) to facilitate local VLM inference rather than being reliant on the Google API. The xllamacpp package supports [Vulkan](https://xorbitsai.github.io/xllamacpp/whl/vulkan) and MPS inference as well as [CUDA](https://xorbitsai.github.io/xllamacpp/whl/cu128). Sample inference code can be reviewed [here](https://github.com/xorbitsai/xllamacpp/blob/f83e5fcd8007f6f2368cc469010fee9d28971863/tests/test_server.py#L295).