Skip to content
Discussion options

You must be logged in to vote

Yep you can always OCR -> chunk -> annotate chunks with metadata attached to the documents (where they came from, creation timestamp, relationships to other docs captured via eventEffects etc).

On the vision-model side (ala colpali), you can start with single-vector models like https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct [there is also a 2B variant] and pass their embeddings in via CustomSpace in superlinked (or see if you can figure out how to use the ImageSpace to run it within your superlinked server). Of course being single-vector, the quality is reduced - so make sure to get more candidates and provide them to the vision-capable model (e.g. gpt 4o) that you use to ans…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@octalpixel
Comment options

@svonava
Comment options

@octalpixel
Comment options

@octalpixel
Comment options

Answer selected by octalpixel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants