Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,3 +208,8 @@ You can now use these pids to see which documents match the best against your qu
julia> print(readlines("1kcollection.txt")[pids[1]])
Tl;dr - Yes, it sounds like a possible 1080 fox bait poisoning. Can't be sure though. The traditional fox bait is called 1080. That poisonous bait is still used in a few countries to kill foxes, rabbits, possums and other mammal pests. The toxin in 1080 is Sodium fluoroacetate. Wikipedia is a bit vague on symptoms in animals, but for humans they say: In humans, the symptoms of poisoning normally appear between 30 minutes and three hours after exposure. Initial symptoms typically include nausea, vomiting and abdominal pain; sweating, confusion and agitation follow. In significant poisoning, cardiac abnormalities including tachycardia or bradycardia, hypotension and ECG changes develop. Neurological effects include muscle twitching and seizures... One might safely assume a dog, especially a small Whippet, would show symptoms of poisoning faster than the 30 mins stated for humans. The listed (human) symptoms look like a good fit to what your neighbour reported about your dog. Strychnine is another commonly used poison against mammal pests. It affects the animal's muscles so that contracted muscles can no longer relax. That means the muscles responsible of breathing cease to operate and the animal suffocates to death in less than two hours. This sounds like unlikely case with your dog. One possibility is unintentional pet poisoning by snail/slug baits. These baits are meant to control a population of snails and slugs in a garden. Because the pelletized bait looks a lot like dry food made for dogs it is easily one of the most common causes of unintentional poisoning of dogs. The toxin in these baits is Metaldehyde and a dog may die inside four hours of ingesting these baits, which sounds like too slow to explain what happened to your dog, even though the symptoms of this toxin are somewhat similar to your case. Then again, the malicious use of poisons against neighbourhood dogs can vary a lot. In fact they don't end with just pesticides but also other harmful matter, like medicine made for humans and even razorblades stuck inside a meatball, have been found in baits. It is quite impossible to say what might have caused the death of your dog, at least without autopsy and toxicology tests. The 1080 is just one of the possible explanations. It is best to always use a leash when walking dogs in populated areas and only let dogs free (when allowed by local legislation) in unpopulated parks and forests and suchlike places.
```
---

## Tutorials

- [Basic Retrieval Example](tutorials/basic_retrieval.md)
38 changes: 38 additions & 0 deletions docs/src/tutorials/basic_retrieval.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Basic Retrieval Example

This tutorial demonstrates how to use ColBERT.jl for simple document retrieval.

---

## Step 1: Prepare Dataset

The dataset should be in TSV format:

doc_id \t title \t body

Example:

1 Deep Learning Neural networks are powerful
2 Machine Learning Supervised learning is common

---

## Step 2: Build Index

```julia
using ColBERT

config = ColBERTConfig(
collection="sample.tsv",
index_path="index"
)

indexer = Indexer(config)
index(indexer)

## Step 3: Retrieval

After building the index, it can be used for efficient document retrieval.

The querying interface may vary depending on the current implementation.
Users can refer to the latest examples in the repository for performing search.
24 changes: 22 additions & 2 deletions src/indexing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,32 @@ Type representing an ColBERT indexer.
An [`Indexer`] wrapping a [`ColBERTConfig`](@ref) along with the trained ColBERT
model.
"""
function parse_tsv_line(line::String)
parts = split(line, '\t')

# Skip invalid lines
if length(parts) < 2
return nothing
end

# Combine title + body + extra fields
return strip(join(parts[2:end], " "))
end
function Indexer(config::ColBERTConfig)
tokenizer, bert, linear = load_hgf_pretrained_local(config.checkpoint)
bert = bert |> Flux.gpu
linear = linear |> Flux.gpu
collection = config.collection isa String ? readlines(config.collection) :
config.collection
collection =
if config.collection isa String
lines = readlines(config.collection)

[
doc for doc in (parse_tsv_line(line) for line in lines)
if doc !== nothing
]
else
config.collection
end
punctuations_and_padsym = [string.(collect("!\"#\$%&\'()*+,-./:;<=>?@[\\]^_`{|}~"));
tokenizer.padsym]
skiplist = config.mask_punctuation ?
Expand Down