scrabble-reading

This project is an exploration of digitising Scrabble boards with SAM 3 and Perception Encoder.

The code is not complete, but is a starting point for thinking about how to use zero-shot models for columnar/row-wise OCR.

Here is an example showing an input image with the SAM 3 segmentation masks and Perception Encoder masks laid on top:

The notebook processes the detections to return a text version of the game board:

BOARD:

- - - - - - - - - - b r i n e
t * j - - - - - - z e e - - -
- m e h - - - - r - n - - - -
v i t a - - - w e n t - b - -
- - - y e a - i - - s k i d -
- - l - t - - r - - - - - e f
- - - - c - - e - - - - g e l
c - x - h a n d y - a - - - i
o - - - e - - - - - u - - - p
l - - - r - - - n o d - q i s
t a p a s - - - - - i - - - -
s - i - - - - v - r o a m - -
- - n o g - - o - t - d a g -
- - - f - - - l i - - - w - -
- - - - r o u e - - - - - - -

STATS:

# of unique words: 37
len() of longest words: 7
longest word(s): etchers

len() of shortest words: 2
shortest word(s): li, ef, of, rt, bi, re, ad

How it Works

This project:

Uses SAM 3 to find the Scrabble game board in an image.
Crops the game board.
Uses SAM 3 to find all Scabble letters.
Crops all letters.
Uses Perception Encoder to classify each letter.
Maps each letter classification to its corresponding segmentation mask.
Uses the bounding boxes corresponding to each segmentation mask to create a 15x15 grid.

Limitations

This project does not employ perspective correction. Instead, my notebook tries to manually calculate how many rows there are and divide the board up into rows. This leads to several potential errors in an image whose perspective is not straight. I may experiment with perspective correction in the future, but I was certainly curious about how far I could get just by assuming that tiles are all the same height in a birds-eye image.

This project was designed to process birds-eye views of Scrabble boards. Angled images or images with missing rows or columns will not work.

I think SAM 3 can only return 100 detections at one time. This means that if there are more than 100 tiles on a board, some tiles will be missed.

API key

This project needs a Roboflow API key to work.

License

This repository is licensed under an MIT license. See the SAM 3 repository and Perception Encoder repository for their respective licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
example.png		example.png
scrabble.ipynb		scrabble.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrabble-reading

How it Works

Limitations

API key

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scrabble-reading

How it Works

Limitations

API key

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages