layout-aware-doc-segmentation

Code and resources for the article “Enhancing Technical Question Answering Quality through Multimodal Document Segmentation”.

Document Segmentation Module

This module provides functionality for analyzing the structure of documents (images and PDFs) using computer vision and natural language processing. The main classes are LayoutExtractor (structure analysis) and ImageDescription (document element description).

Main Components

1. LayoutExtractor (`doc_layout.py`)

A class for extracting structural elements from documents and processing them.

Core Functionality:

Detecting bounding boxes of document elements using YOLOv10
Processing PDFs and images
Merging duplicate and overlapping bounding boxes
Linking related elements (e.g., images with captions)
Encoding images in base64

Key Methods:

get_bboxes() — main method for retrieving bounding boxes
merge_duplicated() — merges duplicate bounding boxes
_find_closest_bboxes() — finds related elements (e.g., image–caption pairs)
_merge_related_bboxes() — merges related elements

Supported Element Types:

Titles, body text
Images, tables, formulas
Captions for images/tables/formulas
Table footnotes

2. ImageDescription (`doc_ocr.py`)

A class for describing document elements using language models.

Core Functionality:

Generating text descriptions of document elements
Recognizing text within bounding boxes
Supporting both API mode (via OpenAI) and local models (transformers)
Handling both individual bounding boxes and arbitrary regions

Key Methods:

inference() — main method for obtaining an element description
_parse_json() — post-processes model output

Usage Example

See run.py and /exasmple/segmentation_example.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
example		example
src		src
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

layout-aware-doc-segmentation

Document Segmentation Module

Main Components

1. LayoutExtractor (`doc_layout.py`)

Core Functionality:

Key Methods:

Supported Element Types:

2. ImageDescription (`doc_ocr.py`)

Core Functionality:

Key Methods:

Usage Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

layout-aware-doc-segmentation

Document Segmentation Module

Main Components

1. LayoutExtractor (doc_layout.py)

Core Functionality:

Key Methods:

Supported Element Types:

2. ImageDescription (doc_ocr.py)

Core Functionality:

Key Methods:

Usage Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. LayoutExtractor (`doc_layout.py`)

2. ImageDescription (`doc_ocr.py`)

Packages