SheetReader

SheetReader is a fast, memory-efficient spreadsheet parser for tabular data from Excel OOXML (.xlsx) files, implemented in C++. Unlike many existing spreadsheet loaders, which rely on general-purpose XML parsers, SheetReader is designed specifically for spreadsheet data. By exploiting the fixed structure of .xlsx files, using parallelism at multiple levels, and managing memory carefully, it avoids unnecessary XML overhead and enables efficient ingestion of spreadsheet data.

Bindings

We provide bindings for several environments:

R: load spreadsheets into data frames. Also available on CRAN.
Python: load spreadsheets into pandas DataFrames. Also available on PyPI.
PostgreSQL FDW: access spreadsheets from PostgreSQL through a foreign data wrapper and combine them with other PostgreSQL tables. Also available on PGXN.
DuckDB extension: access spreadsheets from DuckDB and combine them with other DuckDB tables. Also available as a community extension.

Scientific Background

SheetReader was introduced in the PolyDB research project (polydbms.org). The initial design and evaluation was published in the Information Systems Journal. If you use SheetReader in your research, consider citing the following paper:

@article{GavriilidisHZM23,
  author       = {Haralampos Gavriilidis and Felix Henze and Eleni Tzirita Zacharatou and Volker Markl},
  title        = {SheetReader: Efficient Specialized Spreadsheet Parsing},
  journal      = {Inf. Syst.},
  volume       = {115},
  pages        = {102183},
  year         = {2023},
  url          = {https://doi.org/10.1016/j.is.2023.102183}
}

Acknowledgements

SheetReader includes and uses the following C/C++ libraries:

miniz for ZIP archive operations and decompression
libdeflate for optimized full-buffer decompression
fast_double_parser for optimized number parsing

Logo design by Stefanie Lenk.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
LICENSE		LICENSE
README.md		README.md
logo-sheetreader.png		logo-sheetreader.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SheetReader

Bindings

Scientific Background

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SheetReader

Bindings

Scientific Background

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages