Skip to content

polydbms/sheetreader-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SheetReader

SheetReader is a fast, memory-efficient spreadsheet parser for tabular data from Excel OOXML (.xlsx) files, implemented in C++. Unlike many existing spreadsheet loaders, which rely on general-purpose XML parsers, SheetReader is designed specifically for spreadsheet data. By exploiting the fixed structure of .xlsx files, using parallelism at multiple levels, and managing memory carefully, it avoids unnecessary XML overhead and enables efficient ingestion of spreadsheet data.

Bindings

We provide bindings for several environments:

  • R: load spreadsheets into data frames. Also available on CRAN.
  • Python: load spreadsheets into pandas DataFrames. Also available on PyPI.
  • PostgreSQL FDW: access spreadsheets from PostgreSQL through a foreign data wrapper and combine them with other PostgreSQL tables. Also available on PGXN.
  • DuckDB extension: access spreadsheets from DuckDB and combine them with other DuckDB tables. Also available as a community extension.

Scientific Background

SheetReader was introduced in the PolyDB research project (polydbms.org). The initial design and evaluation was published in the Information Systems Journal. If you use SheetReader in your research, consider citing the following paper:

@article{GavriilidisHZM23,
  author       = {Haralampos Gavriilidis and Felix Henze and Eleni Tzirita Zacharatou and Volker Markl},
  title        = {SheetReader: Efficient Specialized Spreadsheet Parsing},
  journal      = {Inf. Syst.},
  volume       = {115},
  pages        = {102183},
  year         = {2023},
  url          = {https://doi.org/10.1016/j.is.2023.102183}
}

Acknowledgements

SheetReader includes and uses the following C/C++ libraries:

Logo design by Stefanie Lenk.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors