-
Notifications
You must be signed in to change notification settings - Fork 2
FAQ
openpyxl is excellent and battle-tested. Use it if you need features we don't support yet (charts, images, conditional formatting, etc.). Use OpenSheet Core when:
- You're working with large files and need low memory usage (2.5x less memory than openpyxl for reads)
- You want faster read/write performance (up to 14x faster reads, 1.6x faster writes)
- You want zero Python dependencies
- You only need the core read/write/formula/date/styling workflow
No, and there are no plans to. .xls is a proprietary binary format from pre-2007 Excel. .xlsx is the modern open XML standard. If you need .xls support, use xlrd.
No. CSV is a different format entirely. Use Python's built-in csv module or pandas.read_csv().
Not currently. OpenSheet Core is designed for streaming reads and writes. To modify an existing file, read it, make changes in Python, and write a new file. In-place editing would require a DOM model which conflicts with the streaming architecture.
Excel stores all numbers as IEEE 754 doubles internally. OpenSheet Core returns int when the value has no fractional part (e.g., 42.0 → 42) and float otherwise. If you need strict integer types, cast after reading.
Excel stores dates as serial numbers (days since 1900-01-01). OpenSheet Core detects date-formatted cells using the style information in the file and automatically converts them to datetime.date or datetime.datetime Python objects.
Empty cells are returned as None.
Each sheet dict contains:
| Key | Type | Description |
|---|---|---|
"name" |
str |
Sheet name |
"rows" |
list[list] |
Row data with typed Python values |
"merges" |
list[str] |
Merged cell ranges (e.g. ["A1:C1"]) |
"column_widths" |
dict[int, float] |
0-based column index to width in character units |
"row_heights" |
dict[int, float] |
0-based row index to height in points |
"freeze_pane" |
tuple[int, int] or None
|
(rows_frozen, cols_frozen) |
"auto_filter" |
str or None
|
Auto-filter range (e.g. "A1:C3") |
"state" |
str |
Sheet visibility ("visible", "hidden", or "veryHidden") |
"comments" |
list[dict] |
Cell comments with "cell", "author", "text"
|
"hyperlinks" |
list[dict] |
Cell hyperlinks with "cell", "url", "tooltip"
|
"protection" |
dict or None
|
Sheet protection settings |
"tables" |
list[dict] |
Structured table definitions |
Use defined_names() to read and XlsxWriter.define_name() to write:
from opensheet_core import XlsxWriter, defined_names
# Write
with XlsxWriter("output.xlsx") as writer:
writer.add_sheet("Data")
writer.write_row([100])
writer.define_name("Total", "Data!$A$1") # Workbook-scoped
writer.define_name("LocalTotal", "Data!$A$1", sheet_index=0) # Sheet-scoped
# Read
for n in defined_names("output.xlsx"):
print(n["name"], n["value"], n["sheet_index"])When reading, merged cell ranges are returned in the sheet's "merges" list (e.g., ["A1:C1"]). The merged cell's value appears in the top-left cell; other cells in the range are None.
Constant memory regardless of file size. The writer streams data directly to the ZIP archive. Writing 10 rows uses the same memory as writing 10 million rows.
Yes. Call add_sheet("Name") to create a new sheet. Rows are always written to the most recently added sheet.
with XlsxWriter("output.xlsx") as writer:
writer.add_sheet("Sheet1")
writer.write_row(["data for sheet 1"])
writer.add_sheet("Sheet2")
writer.write_row(["data for sheet 2"])| Python type | Excel cell type |
|---|---|
str |
Text |
int, float
|
Number |
bool |
Boolean |
datetime.date |
Date (with date format) |
datetime.datetime |
DateTime (with datetime format) |
Formula(...) |
Formula |
FormattedCell(value, fmt) |
Number with custom format (currency, percentage, etc.) |
StyledCell(value, style) |
Any value with font/fill/border/alignment styling |
None |
Empty cell |
Yes. Install with pip install opensheet-core[pandas] and use read_xlsx_df() to read into DataFrames and to_xlsx() to write DataFrames to XLSX. Pandas is optional — the core library works without it.
from opensheet_core import read_xlsx_df, to_xlsx
df = read_xlsx_df("data.xlsx")
to_xlsx(df, "output.xlsx")Use xlsx_to_markdown() to convert sheets into structured markdown tables that LLMs understand well:
from opensheet_core import xlsx_to_markdown
md = xlsx_to_markdown("data.xlsx")
prompt = f"Analyze this data:\n\n{md}\n\nWhat trends do you see?"Use xlsx_to_chunks() to split large spreadsheets into embedding-sized pieces. Each chunk is a self-contained markdown table with the header row repeated:
from opensheet_core import xlsx_to_chunks
chunks = xlsx_to_chunks("large_dataset.xlsx", max_rows=50)
# Feed each chunk to your embedding model / vector storeOr use the built-in LangChain / LlamaIndex integrations directly:
# LangChain
from opensheet_core.langchain import OpenSheetLoader
docs = OpenSheetLoader("data.xlsx", mode="chunks").load()
# LlamaIndex
from opensheet_core.llamaindex import OpenSheetReader
docs = OpenSheetReader(mode="chunks").load_data("data.xlsx")No. The core extraction functions (xlsx_to_markdown, xlsx_to_text, xlsx_to_chunks) have zero dependencies beyond opensheet-core itself. The LangChain and LlamaIndex integrations are optional — they only require their respective libraries when you import them.
Benchmarks use a 100,000-row x 10-column dataset (1M cells) with mixed types. Run them yourself:
python benchmarks/benchmark.pyFive reasons:
- Rust — compiled native code vs. interpreted Python
- Streaming — constant memory for writes, no DOM tree construction
- Deferred shared-string resolution — strings are stored as indices during parsing and only converted to Python objects at the Rust/Python boundary, avoiding O(N) string clones
- Pre-interned Python strings — the shared string table is converted once; repeated strings reuse the same Python object via reference counting instead of allocating new copies
- Minimal overhead — only two dependencies (quick-xml, zip), no abstraction layers
No. Prebuilt wheels are available on PyPI for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x64). Just pip install opensheet-core. You only need Rust if building from source.
Python 3.9 through 3.13.
Yes. Native aarch64 wheels are published for macOS.
Home | AI/RAG | Migration Guide | Architecture | FAQ | GitHub | PyPI
Built with Rust and PyO3 | Licensed under MIT