Skip to content
Nader Bennour edited this page Mar 31, 2026 · 9 revisions

FAQ

General

Why not just use openpyxl?

openpyxl is excellent and battle-tested. Use it if you need features we don't support yet (charts, images, conditional formatting, etc.). Use OpenSheet Core when:

  • You're working with large files and need low memory usage (2.5x less memory than openpyxl for reads)
  • You want faster read/write performance (up to 14x faster reads, 1.6x faster writes)
  • You want zero Python dependencies
  • You only need the core read/write/formula/date/styling workflow

Does it support .xls (old Excel format)?

No, and there are no plans to. .xls is a proprietary binary format from pre-2007 Excel. .xlsx is the modern open XML standard. If you need .xls support, use xlrd.

Does it support .csv?

No. CSV is a different format entirely. Use Python's built-in csv module or pandas.read_csv().

Can I edit an existing XLSX file in place?

Not currently. OpenSheet Core is designed for streaming reads and writes. To modify an existing file, read it, make changes in Python, and write a new file. In-place editing would require a DOM model which conflicts with the streaming architecture.

Reading

Why are some numbers returned as floats instead of ints?

Excel stores all numbers as IEEE 754 doubles internally. OpenSheet Core returns int when the value has no fractional part (e.g., 42.042) and float otherwise. If you need strict integer types, cast after reading.

How are dates handled?

Excel stores dates as serial numbers (days since 1900-01-01). OpenSheet Core detects date-formatted cells using the style information in the file and automatically converts them to datetime.date or datetime.datetime Python objects.

What happens with empty cells?

Empty cells are returned as None.

What does read_xlsx() return?

Each sheet dict contains:

Key Type Description
"name" str Sheet name
"rows" list[list] Row data with typed Python values
"merges" list[str] Merged cell ranges (e.g. ["A1:C1"])
"column_widths" dict[int, float] 0-based column index to width in character units
"row_heights" dict[int, float] 0-based row index to height in points
"freeze_pane" tuple[int, int] or None (rows_frozen, cols_frozen)
"auto_filter" str or None Auto-filter range (e.g. "A1:C3")
"state" str Sheet visibility ("visible", "hidden", or "veryHidden")
"comments" list[dict] Cell comments with "cell", "author", "text"
"hyperlinks" list[dict] Cell hyperlinks with "cell", "url", "tooltip"
"protection" dict or None Sheet protection settings
"tables" list[dict] Structured table definitions

How do I read or write named ranges?

Use defined_names() to read and XlsxWriter.define_name() to write:

from opensheet_core import XlsxWriter, defined_names

# Write
with XlsxWriter("output.xlsx") as writer:
    writer.add_sheet("Data")
    writer.write_row([100])
    writer.define_name("Total", "Data!$A$1")                    # Workbook-scoped
    writer.define_name("LocalTotal", "Data!$A$1", sheet_index=0) # Sheet-scoped

# Read
for n in defined_names("output.xlsx"):
    print(n["name"], n["value"], n["sheet_index"])

What about merged cells?

When reading, merged cell ranges are returned in the sheet's "merges" list (e.g., ["A1:C1"]). The merged cell's value appears in the top-left cell; other cells in the range are None.

Writing

How much memory does the writer use?

Constant memory regardless of file size. The writer streams data directly to the ZIP archive. Writing 10 rows uses the same memory as writing 10 million rows.

Can I write to multiple sheets?

Yes. Call add_sheet("Name") to create a new sheet. Rows are always written to the most recently added sheet.

with XlsxWriter("output.xlsx") as writer:
    writer.add_sheet("Sheet1")
    writer.write_row(["data for sheet 1"])

    writer.add_sheet("Sheet2")
    writer.write_row(["data for sheet 2"])

What Python types can I write?

Python type Excel cell type
str Text
int, float Number
bool Boolean
datetime.date Date (with date format)
datetime.datetime DateTime (with datetime format)
Formula(...) Formula
FormattedCell(value, fmt) Number with custom format (currency, percentage, etc.)
StyledCell(value, style) Any value with font/fill/border/alignment styling
None Empty cell

Does it work with pandas?

Yes. Install with pip install opensheet-core[pandas] and use read_xlsx_df() to read into DataFrames and to_xlsx() to write DataFrames to XLSX. Pandas is optional — the core library works without it.

from opensheet_core import read_xlsx_df, to_xlsx

df = read_xlsx_df("data.xlsx")
to_xlsx(df, "output.xlsx")

AI/RAG

How do I feed spreadsheet data to an LLM?

Use xlsx_to_markdown() to convert sheets into structured markdown tables that LLMs understand well:

from opensheet_core import xlsx_to_markdown

md = xlsx_to_markdown("data.xlsx")
prompt = f"Analyze this data:\n\n{md}\n\nWhat trends do you see?"

How do I use it with a RAG pipeline?

Use xlsx_to_chunks() to split large spreadsheets into embedding-sized pieces. Each chunk is a self-contained markdown table with the header row repeated:

from opensheet_core import xlsx_to_chunks

chunks = xlsx_to_chunks("large_dataset.xlsx", max_rows=50)
# Feed each chunk to your embedding model / vector store

Or use the built-in LangChain / LlamaIndex integrations directly:

# LangChain
from opensheet_core.langchain import OpenSheetLoader
docs = OpenSheetLoader("data.xlsx", mode="chunks").load()

# LlamaIndex
from opensheet_core.llamaindex import OpenSheetReader
docs = OpenSheetReader(mode="chunks").load_data("data.xlsx")

Do I need LangChain or LlamaIndex installed?

No. The core extraction functions (xlsx_to_markdown, xlsx_to_text, xlsx_to_chunks) have zero dependencies beyond opensheet-core itself. The LangChain and LlamaIndex integrations are optional — they only require their respective libraries when you import them.

Performance

How are the benchmarks measured?

Benchmarks use a 100,000-row x 10-column dataset (1M cells) with mixed types. Run them yourself:

python benchmarks/benchmark.py

Why is it so much faster and more memory-efficient than openpyxl?

Five reasons:

  1. Rust — compiled native code vs. interpreted Python
  2. Streaming — constant memory for writes, no DOM tree construction
  3. Deferred shared-string resolution — strings are stored as indices during parsing and only converted to Python objects at the Rust/Python boundary, avoiding O(N) string clones
  4. Pre-interned Python strings — the shared string table is converted once; repeated strings reuse the same Python object via reference counting instead of allocating new copies
  5. Minimal overhead — only two dependencies (quick-xml, zip), no abstraction layers

Installation

Do I need Rust installed to use it?

No. Prebuilt wheels are available on PyPI for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x64). Just pip install opensheet-core. You only need Rust if building from source.

What Python versions are supported?

Python 3.9 through 3.13.

Does it work on Apple Silicon (M1/M2/M3)?

Yes. Native aarch64 wheels are published for macOS.