Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions .github/workflows/fuzz.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: Fuzz

on:
# Run on-demand
workflow_dispatch:
inputs:
duration:
description: "Fuzz duration in seconds per target"
required: false
default: "300"
# Weekly scheduled run
schedule:
- cron: "0 3 * * 1" # Every Monday at 03:00 UTC
# Also run on PRs that touch the reader
pull_request:
paths:
- "src/reader/**"
- "fuzz/**"

jobs:
fuzz:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
target:
- fuzz_read_xlsx
- fuzz_read_single_sheet
- fuzz_read_sheet_names
- fuzz_read_document_properties
steps:
- uses: actions/checkout@v6

- name: Install Rust nightly
uses: dtolnay/rust-toolchain@nightly

- name: Install cargo-fuzz
run: cargo install cargo-fuzz

- name: Cache fuzz corpus
uses: actions/cache@v4
with:
path: fuzz/corpus/${{ matrix.target }}
key: fuzz-corpus-${{ matrix.target }}-${{ github.sha }}
restore-keys: |
fuzz-corpus-${{ matrix.target }}-

- name: Run fuzzer
run: |
DURATION=${{ github.event.inputs.duration || '60' }}
cargo +nightly fuzz run ${{ matrix.target }} -- \
-max_total_time=$DURATION \
-max_len=65536

- name: Upload crash artifacts
if: failure()
uses: actions/upload-artifact@v4
with:
name: fuzz-crash-${{ matrix.target }}
path: fuzz/artifacts/${{ matrix.target }}
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ env/
*.swp
*.swo

# Fuzzing artifacts (keep seed corpus, ignore runtime output)
fuzz/artifacts/
fuzz/corpus/

# OS
.DS_Store
Thumbs.db
96 changes: 96 additions & 0 deletions docs/edge-cases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Known Edge Cases and Limitations

This document records edge cases discovered through fuzzing and corpus
testing, along with the reader's behavior in each case.

## Security Limits

The reader enforces hard limits to prevent denial-of-service:

| Limit | Value | Effect |
|-------|-------|--------|
| Max ZIP entry size | 256 MB (decompressed) | Returns `XlsxError` |
| Max shared strings | 2,000,000 | Returns `XlsxError` |
| Max rows per sheet | 1,048,576 (Excel limit) | Returns `XlsxError` |

## ZIP / Archive Edge Cases

| Scenario | Behavior |
|----------|----------|
| Truncated ZIP | Clean `ZipError` |
| Empty ZIP (no entries) | Error: missing `xl/workbook.xml` |
| Non-ZIP data (random bytes) | Clean `ZipError` |
| Missing `xl/workbook.xml` | Error: `InvalidStructure` |
| Missing `_rels/.rels` | Falls back gracefully |
| ZIP bomb (large compression ratio) | Caught by entry size limit |

## XML Edge Cases

| Scenario | Behavior |
|----------|----------|
| Malformed XML (unclosed tags) | Partial parse; may return truncated data |
| Missing `<sheetData>` element | Empty rows returned |
| Unknown XML elements | Ignored (forward-compatible) |
| XML with BOM | Handled by quick-xml |
| Very deeply nested XML | Parsed normally (no depth limit) |

## Cell & Data Type Edge Cases

| Scenario | Behavior |
|----------|----------|
| Shared string index out of bounds | Returns `"[invalid string index]"` |
| Empty cell element `<c/>` | Returns `Empty` |
| Cell with no `<v>` child | Returns `Empty` |
| Boolean cell with value not 0 or 1 | Treated as truthy/falsy |
| Inline string with rich text runs | Concatenated plain text |
| Formula with cached value | Returns `Formula { formula, cached_value }` |
| Error cell (`t="e"`) | Returns the error string (e.g. `"#REF!"`) |
| Number format code detection | Date serial numbers with date format codes → `Date` |

## Row & Structure Edge Cases

| Scenario | Behavior |
|----------|----------|
| Row number 0 | Treated as row 0 (no crash) |
| Sparse rows (gaps in numbering) | Gaps filled with empty rows |
| Row at max (1,048,576) | Accepted (at limit) |
| Duplicate row numbers | Last row wins |
| Columns beyond XFD (16,384) | Parsed if present |
| Merged cells with no data in merge area | Merge range recorded, cells empty |

## Sheet & Workbook Edge Cases

| Scenario | Behavior |
|----------|----------|
| Hidden/veryHidden sheets | Parsed with `state` field set |
| Sheet with no rows | Empty `rows` list |
| 100+ sheets in one workbook | All parsed |
| Sheet name with special characters | Preserved as-is |
| Defined names with `hidden="1"` | Included in output |
| Missing `xl/_rels/workbook.xml.rels` | Error: cannot resolve sheet paths |

## File Source Compatibility

The test corpus includes files structured to match output from:

- **Microsoft Excel** (standard OOXML)
- **LibreOffice Calc** (may use different XML namespaces)
- **Google Sheets** (export as .xlsx)
- **opensheet-core writer** (roundtrip testing)
- **Programmatically generated** (minimal valid XLSX)

## Fuzzing

Fuzz targets exercise all reader entry points:

- `fuzz_read_xlsx` — Full workbook parse
- `fuzz_read_single_sheet` — Single sheet extraction
- `fuzz_read_sheet_names` — Sheet name enumeration
- `fuzz_read_document_properties` — Document metadata parse

Run locally:
```bash
cargo +nightly fuzz run fuzz_read_xlsx -- -max_total_time=60
```

CI runs fuzzing weekly and on PRs touching `src/reader/` or `fuzz/`.
40 changes: 40 additions & 0 deletions fuzz/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
[package]
name = "opensheet-core-fuzz"
version = "0.0.0"
publish = false
edition = "2021"

[package.metadata]
cargo-fuzz = true

[dependencies]
libfuzzer-sys = "0.4"

[dependencies.opensheet-core]
path = ".."
default-features = false

# --- Fuzz targets ---------------------------------------------------

[[bin]]
name = "fuzz_read_xlsx"
path = "fuzz_targets/fuzz_read_xlsx.rs"
doc = false

[[bin]]
name = "fuzz_read_single_sheet"
path = "fuzz_targets/fuzz_read_single_sheet.rs"
doc = false

[[bin]]
name = "fuzz_read_sheet_names"
path = "fuzz_targets/fuzz_read_sheet_names.rs"
doc = false

[[bin]]
name = "fuzz_read_document_properties"
path = "fuzz_targets/fuzz_read_document_properties.rs"
doc = false

[workspace]
members = ["."]
8 changes: 8 additions & 0 deletions fuzz/fuzz_targets/fuzz_read_document_properties.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#![no_main]
use libfuzzer_sys::fuzz_target;
use std::io::Cursor;

fuzz_target!(|data: &[u8]| {
let cursor = Cursor::new(data);
let _ = opensheet_core::reader::xlsx::read_document_properties(cursor);
});
8 changes: 8 additions & 0 deletions fuzz/fuzz_targets/fuzz_read_sheet_names.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#![no_main]
use libfuzzer_sys::fuzz_target;
use std::io::Cursor;

fuzz_target!(|data: &[u8]| {
let cursor = Cursor::new(data);
let _ = opensheet_core::reader::xlsx::read_sheet_names(cursor);
});
9 changes: 9 additions & 0 deletions fuzz/fuzz_targets/fuzz_read_single_sheet.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#![no_main]
use libfuzzer_sys::fuzz_target;
use std::io::Cursor;

fuzz_target!(|data: &[u8]| {
let cursor = Cursor::new(data);
// Try reading the first sheet by index.
let _ = opensheet_core::reader::xlsx::read_single_sheet(cursor, None, Some(0));
});
9 changes: 9 additions & 0 deletions fuzz/fuzz_targets/fuzz_read_xlsx.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#![no_main]
use libfuzzer_sys::fuzz_target;
use std::io::Cursor;

fuzz_target!(|data: &[u8]| {
let cursor = Cursor::new(data);
// We don't care about the result — only that it doesn't panic or hang.
let _ = opensheet_core::reader::xlsx::read_xlsx(cursor);
});
6 changes: 3 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ use pyo3::types::{
use std::fs::File;
use std::io::{BufReader, BufWriter};

mod reader;
mod types;
mod writer;
pub mod reader;
pub mod types;
pub mod writer;

use types::CellValue;
use writer::xlsx::StreamingXlsxWriter;
Expand Down
Binary file added tests/fixtures/auto_filter.xlsx
Binary file not shown.
Binary file added tests/fixtures/column_widths.xlsx
Binary file not shown.
Binary file added tests/fixtures/date_cells.xlsx
Binary file not shown.
Binary file added tests/fixtures/defined_names.xlsx
Binary file not shown.
Binary file added tests/fixtures/document_properties.xlsx
Binary file not shown.
Binary file added tests/fixtures/empty_workbook.xlsx
Binary file not shown.
Binary file added tests/fixtures/empty_zip.xlsx
Binary file not shown.
Binary file added tests/fixtures/freeze_panes.xlsx
Binary file not shown.
Binary file added tests/fixtures/huge_row_gap.xlsx
Binary file not shown.
Binary file added tests/fixtures/inline_strings.xlsx
Binary file not shown.
Binary file added tests/fixtures/large_shared_strings.xlsx
Binary file not shown.
Binary file added tests/fixtures/malformed_xml.xlsx
Binary file not shown.
Binary file added tests/fixtures/many_data_types.xlsx
Binary file not shown.
Binary file added tests/fixtures/merged_cells.xlsx
Binary file not shown.
Binary file added tests/fixtures/missing_workbook.xlsx
Binary file not shown.
Binary file added tests/fixtures/multiple_sheets.xlsx
Binary file not shown.
Binary file added tests/fixtures/negative_row_number.xlsx
Binary file not shown.
Binary file added tests/fixtures/rich_text_strings.xlsx
Binary file not shown.
Binary file added tests/fixtures/single_cell.xlsx
Binary file not shown.
Binary file added tests/fixtures/sparse_rows.xlsx
Binary file not shown.
Binary file added tests/fixtures/truncated_zip.xlsx
Binary file not shown.
Binary file added tests/fixtures/unicode_strings.xlsx
Binary file not shown.
Binary file added tests/fixtures/wrong_string_index.xlsx
Binary file not shown.
Loading
Loading