Add support for generic creation of sparse entries by jebradbury39 · Pull Request #419 · composefs/tar-rs

jebradbury39 · 2025-11-07T19:16:29Z

Problem

Currently, the safest way to create a sparse entry in the Builder is via append_file, which relies on having the data present on the filesystem (at best, you can use tmpfs). However, this has some limitations:

You are stuck with whatever your filesystem supports (w.r.t. size/offsets), when in reality your tar archive might be getting streamed to another destination which CAN support those size/offsets.
You're required to hit the disk for these types of data (e.g. you have some data in memory, which you then need to write to a file, call append_file, and then remove that file).

Solution

Added a new function to the Builder called append_sparse_data, which takes a data reader which implements a new SeekSparse trait. Internally, files on supported unix systems can implement this trait (to avoid code duplication), but it really means that now, anyone using the tar crate can implement SeekSparse on their own datatypes.

Note that tar requires blocks to be 512-byte aligned, and with files, you get this for free (data/holes are block-aligned per the filesystem, which is generally some multiple of 512). With SeekSparse data, you have no restrictions about data alignment, which means that some holes may be encoded as zeros (or whatever "empty" byte the SeekSparse Read impl returns when a caller reads from a hole). This translates into some added complexity in the find_sparse_entries_seek function. I think this is worthwhile, because it decouples the caller of append_sparse_data from having to care about what block-alignment tar uses (which is an internal implementation detail of tar).

xzfc · 2025-11-23T19:07:24Z

The new trait adds much complexity. Can your use-case be solved with simpler means, e.g. passing a slice of SparseEntry alongside the &[u8] data?

jebradbury39 · 2025-11-23T20:38:34Z

One of the objectives here is to avoid holding all the data in memory, which is why I didn't want to rely on passing a byte slice. By using a read-based trait, the implementation of the trait can choose to limit the memory that gets loaded.

xzfc · 2025-11-23T22:31:52Z

One of my concerns is that this new trait mirrors the Linux-style SEEK_DATA / SEEK_HOLE semantics (as said in the doc comment). I'm not sure whether other sparse APIs (e.g. older Linux's FIEMAP, Windows' FSCTL_QUERY_ALLOCATED_RANGES) would fit into this API well if we decide to support them.

Another concern is that the new abstraction feel too complex/indirect for me, compared to the simplicity of the underlying format (which encodes a list of (offset, length) pairs).

I agree with the problem statement in your PR description. But I'd like to avoid adding overly generalized API to reduce the maintenance burden.

I think of the following approaches as an alternative:

A method that accepts sparse_entries: &[SparseEntry] and dense_data: &[u8].
A method that accepts sparse_entries: &[SparseEntry] and returns EntryWriter to write dense data.

Both of these would require allocation for sparse entries (and perhaps for the dense data), making them not suitable for some particular use-cases. But, maybe for such cases, a custom tar writer would be a better fit than the general-purpose tar-rs Builder.

Would you please list your use-cases for this feature? This would help us to prioritize what we want to support.

jebradbury39 · 2025-11-23T23:33:01Z

The second option might work, as long as the EntryWriter works with a writer that has only Write and not Seek.

My main use case is reading parts of procfs mem for various pids (which is essentially sparse data) and streaming this as tar sparse entries (one mem entry per pid) to stdout (hence the requirement that the writer must not require Seek). There's a twist, in that I may not be directly reading the procfs (the data may be streamed in from another process via a thin protocol over tcp - one resident page at a time). Stdout may then be piped to other processes. I need to keep memory usage relatively low during this (some of the pids have multi-GB resident mem).

I did initially try to just create a custom tar writer, but found that tar-rs did not expose enough information to make this practical (exposing certain APIs to enable custom tar writing might be another solution), and creating a tar-rs alternative felt like the wrong way to go. I did notice that tar-rs already supports files on disk, and I was considering creating a temp sparse file, but that felt like an odd/inefficient solution, plus some filesystems may impose limitations on the logical size of a file.

jebradbury39 added 7 commits February 27, 2026 12:14

wip - debugging

e205d23

tests passing

2fa2b65

fix tests

8079b8e

update docs

739c2d1

update

2a213f0

update

17c511c

add cfg checks

97e55f0

jebradbury39 force-pushed the main branch from fb115c1 to 97e55f0 Compare February 27, 2026 20:14

jebradbury39 added 2 commits March 23, 2026 08:47

Merge remote-tracking branch 'upstream/main'

248568e

Merge branch 'alexcrichton:main' into main

eee48ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for generic creation of sparse entries#419

Add support for generic creation of sparse entries#419
jebradbury39 wants to merge 9 commits into
composefs:mainfrom
jebradbury39:main

jebradbury39 commented Nov 7, 2025 •

edited

Loading

Uh oh!

xzfc commented Nov 23, 2025

Uh oh!

jebradbury39 commented Nov 23, 2025

Uh oh!

xzfc commented Nov 23, 2025

Uh oh!

jebradbury39 commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jebradbury39 commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Uh oh!

xzfc commented Nov 23, 2025

Uh oh!

jebradbury39 commented Nov 23, 2025

Uh oh!

xzfc commented Nov 23, 2025

Uh oh!

jebradbury39 commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jebradbury39 commented Nov 7, 2025 •

edited

Loading