Add support for generic creation of sparse entries#419
Conversation
|
The new trait adds much complexity. Can your use-case be solved with simpler means, e.g. passing a slice of |
|
One of the objectives here is to avoid holding all the data in memory, which is why I didn't want to rely on passing a byte slice. By using a read-based trait, the implementation of the trait can choose to limit the memory that gets loaded. |
|
One of my concerns is that this new trait mirrors the Linux-style Another concern is that the new abstraction feel too complex/indirect for me, compared to the simplicity of the underlying format (which encodes a list of I agree with the problem statement in your PR description. But I'd like to avoid adding overly generalized API to reduce the maintenance burden. I think of the following approaches as an alternative:
Both of these would require allocation for sparse entries (and perhaps for the dense data), making them not suitable for some particular use-cases. But, maybe for such cases, a custom tar writer would be a better fit than the general-purpose tar-rs Would you please list your use-cases for this feature? This would help us to prioritize what we want to support. |
|
The second option might work, as long as the EntryWriter works with a writer that has only My main use case is reading parts of procfs mem for various pids (which is essentially sparse data) and streaming this as tar sparse entries (one mem entry per pid) to stdout (hence the requirement that the writer must not require I did initially try to just create a custom tar writer, but found that tar-rs did not expose enough information to make this practical (exposing certain APIs to enable custom tar writing might be another solution), and creating a tar-rs alternative felt like the wrong way to go. I did notice that tar-rs already supports files on disk, and I was considering creating a temp sparse file, but that felt like an odd/inefficient solution, plus some filesystems may impose limitations on the logical size of a file. |
Problem
Currently, the safest way to create a sparse entry in the
Builderis viaappend_file, which relies on having the data present on the filesystem (at best, you can use tmpfs). However, this has some limitations:append_file, and then remove that file).Solution
Added a new function to the
Buildercalledappend_sparse_data, which takes a data reader which implements a newSeekSparsetrait. Internally, files on supported unix systems can implement this trait (to avoid code duplication), but it really means that now, anyone using thetarcrate can implementSeekSparseon their own datatypes.Note that tar requires blocks to be 512-byte aligned, and with files, you get this for free (data/holes are block-aligned per the filesystem, which is generally some multiple of 512). With
SeekSparsedata, you have no restrictions about data alignment, which means that some holes may be encoded as zeros (or whatever "empty" byte theSeekSparseRead impl returns when a caller reads from a hole). This translates into some added complexity in thefind_sparse_entries_seekfunction. I think this is worthwhile, because it decouples the caller ofappend_sparse_datafrom having to care about what block-alignment tar uses (which is an internal implementation detail of tar).