Skip to content

Various safety fixes and defense-in-depth#7887

Merged
Manishearth merged 4 commits intounicode-org:mainfrom
Manishearth:safety-fixes
Apr 21, 2026
Merged

Various safety fixes and defense-in-depth#7887
Manishearth merged 4 commits intounicode-org:mainfrom
Manishearth:safety-fixes

Conversation

@Manishearth
Copy link
Copy Markdown
Member

This fixes a bunch of issues found by @sayrer by running Claude on ICU4X to look for vulnerabilities (reported in https://github.com/unicode-org/icu4x/security/advisories/GHSA-c8m9-943c-5xmw).

Most of the fixes here are defense in depth: the code was already safe, but we can add assertions/etc to make it safer. In other cases the issues found were potential correctness issues, not safety issues.

The only real safety issue found was the multiplication overflows in VarZeroVec validation code: in theory, a 64 bit platform might validate a VZV that is invalid on a 32 bit platform. These have been fixed. They aren't severe enough to warrant a patch release / advisory: the only time this can cause a security issue is if you are trusting VZV data blobs constructed on a 64 bit platform on a 32 bit platform, which only happens with baked data, and 4GB-size VZVs aren't really a thing there.

Thanks, @sayrer !

The first commit was agent assisted but in the end mostly rewritten by me.

Changelog

zerovec: Fix minor soundness issue around unchecked multiplication, add defense in depth against other overflow situations.

resb: Add defense-in-depth around checked multiplication.

icu_provider_fs: Add defense-in-depth against path traversal.

icu_list: Prevent correctness errors from out-of-range indices in list patterns.

@Manishearth Manishearth requested review from a team, robertbastian and sffc as code owners April 20, 2026 20:32
Comment on lines +97 to +99
// Data marker attributes are already validated to be
// alphanumeric + underscore/dash. This is defense-in-depth
// against potential path traversal attacks.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we do the claimed validation? I think we allow attributes to contain /, such as FR/long to be the long French display name, which should by design create a directory in FsDataProvider.

Consider forbidding other characters like . and : to prevent the path traversal

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FR/long is not an attribute. We have the key long and the attribute fr.

My initial code here forbade . and string-initial / and \, but then I checked the attributes docs and realized they forbid slashes anyway.

https://unicode-org.github.io/icu4x/rustdoc/icu_provider/struct.DataMarkerAttributes.html#method.try_from_str

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, this breaks part of my DisplayNames design, I guess a part I haven't implemented yet but assumed would be implementable. We will want to go back and revisit this.

I have a slight preference for you to pull this assumption out to a separate PR because we should discuss this on the DataMarkerAttributes level, but if you want to land this to assert on the status quo, I guess that's okay.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


#[inline]
unsafe fn from_bytes_unchecked(bytes: &[u8]) -> &Self {
debug_assert!(!bytes.is_empty());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: this one is very 🤷 because the trait's safety invariants have us handled via the check on line 179

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm basically always in favor of adding cheap debug assertions that double-validate things unless it's going to be a big mess.

I guess I wouldn't want this everywhere.

I'm going to lean slightly towards keeping this since the non-emptiness is relevant to type-specific invariants.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sffc is correct that "this one is very 🤷". However, it requires constant vigilance. Maybe this:

#[cfg(test)]
mod misuse_tests {
    use super::OptionULE;
    use crate::ule::ULE;

    /// Direct misuse of the unsafe contract: `from_bytes_unchecked` documents
    /// that callers must pass a slice that has already passed `validate_bytes`,
    /// which rejects empty input. Calling it with `&[]` violates that contract.
    ///
    /// In debug builds the `debug_assert!(!bytes.is_empty())` catches this at
    /// the construction site. In release builds this same call is undefined
    /// behavior (the returned `&OptionULE<u8>` would read an out-of-bounds
    /// discriminant byte on its next use), which is why the test is gated to
    /// debug assertions.
    #[test]
    #[should_panic]
    #[cfg(debug_assertions)]
    fn from_bytes_unchecked_empty_slice_trips_debug_assert() {
        // SAFETY: intentionally unsafe — this test exists to verify that the
        // defense-in-depth assertion fires when the caller violates the
        // documented precondition.
        let _ = unsafe { <OptionULE<u8> as ULE>::from_bytes_unchecked(&[]) };
    }

    /// Same shape, but routed through a "helpful wrapper" — the realistic
    /// refactor hazard. A future author writes a convenience function that
    /// forwards a slice to `from_bytes_unchecked` and forgets the length
    /// check that lived in the original call site. The assertion catches it.
    #[test]
    #[should_panic]
    #[cfg(debug_assertions)]
    fn wrapper_forwarding_empty_slice_trips_debug_assert() {
        unsafe fn construct_without_validating(bytes: &[u8]) -> &OptionULE<u8> {
            // SAFETY: (intentionally unsound — simulates a refactor that drops
            // the validate_bytes call upstream.)
            unsafe { <OptionULE<u8> as ULE>::from_bytes_unchecked(bytes) }
        }

        // SAFETY: see above — deliberately violating the contract.
        let _ = unsafe { construct_without_validating(&[]) };
    }

    /// Sanity check: the *contract-respecting* path does not trip the
    /// assertion. A single-byte slice `[0]` encodes `None` for `OptionULE<u8>`
    /// (discriminant 0, inner byte ignored — here we supply two bytes so the
    /// layout is well-formed regardless of U::SIZE assumptions).
    #[test]
    fn from_bytes_unchecked_valid_slice_is_fine() {
        let bytes = [0u8, 0u8];
        // SAFETY: validate_bytes would accept this slice (length == 1 + u8::SIZE,
        // discriminant is 0 meaning None, inner byte is a valid u8 ULE).
        let opt = unsafe { <OptionULE<u8> as ULE>::from_bytes_unchecked(&bytes) };
        // Just prove we can touch it without UB in debug.
        let _ = opt;
    }
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug_assert!(!bytes.is_empty()); could be added to most from_bytes_unchecked implementations then

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, this is a case where there's some relevant arithmetic just after it.

From an unsafe review perspective this is nicer, though I probably should have added an assertion message.

Comment on lines +363 to +364
let indices_len = F::Index::SIZE.wrapping_mul(len_minus_one as usize);
debug_assert!(F::Index::SIZE.checked_mul(len_minus_one as usize).is_some());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: this is equivalent to the * operator you had before I think

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but now it's intentional 😄

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I like doing this in general in the icu4x codebase, but since this is in one our two most sensitive files (the other being yoke)... ok to be extra explicit I guess

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Also I think this is not the first time someone will run AI tooling to search for vulns, so some defense in depth is good in that it prevents duplicate reports.

Comment on lines +150 to +153
let indices_size = F::Index::SIZE
.checked_mul(len - 1)
.expect(F::Index::TOO_LARGE_ERROR);
debug_assert!(out + F::Len::SIZE + indices_size <= self.entire_slice.len());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, here and elsewhere: I'm not thinking too hard in this review about why you are doing a checked_mul on some and a debug assertion on others. I assume that when you assert, it is a safety thing that is technically reachable, and debug assert is an invariant that is not reachable. I can go back through this again if you want me to.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There are three ways I've done things in this PR:

  • Assertions for safety things that are reachable in an API that is allowed to panic (like VZV construction here). I consider FS data provider to be "allowed to panic" in case of a path traversal; that's pretty serious. Happy to downgrade to debug assert if you don't want panics there, though. I view FS code as very fragile and panicky anyway.
  • A debug assertion for safety things that are not reachable but I'm being cautious.
  • An error for actual reachable safety issues

We only have one real reachable safety issue here: the one with 32 bit overflow in VZV components.

@Manishearth Manishearth requested a review from sffc April 21, 2026 04:03
@sffc
Copy link
Copy Markdown
Member

sffc commented Apr 21, 2026

/gemini review

Copy link
Copy Markdown
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is landable but I requested an LLM review to double check that the changes you're making to math expressions are no-ops, which is easier for an LLM to verify than a human.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several safety enhancements to prevent integer overflows, underflows, and path traversal vulnerabilities across the codebase, specifically within list patterns, binary deserialization, and VarZeroVec components. The changes primarily involve replacing direct arithmetic with checked operations and adding explicit bounds checks. The review feedback identifies a potential denial-of-service vulnerability in the file system provider where an assert! is used for input validation instead of returning an error. Additionally, it points out potential underflow panics when calculating index sizes for empty element collections and suggests using saturating_sub to ensure robustness.

// Data marker attributes are already validated to be
// alphanumeric + underscore/dash. This is defense-in-depth
// against potential path traversal attacks.
assert!(!attr_path.contains(['/', '\\']));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Using assert! for input validation can lead to denial-of-service vulnerabilities if the input can be controlled by an untrusted source (e.g., via a DataRequest). It is better to return a DataError instead of panicking, especially in a library context where robustness is expected.

                    if attr_path.contains(['/', '\\']) {
                        return Err(DataError::custom("Invalid marker attributes").with_path_context(&path));
                    }

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be hit, and DOS is better than path traversal.

reason = "Function contract allows panicky behavior"
)]
let indices_size = F::Index::SIZE
.checked_mul(elements.len() - 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

elements.len() - 1 will underflow and panic in debug mode if elements is empty. Using saturating_sub(1) is safer and prevents this panic, which aligns with the defense-in-depth goals of this pull request.

Suggested change
.checked_mul(elements.len() - 1)
.checked_mul(elements.len().saturating_sub(1))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have checked that elements is nonempty already

reason = "Function contract allows panicky behavior"
)]
let indices_size = F::Index::SIZE
.checked_mul(elements.len() - 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

elements.len() - 1 will underflow and panic in debug mode if elements is empty. Using saturating_sub(1) is safer and prevents this panic.

Suggested change
.checked_mul(elements.len() - 1)
.checked_mul(elements.len().saturating_sub(1))

@Manishearth Manishearth merged commit 27f15b2 into unicode-org:main Apr 21, 2026
34 checks passed
&& (allow_suffix || index_1 == pattern.len() - 3) =>
{
if (index_0 > 0 && !cfg!(test)) || index_1 - 3 >= 256 {
let index1_minus_three = index_1.wrapping_sub(3);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a needless change, if the string contains both {0} and {1}, and index_0 < index_1, then index_1 is at least 3. I can see that when reading the code, we should not complicate our code just because AIs are too stupid to understand such simple dependencies

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I made this change because it's pretty non-obvious to humans too. It took me a while to realize this was checked. An alternate way to do this is to have comments about the invariants. I'm happy to switch to that if you prefer.

This isn't unsafe code, but I was reviewing all of this with my unsafe review hat, where you really want things to be dead obvious.


#[inline]
unsafe fn from_bytes_unchecked(bytes: &[u8]) -> &Self {
debug_assert!(!bytes.is_empty());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug_assert!(!bytes.is_empty()); could be added to most from_bytes_unchecked implementations then

///
/// * `key_hashes` - [`ExactSizeIterator`] over the hashed key values
#[expect(clippy::indexing_slicing, clippy::unwrap_used)]
#[expect(clippy::indexing_slicing, clippy::unwrap_used, clippy::expect_used)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

justification?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same justification as the preexisting ones? This is a construction API allowed to panic. I could add justification there just wasn't any already so I didn't.

Manishearth added a commit that referenced this pull request Apr 21, 2026
Followup for Robert's comments on
#7887

## Changelog (N/A)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants