Various safety fixes and defense-in-depth#7887
Conversation
39d3ca6 to
912867e
Compare
912867e to
2c1e23d
Compare
| // Data marker attributes are already validated to be | ||
| // alphanumeric + underscore/dash. This is defense-in-depth | ||
| // against potential path traversal attacks. |
There was a problem hiding this comment.
Where do we do the claimed validation? I think we allow attributes to contain /, such as FR/long to be the long French display name, which should by design create a directory in FsDataProvider.
Consider forbidding other characters like . and : to prevent the path traversal
There was a problem hiding this comment.
FR/long is not an attribute. We have the key long and the attribute fr.
My initial code here forbade . and string-initial / and \, but then I checked the attributes docs and realized they forbid slashes anyway.
There was a problem hiding this comment.
oof, this breaks part of my DisplayNames design, I guess a part I haven't implemented yet but assumed would be implementable. We will want to go back and revisit this.
I have a slight preference for you to pull this assumption out to a separate PR because we should discuss this on the DataMarkerAttributes level, but if you want to land this to assert on the status quo, I guess that's okay.
|
|
||
| #[inline] | ||
| unsafe fn from_bytes_unchecked(bytes: &[u8]) -> &Self { | ||
| debug_assert!(!bytes.is_empty()); |
There was a problem hiding this comment.
Thought: this one is very 🤷 because the trait's safety invariants have us handled via the check on line 179
There was a problem hiding this comment.
I'm basically always in favor of adding cheap debug assertions that double-validate things unless it's going to be a big mess.
I guess I wouldn't want this everywhere.
I'm going to lean slightly towards keeping this since the non-emptiness is relevant to type-specific invariants.
There was a problem hiding this comment.
@sffc is correct that "this one is very 🤷". However, it requires constant vigilance. Maybe this:
#[cfg(test)]
mod misuse_tests {
use super::OptionULE;
use crate::ule::ULE;
/// Direct misuse of the unsafe contract: `from_bytes_unchecked` documents
/// that callers must pass a slice that has already passed `validate_bytes`,
/// which rejects empty input. Calling it with `&[]` violates that contract.
///
/// In debug builds the `debug_assert!(!bytes.is_empty())` catches this at
/// the construction site. In release builds this same call is undefined
/// behavior (the returned `&OptionULE<u8>` would read an out-of-bounds
/// discriminant byte on its next use), which is why the test is gated to
/// debug assertions.
#[test]
#[should_panic]
#[cfg(debug_assertions)]
fn from_bytes_unchecked_empty_slice_trips_debug_assert() {
// SAFETY: intentionally unsafe — this test exists to verify that the
// defense-in-depth assertion fires when the caller violates the
// documented precondition.
let _ = unsafe { <OptionULE<u8> as ULE>::from_bytes_unchecked(&[]) };
}
/// Same shape, but routed through a "helpful wrapper" — the realistic
/// refactor hazard. A future author writes a convenience function that
/// forwards a slice to `from_bytes_unchecked` and forgets the length
/// check that lived in the original call site. The assertion catches it.
#[test]
#[should_panic]
#[cfg(debug_assertions)]
fn wrapper_forwarding_empty_slice_trips_debug_assert() {
unsafe fn construct_without_validating(bytes: &[u8]) -> &OptionULE<u8> {
// SAFETY: (intentionally unsound — simulates a refactor that drops
// the validate_bytes call upstream.)
unsafe { <OptionULE<u8> as ULE>::from_bytes_unchecked(bytes) }
}
// SAFETY: see above — deliberately violating the contract.
let _ = unsafe { construct_without_validating(&[]) };
}
/// Sanity check: the *contract-respecting* path does not trip the
/// assertion. A single-byte slice `[0]` encodes `None` for `OptionULE<u8>`
/// (discriminant 0, inner byte ignored — here we supply two bytes so the
/// layout is well-formed regardless of U::SIZE assumptions).
#[test]
fn from_bytes_unchecked_valid_slice_is_fine() {
let bytes = [0u8, 0u8];
// SAFETY: validate_bytes would accept this slice (length == 1 + u8::SIZE,
// discriminant is 0 meaning None, inner byte is a valid u8 ULE).
let opt = unsafe { <OptionULE<u8> as ULE>::from_bytes_unchecked(&bytes) };
// Just prove we can touch it without UB in debug.
let _ = opt;
}
}
There was a problem hiding this comment.
debug_assert!(!bytes.is_empty()); could be added to most from_bytes_unchecked implementations then
There was a problem hiding this comment.
I know, this is a case where there's some relevant arithmetic just after it.
From an unsafe review perspective this is nicer, though I probably should have added an assertion message.
| let indices_len = F::Index::SIZE.wrapping_mul(len_minus_one as usize); | ||
| debug_assert!(F::Index::SIZE.checked_mul(len_minus_one as usize).is_some()); |
There was a problem hiding this comment.
Observation: this is equivalent to the * operator you had before I think
There was a problem hiding this comment.
Correct, but now it's intentional 😄
There was a problem hiding this comment.
I don't think I like doing this in general in the icu4x codebase, but since this is in one our two most sensitive files (the other being yoke)... ok to be extra explicit I guess
There was a problem hiding this comment.
Right. Also I think this is not the first time someone will run AI tooling to search for vulns, so some defense in depth is good in that it prevents duplicate reports.
| let indices_size = F::Index::SIZE | ||
| .checked_mul(len - 1) | ||
| .expect(F::Index::TOO_LARGE_ERROR); | ||
| debug_assert!(out + F::Len::SIZE + indices_size <= self.entire_slice.len()); |
There was a problem hiding this comment.
Note, here and elsewhere: I'm not thinking too hard in this review about why you are doing a checked_mul on some and a debug assertion on others. I assume that when you assert, it is a safety thing that is technically reachable, and debug assert is an invariant that is not reachable. I can go back through this again if you want me to.
There was a problem hiding this comment.
Yes. There are three ways I've done things in this PR:
- Assertions for safety things that are reachable in an API that is allowed to panic (like VZV construction here). I consider FS data provider to be "allowed to panic" in case of a path traversal; that's pretty serious. Happy to downgrade to debug assert if you don't want panics there, though. I view FS code as very fragile and panicky anyway.
- A debug assertion for safety things that are not reachable but I'm being cautious.
- An error for actual reachable safety issues
We only have one real reachable safety issue here: the one with 32 bit overflow in VZV components.
|
/gemini review |
sffc
left a comment
There was a problem hiding this comment.
This is landable but I requested an LLM review to double check that the changes you're making to math expressions are no-ops, which is easier for an LLM to verify than a human.
There was a problem hiding this comment.
Code Review
This pull request introduces several safety enhancements to prevent integer overflows, underflows, and path traversal vulnerabilities across the codebase, specifically within list patterns, binary deserialization, and VarZeroVec components. The changes primarily involve replacing direct arithmetic with checked operations and adding explicit bounds checks. The review feedback identifies a potential denial-of-service vulnerability in the file system provider where an assert! is used for input validation instead of returning an error. Additionally, it points out potential underflow panics when calculating index sizes for empty element collections and suggests using saturating_sub to ensure robustness.
| // Data marker attributes are already validated to be | ||
| // alphanumeric + underscore/dash. This is defense-in-depth | ||
| // against potential path traversal attacks. | ||
| assert!(!attr_path.contains(['/', '\\'])); |
There was a problem hiding this comment.
Using assert! for input validation can lead to denial-of-service vulnerabilities if the input can be controlled by an untrusted source (e.g., via a DataRequest). It is better to return a DataError instead of panicking, especially in a library context where robustness is expected.
if attr_path.contains(['/', '\\']) {
return Err(DataError::custom("Invalid marker attributes").with_path_context(&path));
}There was a problem hiding this comment.
This should not be hit, and DOS is better than path traversal.
| reason = "Function contract allows panicky behavior" | ||
| )] | ||
| let indices_size = F::Index::SIZE | ||
| .checked_mul(elements.len() - 1) |
There was a problem hiding this comment.
elements.len() - 1 will underflow and panic in debug mode if elements is empty. Using saturating_sub(1) is safer and prevents this panic, which aligns with the defense-in-depth goals of this pull request.
| .checked_mul(elements.len() - 1) | |
| .checked_mul(elements.len().saturating_sub(1)) |
There was a problem hiding this comment.
We have checked that elements is nonempty already
| reason = "Function contract allows panicky behavior" | ||
| )] | ||
| let indices_size = F::Index::SIZE | ||
| .checked_mul(elements.len() - 1) |
| && (allow_suffix || index_1 == pattern.len() - 3) => | ||
| { | ||
| if (index_0 > 0 && !cfg!(test)) || index_1 - 3 >= 256 { | ||
| let index1_minus_three = index_1.wrapping_sub(3); |
There was a problem hiding this comment.
this is a needless change, if the string contains both {0} and {1}, and index_0 < index_1, then index_1 is at least 3. I can see that when reading the code, we should not complicate our code just because AIs are too stupid to understand such simple dependencies
There was a problem hiding this comment.
No, I made this change because it's pretty non-obvious to humans too. It took me a while to realize this was checked. An alternate way to do this is to have comments about the invariants. I'm happy to switch to that if you prefer.
This isn't unsafe code, but I was reviewing all of this with my unsafe review hat, where you really want things to be dead obvious.
|
|
||
| #[inline] | ||
| unsafe fn from_bytes_unchecked(bytes: &[u8]) -> &Self { | ||
| debug_assert!(!bytes.is_empty()); |
There was a problem hiding this comment.
debug_assert!(!bytes.is_empty()); could be added to most from_bytes_unchecked implementations then
| /// | ||
| /// * `key_hashes` - [`ExactSizeIterator`] over the hashed key values | ||
| #[expect(clippy::indexing_slicing, clippy::unwrap_used)] | ||
| #[expect(clippy::indexing_slicing, clippy::unwrap_used, clippy::expect_used)] |
There was a problem hiding this comment.
Same justification as the preexisting ones? This is a construction API allowed to panic. I could add justification there just wasn't any already so I didn't.
Followup for Robert's comments on #7887 ## Changelog (N/A)
This fixes a bunch of issues found by @sayrer by running Claude on ICU4X to look for vulnerabilities (reported in https://github.com/unicode-org/icu4x/security/advisories/GHSA-c8m9-943c-5xmw).
Most of the fixes here are defense in depth: the code was already safe, but we can add assertions/etc to make it safer. In other cases the issues found were potential correctness issues, not safety issues.
The only real safety issue found was the multiplication overflows in VarZeroVec validation code: in theory, a 64 bit platform might validate a VZV that is invalid on a 32 bit platform. These have been fixed. They aren't severe enough to warrant a patch release / advisory: the only time this can cause a security issue is if you are trusting VZV data blobs constructed on a 64 bit platform on a 32 bit platform, which only happens with baked data, and 4GB-size VZVs aren't really a thing there.
Thanks, @sayrer !
The first commit was agent assisted but in the end mostly rewritten by me.
Changelog
zerovec: Fix minor soundness issue around unchecked multiplication, add defense in depth against other overflow situations.
resb: Add defense-in-depth around checked multiplication.
icu_provider_fs: Add defense-in-depth against path traversal.
icu_list: Prevent correctness errors from out-of-range indices in list patterns.