Skip to content

Update nesting level data type from u16 to u32 to avoid attempt to add with overflow panic#934

Closed
gliderkite wants to merge 1 commit intotafia:masterfrom
gliderkite:nesting-level-u32
Closed

Update nesting level data type from u16 to u32 to avoid attempt to add with overflow panic#934
gliderkite wants to merge 1 commit intotafia:masterfrom
gliderkite:nesting-level-u32

Conversation

@gliderkite
Copy link

I need to parse large XML files (50MB+) that represent traffic information following the DATEX II schema. The content of the XML looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<d2LogicalModel xmlns="http://datex2.eu/schema/1_0/1_0">
    <payloadPublication xsi:type="">
    <situation id="">
        <situationRecord xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="" id="">...</situationRecord>
        ...
    </situation>
    ...
    </payloadPublication>
</d2LogicalModel>

There can be several thousands of <situation>. I deserialize using quick_xml version 0.39 into custom struct:

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct D2LogicalModel {
   payload_publication: SituationPublication,
}

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct SituationPublication {
    #[serde(rename = "@xsi:type", alias = "@type")]
    xsi_type: String,
    situation: Vec<Situation>,
}

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct Situation {
    #[serde(rename = "@id")]
    id: String,
    situation_record: Vec<SituationRecord>,
}

...

let data = include_bytes!("traffic.xml");
let model: D2LogicalModel = quick_xml::de::from_reader(data.as_slice()).unwrap();

But when I attempt to do so with the largest files I get a panic:

thread 'tests::test_deserialize_situation_publication' (384159) panicked at .cargo/registry/src/index.crates.io-1949cf8c6b5b557f/quick-xml-0.39.0/src/name.rs:658:9:
attempt to add with overflow

This is the line that caused it

pub fn push(&mut self, start: &BytesStart) -> Result<(), NamespaceError> {
        self.nesting_level += 1;

where nesting_level is a u16 that ends up overflowing.

This patch fixes the issue by simply moving to u32. I am not familiar with this library so I am not sure if there are other implications, please feel free to suggest better alternatives. As a workaround I can use my fork with this commit or I could stream all the situations and deserialize one by one using this version of the library, but I'd rather avoid that.

@Mingun
Copy link
Collaborator

Mingun commented Feb 4, 2026

Hm, overflow here means that you have XML with 65535 nested tags. That seems very unlikely. Can you share XML which trigger this error?

@Mingun
Copy link
Collaborator

Mingun commented Feb 4, 2026

Actually, I think, you hit #597

@gliderkite
Copy link
Author

gliderkite commented Feb 4, 2026

Hm, overflow here means that you have XML with 65535 nested tags. That seems very unlikely. Can you share XML which trigger this error?

No, I cannot share the original full file(s). We are talking about traffic data for whole countries, that may contain sensitive data. This panic is actually quite likely to occur depending on the size of the file.

The file is not strictly needed to replicate the issue anyway, you can try with something like this:

#[test]
fn test_deserialize_large_publication() {
    const SITUATION: &str = r#"<situation id="TTI-756fxxxxxxxx-48c0-47b1-a219-0d66b5c461cd-TTU36310151436002000">
            <headerInformation>
            <confidentiality>internalUse</confidentiality>
            <informationStatus>real</informationStatus>
            <urgency>normalUrgency</urgency>
            </headerInformation>
            <situationRecord xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="NetworkManagement" id="TTI-756f6290-48c0-47b1-a219-0d66b5c461cd-TTUXXXXXXXXXX151436002000-1">
            <situationRecordCreationTime>2050-02-09T21:00:00Z</situationRecordCreationTime>
            <situationRecordVersion>1</situationRecordVersion>
            <situationRecordVersionTime>2026-02-03T14:55:11Z</situationRecordVersionTime>
            <situationRecordFirstSupplierVersionTime>2050-02-03T14:55:11Z</situationRecordFirstSupplierVersionTime>
            <probabilityOfOccurrence>certain</probabilityOfOccurrence>
            <validity>
                <validityStatus>suspended</validityStatus>
                <validityTimeSpecification>
                <overallStartTime>2050-02-09T21:00:00Z</overallStartTime>
                <overallEndTime>2050-02-10T05:00:00Z</overallEndTime>
                </validityTimeSpecification>
            </validity>
            <generalPublicComment>
                <comment>
                <value lang="EN">Lane closure scheduled due to Roadworks / License - Roadworks works</value>
                </comment>
            </generalPublicComment>
            <groupOfLocations>
                <locationContainedInGroup xsi:type="Linear">
                <locationExtension>
                    <openlr>
                    <binary version="3">CwV9qyHNxjv+AwAZAMQ74AH//gBOO/8A//8AFjvgCgKdAEQ7FQ==</binary>
                    </openlr>
                </locationExtension>
                </locationContainedInGroup>
            </groupOfLocations>
            <situationRecordExtension>
                <alertCEventCode>500</alertCEventCode>
            </situationRecordExtension>
            <networkManagementType>laneOrCarriagewayClosed</networkManagementType>
            </situationRecord>
        </situation>"#;


    let mut situations = String::new();
    for _ in 0..30_000 {
        situations.push_str(SITUATION);
    }

    let xml = d2_logical_model_xml(&situations);
    let model: D2LogicalModel = quick_xml::de::from_reader(xml.as_slice()).unwrap();
}

fn d2_logical_model_xml(situations: &str) -> Vec<u8> {
    format!(
        r#"
            <?xml version="1.0" encoding="UTF-8" ?>
            <d2LogicalModel xmlns="http://datex2.eu/schema/1_0/1_0" modelBaseVersion="1.0">

                <payloadPublication xsi:type="SituationPublication">
                <publicationTime>2023-10-10T12:44:30Z</publicationTime>

                {situations}

                </payloadPublication>
            </d2LogicalModel>
        "#
    )
    .into_bytes()
}

@Mingun
Copy link
Collaborator

Mingun commented Feb 4, 2026

That is definitely #597. Fix in #598 is worked for well-formed documents, but behaves unexpectedly when strange documents are parsed. As a short-term solution you may apply that patch.

I'm close this PR, because it just masking the problem instead of solving it.

@Mingun Mingun closed this Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants