jongpie · anthonygiuliano · Mar 3, 2026 · May 17, 2026 · jongpie · May 8, 2026
@@ -0,0 +1,17 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<CustomMetadata xmlns="http://soap.sforce.com/2006/04/metadata" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
+    <label>Data Mask Regex: Chunk Size</label>
+    <protected>false</protected>
+    <values>
+        <field>Comments__c</field>
+        <value xsi:nil="true"/>
+    </values>
+    <values>
+        <field>Description__c</field>
+        <value xsi:type="xsd:string">When data masking is applied to a very long string, the value is processed in chunks of this many characters to avoid the Apex &apos;System.LimitException: Regex too complicated&apos; error (which Salesforce raises when a single regex evaluation is too expensive). Tradeoffs: a LARGER chunk size means fewer chunks and slightly less overlap re-scanning, but each regex evaluation runs against more text and is therefore more likely to hit the LimitException; a SMALLER chunk size is safer against the limit but increases chunk count and overlap re-scan overhead. The chunk size must also be larger than DataMaskRegexOverlapSize plus the longest value any enabled rule can match, or boundary values can be missed. The default (4000) is a deliberately conservative value, roughly 27x below even the worst-case measured failure point. With all four shipped rules applied single-pass (no chunking), the limit was hit between ~110K characters (realistic log-shaped text) and ~220K characters (dense structured input) — diluting matches with ordinary text makes it fail sooner, not later, because the limit is a regex-engine step budget rather than a character count. Note: that LimitException is uncatchable, so without chunking a single oversized log message fails the whole logging call. Benchmarking found chunk size to be a safety knob rather than a performance lever: processing CPU was effectively flat across chunk sizes from 1K to 64K, so raising this value yields no measurable speedup while moving closer to the failure point. Lower this if a custom rule still throws &apos;Regex too complicated&apos; at the default; only raise it after testing your specific rule regexes against representative data. When no record is configured, Nebula Logger falls back to its built-in default of 4000.</value>
+    </values>
+    <values>
+        <field>Value__c</field>
+        <value xsi:type="xsd:string">4000</value>
+    </values>
+</CustomMetadata>
@@ -0,0 +1,17 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<CustomMetadata xmlns="http://soap.sforce.com/2006/04/metadata" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
+    <label>Data Mask Regex: Overlap Size</label>
+    <protected>false</protected>
+    <values>
+        <field>Comments__c</field>
+        <value xsi:nil="true"/>
+    </values>
+    <values>
+        <field>Description__c</field>
+        <value xsi:type="xsd:string">When data masking is applied to a very long string, the value is processed in overlapping chunks to avoid the Apex &apos;System.LimitException: Regex too complicated&apos; error. This integer controls how many characters adjacent chunks overlap, which guarantees that a sensitive value sitting on a chunk boundary is still fully contained within at least one chunk. This value MUST be greater than or equal to the longest value that any enabled LogEntryDataMaskRule__mdt regex can match. The default (20) covers the built-in rules (SSN ~11 chars, credit card ~19 chars with separators); increase it if you add custom rules that match longer values. When no record is configured, Nebula Logger falls back to its built-in default of 20.</value>
+    </values>
+    <values>
+        <field>Value__c</field>
+        <value xsi:type="xsd:string">20</value>
+    </values>
+</CustomMetadata>
@@ -20,6 +20,59 @@ global with sharing class LogEntryEventBuilder {
   private static final String HTTP_HEADER_FORMAT = '{0}: {1}';
   private static final String NEW_LINE_DELIMITER = '\n';
 
+  // Data-masking regex is applied in overlapping chunks to avoid Apex's
+  // `System.LimitException: Regex too complicated`, which Salesforce throws when a single
+  // regex evaluation exceeds an internal step budget. See issue #639. Salesforce does not
+  // document the threshold; it depends on the input length and the specific rule's
+  // pattern. Each enabled rule is an independent `replaceAll` with its own step budget,
+  // so the single most expensive rule sets the cliff — running more rules does not lower
+  // it (rule count only adds cumulative CPU, a separate limit). Of the four shipped rules
+  // the Mastercard pattern (long alternation + `\3` backreference) is by far the worst;
+  // measured single-pass it alone throws at the same size as all four together.
+  // Critically, this LimitException is UNCATCHABLE — a try/catch around `replaceAll`
+  // does not trap it — so without chunking a single large log message makes the entire
+  // logging call fail unrecoverably.
+  //
+  // DATA_MASK_REGEX_CHUNK_SIZE: the max number of characters fed to a single
+  // `replaceAll`/`Matcher` evaluation. 4000 is a deliberately conservative fixed value,
+  // trading a higher chunk count for a wide safety margin below the measured failure point.
+  //
+  // Measured (Nebula Logger v4.17.3, current Apex regex engine, all four shipped rules
+  // applied single-pass to the whole blob — i.e. the pre-chunking `applyDataMaskRules`
+  // path; reproduced identically on a scratch org and a sandbox). The un-chunked cliff
+  // depends on content shape, not just length, because the limit is a regex-engine STEP
+  // budget (CPU at the cliff was a steady ~25-45 ms across every shape tested):
+  //   - dense, structured near-miss tokens (best case): throws at ~220K chars
+  //   - tokens diluted with inert text (realistic log shape, worst case): throws at ~110K
+  // i.e. diluting matches with ordinary text makes it fail SOONER (more engine steps per
+  // character), not later. The original #639 report at ~35K was an older, lower-threshold
+  // engine. The default chunk size of 4000 is ~27x below even the worst-case cliff.
+  //
+  // Chunk size is a SAFETY knob, not a performance lever: with chunking enabled, CPU was
+  // flat (<6 ms variance) across chunk sizes 1K-64K and roughly linear in input length
+  // (200K chars masked in single-digit ms). Raising the chunk size yields no measurable
+  // speedup and only moves toward the cliff; lowering it adds margin at negligible cost.
+  // Lower it (via the override below) if custom rules push the failure point down.
+  // Overridable at runtime via the optional `LoggerParameter__mdt.DataMaskRegexChunkSize`
+  // record (no deploy required); the constant below is only the default.
+  //
+  // DATA_MASK_REGEX_OVERLAP_SIZE: adjacent chunks overlap by this many characters so a
+  // sensitive value that straddles a chunk boundary is still fully contained within at
+  // least one chunk. This value MUST be >= the longest sensitive value any data-mask rule
+  // can match; 20 covers the built-in rules (SSN ~11 chars, credit card ~19 chars with
+  // separators). It cannot be derived from the rule regexes (a pattern's max match length
+  // is not generally computable), so for orgs whose custom rules match longer values it is
+  // overridable at runtime via the optional `LoggerParameter__mdt.DataMaskRegexOverlapSize`
+  // record (no deploy required); the constant below is only the default.
+  @TestVisible
+  private static final Integer DATA_MASK_REGEX_CHUNK_SIZE = 4000;
+  @TestVisible
+  private static final Integer DATA_MASK_REGEX_OVERLAP_SIZE = 20;
+  // Matches a `$N` capture-group reference (N = one or more digits) inside a replacement
+  // template. Used by expandReplacement(); safe to regex directly since replacement
+  // templates are short config values, never the long log payload.
+  private static final System.Pattern DATA_MASK_REPLACEMENT_TOKEN_PATTERN = System.Pattern.compile('\\$([0-9]+)');
+
   private static String cachedOrganizationEnvironmentType;
 
   @TestVisible
@@ -1150,12 +1203,176 @@ global with sharing class LogEntryEventBuilder {
 
     for (LogEntryDataMaskRule__mdt dataMaskRule : CACHED_DATA_MASK_RULES.values()) {
       if (dataMaskRule.IsEnabled__c) {
-        dataInput = dataInput.replaceAll(dataMaskRule.SensitiveDataRegEx__c, dataMaskRule.ReplacementRegEx__c);
+        dataInput = applyDataMaskRuleToChunkedText(dataInput, dataMaskRule.SensitiveDataRegEx__c, dataMaskRule.ReplacementRegEx__c);
       }
     }
     return dataInput;
   }
 
+  // Chunk size defaults to DATA_MASK_REGEX_CHUNK_SIZE but can be tuned without a deploy via
+  // the optional LoggerParameter__mdt.DataMaskRegexChunkSize record. Lower it if a custom
+  // rule's regex still throws `Regex too complicated` at the default; raise it (carefully)
+  // to reduce chunk count. Resolved once per masking pass and threaded through so a single
+  // consistent value is used for every boundary calculation in that pass.
+  private static Integer getDataMaskRegexChunkSize() {
+    return LoggerParameter.getInteger('DataMaskRegexChunkSize', DATA_MASK_REGEX_CHUNK_SIZE);
+  }
+
+  private static String applyDataMaskRuleToChunkedText(String text, String sensitiveDataRegEx, String replacementRegEx) {
+    if (text == null) {
+      return text;
+    }
+
+    Integer chunkSize = getDataMaskRegexChunkSize();
+
+    // Short enough to mask in a single pass — no chunking needed.
+    if (text.length() <= chunkSize) {
+      return text.replaceAll(sensitiveDataRegEx, replacementRegEx);
+    }
+
+    List<String> lines = text.split('\n', -1);
+    if (lines.size() > 1) {
+      List<String> processedLines = new List<String>();
+      for (String line : lines) {
+        if (line.length() <= chunkSize) {
+          processedLines.add(line.replaceAll(sensitiveDataRegEx, replacementRegEx));
+        } else {
+          processedLines.add(applyDataMaskRuleToLongLine(line, sensitiveDataRegEx, replacementRegEx, chunkSize));
+        }
+      }
+      return String.join(processedLines, '\n');
+    }
+
+    return applyDataMaskRuleToLongLine(text, sensitiveDataRegEx, replacementRegEx, chunkSize);
+  }
+
+  /**
+   * Applies a single data-mask rule to one line that is too long to regex in a single pass.
+   *
+   * `String.replaceAll` cannot be called on the whole line (it would throw the
+   * `Regex too complicated` LimitException), so the line is scanned in overlapping
+   * windows of `chunkSize` characters (the caller-resolved value of
+   * DATA_MASK_REGEX_CHUNK_SIZE / its LoggerParameter override), advancing by `step`
+   * (= chunk size - overlap) each iteration. The overlap guarantees that any sensitive
+   * value sitting on a chunk boundary is fully visible in at least one window.
+   *
+   * Because windows overlap, the same match can be discovered more than once, and
+   * `Matcher` indexes are window-relative — so matches are collected with absolute
+   * positions, deduplicated, sorted, then applied left-to-right in a second pass.
+   *
+   * Worked example (chunk size 10, overlap 4, step 6) masking the SSN `123-45-6789`
+   * with replacement `***`:
+   *
+   *   line  = "name 123-45-6789 end"   (length 20)
+   *   chunk0 = line[0..10)  = "name 123-4"      -> no full SSN match
+   *   chunk1 = line[6..16)  = "23-45-6789"      -> matches at window 0  => absStart 6
+   *   chunk2 = line[12..20) = "6789 end"        -> no match
+   *   collected: { start 6 -> end 16 }
+   *   result = line[0..6) + "***" + line[16..20) = "name *** end"
+   *
+   * Keeping the *longest* match for a given start (rather than the first one found)
+   * matters because an earlier window may truncate the value at its right edge,
+   * yielding a shorter, less accurate match than a later window with more context.
+   */
+  private static String applyDataMaskRuleToLongLine(String line, String sensitiveDataRegEx, String replacementRegEx, Integer chunkSize) {
+    System.Pattern regex = System.Pattern.compile(sensitiveDataRegEx);
+    // Overlap defaults to DATA_MASK_REGEX_OVERLAP_SIZE but can be raised without a deploy
+    // via the optional LoggerParameter__mdt.DataMaskRegexOverlapSize record, for orgs whose
+    // custom data-mask rules match values longer than the built-in rules.
+    Integer overlapSize = LoggerParameter.getInteger('DataMaskRegexOverlapSize', DATA_MASK_REGEX_OVERLAP_SIZE);
+    Integer step = chunkSize - overlapSize;
+
+    // Pass 1: scan overlapping windows and record every match by its ABSOLUTE start
+    // position. endByStart maps an absolute start index -> absolute end index; groupsByStart
+    // keeps that match's capture groups (group 0 = full match) so the replacement template
+    // can be expanded later without re-running the regex.
+    Map<Integer, Integer> endByStart = new Map<Integer, Integer>();
+    Map<Integer, List<String>> groupsByStart = new Map<Integer, List<String>>();
+
+    for (Integer i = 0; i < line.length(); i += step) {
+      Integer chunkEnd = Math.min(i + chunkSize, line.length());
+      System.Matcher m = regex.matcher(line.substring(i, chunkEnd));
+      while (m.find()) {
+        // Matcher indexes are window-relative; add the window offset `i` to get
+        // absolute positions within the full line.
+        Integer absStart = i + m.start();
+        Integer absEnd = i + m.end();
+        // First time we see this start, OR a later (overlapping) window found a longer
+        // match starting at the same place — keep the longer one, it has more context.
+        if (!endByStart.containsKey(absStart) || absEnd > endByStart.get(absStart)) {
+          endByStart.put(absStart, absEnd);
+          List<String> groups = new List<String>();
+          for (Integer g = 0; g <= m.groupCount(); g++) {
+            groups.add(m.group(g));
+          }
+          groupsByStart.put(absStart, groups);
+        }
+      }
+    }
+
+    if (endByStart.isEmpty()) {
+      return line;
+    }
+
+    // Apex Map.keySet() has no guaranteed iteration order, so explicitly sort the start
+    // positions to process matches strictly left-to-right in Pass 2.
+    List<Integer> sortedStarts = new List<Integer>(endByStart.keySet());
+    sortedStarts.sort();
+
+    // Pass 2: walk the matches left-to-right, copying the untouched text between matches
+    // ("gaps") verbatim and substituting each match with its expanded replacement.
+    // `pos` tracks how far into the original line has been consumed.
+    String result = '';
+    Integer pos = 0;
+    for (Integer start : sortedStarts) {
+      // This match starts inside a region already replaced by an earlier (longer)
+      // match — skip it to avoid double-masking overlapping hits.
+      if (start < pos) {
+        continue;
+      }
+      result += line.substring(pos, start);
+      result += expandReplacement(replacementRegEx, groupsByStart.get(start));
+      pos = endByStart.get(start);
+    }
+    result += line.substring(pos);
+    return result;
+  }
+
+  /**
+   * Expands `$N` capture-group references in a replacement template, equivalent to
+   * Java's `Matcher.appendReplacement`.
+   *
+   * Only `$N` tokens that appear in the original `replacement` template are expanded;
+   * a `$N` sequence that happens to occur *inside a captured group's value* is copied
+   * through verbatim (this is why the result is built from the template, not produced by
+   * `String.replace` on the group values). An unresolvable token (`$0`, an out-of-range
+   * group, or a null group) is left as the literal text `$N`.
+   *
+   * Example: replacement `"[$1]-$2"`, groups [full, "A", "B"] -> `"[A]-B"`.
+   * Example: replacement `"$1"`, group 1 = `"price=$3"` -> `"price=$3"` (the `$3` in the
+   * captured value is NOT re-expanded).
+   */
+  private static String expandReplacement(String replacement, List<String> groups) {
+    System.Matcher tokenMatcher = DATA_MASK_REPLACEMENT_TOKEN_PATTERN.matcher(replacement);
+    String result = '';
+    Integer pos = 0;
+    while (tokenMatcher.find()) {
+      // Copy the literal template text preceding this `$N` token.
+      result += replacement.substring(pos, tokenMatcher.start());
+      Integer groupNum = Integer.valueOf(tokenMatcher.group(1));
+      if (groupNum >= 1 && groupNum < groups.size() && groups[groupNum] != null) {
+        result += groups[groupNum];
+      } else {
+        // Not a resolvable group reference — preserve the token text literally.
+        result += tokenMatcher.group();
+      }
+      pos = tokenMatcher.end();
+    }
+    // Copy any literal template text after the last token.
+    result += replacement.substring(pos);
+    return result;
+  }
+
   private static String getJson(SObject record, Boolean isRecordFieldStrippingEnabled) {
     List<SObject> records = new List<SObject>{ record };
     records = isRecordFieldStrippingEnabled == false ? records : stripInaccessible(records);
@@ -1404,7 +1621,7 @@ global with sharing class LogEntryEventBuilder {
       String maskedTextValue = textValueToMask;
       for (LogEntryDataMaskRule__mdt dataMaskRule : CACHED_DATA_MASK_RULES.values()) {
         if (dataMaskRule.IsEnabled__c) {
-          maskedTextValue = maskedTextValue.replaceAll(dataMaskRule.SensitiveDataRegEx__c, dataMaskRule.ReplacementRegEx__c);
+          maskedTextValue = applyDataMaskRuleToChunkedText(maskedTextValue, dataMaskRule.SensitiveDataRegEx__c, dataMaskRule.ReplacementRegEx__c);
         }
       }