Skip to content

[wip] PKI index optimization using memory-mapped hash table#68891

Open
dwoz wants to merge 18 commits intosaltstack:masterfrom
dwoz:fastcache-master
Open

[wip] PKI index optimization using memory-mapped hash table#68891
dwoz wants to merge 18 commits intosaltstack:masterfrom
dwoz:fastcache-master

Conversation

@dwoz
Copy link
Copy Markdown
Contributor

@dwoz dwoz commented Apr 4, 2026

Introduce a generic memory-mapped cache utility to eliminate $O(N)$ directory scans when verifying minion IDs on the Master.

  • Create salt/utils/mmap_cache.py for high-performance lockless lookups
  • Add Master configuration options for PKI index management
  • Refactor CkMinions to utilize the mmap index for target verification
  • Update Maintenance daemon to synchronize the index via the event bus
  • Implement pki.rebuild_index runner for manual index maintenance
  • Add comprehensive unit tests and Sphinx documentation
  • With no index file and 100K minion keys, salt-key goes from 40s to 6s because of the scandir optimization.
  • When the index is enabled and exists we go from 6x to 0.6s becuase of the index O(1) optimization.

What does this PR do?

What issues does this PR fix or reference?

Fixes

Previous Behavior

Remove this section if not relevant

New Behavior

Remove this section if not relevant

Merge requirements satisfied?

[NOTICE] Bug fixes or features added to Salt require tests.

Commits signed with GPG?

Yes/No

@dwoz dwoz requested a review from a team as a code owner April 4, 2026 02:16
@dwoz dwoz added the test:full Run the full test suite label Apr 4, 2026
@dwoz dwoz changed the title [wip] Implement O(1) PKI index optimization using memory-mapped hash table [wip] PKI index optimization using memory-mapped hash table Apr 4, 2026
dwoz added 18 commits April 6, 2026 15:05
Introduce a generic memory-mapped cache utility to eliminate $O(N)$
directory scans when verifying minion IDs on the Master.

- Create salt/utils/mmap_cache.py for high-performance lockless lookups
- Add Master configuration options for PKI index management
- Refactor CkMinions to utilize the mmap index for target verification
- Update Maintenance daemon to synchronize the index via the event bus
- Implement pki.rebuild_index runner for manual index maintenance
- Add comprehensive unit tests and Sphinx documentation
Improve system performance by integrating an O(1) memory-mapped
index into the cache driver and CLI. This change ensures minion
lookups are near-instant and reduces Master FD pressure.

- Update cache driver to atomically maintain index
- Fully utilize index in Key and CkMinions classes
- Shift system verification to tracking actual FD usage
- Optimize mmap_cache with sparse files and bulk rebuilds
- Cleanup unused tests and imports
Align PKI index slot_size with Salt's 128-byte default to prevent
IndexError during lookups.

- Synchronize slot_size to 128 in localfs_key and PkiIndex
- Fix hardcoded defaults to match salt.config
- Fix list_all signature mismatch in localfs_key
- Add file size verification to MmapCache.open()
- Ensure mmap is closed after every operation to prevent FD leaks
- Update verify_env to ensure salt user owns index files
- Fix salt-key fallback to directory scan when index is missing
- Handle cluster_pki_dir in localfs_key and PkiIndex
- Ensure dotfiles like .pki_index.mmap are chowned in verify_env
- Prevent FD leaks by closing mmap after every operation
Fix three critical issues causing package install/upgrade test failures:

1. Clustered environment PKI path bug in rebuild_index()
   When cluster_id is configured, rebuild_index() was incorrectly using
   pki_dir instead of cluster_pki_dir, causing the index to be built
   from the wrong location and breaking minion key authentication.

2. Missing salt.output import in salt/key.py
   Added missing import that caused UnboundLocalError when using
   salt-key with certain output formatters.

3. Robust error handling for list_status() in CkMinions
   Added null check for list_status() return value before accessing
   'minions' key, preventing AttributeError when key listing fails.

These fixes resolve the "No minions matched the target" errors that
were causing widespread package test failures across all platforms
in CI (Debian, Ubuntu, Rocky Linux, macOS, Photon OS).
- Fix nested dictionary bug in _check_glob_minions that broke compound matching
- Fix UnboundLocalError in salt/key.py by using aliased local imports
- Fix clustered environment path selection bug in valid_id()
- Restore required cachedir parameter in list_all() for stability
- Use explicit write() instead of truncate() for robust file allocation
- Add flush() calls after mmap writes for better cross-process visibility
- Ensure all local salt.* imports in salt/key.py use aliases to prevent shadowing
- Correct nested dictionary bug in _check_glob_minions
- Fix clustered environment path selection in valid_id()
…ssions

- Use _indices dict in localfs_key to isolate index objects by pki_dir
- Ensure PKI index files have 0600 permissions in verify_env
- Move index ownership/permission logic to end of verify_env for efficiency
- Remove expensive open/close on every operation
- Use salt.utils.files.wait_lock for multi-process safe writes
- Use realpath for path comparisons in index caching
- Use explicit write() instead of truncate() for robust file allocation
- Ensure PKI index files have 0600 permissions
- Remove expensive open/close on every operation for better performance
- Use fcntl.flock for multi-process safe writes without deadlock risks
- Robustly handle atomic file swaps in open()
- Ensure full file allocation during initialization for macOS/Windows compatibility
- Move .pki_index.mmap from /etc to /var/cache/salt/master
- Update verify_env signature to accept opts and handle permissions in new location
- Update all verify_env call sites to pass opts
- Fix NameError in verify_env and PermissionError in unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants