Skip to content

Resource manager hooks#788

Draft
PawelPlesniak wants to merge 6 commits intodevelopfrom
PawelPlesniak/ResourceManagerHooks
Draft

Resource manager hooks#788
PawelPlesniak wants to merge 6 commits intodevelopfrom
PawelPlesniak/ResourceManagerHooks

Conversation

@PawelPlesniak
Copy link
Copy Markdown
Collaborator

@PawelPlesniak PawelPlesniak commented Feb 23, 2026

Description

Fixes #756

This PR implements a parsing of the configuration file to determine the set of resources requested for a particular session. A dummy resource manager configuration has been added to config/tests/one-controller-config.data.xml to validate that this is working as intended.

Requirements

Manual testing suggestions

Run the changed config and validate that the requested resources are indeed parsed out of the configuraiton

Test the change without a defined set of resources

drunc-unified-shell ssh-standalone pythoncode/drunc/config/tests/one-controller-config.data.xml one-controller-config pawel
bo[2026/02/26 18:56:25 UTC] INFO       shell.py:197                             drunc.unified_shell                                This session has requested the following resources: set()
[2026/02/26 18:56:25 UTC] INFO       shell.py:201                             drunc.unified_shell                                Setting up to use the process manager with configuration ssh-standalone and configuration id "one-controller-config" from oksconflibs:pythoncode/drunc/config/tests/one-controller-config.data.xml
[2026/02/26 18:56:25 UTC] INFO       shell.py:223                             drunc.unified_shell                                Starting process manager
[2026/02/26 18:56:25 UTC] INFO       process_manager.py:110                   drunc.process_manager                              process_manager communicating through address 10.73.136.71:35857
o[2026/02/26 18:56:25 UTC] INFO       shell.py:571                             drunc.unified_shell                                unified_shell ready with process_manager and controller commands
drunc-unified-shell > boot
[2026/02/26 18:56:27 UTC] INFO       process_manager_driver.py:96             drunc.process_manager_driver                       Booting session pawel
[2026/02/26 18:56:27 UTC] INFO       ssh_process_manager.py:368               drunc.process_manager.SSH_SHELL_process_manager    Booted 'local-connection-server' from session 'pawel' with UUID df418eaa-f8cc-499d-a3da-57e68a883ae5
[2026/02/26 18:56:29 UTC] INFO       ssh_process_manager.py:368               drunc.process_manager.SSH_SHELL_process_manager    Booted 'controller-0' from session 'pawel' with UUID 8d3d4dac-5618-4444-9216-b747823da3c4
  Looking for controller-0 on the connectivity service... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 0:00:01
⠋ Trying to talk to the root controller... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ -:--:-- 0:00:00
                                             pawel status                                             
┏━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name         ┃ Info ┃ State   ┃ Substate ┃ In error ┃ Included ┃ Endpoint                          ┃
┡━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ controller-0 │      │ initial │ initial  │ No       │ Yes      │ grpc://np04-srv-029.cern.ch:34919 │
└──────────────┴──────┴─────────┴──────────┴──────────┴──────────┴───────────────────────────────────┘
Waiting on tree initialisation... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
[2026/02/26 18:56:31 UTC] INFO       commands.py:133                          drunc.unified_shell.boot                           Booted successfully
drunc-unified-shell > terminate
[2026/02/26 18:56:33 UTC] INFO       ssh_process_manager.py:203               drunc.process_manager.SSH_SHELL_process_manager    Terminating
[2026/02/26 18:56:33 UTC] INFO       ssh_process_manager.py:206               drunc.process_manager.SSH_SHELL_process_manager    Killing all the known processes before exiting
[2026/02/26 18:56:33 UTC] INFO       ssh_process_lifetime_manager_shell.py:56 drunc.process_manager.SSH_SHELL_process_manager    --- Shutdown stage: Terminating role 'unknown' from provided UUIDs ---
[2026/02/26 18:56:33 UTC] INFO       ssh_process_lifetime_manager_shell.py:56 drunc.process_manager.SSH_SHELL_process_manager    --- Shutdown stage: Terminating role 'application' from provided UUIDs ---
[2026/02/26 18:56:33 UTC] INFO       ssh_process_lifetime_manager_shell.py:56 drunc.process_manager.SSH_SHELL_process_manager    --- Shutdown stage: Terminating role 'segment-controller' from provided UUIDs ---
[2026/02/26 18:56:33 UTC] INFO       ssh_process_lifetime_manager_shell.py:56 drunc.process_manager.SSH_SHELL_process_manager    --- Shutdown stage: Terminating role 'root-controller' from provided UUIDs ---
[2026/02/26 18:56:33 UTC] INFO       ssh_process_lifetime_manager_shell.py:50 drunc.process_manager.SSH_SHELL_process_manager    Killing 1 process(es) with role 'root-controller' from 2 candidates
[2026/02/26 18:56:35 UTC] INFO       ssh_process_manager.py:305               drunc.process_manager.SSH_SHELL_process_manager    Process 'controller-0' (session: 'pawel', user: 'pplesnia') process exited with exit code 0
[2026/02/26 18:56:35 UTC] INFO       ssh_process_lifetime_manager_shell.py:10 drunc.process_manager.SSH_SHELL_process_manager    Remote process 8d3d4dac-5618-4444-9216-b747823da3c4 (PID 2758205) terminated gracefully following SIGQUIT signal.
[2026/02/26 18:56:36 UTC] INFO       ssh_process_lifetime_manager_shell.py:57 drunc.process_manager.SSH_SHELL_process_manager    --- Shutdown stage: Role 'root-controller' complete ---
[2026/02/26 18:56:36 UTC] INFO       ssh_process_lifetime_manager_shell.py:56 drunc.process_manager.SSH_SHELL_process_manager    --- Shutdown stage: Terminating role 'local-connection-server' from provided UUIDs ---
[2026/02/26 18:56:36 UTC] INFO       ssh_process_lifetime_manager_shell.py:50 drunc.process_manager.SSH_SHELL_process_manager    Killing 1 process(es) with role 'local-connection-server' from 2 candidates
[2026/02/26 18:56:36 UTC] INFO       ssh_process_manager.py:305               drunc.process_manager.SSH_SHELL_process_manager    Process 'local-connection-server' (session: 'pawel', user: 'pplesnia') process exited with exit code 0
[2026/02/26 18:56:37 UTC] INFO       ssh_process_lifetime_manager_shell.py:10 drunc.process_manager.SSH_SHELL_process_manager    Remote process df418eaa-f8cc-499d-a3da-57e68a883ae5 (PID 2758004) terminated gracefully following SIGQUIT signal.
[2026/02/26 18:56:37 UTC] INFO       ssh_process_lifetime_manager_shell.py:57 drunc.process_manager.SSH_SHELL_process_manager    --- Shutdown stage: Role 'local-connection-server' complete ---

Test the change with a defined set of resources

drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config pawel
[2026/02/26 18:57:37 UTC] INFO       shell.py:201                             drunc.unified_shell                                Setting up to use the process manager with configuration ssh-standalone and configuration id "local-1x1-config" from oksconflibs:config/daqsystemtest/example-configs.data.xml
[2026/02/26 18:57:37 UTC] INFO       shell.py:223                             drunc.unified_shell                                Starting process manager
[2026/02/26 18:57:37 UTC] INFO       process_manager.py:110                   drunc.process_manager                              process_manager communicating through address 10.73.136.71:40151
[2026/02/26 18:57:37 UTC] INFO       shell.py:571                             drunc.unified_shell                                unified_shell ready with process_manager and controller commands
drunc-unified-shell > boot
[2026/02/26 18:57:38 UTC] INFO       commands.py:72                           drunc.unified_shell.boot                           Placeholder Requesting objects in the following segments:
[2026/02/26 18:57:38 UTC] INFO       shell_utils.py:180                       drunc.unified_shell.boot                           └── df-segment: {'storage:localhost:.'}
[2026/02/26 18:57:38 UTC] INFO       commands.py:77                           drunc.unified_shell.boot                           Empty segments (skipped): ru-segment, trg-segment, hsi-fake-segment
... 
drunc-unified-shell > exit
[2026/02/26 18:57:52 UTC] INFO       shell.py:435                             drunc.unified_shell                                Shutting down the unified_shell

[2026/02/26 18:57:52 UTC] INFO       shell_utils.py:135                       drunc.utils.ShellContext                           You will not be able to issue commands to the controller anymore.
[2026/02/26 18:57:52 UTC] INFO       shell_utils.py:137                       drunc.utils.ShellContext                           Controller driver has been deleted.
[2026/02/26 18:57:52 UTC] INFO       commands.py:162                          drunc.unified_shell.terminate                      Placeholder Releasing managed objects in the following segments:
[2026/02/26 18:57:52 UTC] INFO       shell_utils.py:180                       drunc.unified_shell.terminate                      └── df-segment: {'storage:localhost:.'}
[2026/02/26 18:57:52 UTC] INFO       commands.py:168                          drunc.unified_shell.terminate                      Empty segments (skipped): ru-segment, trg-segment, hsi-fake-segment
[2026/02/26 18:57:52 UTC] INFO       ssh_process_manager.py:203               drunc.process_manager.SSH_SHELL_process_manager    Terminating
...

Type of change

  • Documentation (non-breaking change that adds or improves the documentation)
  • New feature (non-breaking change which adds functionality)
  • Optimization (non-breaking, back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (whatever its nature)

Key checklist

  • All tests pass (eg. python -m pytest)
  • Pre-commit hooks run successfully (eg. pre-commit run --all-files)

Further checks

@PawelPlesniak
Copy link
Copy Markdown
Collaborator Author

TODOs

  • Make a storage of what resources the segments are using, stored in the segment controller metadata.
  • The above should be done with a generic RPC that can be later be used to update resources as required.

@PawelPlesniak
Copy link
Copy Markdown
Collaborator Author

This is working as intended and MSQT passes. Will run the full test after @mroda88 approval

Copy link
Copy Markdown
Contributor

@mroda88 mroda88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and it works well for give an idea to the other WGs

@PawelPlesniak PawelPlesniak marked this pull request as draft March 11, 2026 13:55
@PawelPlesniak
Copy link
Copy Markdown
Collaborator Author

Converted to draft as the scope is unclear as of right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Hooks for resource manager

3 participants