This software is pre-production and should not be deployed to production servers.
Table of Contents
To build WCA pex distribution file one need:
- GNU make
- docker
To build pex file inside docker (Dockerfile is used), please run:
make wca_package_in_dockerThe command will result in creation of dist/wca.pex file.
File dist/wca.pex must be copied to /usr/bin/wca.pex.
To build distribution file with support for storing metrics in Apache Kafka please follow Building executable binary with KafkaStorage component enabled guide.
- Centos 7.6 with at least 3.10.0-862 kernel with support of resctrl filesystem (WCA should work on earlier versions of centos or other Linux distributions, however it is tested only on centos 7.6)
- Python 3.6.x
All other WCA dependencies are bundled using PEX.
For RDT related features:
- Hardware with Intel RDT support.
It is possible to use RDT features on Skylake family of processors. However, there are known issues mentioned in errata:
- SKZ4 MBM does not accurately track write bandwidth,
- SKZ17 CMT counters may not count accurately,
- SKZ18 CAT may not restrict cacheline allocation under certain conditions,
- SKZ19 MBM counters may undercount.
To enable RDT please add kernel boot time parameters rdt=cmt,mbmtotal,mbmlocal,l3cat
(kernel documenatation).
yum install python3 # centos 7.6Then, verify that Python is installed correctly:
python3 --version
Should output:
Python 3.6.x
WCA processes should not be run with root privileges. Following privileges are needed to run WCA as non-root user:
- CAP_DAC_OVERRIDE - to allow non-root use cgroups filesystem.
- CAP_SETUID capability and SECBIT_NO_SETUID_FIXUP secure bit set - to allow non-root use resctrl filesystem.
/proc/sys/kernel/perf_event_paranoid- content of the file must be set to0or-1to allow non-root user to collect all the necessary perf event information.
If it is impossible or undesired to run WCA with privileges outlined above, then you must add -0 (or its
long form: --root) argument when starting the process)
Assumptions:
/var/lib/wcadirectory existswcauser and group already exists
Please use following template as systemd /etc/systemd/system/wca.service unit file:
[Unit]
Description=Workload Collocation Agent
[Service]
ExecStart=/usr/bin/scl enable rh-python36 '/usr/bin/wca.pex \
--config /etc/wca/wca_config.yml \
--register $EXTRA_COMPONENT \
--log info'
User=wca
Group=wca
# CAP_DAC_OVERRIDE allows to remove resctrl groups and CAP_SETUID allows to change effective uid to add tasks to the groups
CapabilityBoundingSet=CAP_DAC_OVERRIDE CAP_SETUID
AmbientCapabilities=CAP_DAC_OVERRIDE CAP_SETUID
# We must avoid dropping capabilities after changing effective uid from root to wca
SecureBits=no-setuid-fixup
Restart=always
RestartSec=5
LimitNOFILE=500000
WorkingDirectory=/var/lib/wca
[Install]
WantedBy=multi-user.target
where:
--register flag is needed if external plugin needs to be used.
$EXTRA_COMPONENT should be replaced with name of a class e.g. your_custom_module.allocators:YourCustomAllocator.
Class name must comply with pkg_resources format.
All dependencies of the class must be available in currently used PYTHONPATH.
You can use wca.allocators:NOPAllocator that is already bundled within dist/wca.pex file and does not have to be registered
(if you decide to use it remove registration from wca.service file).
| note: | Running wca with dedicated "wca" user is more secure, but requires enabling perf counters to be used by non-root users.
You need to reconfigure perf_event_paranoid sysctl paramter like this:
sudo sysctl -w kernel.perf_event_paranoid=-1 or for persistent mode modify /etc/sysctl.conf and set
kernel.perf_event_paranoid = -1. Mode about perf_event_paranoid here |
|---|
It is recommended to build a pex file with external component and its dependencies bundled. See prm plugin from platform-resource-manager as an example of such an approach.
Config /etc/wca/wca_config.yml must exists. See an example configuration file to be used with NOPAllocator:
runner: !AllocationRunner
config: !AllocationRunnerConfig
node: !MesosNode
mesos_agent_endpoint: 'http://127.0.0.1:5051'
timeout: 5
interval: 1.
metrics_storage: !LogStorage
output_filename: '/tmp/metrics_storage.log'
extra_labels:
env_id: "$HOST_IP"
anomalies_storage: !LogStorage
output_filename: '/tmp/anomalies_storage.log'
allocator: !NOPAllocator
...
...Following configuration is required in order to use MesosNode component to discover new tasks:
- Mesos containerizer (
--containerizers=mesos) must be used. - Mesos agent must be configured to support following isolators
filesystem/linux,docker/volume,docker/runtime,cgroups/cpu,cgroups/perf_event.
- Mesos agent must expose operator API over secure socket. WCA TLS can be disabled in configuration by modifying
mesos_agent_endpointproperty. - Mesos agent may be configured to use Docker registry to fetch images.