Hello, I'm observing what appears to be intermittent failure depending on which particular runner gets the job. This is in parallel with runs for the standard aarch64 and amd64 GitHub Actions runners which do not exhibit the failures, so... I think it's something ssh adjacent.
Broad strokes description is that the workflow builds python wheels and uploads them (ssh rsync) to a web server which will later serve as the python wheels index for some other workflows. The private key is in a secret and the print of a pubkey is simply proof that the key entered is valid before continuing; however one of those validation commands fails sometimes on the RISE RISC-V runners, but not always:
if [[ "false" =~ false|False ]] && [ -z "" ]; then
umask g-rwx,o-rwx
chmod -f g-rwx,o-rwx .ssh .ssh/id_rsa .ssh/known_hosts .ssh/wheels-key || true
mkdir -p .ssh
# Write valid wheels-key or wrapped with PEM header-footer
echo "***
...snip...
***" > .ssh/wheels-key
ssh-keygen -y -e -f .ssh/wheels-key 2>&1>/dev/null || \
sed -e '1i-----BEGIN RSA PRIVATE KEY-----' \
-e '$a-----END RSA PRIVATE KEY-----' -i .ssh/wheels-key
ssh-keygen -y -e -f .ssh/wheels-key 2>&1>/dev/null && \
cat .ssh/wheels-key >> .ssh/id_rsa && rm -f .ssh/wheels-key
# Validate & update known_hosts
ssh-keygen -y -e -f .ssh/id_rsa
ssh-keyscan -H "ai6fs.net" >> .ssh/known_hosts
fi
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
---- BEGIN SSH2 PUBLIC KEY ----
Comment: "3072-bit RSA, converted by runner@riscv-runner-12 from OpenS"
AAAAB3NzaC1yc2EAAAADAQABAAABgQDAFGj/8WmGysWZwN9aXWtwfjVhyy821k7Fg1QQ1k
l4qdQ0GJNyh64oWBK7dNsJLd53gSeijK/Yvz0fFZybp7ZGaR12N/0/baV8IQ0Ga9b+rR2m
+eKI/YinsWJej+iP7yWdZ/VataLDMUtbg2TpufawFbBmeX5m3h/Ffjc7ArD5LAjBqB0NQc
CSldMboNcoESi7bcqAXvUako+69gl3QKryHdEDR1tLykKRPE7ItTt479OkVRUPRdv986l0
dUwXK8Kjf6cqukCAf5eWgoiocyJRm3WdHdm+LfK8AUsWQjk/4yDCqHiWbYxryVW+5ck7Ss
Zem7ennP4VBPxLWFtpd6FYlmLPh7aPNUECDCdY6mKWwwzqFdzaOLj2FmVdF0uFLAxuwnWy
iI13onMyYTHqyq9JOm6uK+pDYXDMETG4ZgfCaQ3XW/xvmy+ev0FPa4sZhDu+dbW+nAdnsb
9hdokl7UdP3ZTlFdnIbj5IVr8XbV1kuZ/a+oGn8UmAFeiW2Q96Ox8=
---- END SSH2 PUBLIC KEY ----
Error: Process completed with exit code 1.
and then, the same exact job re-run is successful:
if [[ "false" =~ false|False ]] && [ -z "" ]; then
umask g-rwx,o-rwx
chmod -f g-rwx,o-rwx .ssh .ssh/id_rsa .ssh/known_hosts .ssh/wheels-key || true
mkdir -p .ssh
# Write valid wheels-key or wrapped with PEM header-footer
echo "***
...snip...
***" > .ssh/wheels-key
ssh-keygen -y -e -f .ssh/wheels-key 2>&1>/dev/null || \
sed -e '1i-----BEGIN RSA PRIVATE KEY-----' \
-e '$a-----END RSA PRIVATE KEY-----' -i .ssh/wheels-key
ssh-keygen -y -e -f .ssh/wheels-key 2>&1>/dev/null && \
cat .ssh/wheels-key >> .ssh/id_rsa && rm -f .ssh/wheels-key
# Validate & update known_hosts
ssh-keygen -y -e -f .ssh/id_rsa
ssh-keyscan -H "ai6fs.net" >> .ssh/known_hosts
fi
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
---- BEGIN SSH2 PUBLIC KEY ----
Comment: "3072-bit RSA, converted by runner@riscv-runner-28 from OpenS"
AAAAB3NzaC1yc2EAAAADAQABAAABgQDAFGj/8WmGysWZwN9aXWtwfjVhyy821k7Fg1QQ1k
l4qdQ0GJNyh64oWBK7dNsJLd53gSeijK/Yvz0fFZybp7ZGaR12N/0/baV8IQ0Ga9b+rR2m
+eKI/YinsWJej+iP7yWdZ/VataLDMUtbg2TpufawFbBmeX5m3h/Ffjc7ArD5LAjBqB0NQc
CSldMboNcoESi7bcqAXvUako+69gl3QKryHdEDR1tLykKRPE7ItTt479OkVRUPRdv986l0
dUwXK8Kjf6cqukCAf5eWgoiocyJRm3WdHdm+LfK8AUsWQjk/4yDCqHiWbYxryVW+5ck7Ss
Zem7ennP4VBPxLWFtpd6FYlmLPh7aPNUECDCdY6mKWwwzqFdzaOLj2FmVdF0uFLAxuwnWy
iI13onMyYTHqyq9JOm6uK+pDYXDMETG4ZgfCaQ3XW/xvmy+ev0FPa4sZhDu+dbW+nAdnsb
9hdokl7UdP3ZTlFdnIbj5IVr8XbV1kuZ/a+oGn8UmAFeiW2Q96Ox8=
---- END SSH2 PUBLIC KEY ----
# ai6fs.net:22 SSH-2.0-OpenSSH_10.0p2 Debian-7+deb13u1
# ai6fs.net:22 SSH-2.0-OpenSSH_10.0p2 Debian-7+deb13u1
# ai6fs.net:22 SSH-2.0-OpenSSH_10.0p2 Debian-7+deb13u1
# ai6fs.net:22 SSH-2.0-OpenSSH_10.0p2 Debian-7+deb13u1
# ai6fs.net:22 SSH-2.0-OpenSSH_10.0p2 Debian-7+deb13u1
Hello, I'm observing what appears to be intermittent failure depending on which particular runner gets the job. This is in parallel with runs for the standard aarch64 and amd64 GitHub Actions runners which do not exhibit the failures, so... I think it's something ssh adjacent.
Broad strokes description is that the workflow builds python wheels and uploads them (ssh rsync) to a web server which will later serve as the python wheels index for some other workflows. The private key is in a secret and the print of a pubkey is simply proof that the key entered is valid before continuing; however one of those validation commands fails sometimes on the RISE RISC-V runners, but not always:
and then, the same exact job re-run is successful: