Add reproducible builds support in OHCL-Linux-Kernel#115
Add reproducible builds support in OHCL-Linux-Kernel#115namancse wants to merge 5 commits intoproduct/hcl-main/6.12from
Conversation
Add kernel Makefile changes to support reproducible builds across machines. Changes: - Makefile: Add KBUILD_BUILD_ID variable (default: sha1) to allow overriding the build-id linker flag for vmlinux and modules - arch/x86/entry/vdso/Makefile: Use --build-id=none for x86 VDSO - arch/arm64/kernel/vdso/Makefile: Use --build-id=none for arm64 VDSO - arch/arm64/kernel/vdso32/Makefile: Use --build-id=none for arm64-32 VDSO The VDSO changes must remain in kernel Makefiles as VDSO_LDFLAGS are not overridable from the command line. Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Add reproducible build system using NixOS flakes with pinned dependencies. New files: - flake.nix: Nix environment with pinned toolchain (GCC 13.2.0, binutils, etc.) - flake.lock: Locked package versions for reproducibility - Microsoft/nix-build.sh: Main build script with reproducible environment - Microsoft/nix-setup.sh: One-time Nix installation helper - Microsoft/nix-clean.sh: Build artifact cleanup Modified files: - Microsoft/build-hcl-kernel.sh: When REPRODUCIBLE_BUILD=1: - Pass KBUILD_BUILD_ID=none to disable Build IDs - Pass KCFLAGS=-fdebug-prefix-map to normalize debug paths - Skip --add-gnu-debuglink to avoid CRC embedding - .gitignore: Add Nix-related entries Environment variables set for reproducibility: - SOURCE_DATE_EPOCH=1609459200 (fixed timestamp) - KBUILD_BUILD_USER=builder - KBUILD_BUILD_HOST=nixos - REPRODUCIBLE_BUILD=1 (flag for build scripts) Usage: ./Microsoft/nix-setup.sh # One-time Nix installation ./Microsoft/nix-build.sh x64 # Build x64 kernel ./Microsoft/nix-build.sh arm64 # Build arm64 kernel Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Add build-hcl-kernel-pipeline.sh that implements the full kernel build workflow for Azure DevOps pipelines with reproducible build support. Features: - Supports amd64 and arm64 architectures - CVM config merge support - Reproducible build mode with Nix environment - Generates kernel, headers, modules, and debug symbols - Progress indicators for build stages [1/5] through [5/5] - SHA256 checksum output for reproducibility verification Usage: ./build-hcl-kernel-pipeline.sh -s <source> -b <build> -c <config> -a <arch> ./build-hcl-kernel-pipeline.sh ... --reproducible # Enable Nix environment
Enhance nix-setup.sh to ensure Nix is available in PATH immediately after installation or when sourcing existing profiles. Changes: - Add source_nix_profile() helper function - Check multiple profile locations (~/.nix-profile, /nix/var/nix/profiles) - Clean up debug prints to informative messages - Better error handling when Nix is installed but not in PATH
Ensure only Nix-provided tools are used during reproducible builds, preventing system package leakage that could affect reproducibility. Changes: - Add --ignore-environment to nix develop for pure shell - Keep essential env vars: HOME, USER, TERM - Explicitly set CC=gcc to use Nix's GCC in all scenarios - Detect host architecture to avoid cross-compiler on native builds - Add LOCALVERSION= to prevent '+' suffix in version string - Add shell utilities to flake.nix (getopt, coreutils, rsync, etc.) - Print SHA256 checksum of vmlinux for verification This ensures cross-compiled and native builds use the correct compiler identification strings for reproducibility.
saurabh-sengar
left a comment
There was a problem hiding this comment.
Can we upstream Linux kernel changes in this PR ?
Is there no way reproducibitly builds are supported by Linux kernel today ?
Ref: https://docs.kernel.org/kbuild/reproducible-builds.html
| # For reproducible builds, use --build-id=none to avoid non-deterministic Build IDs. | ||
| ldflags-y := -shared -soname=linux-vdso.so.1 \ | ||
| -Bsymbolic --build-id=sha1 -n $(btildflags-y) | ||
| -Bsymbolic --build-id=none -n $(btildflags-y) |
There was a problem hiding this comment.
Can we upstream this ?
There was a problem hiding this comment.
The difference comes from build ids embedded as sha in vmlinux binaries, which depends on the path of linux kernel. When I move the kernel to exact same path, I am able to generate same binaries without these kernel changes.
Referring https://reproducible-builds.org/docs/build-path/ and https://docs.kernel.org/kbuild/reproducible-builds.html#absolute-filenames
I have used all of these options, but in the first wiki, it suggests that the paths can still be embedded [1]. I spent quite some time to figure out where it is being added, but it seems to be from some linker scripts, outside of kernel.
[1] "In most cases however, post-processing is required to either remove the build path or to normalize it to a predefined value."
This leaves us with these options:
1. Carry these changes as OOT
1. Try to upstream as RFC and see how it goes. It is not going to go in its current form, but maybe we come to know of some other option.
2. Post processing of binaries to remove build hashes - I POCed this as well. https://github.com/microsoft/OHCL-Linux-Kernel/tree/user/namjain/reprobuild-pipeline-without-kernel-changes
3. Live with these vmlinux differences, as rest of the things are anyways same. Leave it to the other person to keep the kernel in the same path and generate their builds.
| # Strip debug symbols from original module | ||
| # For reproducible builds, skip --add-gnu-debuglink as it embeds a CRC | ||
| if [[ -n "$REPRODUCIBLE_BUILD" ]]; then | ||
| $OBJCOPY --strip-unneeded "$module_path" | ||
| else | ||
| $OBJCOPY --strip-unneeded "$module_path" | ||
| fi |
There was a problem hiding this comment.
Looks like a copy-paste error here - probably meant to keep --add-gnu-debuglink in the else block?
There was a problem hiding this comment.
thanks for noticing, will add it.
| # Move merged config back to Microsoft directory (overwrites original) | ||
| mv .config "Microsoft/hcl-$CONFIG_ARCH.config" | ||
| echo ">>> CVM config merged: Microsoft/hcl-$CONFIG_ARCH.config" | ||
|
|
||
| cd "$SOURCE_DIR" |
There was a problem hiding this comment.
I'm not too familiar with what this is doing, but should we keep a backup of the original config somewhere in case something goes wrong?
There was a problem hiding this comment.
This is the case for handling CVM configs. We take the base arch config, and then add CVM specific configs on it, run make olddefconfig and then copy the .config back to Microsoft/ folder to be used for building kernel. Its fine if we don't keep the original copy, it does not get reused.
There was a problem hiding this comment.
had it been reused in pipeline, we would have to copy original config back.
| # Reproducible build environment variables | ||
| reproducibleEnv = { | ||
| # Disable timestamps in build output | ||
| SOURCE_DATE_EPOCH = "1609459200"; # 2021-01-01 00:00:00 UTC |
There was a problem hiding this comment.
While this works for reproducibility, I think you typically want this to be the modification of the last commit on the branch or something like that, and then include the "right" SOURCE_DATE_EPOCH in the recipe when being run for reproducibility.
The end result is that for a given release, there should be a recipe that says "run the build with this SOURCE_DATE_EPOCH set to get the same output as the release" which would get picked up as part of the build. I have similarly hardcoded it in the Nix build in the OpenVMM repo but plan to change it as described.
| # ARM64 cross-compilation toolchain | ||
| pkgsCross.aarch64-multiplatform.stdenv.cc | ||
| pkgsCross.aarch64-multiplatform.buildPackages.binutils |
There was a problem hiding this comment.
Can you run this build on an ARM machine for ARM -> x64 cross-compilation?
OHCL-Linux-Kernel has Microsoft/build-hcl-kernel.sh script which is used to build kernel. However, in build pipelines, that script is not used and similar code in pipeline code itself is used.
To implement reproducible builds, add this support in both local build script (Microsoft/build-hcl-kernel.sh) and the pipeline code. Instead of adding the support in pipeline directly, move the kernel build code from pipeline to a new script "Microsoft/build-hcl-kernel-pipeline.sh" and ad reproducible builds changes in it. With that, buddy/official pipeline would then call this script to build kernel.