Skip to content

Add reproducible builds support in OHCL-Linux-Kernel#115

Draft
namancse wants to merge 5 commits intoproduct/hcl-main/6.12from
user/namjain/reprobuild-pipeline-golden-working
Draft

Add reproducible builds support in OHCL-Linux-Kernel#115
namancse wants to merge 5 commits intoproduct/hcl-main/6.12from
user/namjain/reprobuild-pipeline-golden-working

Conversation

@namancse
Copy link
Contributor

OHCL-Linux-Kernel has Microsoft/build-hcl-kernel.sh script which is used to build kernel. However, in build pipelines, that script is not used and similar code in pipeline code itself is used.
To implement reproducible builds, add this support in both local build script (Microsoft/build-hcl-kernel.sh) and the pipeline code. Instead of adding the support in pipeline directly, move the kernel build code from pipeline to a new script "Microsoft/build-hcl-kernel-pipeline.sh" and ad reproducible builds changes in it. With that, buddy/official pipeline would then call this script to build kernel.

Naman Jain added 5 commits January 20, 2026 08:27
Add kernel Makefile changes to support reproducible builds across machines.

Changes:
- Makefile: Add KBUILD_BUILD_ID variable (default: sha1) to allow
  overriding the build-id linker flag for vmlinux and modules
- arch/x86/entry/vdso/Makefile: Use --build-id=none for x86 VDSO
- arch/arm64/kernel/vdso/Makefile: Use --build-id=none for arm64 VDSO
- arch/arm64/kernel/vdso32/Makefile: Use --build-id=none for arm64-32 VDSO

The VDSO changes must remain in kernel Makefiles as VDSO_LDFLAGS are
not overridable from the command line.

Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Add reproducible build system using NixOS flakes with pinned dependencies.

New files:
- flake.nix: Nix environment with pinned toolchain (GCC 13.2.0, binutils, etc.)
- flake.lock: Locked package versions for reproducibility
- Microsoft/nix-build.sh: Main build script with reproducible environment
- Microsoft/nix-setup.sh: One-time Nix installation helper
- Microsoft/nix-clean.sh: Build artifact cleanup

Modified files:
- Microsoft/build-hcl-kernel.sh: When REPRODUCIBLE_BUILD=1:
  - Pass KBUILD_BUILD_ID=none to disable Build IDs
  - Pass KCFLAGS=-fdebug-prefix-map to normalize debug paths
  - Skip --add-gnu-debuglink to avoid CRC embedding
- .gitignore: Add Nix-related entries

Environment variables set for reproducibility:
- SOURCE_DATE_EPOCH=1609459200 (fixed timestamp)
- KBUILD_BUILD_USER=builder
- KBUILD_BUILD_HOST=nixos
- REPRODUCIBLE_BUILD=1 (flag for build scripts)

Usage:
  ./Microsoft/nix-setup.sh       # One-time Nix installation
  ./Microsoft/nix-build.sh x64   # Build x64 kernel
  ./Microsoft/nix-build.sh arm64 # Build arm64 kernel

Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Add build-hcl-kernel-pipeline.sh that implements the full kernel build
workflow for Azure DevOps pipelines with reproducible build support.

Features:
- Supports amd64 and arm64 architectures
- CVM config merge support
- Reproducible build mode with Nix environment
- Generates kernel, headers, modules, and debug symbols
- Progress indicators for build stages [1/5] through [5/5]
- SHA256 checksum output for reproducibility verification

Usage:
  ./build-hcl-kernel-pipeline.sh -s <source> -b <build> -c <config> -a <arch>
  ./build-hcl-kernel-pipeline.sh ... --reproducible  # Enable Nix environment
Enhance nix-setup.sh to ensure Nix is available in PATH immediately
after installation or when sourcing existing profiles.

Changes:
- Add source_nix_profile() helper function
- Check multiple profile locations (~/.nix-profile, /nix/var/nix/profiles)
- Clean up debug prints to informative messages
- Better error handling when Nix is installed but not in PATH
Ensure only Nix-provided tools are used during reproducible builds,
preventing system package leakage that could affect reproducibility.

Changes:
- Add --ignore-environment to nix develop for pure shell
- Keep essential env vars: HOME, USER, TERM
- Explicitly set CC=gcc to use Nix's GCC in all scenarios
- Detect host architecture to avoid cross-compiler on native builds
- Add LOCALVERSION= to prevent '+' suffix in version string
- Add shell utilities to flake.nix (getopt, coreutils, rsync, etc.)
- Print SHA256 checksum of vmlinux for verification

This ensures cross-compiled and native builds use the correct compiler
identification strings for reproducibility.
Copy link
Contributor

@saurabh-sengar saurabh-sengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we upstream Linux kernel changes in this PR ?
Is there no way reproducibitly builds are supported by Linux kernel today ?
Ref: https://docs.kernel.org/kbuild/reproducible-builds.html

# For reproducible builds, use --build-id=none to avoid non-deterministic Build IDs.
ldflags-y := -shared -soname=linux-vdso.so.1 \
-Bsymbolic --build-id=sha1 -n $(btildflags-y)
-Bsymbolic --build-id=none -n $(btildflags-y)
Copy link
Contributor

@saurabh-sengar saurabh-sengar Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we upstream this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference comes from build ids embedded as sha in vmlinux binaries, which depends on the path of linux kernel. When I move the kernel to exact same path, I am able to generate same binaries without these kernel changes.

Referring https://reproducible-builds.org/docs/build-path/ and https://docs.kernel.org/kbuild/reproducible-builds.html#absolute-filenames
I have used all of these options, but in the first wiki, it suggests that the paths can still be embedded [1]. I spent quite some time to figure out where it is being added, but it seems to be from some linker scripts, outside of kernel.

[1] "In most cases however, post-processing is required to either remove the build path or to normalize it to a predefined value."

This leaves us with these options:
1. Carry these changes as OOT
1. Try to upstream as RFC and see how it goes. It is not going to go in its current form, but maybe we come to know of some other option.
2. Post processing of binaries to remove build hashes - I POCed this as well. https://github.com/microsoft/OHCL-Linux-Kernel/tree/user/namjain/reprobuild-pipeline-without-kernel-changes
3. Live with these vmlinux differences, as rest of the things are anyways same. Leave it to the other person to keep the kernel in the same path and generate their builds.

Comment on lines +408 to +414
# Strip debug symbols from original module
# For reproducible builds, skip --add-gnu-debuglink as it embeds a CRC
if [[ -n "$REPRODUCIBLE_BUILD" ]]; then
$OBJCOPY --strip-unneeded "$module_path"
else
$OBJCOPY --strip-unneeded "$module_path"
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a copy-paste error here - probably meant to keep --add-gnu-debuglink in the else block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for noticing, will add it.

Comment on lines +290 to +294
# Move merged config back to Microsoft directory (overwrites original)
mv .config "Microsoft/hcl-$CONFIG_ARCH.config"
echo ">>> CVM config merged: Microsoft/hcl-$CONFIG_ARCH.config"

cd "$SOURCE_DIR"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with what this is doing, but should we keep a backup of the original config somewhere in case something goes wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case for handling CVM configs. We take the base arch config, and then add CVM specific configs on it, run make olddefconfig and then copy the .config back to Microsoft/ folder to be used for building kernel. Its fine if we don't keep the original copy, it does not get reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had it been reused in pipeline, we would have to copy original config back.

# Reproducible build environment variables
reproducibleEnv = {
# Disable timestamps in build output
SOURCE_DATE_EPOCH = "1609459200"; # 2021-01-01 00:00:00 UTC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this works for reproducibility, I think you typically want this to be the modification of the last commit on the branch or something like that, and then include the "right" SOURCE_DATE_EPOCH in the recipe when being run for reproducibility.

The end result is that for a given release, there should be a recipe that says "run the build with this SOURCE_DATE_EPOCH set to get the same output as the release" which would get picked up as part of the build. I have similarly hardcoded it in the Nix build in the OpenVMM repo but plan to change it as described.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add it.

Comment on lines +41 to +43
# ARM64 cross-compilation toolchain
pkgsCross.aarch64-multiplatform.stdenv.cc
pkgsCross.aarch64-multiplatform.buildPackages.binutils

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run this build on an ARM machine for ARM -> x64 cross-compilation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants