Skip to content

OCPBUGS-69684, OCPBUGS-77091, OCPBUGS-77375, OCPBUGS-77480, OCPBUGS-77572, OCPBUGS-78296, OCPBUGS-78552, OCPBUGS-78711, OCPBUGS-78814: Sync from upstream (29-Mar-2026)#1

Closed
vitus133 wants to merge 57 commits intomainfrom
upstream-sync-2026-03-29
Closed

OCPBUGS-69684, OCPBUGS-77091, OCPBUGS-77375, OCPBUGS-77480, OCPBUGS-77572, OCPBUGS-78296, OCPBUGS-78552, OCPBUGS-78711, OCPBUGS-78814: Sync from upstream (29-Mar-2026)#1
vitus133 wants to merge 57 commits intomainfrom
upstream-sync-2026-03-29

Conversation

@vitus133
Copy link
Copy Markdown
Owner

Upstream PRs included

  • #574 OCPBUGS-78552: Sync from upstream (24-Mar-2026) (OCPBUGS-78552)
  • #572 Fix upstream-sync.sh macOS compatibility
  • #571 Upstream to downstream staging main
  • #568 Improve usptream-sync script
  • #567 OCPBUGS-77480, OCPBUGS-78711, OCPBUGS-78814: Sync from upstream (19-Mar-2026) (OCPBUGS-77480,OCPBUGS-78711,OCPBUGS-78814)
  • #565 Upstream to downstream staging main
  • #564 Add github action to create a sync PR every hour
  • #563 Upstream to downstream staging main
  • #556 OCPBUGS-77572: Updating ose-linuxptp-daemon-container image to be consistent with ART for 4.22 (OCPBUGS-77572)

sebsoto and others added 30 commits March 5, 2026 12:50
Changes a variable name in hack/gen-configmap-data-source.sh to be more
descriptive for easier readability.
Updates the README to be easier to read. Mainly rewording to fix some
grammatical issues, along with some removal of duplicate information.

Log output at the end was removed as the contents didn't seem relevant
for a user trying to run the daemon.
…penshift-4.22-linuxptp-daemon

OCPBUGS-77572: Updating ose-linuxptp-daemon-container image to be consistent with ART for 4.22
Signed-off-by: Vitaly Grinberg <vgrinber@redhat.com>
Three related issues caused incorrect T-BC behavior during upstream
port failover:
1. ptp4l offsets were never reported to the T-BC state machine, so
   getLargestOffset returned FaultyPhaseOffset (unfilled window) and
   freeRunCondition could not detect large ptp4l offsets. Fix: add a
   tunable 1-second averaged ptp4l offset event (sendPtp4lOffsetEvent)
   fed via a sliding window (ptp4lOffsetEventWindowSize setting), and
   teach freeRunCondition/getLargestOffset to use it while skipping
   empty windows.
2. When all PTP ports are lost, sendPtp4lEvent uses a fallback iface,
   leaving stale LOCKED DataDetails on inactive ports. isSourceLostBC
   then incorrectly reports source as not lost. Fix: AddEvent now
   propagates SourceLost to all LOCKED details, not just the event's
   own interface.
3. downstreamAnnounceIWF runs slow PMC calls in a goroutine. If the BC
   transitions to FREERUN mid-flight, the goroutine unconditionally
   overwrites clockClass with stale upstream data. Fix: add
   context-based cancellation (cancel-on-supersede in
   updateDownstreamData) plus applyIfLockedBC state guards around
   data mutations.

Assisted by Cursor AI

Signed-off-by: Vitaly Grinberg <vgrinber@redhat.com>
Signed-off-by: Vitaly Grinberg <vgrinber@redhat.com>
Assisted-by: Cursor
…stream_staging_main

Upstream to downstream staging main
Add API to enable / disable leap second sources.
By default (and if leapSources is omitted) all satellite
sources are enabled. To disable sources, specify them
under the plugin gnss->leapSources section:
e825:
  devices:
    - eno8703
  gnss:
    disabled: false
    leapSources:
      navic: false

Signed-off-by: Vitaly Grinberg <vgrinber@redhat.com>
When cloud-event-proxy restarts and linuxptp-daemon re-emits cached
port role events, the Raw field does not contain a trailing newline.
This causes all port events to be concatenated on the socket, and
cloud-event-proxy fails to parse them (strconv.ParseInt: parsing
"port": invalid syntax), resulting in missing clock class metrics.

Ensure each port role log line has a trailing newline before writing
to the event socket.

Signed-off-by: Jack Ding <jackding@gmail.com>
OCPBUGS-78296: Allow non-GNSS leap second sources
Extend aws-ci action workflow to save artifacts
…pDevice

 -Accept lspci VPD results even when PartNumber is empty (provisional VPD),
 -default LinkSpeed and FEC to unknown when ethtool cannot determine them.
 -Skip non-PCI and virtual-function NICs before running ethtool, and lower remaining skip messages to debug level, eliminating log noise from container/virtual interfaces
 -Collect VPD once per NIC via the PTP-exposing port
Fix missing newline in re-emitted port role logs to event socket
…stream_staging_main

Upstream to downstream staging main
Add github action to create a sync PR every hour
This commit fixes an issue where if the clock_id is not set in
synce4lConf, it is incorrectly set to 0, instead of the correct value
previously extracted from the network device list.

This issue was due to the clock_id being pulled from the initial value
of the config, not the actual object that was mutated.
Signed-off-by: Vitaly Grinberg <vgrinber@redhat.com>
Fix VPD collection and ensure LinkSpeed/FEC always reported in NodePtpDevice
nocturnalastro and others added 27 commits March 18, 2026 12:00
OCPBUGS-78711: Fix clock_id being set to 0 in SyncE config
Move builder image to non-docker image so that we do not get hit with pull limits
The issue was that the expectWorker was not exiting when exp was closed instead
it return errors causing more the process to eventually crash due to to many go
routines
…ilure

If something goes wrong with gpsd (or we're just unlucky with
initialization timing races), we can attempt to run the ublox init code
before gpsd is actually running, which is silently ignored and can lead
to both ts2phc and our daemon's GNSS monitoring to fail.

This change largely fixes the 1st part, by performing ublx protocol
detection as part of the object initialization, and returning an error
if the initialization fails.  The monitoring framework will already
retry this registration every 1s until it succeeds, so with the error
return functioning, we will get appropriate retries until gpsd is
running and ubxtool can talk to it.

Signed-off-by: Jim Ramsay <jramsay@redhat.com>
Signed-off-by: Jim Ramsay <jramsay@redhat.com>
Generated-by: Cursor
Provides a FORK_REMOTE to users allow to push to a fork and
then create the PR against the downstream.

Also uses worktrees to better isolate the changes. Users can control
where the work tree is created (default /tmp). There is also a --keep-worktree
flag if the user wishes to inspect.
…_fixup-4.22

OCPBUGS-77480: Fix ubxtool initialization race conditions
…process_failure

Fix pmc looping when pmc process is killed
…26-03-19

OCPBUGS-77480, OCPBUGS-78711, OCPBUGS-78814: Sync from upstream (19-Mar-2026)
Centralizes the newline suffix check inside writeLogToSocket so all
callers get consistent newline termination automatically, instead of
each call site handling it independently.

Signed-off-by: Jack Ding <jackding@gmail.com>
When cloud-event-proxy crashes and restarts, clock_class metrics for
most ptp4l configs disappear because:

1. EmitClockClassLogs() skipped configs where pmc.parentDS was nil,
   even though the clock class data was available in clkSyncState.
   Remove this unnecessary guard since EmitClockClass() already
   handles missing data gracefully.

2. emitClockClass() used utils.EmitClockClass() which writes directly
   to the socket without reconnect-and-retry logic. On broken pipe the
   data was silently lost. Switch to writeLogToSocket() which handles
   reconnection, consistent with EmitClockSyncLogs, EmitPortRoleLogs,
   and EmitProcessStatusLog.

3. Clean up now-unused code: utils.EmitClockClass, utils.IsBrokenPipe,
   signalBrokenPipe, brokenPipeCh, and their associated tests.

Signed-off-by: Jack Ding <jackding@gmail.com>
Fix clock class metrics lost after cloud-event-proxy restart
Three big fixes necessary in order to correctly report GM state:
1: GM state should be s1 if ts2phc is in holdover.
2: ts2phc should exit holdover after timeout
3: dpll should stay PTP_NOTSET upon loss of gnss.
…te_on_gnss_loss

Fix reporting of GM state
Replace GNU-specific sed and grep syntax with POSIX equivalents
so the script works on both macOS and Linux:
- sed multi-line join -> paste -sd ',' + sed
- grep -oP (Perl lookbehind) -> grep -oE with two-stage filter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stream_staging_main

Upstream to downstream staging main
Adds utils.CheckMetricSanity to prevent any metric update from being
emitted with an empty process or interface name across the daemon.
If the labels are missing, the update is dropped and a stack trace
is logged to warn developers of the source error.

Assisted-by: gemini-2.5-pro
Signed-off-by: Jim Ramsay <jramsay@redhat.com>
…e_metric

OCPBUGS-78552: Generic sanity check for metrics
Configure AWS timeout to 2 hours for long jobs
Two issues caused clock_class metrics to be missing after
cloud-event-proxy crashed and recovered:

1. writeLogToSocket returned false when conn was nil (set by another
   goroutine handling a broken pipe) without attempting reconnection.
   Concurrent writers like EmitClockClassLogs silently lost data.
   Fix: call reconnectEventSocket when conn is nil, blocking until
   any in-progress reconnection completes.

2. clkSyncState was never populated with clock class values in T-BC/HA
   configurations. The clockClassRequestCh handler and UpdateClockClass
   set e.clockClass (EventHandler level) but not clkSyncState entries.
   EmitClockClass and the classTicker rely on clkSyncState for
   re-emission, so they had nothing to emit.
   Fix: store clock class in clkSyncState when received via
   clockClassRequestCh, via new storeClockClassLocked helper.

Signed-off-by: Jack Ding <jackding@gmail.com>
Fix race condition in writeLogToSocket dropping writes during reconnect
OCPBUGS-78552: Sync from upstream (24-Mar-2026)
@vitus133 vitus133 closed this Mar 29, 2026
@vitus133 vitus133 deleted the upstream-sync-2026-03-29 branch March 29, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants