Re: [PATCH V7 36/37] UefiCpuPkg: Setting initial-count register as the last step

Henz, Patrick

Hi all,

We (Hewlett Packard Enterprise) are also running into a race condition due to how InitializeApicTimer initializes the APIC timers, we figured this might be a good place to report our findings. On the occasion we notice that APs get stuck in the timer interrupt handling code after getting woken up by the BSP. It appears that if the CurrentCount timer value provided by the BSP is sufficiently small, the brief amount of time between an AP calling InitializeApicTimer and calling DisableApicTimerInterrupt (see SyncLocalApicTimerSetting as an example) leaves enough room for an APIC timer interrupt to occur. This seems to become more frequent on larger systems with higher processor counts, from what we've gathered the increase in the number of locking sequence invocations appears to be making this condition far more likely to occur. We work on scaled systems with node controllers and we've really only seen this on larger systems, but it seems to us this could feasibly happen on smaller systems too. Our current solution is to add an additional argument to InitializeApicTimer, allowing the caller to specify whether or not APIC timer interrupts are to be enabled for the current thread.

Patrick Henz

Enterprise X86 Labs
Hewlett Packard Enterprise

-----Original Message-----
From: [] On Behalf Of Lendacky, Thomas via
Sent: Friday, May 13, 2022 5:13 PM
To:; min.m.xu@...; Ni, Ray <>
Cc: Yao, Jiewen <jiewen.yao@...>; Gerd Hoffmann <kraxel@...>; Anthony Perard <anthony.perard@...>; Julien Grall <julien@...>; Dong, Eric <eric.dong@...>
Subject: Re: [edk2-devel] [PATCH V7 36/37] UefiCpuPkg: Setting initial-count register as the last step

On 5/11/22 19:52, Min Xu via wrote:
On May 11, 2022 10:06 PM, Lendacky, Thomas wrote:
On 5/10/22 21:00, Xu, Min M wrote:
On May 11, 2022 4:30 AM, Tom Lendacky wrote:
I'm replying to this patch since I can't find patch V12 46/47
anywhere in my email.

I've bisected a regression in the Linux kernel to this patch when
an SEV-SNP guest is booted. The following message is issued in the
kernel for every AP being brought online:

APIC: Stale IRR:
00020 ISR:

Possibly a timing issue involving the mode switch with the
interrupt unmasked. If I leave the interrupt masked and only
un-mask it after the programming of the init-count, then the message goes away.
Do you mean in InitializeApicTimer, it should follow below steps:
1. mask LvtTimer. (set LvtTimer.Bits.Mask = 1) 2. Do other stuff,
including programing the init-count register.
3. un-mask LvtTimer (set LvtTimer.Bit.Mask = 0)
Yes, I believe so. I'm not an expert on the APIC timer, but that
seems reasonable to me.
I tested this fix in Td guest and it has no side effect.
I check the Intel SDM (Vol.3A Chap 10.5 Handling Local Interrupts) but it doesn't describe the actual sequence of LvtTimer.Bits.Mask and programming of init-count register.
@ Ni, Ray, What's your thought about it?
I guess you can theoretically miss an interrupt if your initial count is expires before you unmask the interrupt, so I think your fix is correct and no changes are needed.

I need to double check whether I'm properly resetting the APIC when APs are booted multiple times. Since this only occurs with SNP, I think this is on my end.



Join { to automatically receive all group messages.