On RPi4 and Juno, gBS->Stall(1) takes 10us and 100us respectively


Rebecca Cran <quic_rcran@...>
 

I've been doing some work on USB and ended up realizing that gBS->Stall(1) is taking much longer than it should: on my Juno R2 it's stalling for 100 us, while on my Raspberry Pi 4 it's 10 us.

This appears to be causing a USB bulk transfer request that asks for a 1ms timeout taking 100ms on the Juno.

I'm measuring the delay with the following code:


UINT64 First = GetPerformanceCounter ();

gBS->Stall (XHC_1_MICROSECOND);

UINT64 Second = GetPerformanceCounter ();

UINT64 FirstNs = GetTimeInNanoSecond (First);
UINT64 SecondNs = GetTimeInNanoSecond (Second);

DEBUG ((DEBUG_INFO, "Stalled for %llu ns (%llu ms)\n", (SecondNs - FirstNs), (SecondNs - FirstNs) / 1024 / 1024));



I see output such as:

Stalled for 10500 ns (0 ms)


--
Rebecca Cran


Andrew Fish
 

Rebecca, 

gBS->Stall() is built on top [1] of the Metronome Architectural Protocol [2]. You should look at how the platform implements the Metronome Architectural Protocol. 

It looks like most platform implement a generic Metronome Driver[3] that just sits on top of the platforms TimerLib [4] implementation. 

You can find the TimerLib implementations via:

$ git grep TimerLib -- \*.inf | grep LIBRARY_CLASS        

ArmPkg/Library/ArmArchTimerLib/ArmArchTimerLib.inf:14:  LIBRARY_CLASS                  = TimerLib

EmbeddedPkg/Library/DebugAgentTimerLibNull/DebugAgentTimerLibNull.inf:19:  LIBRARY_CLASS                  = DebugAgentTimerLib|SEC BASE DXE_CORE

EmulatorPkg/Library/DxeCoreTimerLib/DxeCoreTimerLib.inf:22:  LIBRARY_CLASS                  = TimerLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION

EmulatorPkg/Library/DxeTimerLib/DxeTimerLib.inf:22:  LIBRARY_CLASS                  = TimerLib|DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION

EmulatorPkg/Library/PeiTimerLib/PeiTimerLib.inf:22:  LIBRARY_CLASS                  = TimerLib|PEIM PEI_CORE SEC

MdePkg/Library/BaseTimerLibNullTemplate/BaseTimerLibNullTemplate.inf:23:  LIBRARY_CLASS                  = TimerLib

MdePkg/Library/SecPeiDxeTimerLibCpu/SecPeiDxeTimerLibCpu.inf:30:  LIBRARY_CLASS                  = TimerLib

OvmfPkg/Library/AcpiTimerLib/BaseAcpiTimerLib.inf:17:  LIBRARY_CLASS  = TimerLib|PEI_CORE PEIM DXE_CORE

OvmfPkg/Library/AcpiTimerLib/BaseAcpiTimerLibBhyve.inf:18:  LIBRARY_CLASS  = TimerLib

OvmfPkg/Library/AcpiTimerLib/BaseRomAcpiTimerLib.inf:16:  LIBRARY_CLASS  = TimerLib|SEC

OvmfPkg/Library/AcpiTimerLib/DxeAcpiTimerLib.inf:17:  LIBRARY_CLASS  = TimerLib|DXE_DRIVER DXE_RUNTIME_DRIVER DXE_SMM_DRIVER UEFI_DRIVER UEFI_APPLICATION SMM_CORE

PcAtChipsetPkg/Library/AcpiTimerLib/BaseAcpiTimerLib.inf:21:  LIBRARY_CLASS                  = TimerLib|SEC PEI_CORE PEIM

PcAtChipsetPkg/Library/AcpiTimerLib/DxeAcpiTimerLib.inf:21:  LIBRARY_CLASS                  = TimerLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER DXE_SMM_DRIVER UEFI_APPLICATION UEFI_DRIVER SMM_CORE

PcAtChipsetPkg/Library/AcpiTimerLib/PeiAcpiTimerLib.inf:21:  LIBRARY_CLASS                  = TimerLib|PEI_CORE PEIM

PcAtChipsetPkg/Library/AcpiTimerLib/StandaloneMmAcpiTimerLib.inf:23:  LIBRARY_CLASS                  = TimerLib|MM_CORE_STANDALONE MM_STANDALONE

UefiCpuPkg/Library/CpuTimerLib/BaseCpuTimerLib.inf:18:  LIBRARY_CLASS                  = TimerLib

UefiCpuPkg/Library/SecPeiDxeTimerLibUefiCpu/SecPeiDxeTimerLibUefiCpu.inf:30:  LIBRARY_CLASS                  = TimerLib

UefiPayloadPkg/Library/AcpiTimerLib/AcpiTimerLib.inf:15:  LIBRARY_CLASS                  = TimerLib


So I’d guess your platform is using the ArmArchTimerLi [5]. If that is the case I’d look at the value of PcdArmArchTimerFreqInHz if it is zero then I think you are running on the default ARM timer value. 

If you are using the stock stuff and all your TimerLib instances are the same in all your drivers then the TimerLib is also abstraction the performance measure.

This is probably not helpful but the UEFI Spec defines that Stall() delays for at least the requested amount of time, with no upper bound on how long it can take.  

[1] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Core/Dxe/Misc/Stall.c#L49

On Nov 28, 2022, at 11:02 AM, Rebecca Cran <quic_rcran@...> wrote:

I've been doing some work on USB and ended up realizing that gBS->Stall(1) is taking much longer than it should: on my Juno R2 it's stalling for 100 us, while on my Raspberry Pi 4 it's 10 us.

This appears to be causing a USB bulk transfer request that asks for a 1ms timeout taking 100ms on the Juno.

I'm measuring the delay with the following code:


UINT64 First = GetPerformanceCounter ();

gBS->Stall (XHC_1_MICROSECOND);

UINT64 Second = GetPerformanceCounter ();

UINT64 FirstNs = GetTimeInNanoSecond (First);
UINT64 SecondNs = GetTimeInNanoSecond (Second);

DEBUG ((DEBUG_INFO, "Stalled for %llu ns (%llu ms)\n", (SecondNs - FirstNs), (SecondNs - FirstNs) / 1024 / 1024));



I see output such as:

Stalled for 10500 ns (0 ms)


--
Rebecca Cran







Rebecca Cran <quic_rcran@...>
 

Thanks. RPi4 and Juno use EmbeddedPkg/MetronomeDxe (https://github.com/tianocore/edk2/blob/master/EmbeddedPkg/MetronomeDxe/Metronome.c#L54) which uses the PCD PcdMetronomeTickPeriod.

JunoPkg overrides that to 1000, while RPi4 uses the default of 100. Given that the clocks run at 50MHz and 54MHz, I think the tick period should be 1 (1 100-ns unit, giving a clock frequency of 10MHz)?

--
Rebecca Cran

On 11/28/22 12:51, Andrew Fish via groups.io wrote:
Rebecca,
gBS->Stall() is built on top [1] of the Metronome Architectural Protocol [2]. You should look at how the platform implements the Metronome Architectural Protocol.
It looks like most platform implement a generic Metronome Driver[3] that just sits on top of the platforms TimerLib [4] implementation.
You can find the TimerLib implementations via:
$ git grep TimerLib -- \*.inf | grep LIBRARY_CLASS
ArmPkg/Library/ArmArchTimerLib/ArmArchTimerLib.inf:14:  LIBRARY_CLASS               = TimerLib
EmbeddedPkg/Library/DebugAgentTimerLibNull/DebugAgentTimerLibNull.inf:19:  LIBRARY_CLASS                  = DebugAgentTimerLib|SEC BASE DXE_CORE
EmulatorPkg/Library/DxeCoreTimerLib/DxeCoreTimerLib.inf:22: LIBRARY_CLASS                  = TimerLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION
EmulatorPkg/Library/DxeTimerLib/DxeTimerLib.inf:22:  LIBRARY_CLASS             = TimerLib|DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION
EmulatorPkg/Library/PeiTimerLib/PeiTimerLib.inf:22:  LIBRARY_CLASS             = TimerLib|PEIM PEI_CORE SEC
MdePkg/Library/BaseTimerLibNullTemplate/BaseTimerLibNullTemplate.inf:23:  LIBRARY_CLASS                  = TimerLib
MdePkg/Library/SecPeiDxeTimerLibCpu/SecPeiDxeTimerLibCpu.inf:30: LIBRARY_CLASS                  = TimerLib
OvmfPkg/Library/AcpiTimerLib/BaseAcpiTimerLib.inf:17:  LIBRARY_CLASS  = TimerLib|PEI_CORE PEIM DXE_CORE
OvmfPkg/Library/AcpiTimerLib/BaseAcpiTimerLibBhyve.inf:18: LIBRARY_CLASS  = TimerLib
OvmfPkg/Library/AcpiTimerLib/BaseRomAcpiTimerLib.inf:16:  LIBRARY_CLASS = TimerLib|SEC
OvmfPkg/Library/AcpiTimerLib/DxeAcpiTimerLib.inf:17:  LIBRARY_CLASS  = TimerLib|DXE_DRIVER DXE_RUNTIME_DRIVER DXE_SMM_DRIVER UEFI_DRIVER UEFI_APPLICATION SMM_CORE
PcAtChipsetPkg/Library/AcpiTimerLib/BaseAcpiTimerLib.inf:21: LIBRARY_CLASS                  = TimerLib|SEC PEI_CORE PEIM
PcAtChipsetPkg/Library/AcpiTimerLib/DxeAcpiTimerLib.inf:21: LIBRARY_CLASS                  = TimerLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER DXE_SMM_DRIVER UEFI_APPLICATION UEFI_DRIVER SMM_CORE
PcAtChipsetPkg/Library/AcpiTimerLib/PeiAcpiTimerLib.inf:21: LIBRARY_CLASS                  = TimerLib|PEI_CORE PEIM
PcAtChipsetPkg/Library/AcpiTimerLib/StandaloneMmAcpiTimerLib.inf:23: LIBRARY_CLASS                  = TimerLib|MM_CORE_STANDALONE MM_STANDALONE
UefiCpuPkg/Library/CpuTimerLib/BaseCpuTimerLib.inf:18:  LIBRARY_CLASS               = TimerLib
UefiCpuPkg/Library/SecPeiDxeTimerLibUefiCpu/SecPeiDxeTimerLibUefiCpu.inf:30:  LIBRARY_CLASS                  = TimerLib
UefiPayloadPkg/Library/AcpiTimerLib/AcpiTimerLib.inf:15:  LIBRARY_CLASS                 = TimerLib
So I’d guess your platform is using the ArmArchTimerLi [5]. If that is the case I’d look at the value of PcdArmArchTimerFreqInHz if it is zero then I think you are running on the default ARM timer value.
If you are using the stock stuff and all your TimerLib instances are the same in all your drivers then the TimerLib is also abstraction the performance measure.
This is probably not helpful but the UEFI Spec defines that Stall() delays for at least the requested amount of time, with no upper bound on how long it can take.
[1] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Core/Dxe/Misc/Stall.c#L49 <https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Core/Dxe/Misc/Stall.c#L49>
[2] https://github.com/tianocore/edk2/blob/master/MdePkg/Include/Protocol/Metronome.h#L25 <https://github.com/tianocore/edk2/blob/master/MdePkg/Include/Protocol/Metronome.h#L25>
[3] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Universal/Metronome/Metronome.c#L61 <https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Universal/Metronome/Metronome.c#L61>
[4] https://github.com/tianocore/edk2/blob/master/MdePkg/Include/Library/TimerLib.h <https://github.com/tianocore/edk2/blob/master/MdePkg/Include/Library/TimerLib.h>
[5] https://github.com/tianocore/edk2/blob/master/ArmPkg/Library/ArmArchTimerLib/ArmArchTimerLib.c#L18 <https://github.com/tianocore/edk2/blob/master/ArmPkg/Library/ArmArchTimerLib/ArmArchTimerLib.c#L18>
Thanks,
Andrew Fish

On Nov 28, 2022, at 11:02 AM, Rebecca Cran <quic_rcran@...> wrote:

I've been doing some work on USB and ended up realizing that gBS->Stall(1) is taking much longer than it should: on my Juno R2 it's stalling for 100 us, while on my Raspberry Pi 4 it's 10 us.

This appears to be causing a USB bulk transfer request that asks for a 1ms timeout taking 100ms on the Juno.

I'm measuring the delay with the following code:


UINT64 First = GetPerformanceCounter ();

gBS->Stall (XHC_1_MICROSECOND);

UINT64 Second = GetPerformanceCounter ();

UINT64 FirstNs = GetTimeInNanoSecond (First);
UINT64 SecondNs = GetTimeInNanoSecond (Second);

DEBUG ((DEBUG_INFO, "Stalled for %llu ns (%llu ms)\n", (SecondNs - FirstNs), (SecondNs - FirstNs) / 1024 / 1024));



I see output such as:

Stalled for 10500 ns (0 ms)


--
Rebecca Cran