[edk2-devel] [RFC][PATCH v1] UefiCpuPkg/MpInitLib DXE: Reduce AP status check interval


Ni, Ray
 

The use case is valid, IMO. And the commit message is helpful.

But I really think this constant should be PCD. Here's why I think a
platform might want to control it:

- The best (finest) possible resolution for timer events is platform
dependent, IIUC. The duration of the "idle tick" is platform-specific.
And, it likely makes no sense to set AP_CHECK_INTERVAL to a duration
that's around, or under, what the arch timer resolution allows for.

- In the other direction, CheckAndUpdateApsStatus() contains a loop that
counts up to CpuMpData->CpuCount. In a very large system (hundreds or
maybe thousands of APs) this function may have non-negligible cost.

I suggest introducing a PCD for this (measured in msecs) and using 100
msecs as the default.
Laszlo,
Adding a PCD means platform integrators need to consider which value to set.
Most of the time, they may just use the default PCD value.
Then, why not we add the PCD later when a real case is met?

Thanks,
Ray


Laszlo Ersek
 

On 03/16/20 02:37, Ni, Ray wrote:
The use case is valid, IMO. And the commit message is helpful.

But I really think this constant should be PCD. Here's why I think a
platform might want to control it:

- The best (finest) possible resolution for timer events is platform
dependent, IIUC. The duration of the "idle tick" is platform-specific.
And, it likely makes no sense to set AP_CHECK_INTERVAL to a duration
that's around, or under, what the arch timer resolution allows for.

- In the other direction, CheckAndUpdateApsStatus() contains a loop that
counts up to CpuMpData->CpuCount. In a very large system (hundreds or
maybe thousands of APs) this function may have non-negligible cost.

I suggest introducing a PCD for this (measured in msecs) and using 100
msecs as the default.
Laszlo,
Adding a PCD means platform integrators need to consider which value to set.
Most of the time, they may just use the default PCD value.
Then, why not we add the PCD later when a real case is met?
The patch changes existent behavior; it is not for a newly introduced
feature.

Because most platforms are not in the edk2 tree, we don't know what
platforms could be regressed by increasing the polling frequency
tenfold. (And remember that the polling action has O(n) cost, where "n"
is the logical processor count.)

Under your suggestion, the expression "real case is met" amounts to
"someone reports a regression" (possibly after the next stable tag,
even). I don't think that's a good idea.

In particular, the patch is motivated by RegisterCpuFeaturesLib -- the
CpuFeaturesInitialize() function -- on some platform(s) that Hao uses.
But there are platforms that don't use RegisterCpuFeaturesLib, and still
use MpInitLib.

Thanks,
Laszlo


Ni, Ray
 

Laszlo,
Adding a PCD means platform integrators need to consider which value to set.
Most of the time, they may just use the default PCD value.
Then, why not we add the PCD later when a real case is met?
The patch changes existent behavior; it is not for a newly introduced
feature.

Because most platforms are not in the edk2 tree, we don't know what
platforms could be regressed by increasing the polling frequency
tenfold. (And remember that the polling action has O(n) cost, where "n"
is the logical processor count.)

Under your suggestion, the expression "real case is met" amounts to
"someone reports a regression" (possibly after the next stable tag,
even). I don't think that's a good idea.
In particular, the patch is motivated by RegisterCpuFeaturesLib -- the
CpuFeaturesInitialize() function -- on some platform(s) that Hao uses.
But there are platforms that don't use RegisterCpuFeaturesLib, and still
use MpInitLib.
OK. I agree with your suggestion.


Wu, Hao A
 

-----Original Message-----
From: devel@edk2.groups.io [mailto:devel@edk2.groups.io] On Behalf Of Brian
J. Johnson
Sent: Tuesday, March 24, 2020 12:39 AM
To: devel@edk2.groups.io; Ni, Ray; Laszlo Ersek; Wu, Hao A;
rfc@edk2.groups.io
Cc: Dong, Eric; Kinney, Michael D; Zeng, Star
Subject: Re: [edk2-devel] [RFC][PATCH v1] UefiCpuPkg/MpInitLib DXE: Reduce
AP status check interval

On 3/23/20 9:37 AM, Ni, Ray wrote:
Laszlo,
Adding a PCD means platform integrators need to consider which value to
set.
Most of the time, they may just use the default PCD value.
Then, why not we add the PCD later when a real case is met?
The patch changes existent behavior; it is not for a newly introduced
feature.

Because most platforms are not in the edk2 tree, we don't know what
platforms could be regressed by increasing the polling frequency
tenfold. (And remember that the polling action has O(n) cost, where "n"
is the logical processor count.)

Under your suggestion, the expression "real case is met" amounts to
"someone reports a regression" (possibly after the next stable tag,
even). I don't think that's a good idea.
In particular, the patch is motivated by RegisterCpuFeaturesLib -- the
CpuFeaturesInitialize() function -- on some platform(s) that Hao uses.
But there are platforms that don't use RegisterCpuFeaturesLib, and still
use MpInitLib.
OK. I agree with your suggestion.
Thank you. I agree with Laszlo: MP initialization is tricky to scale,
and platform dependent. My group builds real machines with thousands of
APs, and we have had to do various tweaks to the MP init. code. Having
a PCD to adjust this timeout will be very useful.

Thanks all for the feedbacks. Please grant me some time to prepare a new
version of the patch.

Best Regards,
Hao Wu



Thanks,
--
Brian J. Johnson
Enterprise X86 Lab

Hewlett Packard Enterprise
brian.johnson@...



Brian J. Johnson
 

On 3/23/20 9:37 AM, Ni, Ray wrote:
Laszlo,
Adding a PCD means platform integrators need to consider which value to set.
Most of the time, they may just use the default PCD value.
Then, why not we add the PCD later when a real case is met?
The patch changes existent behavior; it is not for a newly introduced
feature.

Because most platforms are not in the edk2 tree, we don't know what
platforms could be regressed by increasing the polling frequency
tenfold. (And remember that the polling action has O(n) cost, where "n"
is the logical processor count.)

Under your suggestion, the expression "real case is met" amounts to
"someone reports a regression" (possibly after the next stable tag,
even). I don't think that's a good idea.
In particular, the patch is motivated by RegisterCpuFeaturesLib -- the
CpuFeaturesInitialize() function -- on some platform(s) that Hao uses.
But there are platforms that don't use RegisterCpuFeaturesLib, and still
use MpInitLib.
OK. I agree with your suggestion.
Thank you. I agree with Laszlo: MP initialization is tricky to scale, and platform dependent. My group builds real machines with thousands of APs, and we have had to do various tweaks to the MP init. code. Having a PCD to adjust this timeout will be very useful.

Thanks,
--
Brian J. Johnson
Enterprise X86 Lab

Hewlett Packard Enterprise
brian.johnson@...