The use case is valid, IMO. And the commit message is helpful.
But I really think this constant should be PCD. Here's why I think a platform might want to control it:
- The best (finest) possible resolution for timer events is platform dependent, IIUC. The duration of the "idle tick" is platform-specific. And, it likely makes no sense to set AP_CHECK_INTERVAL to a duration that's around, or under, what the arch timer resolution allows for.
- In the other direction, CheckAndUpdateApsStatus() contains a loop that counts up to CpuMpData->CpuCount. In a very large system (hundreds or maybe thousands of APs) this function may have non-negligible cost.
I suggest introducing a PCD for this (measured in msecs) and using 100 msecs as the default. Laszlo, Adding a PCD means platform integrators need to consider which value to set. Most of the time, they may just use the default PCD value. Then, why not we add the PCD later when a real case is met? Thanks, Ray
|
|
On 03/16/20 02:37, Ni, Ray wrote: The use case is valid, IMO. And the commit message is helpful.
But I really think this constant should be PCD. Here's why I think a platform might want to control it:
- The best (finest) possible resolution for timer events is platform dependent, IIUC. The duration of the "idle tick" is platform-specific. And, it likely makes no sense to set AP_CHECK_INTERVAL to a duration that's around, or under, what the arch timer resolution allows for.
- In the other direction, CheckAndUpdateApsStatus() contains a loop that counts up to CpuMpData->CpuCount. In a very large system (hundreds or maybe thousands of APs) this function may have non-negligible cost.
I suggest introducing a PCD for this (measured in msecs) and using 100 msecs as the default. Laszlo, Adding a PCD means platform integrators need to consider which value to set. Most of the time, they may just use the default PCD value. Then, why not we add the PCD later when a real case is met?
The patch changes existent behavior; it is not for a newly introduced feature. Because most platforms are not in the edk2 tree, we don't know what platforms could be regressed by increasing the polling frequency tenfold. (And remember that the polling action has O(n) cost, where "n" is the logical processor count.) Under your suggestion, the expression "real case is met" amounts to "someone reports a regression" (possibly after the next stable tag, even). I don't think that's a good idea. In particular, the patch is motivated by RegisterCpuFeaturesLib -- the CpuFeaturesInitialize() function -- on some platform(s) that Hao uses. But there are platforms that don't use RegisterCpuFeaturesLib, and still use MpInitLib. Thanks, Laszlo
|
|
Laszlo, Adding a PCD means platform integrators need to consider which value to set. Most of the time, they may just use the default PCD value. Then, why not we add the PCD later when a real case is met? The patch changes existent behavior; it is not for a newly introduced feature.
Because most platforms are not in the edk2 tree, we don't know what platforms could be regressed by increasing the polling frequency tenfold. (And remember that the polling action has O(n) cost, where "n" is the logical processor count.)
Under your suggestion, the expression "real case is met" amounts to "someone reports a regression" (possibly after the next stable tag, even). I don't think that's a good idea. In particular, the patch is motivated by RegisterCpuFeaturesLib -- the CpuFeaturesInitialize() function -- on some platform(s) that Hao uses. But there are platforms that don't use RegisterCpuFeaturesLib, and still use MpInitLib.
OK. I agree with your suggestion.
|
|
toggle quoted message
Show quoted text
-----Original Message----- From: devel@edk2.groups.io [mailto:devel@edk2.groups.io] On Behalf Of Brian J. Johnson Sent: Tuesday, March 24, 2020 12:39 AM To: devel@edk2.groups.io; Ni, Ray; Laszlo Ersek; Wu, Hao A; rfc@edk2.groups.io Cc: Dong, Eric; Kinney, Michael D; Zeng, Star Subject: Re: [edk2-devel] [RFC][PATCH v1] UefiCpuPkg/MpInitLib DXE: Reduce AP status check interval
On 3/23/20 9:37 AM, Ni, Ray wrote:
Laszlo, Adding a PCD means platform integrators need to consider which value to
set.
Most of the time, they may just use the default PCD value. Then, why not we add the PCD later when a real case is met? The patch changes existent behavior; it is not for a newly introduced feature.
Because most platforms are not in the edk2 tree, we don't know what platforms could be regressed by increasing the polling frequency tenfold. (And remember that the polling action has O(n) cost, where "n" is the logical processor count.)
Under your suggestion, the expression "real case is met" amounts to "someone reports a regression" (possibly after the next stable tag, even). I don't think that's a good idea. In particular, the patch is motivated by RegisterCpuFeaturesLib -- the CpuFeaturesInitialize() function -- on some platform(s) that Hao uses. But there are platforms that don't use RegisterCpuFeaturesLib, and still use MpInitLib. OK. I agree with your suggestion. Thank you. I agree with Laszlo: MP initialization is tricky to scale, and platform dependent. My group builds real machines with thousands of APs, and we have had to do various tweaks to the MP init. code. Having a PCD to adjust this timeout will be very useful. Thanks all for the feedbacks. Please grant me some time to prepare a new version of the patch. Best Regards, Hao Wu Thanks, -- Brian J. Johnson Enterprise X86 Lab
Hewlett Packard Enterprise brian.johnson@...
|
|
On 3/23/20 9:37 AM, Ni, Ray wrote: Laszlo, Adding a PCD means platform integrators need to consider which value to set. Most of the time, they may just use the default PCD value. Then, why not we add the PCD later when a real case is met? The patch changes existent behavior; it is not for a newly introduced feature.
Because most platforms are not in the edk2 tree, we don't know what platforms could be regressed by increasing the polling frequency tenfold. (And remember that the polling action has O(n) cost, where "n" is the logical processor count.)
Under your suggestion, the expression "real case is met" amounts to "someone reports a regression" (possibly after the next stable tag, even). I don't think that's a good idea. In particular, the patch is motivated by RegisterCpuFeaturesLib -- the CpuFeaturesInitialize() function -- on some platform(s) that Hao uses. But there are platforms that don't use RegisterCpuFeaturesLib, and still use MpInitLib. OK. I agree with your suggestion.
Thank you. I agree with Laszlo: MP initialization is tricky to scale, and platform dependent. My group builds real machines with thousands of APs, and we have had to do various tweaks to the MP init. code. Having a PCD to adjust this timeout will be very useful. Thanks, -- Brian J. Johnson Enterprise X86 Lab Hewlett Packard Enterprise brian.johnson@...
|
|