[PATCH 1/1] UefiCpuPkg/MpInitLib: Reduce reset vector memory pressure


Lendacky, Thomas
 

From: Tom Lendacky <thomas.lendacky@amd.com>

The AP reset vector stack allocation is only required if running as an
SEV-ES guest. Since the reset vector allocation is below 1MB in memory,
eliminate the requirement for bare-metal systems and non SEV-ES guests
to allocate the extra stack area, which can be large if the
PcdCpuMaxLogicalProcessorNumber value is large.

Cc: Garrett Kirkendall <garrett.kirkendall@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
UefiCpuPkg/Library/MpInitLib/MpLib.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index 07426274f639..39af2f9fba7d 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -1152,7 +1152,15 @@ GetApResetStackSize (
VOID
)
{
- return AP_RESET_STACK_SIZE * PcdGet32(PcdCpuMaxLogicalProcessorNumber);
+ //
+ // The AP reset stack is only used by SEV-ES guests. Don't add to the
+ // allocation if not necessary.
+ //
+ if (PcdGetBool (PcdSevEsIsEnabled) == TRUE) {
+ return AP_RESET_STACK_SIZE * PcdGet32(PcdCpuMaxLogicalProcessorNumber);
+ } else {
+ return 0;
+ }
}

/**
--
2.28.0


Ni, Ray
 

It's a good optimization.

Can you eliminate the GetApResetStackSize() and embed the check and calculation in GetApResetVectorSize()?
Reason: GetApResetStackSize() is only called in one place. Removing the additional layer of function call makes
the code easier to read.

Thanks,
Ray

-----Original Message-----
From: Tom Lendacky <thomas.lendacky@amd.com>
Sent: Wednesday, September 23, 2020 3:59 AM
To: devel@edk2.groups.io
Cc: Dong, Eric <eric.dong@intel.com>; Ni, Ray <ray.ni@intel.com>; Laszlo Ersek
<lersek@redhat.com>; Kumar, Rahul1 <rahul1.kumar@intel.com>; Brijesh
Singh <brijesh.singh@amd.com>; Garrett Kirkendall
<garrett.kirkendall@amd.com>
Subject: [PATCH 1/1] UefiCpuPkg/MpInitLib: Reduce reset vector memory
pressure

From: Tom Lendacky <thomas.lendacky@amd.com>

The AP reset vector stack allocation is only required if running as an
SEV-ES guest. Since the reset vector allocation is below 1MB in memory,
eliminate the requirement for bare-metal systems and non SEV-ES guests
to allocate the extra stack area, which can be large if the
PcdCpuMaxLogicalProcessorNumber value is large.

Cc: Garrett Kirkendall <garrett.kirkendall@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
UefiCpuPkg/Library/MpInitLib/MpLib.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index 07426274f639..39af2f9fba7d 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -1152,7 +1152,15 @@ GetApResetStackSize (
VOID
)
{
- return AP_RESET_STACK_SIZE *
PcdGet32(PcdCpuMaxLogicalProcessorNumber);
+ //
+ // The AP reset stack is only used by SEV-ES guests. Don't add to the
+ // allocation if not necessary.
+ //
+ if (PcdGetBool (PcdSevEsIsEnabled) == TRUE) {
+ return AP_RESET_STACK_SIZE *
PcdGet32(PcdCpuMaxLogicalProcessorNumber);
+ } else {
+ return 0;
+ }
}

/**
--
2.28.0


Laszlo Ersek
 

On 09/22/20 21:59, Tom Lendacky wrote:
From: Tom Lendacky <thomas.lendacky@amd.com>

The AP reset vector stack allocation is only required if running as an
SEV-ES guest. Since the reset vector allocation is below 1MB in memory,
eliminate the requirement for bare-metal systems and non SEV-ES guests
to allocate the extra stack area, which can be large if the
PcdCpuMaxLogicalProcessorNumber value is large.

Cc: Garrett Kirkendall <garrett.kirkendall@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
UefiCpuPkg/Library/MpInitLib/MpLib.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index 07426274f639..39af2f9fba7d 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -1152,7 +1152,15 @@ GetApResetStackSize (
VOID
)
{
- return AP_RESET_STACK_SIZE * PcdGet32(PcdCpuMaxLogicalProcessorNumber);
+ //
+ // The AP reset stack is only used by SEV-ES guests. Don't add to the
+ // allocation if not necessary.
+ //
+ if (PcdGetBool (PcdSevEsIsEnabled) == TRUE) {
+ return AP_RESET_STACK_SIZE * PcdGet32(PcdCpuMaxLogicalProcessorNumber);
+ } else {
+ return 0;
+ }
}

/**
Looks like this is a fix for commit 7b7508ad784d ("UefiCpuPkg: Allow AP
booting under SEV-ES", 2020-08-17).

In retrospect, that commit changed "ApResetVectorSize" -- which is
passed to GetWakeupBuffer() -- from value [a]

RendezvousFunnelSize + sizeof (MP_CPU_EXCHANGE_INFO)

to value [b]

ALIGN_VALUE ((RendezvousFunnelSize +
SwitchToRealSize +
sizeof (MP_CPU_EXCHANGE_INFO)),
CPU_STACK_ALIGNMENT) +
AP_RESET_STACK_SIZE * PcdGet32(PcdCpuMaxLogicalProcessorNumber)

The currently proposed patch does not entirely restore
"ApResetVectorSize" to the value it used to carry before commit
7b7508ad784d. Namely, while the patch eliminates the last addend, two
other changes from commit 7b7508ad784d remain in place:

- the addition of "SwitchToRealSize",
- the alignment up to CPU_STACK_ALIGNMENT.

As far as I understand, "SwitchToRealSize" is never zero
(AsmGetAddressMap() populates it with a difference of build-time
constants). I think it's not useful when SEV-ES is inactive.

Furthermore, the alignment for stack purposes is useless if we won't
have AP stacks (i.e., again when SEV-ES is inactive).

(1) Therefore I'd propose:

- folding GetApResetStackSize() into GetApResetVectorSize(),

- modifying GetApResetVectorSize() such that it return the original sum
[a] if SEV-ES is inactive, and the larger sum [b] if SEV-ES is active.

Hmmm. OK, maybe "SwitchToRealSize" should remain in place. It's a small
addition, and it reflects a code portion that is permanent. However, I
do think the alignment is both useless and confusing. If we won't
allocate an array of stacks, the alignment really makes no sense.


(2) A style comment: PcdGetBool()'s return value should not be compared
with TRUE or FALSE; just use PcdGetBool() as the whole controlling
expression for the "if".


(3) Even better... can you modify GetApResetVectorSize() to take
&CpuMpData rather than &CpuMpData->AddressMap, and then check
CpuMpData->SevEsIsEnabled?

Hmmm, wait, that's not really simple, as we call GetApResetVectorSize()
from MpInitLibInitialize() too, way before we set
CpuMpData->SevEsIsEnabled from the PCD.

So I guess we should pass a dedicated BOOLEAN parameter to
GetApResetVectorSize(), called "SevEsIsEnabled". At the call site in
MpInitLibInitialize(), we should pass in the PCD's value. At the call
site in AllocateResetVector(), we should pass in
CpuMpData->SevEsIsEnabled.

The reason I'm suggesting (3) is that I don't feel comfortable with
checking dynamic PCDs outside of entry point functions / initialization
functions.

Thanks,
Laszlo


Laszlo Ersek
 

On 09/23/20 10:14, Laszlo Ersek wrote:
On 09/22/20 21:59, Tom Lendacky wrote:
From: Tom Lendacky <thomas.lendacky@amd.com>

The AP reset vector stack allocation is only required if running as an
SEV-ES guest. Since the reset vector allocation is below 1MB in memory,
eliminate the requirement for bare-metal systems and non SEV-ES guests
to allocate the extra stack area, which can be large if the
PcdCpuMaxLogicalProcessorNumber value is large.

Cc: Garrett Kirkendall <garrett.kirkendall@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
UefiCpuPkg/Library/MpInitLib/MpLib.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index 07426274f639..39af2f9fba7d 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -1152,7 +1152,15 @@ GetApResetStackSize (
VOID
)
{
- return AP_RESET_STACK_SIZE * PcdGet32(PcdCpuMaxLogicalProcessorNumber);
+ //
+ // The AP reset stack is only used by SEV-ES guests. Don't add to the
+ // allocation if not necessary.
+ //
+ if (PcdGetBool (PcdSevEsIsEnabled) == TRUE) {
+ return AP_RESET_STACK_SIZE * PcdGet32(PcdCpuMaxLogicalProcessorNumber);
+ } else {
+ return 0;
+ }
}

/**
Looks like this is a fix for commit 7b7508ad784d ("UefiCpuPkg: Allow AP
booting under SEV-ES", 2020-08-17).

In retrospect, that commit changed "ApResetVectorSize" -- which is
passed to GetWakeupBuffer() -- from value [a]

RendezvousFunnelSize + sizeof (MP_CPU_EXCHANGE_INFO)

to value [b]

ALIGN_VALUE ((RendezvousFunnelSize +
SwitchToRealSize +
sizeof (MP_CPU_EXCHANGE_INFO)),
CPU_STACK_ALIGNMENT) +
AP_RESET_STACK_SIZE * PcdGet32(PcdCpuMaxLogicalProcessorNumber)

The currently proposed patch does not entirely restore
"ApResetVectorSize" to the value it used to carry before commit
7b7508ad784d. Namely, while the patch eliminates the last addend, two
other changes from commit 7b7508ad784d remain in place:

- the addition of "SwitchToRealSize",
- the alignment up to CPU_STACK_ALIGNMENT.

As far as I understand, "SwitchToRealSize" is never zero
(AsmGetAddressMap() populates it with a difference of build-time
constants). I think it's not useful when SEV-ES is inactive.

Furthermore, the alignment for stack purposes is useless if we won't
have AP stacks (i.e., again when SEV-ES is inactive).

(1) Therefore I'd propose:

- folding GetApResetStackSize() into GetApResetVectorSize(),

- modifying GetApResetVectorSize() such that it return the original sum
[a] if SEV-ES is inactive, and the larger sum [b] if SEV-ES is active.

Hmmm. OK, maybe "SwitchToRealSize" should remain in place. It's a small
addition, and it reflects a code portion that is permanent. However, I
do think the alignment is both useless and confusing. If we won't
allocate an array of stacks, the alignment really makes no sense.


(2) A style comment: PcdGetBool()'s return value should not be compared
with TRUE or FALSE; just use PcdGetBool() as the whole controlling
expression for the "if".


(3) Even better... can you modify GetApResetVectorSize() to take
&CpuMpData rather than &CpuMpData->AddressMap, and then check
CpuMpData->SevEsIsEnabled?

Hmmm, wait, that's not really simple, as we call GetApResetVectorSize()
from MpInitLibInitialize() too, way before we set
CpuMpData->SevEsIsEnabled from the PCD.

So I guess we should pass a dedicated BOOLEAN parameter to
GetApResetVectorSize(), called "SevEsIsEnabled". At the call site in
MpInitLibInitialize(), we should pass in the PCD's value. At the call
site in AllocateResetVector(), we should pass in
CpuMpData->SevEsIsEnabled.

The reason I'm suggesting (3) is that I don't feel comfortable with
checking dynamic PCDs outside of entry point functions / initialization
functions.
(4) The product

AP_RESET_STACK_SIZE * PcdGet32(PcdCpuMaxLogicalProcessorNumber)

includes the BSP. Is that intentional? Should we rather use:

AP_RESET_STACK_SIZE * (PcdGet32(PcdCpuMaxLogicalProcessorNumber) - 1)

?

(That would be a separate patch, of course.)

Thanks
Laszlo


Laszlo Ersek
 

On 09/23/20 10:14, Laszlo Ersek wrote:

(3) Even better... can you modify GetApResetVectorSize() to take
&CpuMpData rather than &CpuMpData->AddressMap, and then check
CpuMpData->SevEsIsEnabled?

Hmmm, wait, that's not really simple, as we call GetApResetVectorSize()
from MpInitLibInitialize() too, way before we set
CpuMpData->SevEsIsEnabled from the PCD.

So I guess we should pass a dedicated BOOLEAN parameter to
GetApResetVectorSize(), called "SevEsIsEnabled". At the call site in
MpInitLibInitialize(), we should pass in the PCD's value. At the call
site in AllocateResetVector(), we should pass in
CpuMpData->SevEsIsEnabled.

The reason I'm suggesting (3) is that I don't feel comfortable with
checking dynamic PCDs outside of entry point functions / initialization
functions.
You know what, never mind (3) -- I've just realized that
PcdCpuMaxLogicalProcessorNumber may be a dynamic PCD too. It might
require a lot of work to restrict all dynamic PCD accesses to the init
function only, and I couldn't necessarily justify all that work at the
moment (for myself or for anyone else).

So please consider (1), (2) and (4).

Thanks!
Laszlo


Lendacky, Thomas
 

On 9/23/20 3:31 AM, Laszlo Ersek wrote:
On 09/23/20 10:14, Laszlo Ersek wrote:

(3) Even better... can you modify GetApResetVectorSize() to take
&CpuMpData rather than &CpuMpData->AddressMap, and then check
CpuMpData->SevEsIsEnabled?

Hmmm, wait, that's not really simple, as we call GetApResetVectorSize()
from MpInitLibInitialize() too, way before we set
CpuMpData->SevEsIsEnabled from the PCD.

So I guess we should pass a dedicated BOOLEAN parameter to
GetApResetVectorSize(), called "SevEsIsEnabled". At the call site in
MpInitLibInitialize(), we should pass in the PCD's value. At the call
site in AllocateResetVector(), we should pass in
CpuMpData->SevEsIsEnabled.

The reason I'm suggesting (3) is that I don't feel comfortable with
checking dynamic PCDs outside of entry point functions / initialization
functions.
You know what, never mind (3) -- I've just realized that
PcdCpuMaxLogicalProcessorNumber may be a dynamic PCD too. It might
require a lot of work to restrict all dynamic PCD accesses to the init
function only, and I couldn't necessarily justify all that work at the
moment (for myself or for anyone else).

So please consider (1), (2) and (4).
Yup, will do.

Thanks,
Tom


Thanks!
Laszlo


Lendacky, Thomas
 

On 9/23/20 8:58 AM, Tom Lendacky wrote:
On 9/23/20 3:31 AM, Laszlo Ersek wrote:
On 09/23/20 10:14, Laszlo Ersek wrote:

(3) Even better... can you modify GetApResetVectorSize() to take
&CpuMpData rather than &CpuMpData->AddressMap, and then check
CpuMpData->SevEsIsEnabled?

Hmmm, wait, that's not really simple, as we call GetApResetVectorSize()
from MpInitLibInitialize() too, way before we set
CpuMpData->SevEsIsEnabled from the PCD.

So I guess we should pass a dedicated BOOLEAN parameter to
GetApResetVectorSize(), called "SevEsIsEnabled". At the call site in
MpInitLibInitialize(), we should pass in the PCD's value. At the call
site in AllocateResetVector(), we should pass in
CpuMpData->SevEsIsEnabled.

The reason I'm suggesting (3) is that I don't feel comfortable with
checking dynamic PCDs outside of entry point functions / initialization
functions.
You know what, never mind (3) -- I've just realized that
PcdCpuMaxLogicalProcessorNumber may be a dynamic PCD too. It might
require a lot of work to restrict all dynamic PCD accesses to the init
function only, and I couldn't necessarily justify all that work at the
moment (for myself or for anyone else).

So please consider (1), (2) and (4).
Yup, will do.
Re. #4, the code uses the APIC ID to calculate the stack start, so the
calculation, now in GetApResetVectorSize(), uses the full value of
PcdGet32(PcdCpuMaxLogicalProcessorNumber) instead of subtracting one from
it. I can always update MpInitLibSevEsAPReet() to subtract one from the
APIC ID value returned if we want to eliminate the extra 64 bytes.

Thanks,
Tom


Thanks,
Tom


Thanks!
Laszlo