Missing barrier when switching stack


Alexey Kardashevskiy
 

I am observing weirdness in PeiCheckAndSwitchStack() when SecCoreData is offset twice.

Since the produced physical address is beyond 2GB and there is no any other memory region, this becomes unassigned memory access which almost always points to a bug. QEMU by default ignores these but "-d guest_errors" makes them visible and gdb stops at unassigned memory region hook (and this is the most annoying part).

This is to demonstrate the idea: https://github.com/aik/edk2/commit/2e81b553c189a41a824d2bc3b6e803bd9b26b07f

TRACE1 prints SecCoreData == 81FD20 and TRACE2 prints 7BF55D20 and TRACE3 prints 7BF55D20 (the same as TRACE2).
But if TRACE2 is commented out (as in the patch), TRACE3 prins 0xf768bd20 which is 2*(0x7BF55D20 - 0x81FD20) + 0x81FD20.

The code adjusts SecCoreData before it switches the stack, SecCoreData should be adjusted once but it happens twice. /me brain explodes.

This is AMD EPYC, upstream QEMU, q32 machine, 2GB of RAM, 1 vcpu.

Is some barrier missing here? Thanks,

This is the QEMU cmdline:

/home/aik/pbuild/qemu-snp-localhost-x86_64/qemu-system-x86_64 \
-enable-kvm \
-m 2G \
-smp 1 \
-netdev user,id=USER0,hostfwd=tcp::2223-:22 \
-device virtio-net-pci,id=vnet0,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER0 \
-machine q35 \
-device virtio-scsi-pci,id=vscsi0,iommu_platform=on,disable-modern=off,disable-legacy=on \
-drive id=DRIVE0,if=none,file=img/u2204_128G_aikbook_sev.qcow2,format=qcow2 \
-device scsi-hd,id=scsi-hd0,drive=DRIVE0 \
-drive if=pflash,format=raw,unit=0,file=/home/aik/OVMF_CODE.fd,readonly=on,id=MYPF \
-nographic \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device isa-serial,id=isa-serial0,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-kernel /boot/vmlinuz \
-append console=ttyS0,115200n1 earlyprintk root=/dev/sda3 \
-d guest_errors

Originally posted on github https://github.com/tianocore/edk2/discussions/3419 but that place looks eerily quiet.


Alexey Kardashevskiy
 

Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm.


Oliver Steffen
 

Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm.
Interesting. Do you know why / what is different on Fedora?

Thanks,
Oliver


Alexey Kardashevskiy
 

On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm.
Interesting. Do you know why / what is different on Fedora?
Absolutely not :( I suspect "-t GCC" but did not have time to dig any deeper, this is how I build it:

source ./edksetup.sh --reconfig

nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc



--
Alexey


Alexey Kardashevskiy
 

On 07/11/2022 22:51, Alexey Kardashevskiy wrote:
On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm.
Interesting.  Do you know why / what is different on Fedora?
Absolutely not :( I suspect "-t GCC" but did not have time to dig any deeper, this is how I build it:
source ./edksetup.sh --reconfig
nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc
This is the culpit:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)

Disabling LTO fixes the problem, more info here (cut-n-paste below):
https://github.com/aik/edk2/commits/lto-off

"-flto=auto" or "-flto=1" do not help, I came across those when found that LTO changed in gcc12.


What I still do not understand - where do -flto and -DUSING_LTO come from? Also, there seems to be no way for disabling LTO via a pragma just for that PeiCheckAndSwitchStack(), or is there? Thanks,



===
The crash happens in AMD SEV-ES guest under QEMU where stack switching
happens - PeiCheckAndSwitchStack() from
MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c

!!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000
RIP - 000000000082043A, CS - 0000000000000018, RFLAGS - 0000000000010046
RAX - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000
RBX - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088
RSI - 000000007BF8C608, RDI - 000000007BF8D088
R8 - 0000000000000000, R9 - 000000007BF8C608, R10 - 00000000F76F9D20
R11 - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001
R14 - 000000007BF8C608, R15 - 000000007B76D000
DS - 0000000000000008, ES - 0000000000000008, FS - 0000000000000008
GS - 0000000000000008, SS - 0000000000000008
CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000
CR4 - 0000000000000668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000
GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000
IDTR - 000000007BF8CD70 000000000000021F, TR - 0000000000000000
FXSAVE_STATE - 000000007BF8B9F0
!!!! Find image based on IP(0x82043A) /home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod
ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140, EntryPoint=00000000
00827E5D) !!!!

Disabling LTO fixes the problem. The problem seems to be introduced
recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock Ubuntu 22.04).

These fail building bootable image unless LTO is off:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
GNU ld version 2.37-36.fc36

Because disabling LTO triggers a lot of uninitialized warnings, this
fixes them too.
===



--
Alexey


Alexey Kardashevskiy
 

On 14/11/2022 13:57, Alexey Kardashevskiy wrote:
On 07/11/2022 22:51, Alexey Kardashevskiy wrote:


On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm.
Interesting.  Do you know why / what is different on Fedora?
Absolutely not :( I suspect "-t GCC" but did not have time to dig any deeper, this is how I build it:

source ./edksetup.sh --reconfig
nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc
This is the culpit:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
Disabling LTO fixes the problem, more info here (cut-n-paste below):
https://github.com/aik/edk2/commits/lto-off
"-flto=auto" or "-flto=1" do not help, I came across those when found that LTO changed in gcc12.
What I still do not understand - where do -flto and -DUSING_LTO come from? Also, there seems to be no way for disabling LTO via a pragma just for that PeiCheckAndSwitchStack(), or is there? Thanks,
btw #pragma GCC optimize ("O0") or "Ofast" wrapped just around PeiCheckAndSwitchStack() fixes it too but not "Os" - this one still produces broken image.


===
The crash happens in AMD SEV-ES guest under QEMU where stack switching
happens - PeiCheckAndSwitchStack() from
MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c
!!!! X64 Exception Type - 0D(#GP - General Protection)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000
RIP  - 000000000082043A, CS  - 0000000000000018, RFLAGS - 0000000000010046
RAX  - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000
RBX  - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088
RSI  - 000000007BF8C608, RDI - 000000007BF8D088
R8   - 0000000000000000, R9  - 000000007BF8C608, R10 - 00000000F76F9D20
R11  - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001
R14  - 000000007BF8C608, R15 - 000000007B76D000
DS   - 0000000000000008, ES  - 0000000000000008, FS  - 0000000000000008
GS   - 0000000000000008, SS  - 0000000000000008
CR0  - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000
GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000
IDTR - 000000007BF8CD70 000000000000021F,   TR - 0000000000000000
FXSAVE_STATE - 000000007BF8B9F0
!!!! Find image based on IP(0x82043A) /home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod
ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140, EntryPoint=00000000
00827E5D) !!!!
Disabling LTO fixes the problem. The problem seems to be introduced
recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock Ubuntu 22.04).
These fail building bootable image unless LTO is off:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
GNU ld version 2.37-36.fc36
Because disabling LTO triggers a lot of uninitialized warnings, this
fixes them too.
===


--
Alexey


Oliver Steffen
 

Quoting Alexey Kardashevskiy (2022-11-14 07:36:17)


On 14/11/2022 13:57, Alexey Kardashevskiy wrote:


On 07/11/2022 22:51, Alexey Kardashevskiy wrote:


On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (==
the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is
selected btw. False alarm.
Interesting.  Do you know why / what is different on Fedora?
Absolutely not :( I suspect "-t GCC" but did not have time to dig any
deeper, this is how I build it:

source ./edksetup.sh --reconfig
nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t
GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc
This is the culpit:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)

Disabling LTO fixes the problem, more info here (cut-n-paste below):
https://github.com/aik/edk2/commits/lto-off

"-flto=auto" or "-flto=1" do not help, I came across those when found
that LTO changed in gcc12.


What I still do not understand - where do -flto and -DUSING_LTO come
from? Also, there seems to be no way for disabling LTO via a pragma just
for that PeiCheckAndSwitchStack(), or is there? Thanks,
btw #pragma GCC optimize ("O0") or "Ofast" wrapped just around
PeiCheckAndSwitchStack() fixes it too but not "Os" - this one still
produces broken image.




===
The crash happens in AMD SEV-ES guest under QEMU where stack switching
happens - PeiCheckAndSwitchStack() from
MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c

!!!! X64 Exception Type - 0D(#GP - General Protection)  CPU Apic ID -
00000000 !!!!
ExceptionData - 0000000000000000
RIP  - 000000000082043A, CS  - 0000000000000018, RFLAGS - 0000000000010046
RAX  - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000
RBX  - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088
RSI  - 000000007BF8C608, RDI - 000000007BF8D088
R8   - 0000000000000000, R9  - 000000007BF8C608, R10 - 00000000F76F9D20
R11  - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001
R14  - 000000007BF8C608, R15 - 000000007B76D000
DS   - 0000000000000008, ES  - 0000000000000008, FS  - 0000000000000008
GS   - 0000000000000008, SS  - 0000000000000008
CR0  - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000
GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000
IDTR - 000000007BF8CD70 000000000000021F,   TR - 0000000000000000
FXSAVE_STATE - 000000007BF8B9F0
!!!! Find image based on IP(0x82043A)
/home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod
ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140,
EntryPoint=00000000
00827E5D) !!!!

Disabling LTO fixes the problem. The problem seems to be introduced
recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock
Ubuntu 22.04).

These fail building bootable image unless LTO is off:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
GNU ld version 2.37-36.fc36

Because disabling LTO triggers a lot of uninitialized warnings, this
fixes them too.
===
Thanks for the patch, Alexey.
It works with the svsm-preview branch too
(https://github.com/AMDESE/ovmf/tree/svsm-preview).

- Oliver


Alexey Kardashevskiy
 

I am pretty sure the correct patch should be some GCC barries inside that stack switching code, I just could not figure it out yet. Disabling optimization for the entire functions seems to be the simplest fix for now. Thanks,

On 18/11/2022 00:10, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-14 07:36:17)


On 14/11/2022 13:57, Alexey Kardashevskiy wrote:


On 07/11/2022 22:51, Alexey Kardashevskiy wrote:


On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (==
the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is
selected btw. False alarm.
Interesting.  Do you know why / what is different on Fedora?
Absolutely not :( I suspect "-t GCC" but did not have time to dig any
deeper, this is how I build it:

source ./edksetup.sh --reconfig
nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t
GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc
This is the culpit:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)

Disabling LTO fixes the problem, more info here (cut-n-paste below):
https://github.com/aik/edk2/commits/lto-off

"-flto=auto" or "-flto=1" do not help, I came across those when found
that LTO changed in gcc12.


What I still do not understand - where do -flto and -DUSING_LTO come
from? Also, there seems to be no way for disabling LTO via a pragma just
for that PeiCheckAndSwitchStack(), or is there? Thanks,
btw #pragma GCC optimize ("O0") or "Ofast" wrapped just around
PeiCheckAndSwitchStack() fixes it too but not "Os" - this one still
produces broken image.




===
The crash happens in AMD SEV-ES guest under QEMU where stack switching
happens - PeiCheckAndSwitchStack() from
MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c

!!!! X64 Exception Type - 0D(#GP - General Protection)  CPU Apic ID -
00000000 !!!!
ExceptionData - 0000000000000000
RIP  - 000000000082043A, CS  - 0000000000000018, RFLAGS - 0000000000010046
RAX  - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000
RBX  - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088
RSI  - 000000007BF8C608, RDI - 000000007BF8D088
R8   - 0000000000000000, R9  - 000000007BF8C608, R10 - 00000000F76F9D20
R11  - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001
R14  - 000000007BF8C608, R15 - 000000007B76D000
DS   - 0000000000000008, ES  - 0000000000000008, FS  - 0000000000000008
GS   - 0000000000000008, SS  - 0000000000000008
CR0  - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000
GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000
IDTR - 000000007BF8CD70 000000000000021F,   TR - 0000000000000000
FXSAVE_STATE - 000000007BF8B9F0
!!!! Find image based on IP(0x82043A)
/home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod
ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140,
EntryPoint=00000000
00827E5D) !!!!

Disabling LTO fixes the problem. The problem seems to be introduced
recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock
Ubuntu 22.04).

These fail building bootable image unless LTO is off:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
GNU ld version 2.37-36.fc36

Because disabling LTO triggers a lot of uninitialized warnings, this
fixes them too.
===
Thanks for the patch, Alexey.
It works with the svsm-preview branch too
(https://github.com/AMDESE/ovmf/tree/svsm-preview).
- Oliver
--
Alexey


Alexey Kardashevskiy
 

Ping?

This is likely to affect more people when gcc12 makes it to more distros. OTOH it only breaks VMs with encrypted memory which is just a fraction of OVMF users :-/

I poked it a bit more but disasm has way too much LTO noise and when I disable LTO - the problem goes away too :-/

On 18/11/2022 15:15, Alexey Kardashevskiy wrote:
I am pretty sure the correct patch should be some GCC barries inside that stack switching code, I just could not figure it out yet. Disabling optimization for the entire functions seems to be the simplest fix for now. Thanks,
On 18/11/2022 00:10, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-14 07:36:17)


On 14/11/2022 13:57, Alexey Kardashevskiy wrote:


On 07/11/2022 22:51, Alexey Kardashevskiy wrote:


On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (==
the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is
selected btw. False alarm.
Interesting.  Do you know why / what is different on Fedora?
Absolutely not :( I suspect "-t GCC" but did not have time to dig any
deeper, this is how I build it:

source ./edksetup.sh --reconfig
nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t
GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc
This is the culpit:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)

Disabling LTO fixes the problem, more info here (cut-n-paste below):
https://github.com/aik/edk2/commits/lto-off

"-flto=auto" or "-flto=1" do not help, I came across those when found
that LTO changed in gcc12.


What I still do not understand - where do -flto and -DUSING_LTO come
from? Also, there seems to be no way for disabling LTO via a pragma just
for that PeiCheckAndSwitchStack(), or is there? Thanks,
btw #pragma GCC optimize ("O0") or "Ofast" wrapped just around
PeiCheckAndSwitchStack() fixes it too but not "Os" - this one still
produces broken image.




===
The crash happens in AMD SEV-ES guest under QEMU where stack switching
happens - PeiCheckAndSwitchStack() from
MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c

!!!! X64 Exception Type - 0D(#GP - General Protection)  CPU Apic ID -
00000000 !!!!
ExceptionData - 0000000000000000
RIP  - 000000000082043A, CS  - 0000000000000018, RFLAGS - 0000000000010046
RAX  - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000
RBX  - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088
RSI  - 000000007BF8C608, RDI - 000000007BF8D088
R8   - 0000000000000000, R9  - 000000007BF8C608, R10 - 00000000F76F9D20
R11  - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001
R14  - 000000007BF8C608, R15 - 000000007B76D000
DS   - 0000000000000008, ES  - 0000000000000008, FS  - 0000000000000008
GS   - 0000000000000008, SS  - 0000000000000008
CR0  - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000
GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000
IDTR - 000000007BF8CD70 000000000000021F,   TR - 0000000000000000
FXSAVE_STATE - 000000007BF8B9F0
!!!! Find image based on IP(0x82043A)
/home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod
ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140,
EntryPoint=00000000
00827E5D) !!!!

Disabling LTO fixes the problem. The problem seems to be introduced
recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock
Ubuntu 22.04).

These fail building bootable image unless LTO is off:
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
GNU ld version 2.37-36.fc36

Because disabling LTO triggers a lot of uninitialized warnings, this
fixes them too.
===
Thanks for the patch, Alexey.
It works with the svsm-preview branch too
(https://github.com/AMDESE/ovmf/tree/svsm-preview).

- Oliver
--
Alexey


Oliver Steffen
 

Quoting Gerd Hoffmann (2022-11-28 07:55:32)
On Thu, Nov 24, 2022 at 04:04:33PM +1100, Alexey Kardashevskiy wrote:
Ping?

This is likely to affect more people when gcc12 makes it to more distros.
OTOH it only breaks VMs with encrypted memory which is just a fraction of
OVMF users :-/

I poked it a bit more but disasm has way too much LTO noise and when I
disable LTO - the problem goes away too :-/
Just found this while checking the status of my local branches.

HTH & take care,
Gerd
This also fixes it for the linux-SVSM case.
Thanks, Gerd!


From ef9557c940e4110f16b6f0c0c73fbf04551f08f6 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@...>
Date: Fri, 10 Jun 2022 07:43:15 +0200
Subject: [PATCH 1/1] tools_def: add -fno-omit-frame-pointer to
GCC48_{IA32,X64}_CC_FLAGS

Fixes problems due to code assuming it runs with frame pointers and thus
updates rbp / ebp registers when switching stacks.

Signed-off-by: Gerd Hoffmann <kraxel@...>
---
BaseTools/Conf/tools_def.template | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
index 73f95b2a3a9f..f1fd6a003062 100755
--- a/BaseTools/Conf/tools_def.template
+++ b/BaseTools/Conf/tools_def.template
@@ -1888,8 +1888,8 @@ DEFINE GCC_DEPS_FLAGS = -MMD -MF $@.deps

DEFINE GCC48_ALL_CC_FLAGS = DEF(GCC_ALL_CC_FLAGS) -ffunction-sections -fdata-sections -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings
DEFINE GCC48_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20
-DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address
-DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address
+DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer
+DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer
DEFINE GCC48_IA32_X64_ASLDLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable
DEFINE GCC48_IA32_X64_DLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map,--whole-archive
DEFINE GCC48_IA32_DLINK2_FLAGS = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON)
--
2.38.1
--
🎩Oliver Steffen (he/him) - Software Engineer, Virtualization
Red Hat GmbH <https://www.redhat.com/de/global/dach>,
Registered seat: Werner-von-Siemens-Ring 12, D-85630 Grasbrunn, Germany
Commercial register: Amtsgericht München/Munich, HRB 153243,
Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill,
Amy Ross

Everyone has different working hours… Please do not feel obligated to
reply outside of your normal work schedule.


Andrew Fish
 

Do we know the root cause of this crash? Turning off frame pointers seems like kind of hack. There are other toolchains like clang that have frame pointers on by default.

I thought the idea was PeiCheckAndSwitchStack() uses SwitchStack(). SwitchStack() calls InternalSwitchStack() -> that calls InternalSwitchStackAsm(). The InternalSwitchStackAsm() assembler should switch the stack and jump to a new function. This should break the chain of reusing a stale frame pointer.

I saw the crash was a GP fault. I can’t tell from the exception dump what caused the GP fault?

Thanks,

Andrew Fish

On Nov 28, 2022, at 1:15 AM, Oliver Steffen <osteffen@...> wrote:

Quoting Gerd Hoffmann (2022-11-28 07:55:32)
On Thu, Nov 24, 2022 at 04:04:33PM +1100, Alexey Kardashevskiy wrote:
Ping?

This is likely to affect more people when gcc12 makes it to more distros.
OTOH it only breaks VMs with encrypted memory which is just a fraction of
OVMF users :-/

I poked it a bit more but disasm has way too much LTO noise and when I
disable LTO - the problem goes away too :-/
Just found this while checking the status of my local branches.

HTH & take care,
Gerd
This also fixes it for the linux-SVSM case.
Thanks, Gerd!


From ef9557c940e4110f16b6f0c0c73fbf04551f08f6 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@...>
Date: Fri, 10 Jun 2022 07:43:15 +0200
Subject: [PATCH 1/1] tools_def: add -fno-omit-frame-pointer to
GCC48_{IA32,X64}_CC_FLAGS

Fixes problems due to code assuming it runs with frame pointers and thus
updates rbp / ebp registers when switching stacks.

Signed-off-by: Gerd Hoffmann <kraxel@...>
---
BaseTools/Conf/tools_def.template | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
index 73f95b2a3a9f..f1fd6a003062 100755
--- a/BaseTools/Conf/tools_def.template
+++ b/BaseTools/Conf/tools_def.template
@@ -1888,8 +1888,8 @@ DEFINE GCC_DEPS_FLAGS = -MMD -MF $@.deps

DEFINE GCC48_ALL_CC_FLAGS = DEF(GCC_ALL_CC_FLAGS) -ffunction-sections -fdata-sections -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings
DEFINE GCC48_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20
-DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address
-DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address
+DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer
+DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer
DEFINE GCC48_IA32_X64_ASLDLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable
DEFINE GCC48_IA32_X64_DLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map,--whole-archive
DEFINE GCC48_IA32_DLINK2_FLAGS = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON)
--
2.38.1
--
🎩Oliver Steffen (he/him) - Software Engineer, Virtualization
Red Hat GmbH <https://www.redhat.com/de/global/dach>,
Registered seat: Werner-von-Siemens-Ring 12, D-85630 Grasbrunn, Germany
Commercial register: Amtsgericht München/Munich, HRB 153243,
Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill,
Amy Ross

Everyone has different working hours… Please do not feel obligated to
reply outside of your normal work schedule.






Gerd Hoffmann <kraxel@...>
 

On Thu, Nov 24, 2022 at 04:04:33PM +1100, Alexey Kardashevskiy wrote:
Ping?

This is likely to affect more people when gcc12 makes it to more distros.
OTOH it only breaks VMs with encrypted memory which is just a fraction of
OVMF users :-/

I poked it a bit more but disasm has way too much LTO noise and when I
disable LTO - the problem goes away too :-/
Just found this while checking the status of my local branches.

HTH & take care,
Gerd

From ef9557c940e4110f16b6f0c0c73fbf04551f08f6 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@...>
Date: Fri, 10 Jun 2022 07:43:15 +0200
Subject: [PATCH 1/1] tools_def: add -fno-omit-frame-pointer to
GCC48_{IA32,X64}_CC_FLAGS

Fixes problems due to code assuming it runs with frame pointers and thus
updates rbp / ebp registers when switching stacks.

Signed-off-by: Gerd Hoffmann <kraxel@...>
---
BaseTools/Conf/tools_def.template | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
index 73f95b2a3a9f..f1fd6a003062 100755
--- a/BaseTools/Conf/tools_def.template
+++ b/BaseTools/Conf/tools_def.template
@@ -1888,8 +1888,8 @@ DEFINE GCC_DEPS_FLAGS = -MMD -MF $@.deps

DEFINE GCC48_ALL_CC_FLAGS = DEF(GCC_ALL_CC_FLAGS) -ffunction-sections -fdata-sections -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings
DEFINE GCC48_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20
-DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address
-DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address
+DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer
+DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer
DEFINE GCC48_IA32_X64_ASLDLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable
DEFINE GCC48_IA32_X64_DLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map,--whole-archive
DEFINE GCC48_IA32_DLINK2_FLAGS = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON)
--
2.38.1


Gerd Hoffmann <kraxel@...>
 

On Mon, Nov 28, 2022 at 10:34:08AM -0800, Andrew Fish wrote:

Do we know the root cause of this crash? Turning off frame pointers
seems like kind of hack. There are other toolchains like clang that
have frame pointers on by default.
InternalSwitchStackAsm(). The InternalSwitchStackAsm() assembler
should switch the stack and jump to a new function. This should break
the chain of reusing a stale frame pointer.
Was discussed a few months back on the list. If I remember correctly:

The assembler code needs to know whenever frame pointers are used or
not (and update rbp or not). Current code assumes frame pointers are
used. Adding -fno-omit-frame-pointer enforces that for gcc.

When gcc does not use framepointers and additionally the optimizer
decides to use the rbp register for something else things go south
when the stack switching code modifies rbp.

take care,
Gerd


Alexey Kardashevskiy
 

On 28/11/2022 17:55, Gerd Hoffmann wrote:
On Thu, Nov 24, 2022 at 04:04:33PM +1100, Alexey Kardashevskiy wrote:
Ping?

This is likely to affect more people when gcc12 makes it to more distros.
OTOH it only breaks VMs with encrypted memory which is just a fraction of
OVMF users :-/

I poked it a bit more but disasm has way too much LTO noise and when I
disable LTO - the problem goes away too :-/
Just found this while checking the status of my local branches.

This helps indeed too, just tested. Are you pushing this out or I repost my pragma optimize patch? Thanks,


HTH & take care,
Gerd
From ef9557c940e4110f16b6f0c0c73fbf04551f08f6 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@...>
Date: Fri, 10 Jun 2022 07:43:15 +0200
Subject: [PATCH 1/1] tools_def: add -fno-omit-frame-pointer to
GCC48_{IA32,X64}_CC_FLAGS
Fixes problems due to code assuming it runs with frame pointers and thus
updates rbp / ebp registers when switching stacks.
Signed-off-by: Gerd Hoffmann <kraxel@...>
---
BaseTools/Conf/tools_def.template | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template
index 73f95b2a3a9f..f1fd6a003062 100755
--- a/BaseTools/Conf/tools_def.template
+++ b/BaseTools/Conf/tools_def.template
@@ -1888,8 +1888,8 @@ DEFINE GCC_DEPS_FLAGS = -MMD -MF $@.deps
DEFINE GCC48_ALL_CC_FLAGS = DEF(GCC_ALL_CC_FLAGS) -ffunction-sections -fdata-sections -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings
DEFINE GCC48_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20
-DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address
-DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address
+DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer
+DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer
DEFINE GCC48_IA32_X64_ASLDLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable
DEFINE GCC48_IA32_X64_DLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map,--whole-archive
DEFINE GCC48_IA32_DLINK2_FLAGS = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON)
--
Alexey