I am observing weirdness in PeiCheckAndSwitchStack() when SecCoreData is offset twice. Since the produced physical address is beyond 2GB and there is no any other memory region, this becomes unassigned memory access which almost always points to a bug. QEMU by default ignores these but "-d guest_errors" makes them visible and gdb stops at unassigned memory region hook (and this is the most annoying part). This is to demonstrate the idea: https://github.com/aik/edk2/commit/2e81b553c189a41a824d2bc3b6e803bd9b26b07fTRACE1 prints SecCoreData == 81FD20 and TRACE2 prints 7BF55D20 and TRACE3 prints 7BF55D20 (the same as TRACE2). But if TRACE2 is commented out (as in the patch), TRACE3 prins 0xf768bd20 which is 2*(0x7BF55D20 - 0x81FD20) + 0x81FD20. The code adjusts SecCoreData before it switches the stack, SecCoreData should be adjusted once but it happens twice. /me brain explodes. This is AMD EPYC, upstream QEMU, q32 machine, 2GB of RAM, 1 vcpu. Is some barrier missing here? Thanks, This is the QEMU cmdline: /home/aik/pbuild/qemu-snp-localhost-x86_64/qemu-system-x86_64 \ -enable-kvm \ -m 2G \ -smp 1 \ -netdev user,id=USER0,hostfwd=tcp::2223-:22 \ -device virtio-net-pci,id=vnet0,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER0 \ -machine q35 \ -device virtio-scsi-pci,id=vscsi0,iommu_platform=on,disable-modern=off,disable-legacy=on \ -drive id=DRIVE0,if=none,file=img/u2204_128G_aikbook_sev.qcow2,format=qcow2 \ -device scsi-hd,id=scsi-hd0,drive=DRIVE0 \ -drive if=pflash,format=raw,unit=0,file=/home/aik/OVMF_CODE.fd,readonly=on,id=MYPF \ -nographic \ -chardev stdio,id=STDIO0,signal=off,mux=on \ -device isa-serial,id=isa-serial0,chardev=STDIO0 \ -mon id=MON0,chardev=STDIO0,mode=readline \ -kernel /boot/vmlinuz \ -append console=ttyS0,115200n1 earlyprintk root=/dev/sda3 \ -d guest_errors Originally posted on github https://github.com/tianocore/edk2/discussions/3419 but that place looks eerily quiet.
|
|
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm.
|
|
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05) Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm. Interesting. Do you know why / what is different on Fedora? Thanks, Oliver
|
|
On 03/11/2022 00:22, Oliver Steffen wrote: Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm. Interesting. Do you know why / what is different on Fedora? Absolutely not :( I suspect "-t GCC" but did not have time to dig any deeper, this is how I build it: source ./edksetup.sh --reconfig nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc -- Alexey
|
|
On 07/11/2022 22:51, Alexey Kardashevskiy wrote: On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm. Interesting. Do you know why / what is different on Fedora? Absolutely not :( I suspect "-t GCC" but did not have time to dig any deeper, this is how I build it: source ./edksetup.sh --reconfig nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc This is the culpit: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) Disabling LTO fixes the problem, more info here (cut-n-paste below): https://github.com/aik/edk2/commits/lto-off"-flto=auto" or "-flto=1" do not help, I came across those when found that LTO changed in gcc12. What I still do not understand - where do -flto and -DUSING_LTO come from? Also, there seems to be no way for disabling LTO via a pragma just for that PeiCheckAndSwitchStack(), or is there? Thanks, === The crash happens in AMD SEV-ES guest under QEMU where stack switching happens - PeiCheckAndSwitchStack() from MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c !!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000000 RIP - 000000000082043A, CS - 0000000000000018, RFLAGS - 0000000000010046 RAX - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000 RBX - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088 RSI - 000000007BF8C608, RDI - 000000007BF8D088 R8 - 0000000000000000, R9 - 000000007BF8C608, R10 - 00000000F76F9D20 R11 - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001 R14 - 000000007BF8C608, R15 - 000000007B76D000 DS - 0000000000000008, ES - 0000000000000008, FS - 0000000000000008 GS - 0000000000000008, SS - 0000000000000008 CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000 CR4 - 0000000000000668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000 GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000 IDTR - 000000007BF8CD70 000000000000021F, TR - 0000000000000000 FXSAVE_STATE - 000000007BF8B9F0 !!!! Find image based on IP(0x82043A) /home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140, EntryPoint=00000000 00827E5D) !!!! Disabling LTO fixes the problem. The problem seems to be introduced recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock Ubuntu 22.04). These fail building bootable image unless LTO is off: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) GNU ld version 2.37-36.fc36 Because disabling LTO triggers a lot of uninitialized warnings, this fixes them too. === -- Alexey
|
|
On 14/11/2022 13:57, Alexey Kardashevskiy wrote: On 07/11/2022 22:51, Alexey Kardashevskiy wrote:
On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm. Interesting. Do you know why / what is different on Fedora? Absolutely not :( I suspect "-t GCC" but did not have time to dig any deeper, this is how I build it:
source ./edksetup.sh --reconfig nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc This is the culpit: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) Disabling LTO fixes the problem, more info here (cut-n-paste below): https://github.com/aik/edk2/commits/lto-off "-flto=auto" or "-flto=1" do not help, I came across those when found that LTO changed in gcc12. What I still do not understand - where do -flto and -DUSING_LTO come from? Also, there seems to be no way for disabling LTO via a pragma just for that PeiCheckAndSwitchStack(), or is there? Thanks,
btw #pragma GCC optimize ("O0") or "Ofast" wrapped just around PeiCheckAndSwitchStack() fixes it too but not "Os" - this one still produces broken image. === The crash happens in AMD SEV-ES guest under QEMU where stack switching happens - PeiCheckAndSwitchStack() from MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c !!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000000 RIP - 000000000082043A, CS - 0000000000000018, RFLAGS - 0000000000010046 RAX - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000 RBX - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088 RSI - 000000007BF8C608, RDI - 000000007BF8D088 R8  - 0000000000000000, R9 - 000000007BF8C608, R10 - 00000000F76F9D20 R11 - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001 R14 - 000000007BF8C608, R15 - 000000007B76D000 DS  - 0000000000000008, ES - 0000000000000008, FS - 0000000000000008 GS  - 0000000000000008, SS - 0000000000000008 CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000 CR4 - 0000000000000668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000 GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000 IDTR - 000000007BF8CD70 000000000000021F,  TR - 0000000000000000 FXSAVE_STATE - 000000007BF8B9F0 !!!! Find image based on IP(0x82043A) /home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140, EntryPoint=00000000 00827E5D) !!!! Disabling LTO fixes the problem. The problem seems to be introduced recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock Ubuntu 22.04). These fail building bootable image unless LTO is off: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) GNU ld version 2.37-36.fc36 Because disabling LTO triggers a lot of uninitialized warnings, this fixes them too. ===
-- Alexey
|
|
Quoting Alexey Kardashevskiy (2022-11-14 07:36:17)
On 14/11/2022 13:57, Alexey Kardashevskiy wrote:
On 07/11/2022 22:51, Alexey Kardashevskiy wrote:
On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm. Interesting. Do you know why / what is different on Fedora? Absolutely not :( I suspect "-t GCC" but did not have time to dig any deeper, this is how I build it:
source ./edksetup.sh --reconfig nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc This is the culpit: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
Disabling LTO fixes the problem, more info here (cut-n-paste below): https://github.com/aik/edk2/commits/lto-off
"-flto=auto" or "-flto=1" do not help, I came across those when found that LTO changed in gcc12.
What I still do not understand - where do -flto and -DUSING_LTO come from? Also, there seems to be no way for disabling LTO via a pragma just for that PeiCheckAndSwitchStack(), or is there? Thanks,
btw #pragma GCC optimize ("O0") or "Ofast" wrapped just around PeiCheckAndSwitchStack() fixes it too but not "Os" - this one still produces broken image.
=== The crash happens in AMD SEV-ES guest under QEMU where stack switching happens - PeiCheckAndSwitchStack() from MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c
!!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000000 RIP - 000000000082043A, CS - 0000000000000018, RFLAGS - 0000000000010046 RAX - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000 RBX - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088 RSI - 000000007BF8C608, RDI - 000000007BF8D088 R8  - 0000000000000000, R9 - 000000007BF8C608, R10 - 00000000F76F9D20 R11 - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001 R14 - 000000007BF8C608, R15 - 000000007B76D000 DS  - 0000000000000008, ES - 0000000000000008, FS - 0000000000000008 GS  - 0000000000000008, SS - 0000000000000008 CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000 CR4 - 0000000000000668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000 GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000 IDTR - 000000007BF8CD70 000000000000021F,  TR - 0000000000000000 FXSAVE_STATE - 000000007BF8B9F0 !!!! Find image based on IP(0x82043A) /home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140, EntryPoint=00000000 00827E5D) !!!!
Disabling LTO fixes the problem. The problem seems to be introduced recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock Ubuntu 22.04).
These fail building bootable image unless LTO is off: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) GNU ld version 2.37-36.fc36
Because disabling LTO triggers a lot of uninitialized warnings, this fixes them too. ===
Thanks for the patch, Alexey. It works with the svsm-preview branch too ( https://github.com/AMDESE/ovmf/tree/svsm-preview). - Oliver
|
|
I am pretty sure the correct patch should be some GCC barries inside that stack switching code, I just could not figure it out yet. Disabling optimization for the entire functions seems to be the simplest fix for now. Thanks,
toggle quoted message
Show quoted text
On 18/11/2022 00:10, Oliver Steffen wrote: Quoting Alexey Kardashevskiy (2022-11-14 07:36:17)
On 14/11/2022 13:57, Alexey Kardashevskiy wrote:
On 07/11/2022 22:51, Alexey Kardashevskiy wrote:
On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm. Interesting. Do you know why / what is different on Fedora? Absolutely not :( I suspect "-t GCC" but did not have time to dig any deeper, this is how I build it:
source ./edksetup.sh --reconfig nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc This is the culpit: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
Disabling LTO fixes the problem, more info here (cut-n-paste below): https://github.com/aik/edk2/commits/lto-off
"-flto=auto" or "-flto=1" do not help, I came across those when found that LTO changed in gcc12.
What I still do not understand - where do -flto and -DUSING_LTO come from? Also, there seems to be no way for disabling LTO via a pragma just for that PeiCheckAndSwitchStack(), or is there? Thanks,
btw #pragma GCC optimize ("O0") or "Ofast" wrapped just around PeiCheckAndSwitchStack() fixes it too but not "Os" - this one still produces broken image.
=== The crash happens in AMD SEV-ES guest under QEMU where stack switching happens - PeiCheckAndSwitchStack() from MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c
!!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000000 RIP - 000000000082043A, CS - 0000000000000018, RFLAGS - 0000000000010046 RAX - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000 RBX - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088 RSI - 000000007BF8C608, RDI - 000000007BF8D088 R8  - 0000000000000000, R9 - 000000007BF8C608, R10 - 00000000F76F9D20 R11 - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001 R14 - 000000007BF8C608, R15 - 000000007B76D000 DS  - 0000000000000008, ES - 0000000000000008, FS - 0000000000000008 GS  - 0000000000000008, SS - 0000000000000008 CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000 CR4 - 0000000000000668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000 GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000 IDTR - 000000007BF8CD70 000000000000021F,  TR - 0000000000000000 FXSAVE_STATE - 000000007BF8B9F0 !!!! Find image based on IP(0x82043A) /home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140, EntryPoint=00000000 00827E5D) !!!!
Disabling LTO fixes the problem. The problem seems to be introduced recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock Ubuntu 22.04).
These fail building bootable image unless LTO is off: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) GNU ld version 2.37-36.fc36
Because disabling LTO triggers a lot of uninitialized warnings, this fixes them too. ===
Thanks for the patch, Alexey. It works with the svsm-preview branch too (https://github.com/AMDESE/ovmf/tree/svsm-preview). - Oliver
-- Alexey
|
|
Ping?
This is likely to affect more people when gcc12 makes it to more distros. OTOH it only breaks VMs with encrypted memory which is just a fraction of OVMF users :-/
I poked it a bit more but disasm has way too much LTO noise and when I disable LTO - the problem goes away too :-/
toggle quoted message
Show quoted text
On 18/11/2022 15:15, Alexey Kardashevskiy wrote: I am pretty sure the correct patch should be some GCC barries inside that stack switching code, I just could not figure it out yet. Disabling optimization for the entire functions seems to be the simplest fix for now. Thanks, On 18/11/2022 00:10, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-14 07:36:17)
On 14/11/2022 13:57, Alexey Kardashevskiy wrote:
On 07/11/2022 22:51, Alexey Kardashevskiy wrote:
On 03/11/2022 00:22, Oliver Steffen wrote:
Quoting Alexey Kardashevskiy (2022-11-01 05:32:05)
Turns out it only occurs when build in Fedora36 and it does not (== the fw works fine) when built in Ubuntu 22.04. Go figure :-/ GCC5 is selected btw. False alarm. Interesting. Do you know why / what is different on Fedora? Absolutely not :( I suspect "-t GCC" but did not have time to dig any deeper, this is how I build it:
source ./edksetup.sh --reconfig nice build -q --cmd-len=64436 -DDEBUG_ON_SERIAL_PORT=TRUE -n 20 -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc This is the culpit: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
Disabling LTO fixes the problem, more info here (cut-n-paste below): https://github.com/aik/edk2/commits/lto-off
"-flto=auto" or "-flto=1" do not help, I came across those when found that LTO changed in gcc12.
What I still do not understand - where do -flto and -DUSING_LTO come from? Also, there seems to be no way for disabling LTO via a pragma just for that PeiCheckAndSwitchStack(), or is there? Thanks,
btw #pragma GCC optimize ("O0") or "Ofast" wrapped just around PeiCheckAndSwitchStack() fixes it too but not "Os" - this one still produces broken image.
=== The crash happens in AMD SEV-ES guest under QEMU where stack switching happens - PeiCheckAndSwitchStack() from MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c
!!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000000 RIP - 000000000082043A, CS - 0000000000000018, RFLAGS - 0000000000010046 RAX - 0000000000000000, RCX - 000000007BF8C610, RDX - 0000000000000000 RBX - 000000007BF8C608, RSP - 000000007BF8BD90, RBP - 000000007BF8D088 RSI - 000000007BF8C608, RDI - 000000007BF8D088 R8  - 0000000000000000, R9 - 000000007BF8C608, R10 - 00000000F76F9D20 R11 - 00000000F76F9D20, R12 - 000000007BF8D000, R13 - 0000000000000001 R14 - 000000007BF8C608, R15 - 000000007B76D000 DS  - 0000000000000008, ES - 0000000000000008, FS - 0000000000000008 GS  - 0000000000000008, SS - 0000000000000008 CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000000800000 CR4 - 0000000000000668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 0000000000000000, DR7 - 0000000000000000 GDTR - 000000007FBFD000 0000000000000027, LDTR - 0000000000000000 IDTR - 000000007BF8CD70 000000000000021F,  TR - 0000000000000000 FXSAVE_STATE - 000000007BF8B9F0 !!!! Find image based on IP(0x82043A) /home/aik/p/ovmf/Build/OvmfX64/DEBUG_GCC5/X64/MdeMod ulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll (ImageBase=0000000000820140, EntryPoint=00000000 00827E5D) !!!!
Disabling LTO fixes the problem. The problem seems to be introduced recently into gcc as gcc 11.3.0 ld 2.38 do not expose the problem (stock Ubuntu 22.04).
These fail building bootable image unless LTO is off: gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) GNU ld version 2.37-36.fc36
Because disabling LTO triggers a lot of uninitialized warnings, this fixes them too. ===
Thanks for the patch, Alexey. It works with the svsm-preview branch too (https://github.com/AMDESE/ovmf/tree/svsm-preview).
- Oliver
-- Alexey
|
|
Quoting Gerd Hoffmann (2022-11-28 07:55:32) On Thu, Nov 24, 2022 at 04:04:33PM +1100, Alexey Kardashevskiy wrote:
Ping?
This is likely to affect more people when gcc12 makes it to more distros. OTOH it only breaks VMs with encrypted memory which is just a fraction of OVMF users :-/
I poked it a bit more but disasm has way too much LTO noise and when I disable LTO - the problem goes away too :-/ Just found this while checking the status of my local branches.
HTH & take care, Gerd This also fixes it for the linux-SVSM case. Thanks, Gerd! From ef9557c940e4110f16b6f0c0c73fbf04551f08f6 Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann <kraxel@...> Date: Fri, 10 Jun 2022 07:43:15 +0200 Subject: [PATCH 1/1] tools_def: add -fno-omit-frame-pointer to GCC48_{IA32,X64}_CC_FLAGS
Fixes problems due to code assuming it runs with frame pointers and thus updates rbp / ebp registers when switching stacks.
Signed-off-by: Gerd Hoffmann <kraxel@...> --- BaseTools/Conf/tools_def.template | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template index 73f95b2a3a9f..f1fd6a003062 100755 --- a/BaseTools/Conf/tools_def.template +++ b/BaseTools/Conf/tools_def.template @@ -1888,8 +1888,8 @@ DEFINE GCC_DEPS_FLAGS = -MMD -MF $@.deps
DEFINE GCC48_ALL_CC_FLAGS = DEF(GCC_ALL_CC_FLAGS) -ffunction-sections -fdata-sections -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings DEFINE GCC48_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20 -DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address +DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer +DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer DEFINE GCC48_IA32_X64_ASLDLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable DEFINE GCC48_IA32_X64_DLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map,--whole-archive DEFINE GCC48_IA32_DLINK2_FLAGS = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON) -- 2.38.1
-- 🎩Oliver Steffen (he/him) - Software Engineer, Virtualization Red Hat GmbH < https://www.redhat.com/de/global/dach>, Registered seat: Werner-von-Siemens-Ring 12, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht München/Munich, HRB 153243, Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross Everyone has different working hours… Please do not feel obligated to reply outside of your normal work schedule.
|
|
Do we know the root cause of this crash? Turning off frame pointers seems like kind of hack. There are other toolchains like clang that have frame pointers on by default.
I thought the idea was PeiCheckAndSwitchStack() uses SwitchStack(). SwitchStack() calls InternalSwitchStack() -> that calls InternalSwitchStackAsm(). The InternalSwitchStackAsm() assembler should switch the stack and jump to a new function. This should break the chain of reusing a stale frame pointer.
I saw the crash was a GP fault. I can’t tell from the exception dump what caused the GP fault?
Thanks,
Andrew Fish
toggle quoted message
Show quoted text
On Nov 28, 2022, at 1:15 AM, Oliver Steffen <osteffen@...> wrote:
Quoting Gerd Hoffmann (2022-11-28 07:55:32)
On Thu, Nov 24, 2022 at 04:04:33PM +1100, Alexey Kardashevskiy wrote:
Ping?
This is likely to affect more people when gcc12 makes it to more distros. OTOH it only breaks VMs with encrypted memory which is just a fraction of OVMF users :-/
I poked it a bit more but disasm has way too much LTO noise and when I disable LTO - the problem goes away too :-/ Just found this while checking the status of my local branches.
HTH & take care, Gerd This also fixes it for the linux-SVSM case. Thanks, Gerd!
From ef9557c940e4110f16b6f0c0c73fbf04551f08f6 Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann <kraxel@...> Date: Fri, 10 Jun 2022 07:43:15 +0200 Subject: [PATCH 1/1] tools_def: add -fno-omit-frame-pointer to GCC48_{IA32,X64}_CC_FLAGS
Fixes problems due to code assuming it runs with frame pointers and thus updates rbp / ebp registers when switching stacks.
Signed-off-by: Gerd Hoffmann <kraxel@...> --- BaseTools/Conf/tools_def.template | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template index 73f95b2a3a9f..f1fd6a003062 100755 --- a/BaseTools/Conf/tools_def.template +++ b/BaseTools/Conf/tools_def.template @@ -1888,8 +1888,8 @@ DEFINE GCC_DEPS_FLAGS = -MMD -MF $@.deps
DEFINE GCC48_ALL_CC_FLAGS = DEF(GCC_ALL_CC_FLAGS) -ffunction-sections -fdata-sections -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings DEFINE GCC48_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20 -DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address +DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer +DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer DEFINE GCC48_IA32_X64_ASLDLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable DEFINE GCC48_IA32_X64_DLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map,--whole-archive DEFINE GCC48_IA32_DLINK2_FLAGS = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON) -- 2.38.1
-- 🎩Oliver Steffen (he/him) - Software Engineer, Virtualization Red Hat GmbH <https://www.redhat.com/de/global/dach>, Registered seat: Werner-von-Siemens-Ring 12, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht München/Munich, HRB 153243, Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross
Everyone has different working hours… Please do not feel obligated to reply outside of your normal work schedule.
|
|
Gerd Hoffmann <kraxel@...>
On Thu, Nov 24, 2022 at 04:04:33PM +1100, Alexey Kardashevskiy wrote: Ping?
This is likely to affect more people when gcc12 makes it to more distros. OTOH it only breaks VMs with encrypted memory which is just a fraction of OVMF users :-/
I poked it a bit more but disasm has way too much LTO noise and when I disable LTO - the problem goes away too :-/ Just found this while checking the status of my local branches. HTH & take care, Gerd From ef9557c940e4110f16b6f0c0c73fbf04551f08f6 Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann <kraxel@...> Date: Fri, 10 Jun 2022 07:43:15 +0200 Subject: [PATCH 1/1] tools_def: add -fno-omit-frame-pointer to GCC48_{IA32,X64}_CC_FLAGS Fixes problems due to code assuming it runs with frame pointers and thus updates rbp / ebp registers when switching stacks. Signed-off-by: Gerd Hoffmann <kraxel@...> --- BaseTools/Conf/tools_def.template | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template index 73f95b2a3a9f..f1fd6a003062 100755 --- a/BaseTools/Conf/tools_def.template +++ b/BaseTools/Conf/tools_def.template @@ -1888,8 +1888,8 @@ DEFINE GCC_DEPS_FLAGS = -MMD -MF $@.deps DEFINE GCC48_ALL_CC_FLAGS = DEF(GCC_ALL_CC_FLAGS) -ffunction-sections -fdata-sections -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings DEFINE GCC48_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20 -DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address +DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer +DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer DEFINE GCC48_IA32_X64_ASLDLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable DEFINE GCC48_IA32_X64_DLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map,--whole-archive DEFINE GCC48_IA32_DLINK2_FLAGS = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON) -- 2.38.1
|
|
Gerd Hoffmann <kraxel@...>
On Mon, Nov 28, 2022 at 10:34:08AM -0800, Andrew Fish wrote: Do we know the root cause of this crash? Turning off frame pointers seems like kind of hack. There are other toolchains like clang that have frame pointers on by default. InternalSwitchStackAsm(). The InternalSwitchStackAsm() assembler should switch the stack and jump to a new function. This should break the chain of reusing a stale frame pointer. Was discussed a few months back on the list. If I remember correctly: The assembler code needs to know whenever frame pointers are used or not (and update rbp or not). Current code assumes frame pointers are used. Adding -fno-omit-frame-pointer enforces that for gcc. When gcc does not use framepointers and additionally the optimizer decides to use the rbp register for something else things go south when the stack switching code modifies rbp. take care, Gerd
|
|
On 28/11/2022 17:55, Gerd Hoffmann wrote: On Thu, Nov 24, 2022 at 04:04:33PM +1100, Alexey Kardashevskiy wrote:
Ping?
This is likely to affect more people when gcc12 makes it to more distros. OTOH it only breaks VMs with encrypted memory which is just a fraction of OVMF users :-/
I poked it a bit more but disasm has way too much LTO noise and when I disable LTO - the problem goes away too :-/ Just found this while checking the status of my local branches. This helps indeed too, just tested. Are you pushing this out or I repost my pragma optimize patch? Thanks, HTH & take care, Gerd From ef9557c940e4110f16b6f0c0c73fbf04551f08f6 Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann <kraxel@...> Date: Fri, 10 Jun 2022 07:43:15 +0200 Subject: [PATCH 1/1] tools_def: add -fno-omit-frame-pointer to GCC48_{IA32,X64}_CC_FLAGS Fixes problems due to code assuming it runs with frame pointers and thus updates rbp / ebp registers when switching stacks. Signed-off-by: Gerd Hoffmann <kraxel@...> --- BaseTools/Conf/tools_def.template | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.template index 73f95b2a3a9f..f1fd6a003062 100755 --- a/BaseTools/Conf/tools_def.template +++ b/BaseTools/Conf/tools_def.template @@ -1888,8 +1888,8 @@ DEFINE GCC_DEPS_FLAGS = -MMD -MF $@.deps DEFINE GCC48_ALL_CC_FLAGS = DEF(GCC_ALL_CC_FLAGS) -ffunction-sections -fdata-sections -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings DEFINE GCC48_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20 -DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address +DEFINE GCC48_IA32_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m32 -march=i586 -malign-double -fno-stack-protector -D EFI32 -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer +DEFINE GCC48_X64_CC_FLAGS = DEF(GCC48_ALL_CC_FLAGS) -m64 -fno-stack-protector "-DEFIAPI=__attribute__((ms_abi))" -maccumulate-outgoing-args -mno-red-zone -Wno-address -mcmodel=small -fpie -fno-asynchronous-unwind-tables -Wno-address -fno-omit-frame-pointer DEFINE GCC48_IA32_X64_ASLDLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,ReferenceAcpiTable -u ReferenceAcpiTable DEFINE GCC48_IA32_X64_DLINK_FLAGS = DEF(GCC48_IA32_X64_DLINK_COMMON) -Wl,--entry,$(IMAGE_ENTRY_POINT) -u $(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map,--whole-archive DEFINE GCC48_IA32_DLINK2_FLAGS = -Wl,--defsym=PECOFF_HEADER_SIZE=0x220 DEF(GCC_DLINK2_FLAGS_COMMON) -- Alexey
|
|