Hello, I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps. 1. Boot Windows VM up, and <ctrl-C> to exit the QMP 2. Repeat 1 3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre) During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2 However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion? The page fault exception is pasted here, !!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000 IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!! Thanks Annie
|
|
Annie,
Can you attach the entire serial log of the boot to give some context to the address ranges? Also please file a BZ.
CR2 is the fault address and I think the ExceptionData is implying a present page with a reserved bit set in one of the page table entries?
Thanks,
Andrew Fish
toggle quoted message
Show quoted text
On Mar 17, 2021, at 6:48 PM, annie li <annie.li@...> wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000 IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!! Thanks Annie
|
|
On 03/18/21 02:48, Annie Li wrote: Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000 IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!! In addition to what Andrew said, I suggest the following: (1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags. (2) Reproduce the issue, capture the register dump. (3) Run the following command: objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.debug The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code. Then, we can do two things: - we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports, - we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash. It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see. If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it). Thanks, Laszlo
|
|
Hello Andrew A bug is filed - https://bugzilla.tianocore.org/show_bug.cgi?id=3269I changed the DEBUG flag to 0x8070FFFF in OvmfPkg/OvmfPkgX64.dsc, and hope it provides more helpful debug log. The whole log is pretty big, so I'll upload it in this bug, file name is bootfail.zip. I also added more log to print out following variables, they are all tagged with "Annie" to distinguish from the other logs. ERROR: IsEnableNonExecNeeded Annie PcdSetNxForStack 0 ERROR: IsEnableNonExecNeeded Annie PcdDxeNxMemoryProtectionPolicy 0 ERROR: IsEnableNonExecNeeded Annie PcdImageProtectionPolicy 2 ERROR: IsEnableNonExecNeeded Annie final return 1 Thanks Annie
toggle quoted message
Show quoted text
-----Original Message----- From: discuss@edk2.groups.io [mailto:discuss@edk2.groups.io] On Behalf Of Andrew Fish via groups.io Sent: Wednesday, March 17, 2021 11:28 PM To: discuss <discuss@edk2.groups.io>; Annie Li <annie.li@...> Cc: jian.j.wang@...; lersek@... Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926 Annie, Can you attach the entire serial log of the boot to give some context to the address ranges? Also please file a BZ. CR2 is the fault address and I think the ExceptionData is implying a present page with a reserved bit set in one of the page table entries? Thanks, Andrew Fish On Mar 17, 2021, at 6:48 PM, annie li <annie.li@...> wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5 267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!JLv8xfFnOSVRf SXMUSrSBpQfbUJDQoJE27VfrLqKDKW4FplTDvOWWGhKfjWPvg$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference*entry-points-into-winre__;Iw!!GqivPVa7Brio!JLv8xfFnOSVRfSXMUSrSBpQfbUJDQoJE27VfrLqKDKW4FplTDvOWWGho8w0tWw$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1116__;!!GqivPVa7Brio!JLv8xfFnOSVRfSXMUSrSBpQfbUJDQoJE27VfrLqKDKW4FplTDvOWWGgiCR8POQ$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000 IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!! Thanks Annie
|
|
Hello Laszlo, In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug( https://bugzilla.tianocore.org/show_bug.cgi?id=3269). I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there. Thanks Annie
toggle quoted message
Show quoted text
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926 On 03/18/21 02:48, Annie Li wrote: Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5 267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdVv lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference*entry-points-into-winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000 IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!! In addition to what Andrew said, I suggest the following: (1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags. (2) Reproduce the issue, capture the register dump. (3) Run the following command: objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.debug The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code. Then, we can do two things: - we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports, - we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash. It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see. If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it). Thanks, Laszlo
|
|
On Mar 18, 2021, at 6:22 AM, Laszlo Ersek <lersek@...> wrote:
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000 IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!! In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.debug
WARNING Wish List off topic….. It would nice to have a debug script that could post process serial log file and append the extra information. That tool would need to bee toolchain aware as for gcc you do `objdump -f -S TerminalDxe.debug` for Xcode you would do `lldb -o <lldbCommand> Terminal.dll. I guess it could also decode the execution and point out CR2 is the fault address and what ExceptionData means. We could hook something like that into the CI and capture more detailed error reports. Thanks, Andrew Fish The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
toggle quoted message
Show quoted text
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows- hardware/manufacture/desktop/windows-recovery-environment--windows-re-- technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Stupid question? Is there a reason the page tables are not write protected and the write to the page table would fault?
Thanks,
Andrew Fish
toggle quoted message
Show quoted text
On Mar 18, 2021, at 5:37 PM, Yao, Jiewen <jiewen.yao@...> wrote:
Hi Arie I added some of my thought in the Bugzilla. - https://bugzilla.tianocore.org/show_bug.cgi?id=3269
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows- hardware/manufacture/desktop/windows-recovery-environment--windows-re-- technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Good question. The CR0.WP is set. But the page table protection may be turn OFF/ON again, if CPU driver need update it to protect an EFI image. Maybe it a bug somewhere.
I just read the final debug log to see the final result. RSVD in exception data is weird. I think we need confirm what at first.
toggle quoted message
Show quoted text
-----Original Message----- From: Andrew Fish <afish@...> Sent: Friday, March 19, 2021 9:44 AM To: Yao, Jiewen <jiewen.yao@...> Cc: discuss@edk2.groups.io; annie.li@...; Laszlo Ersek <lersek@...>; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Stupid question? Is there a reason the page tables are not write protected and the write to the page table would fault?
Thanks,
Andrew Fish
On Mar 18, 2021, at 5:37 PM, Yao, Jiewen <jiewen.yao@...> wrote:
Hi Arie I added some of my thought in the Bugzilla. - https://bugzilla.tianocore.org/show_bug.cgi?id=3269
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li
Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting
is from this code base(75ab038). Since the overall size of all log is pretty big, I'll
attach all the data you required in to this bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into
this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please
check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two
consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows- hardware/manufacture/desktop/windows-recovery-environment--windows- re--
technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd
round
due to the patch above(5267926). I modified the return value to "(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI -
0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 -
0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS -
0000000000000030
GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for
which you have a matching "TerminalDxe.debug" file. Once you do that, you can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's
of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Yes it is very strange.
toggle quoted message
Show quoted text
On Mar 18, 2021, at 7:17 PM, Yao, Jiewen <jiewen.yao@...> wrote:
Good question. The CR0.WP is set. But the page table protection may be turn OFF/ON again, if CPU driver need update it to protect an EFI image. Maybe it a bug somewhere.
I just read the final debug log to see the final result. RSVD in exception data is weird. I think we need confirm what at first.
-----Original Message----- From: Andrew Fish <afish@...> Sent: Friday, March 19, 2021 9:44 AM To: Yao, Jiewen <jiewen.yao@...> Cc: discuss@edk2.groups.io; annie.li@...; Laszlo Ersek <lersek@...>; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Stupid question? Is there a reason the page tables are not write protected and the write to the page table would fault?
Thanks,
Andrew Fish
On Mar 18, 2021, at 5:37 PM, Yao, Jiewen <jiewen.yao@...> wrote: Hi Arie I added some of my thought in the Bugzilla. - https://bugzilla.tianocore.org/show_bug.cgi?id=3269
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li
Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting
is from this code base(75ab038). Since the overall size of all log is pretty big, I'll
attach all the data you required in to this bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into
this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please
check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two
consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows- hardware/manufacture/desktop/windows-recovery-environment--windows- re--
technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd
round
due to the patch above(5267926). I modified the return value to "(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI -
0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 -
0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS -
0000000000000030
GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for
which you have a matching "TerminalDxe.debug" file. Once you do that, you can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's
of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
toggle quoted message
Show quoted text
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@...] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...> Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926 Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ If you can dump paging structure info for further analysis, we can help to check. -----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows- hardware/manufacture/desktop/windows-recovery-environment--windows-re- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Annie, CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand. https://wiki.osdev.org/PagingThanks, Andrew Fish
toggle quoted message
Show quoted text
On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@...] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...> Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows- hardware/manufacture/desktop/windows-recovery-environment--windows-re- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Annie,
I’ve never used it, but I assume there is a way to dump page tables from the QEMU console. Maybe something like `info mem` ?
Thanks,
Andrew Fish
toggle quoted message
Show quoted text
On Mar 19, 2021, at 10:41 AM, Andrew Fish <afish@...> wrote:
Annie,
CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
https://wiki.osdev.org/Paging
Thanks,
Andrew Fish
On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@... <mailto:annie.li@...>> wrote:
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@... <mailto:jiewen.yao@...>] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>; Annie Li <annie.li@... <mailto:annie.li@...>>; Laszlo Ersek <lersek@... <mailto:lersek@...>> Cc: Wang, Jian J <jian.j.wang@... <mailto:jian.j.wang@...>>; Andrew Fish <afish@... <mailto:afish@...>>; Aaron Young <aaron.young@... <mailto:aaron.young@...>>; Yao, Jiewen <jiewen.yao@... <mailto:jiewen.yao@...>> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ <https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$>
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <mailto:discuss@edk2.groups.io> <discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@... <mailto:lersek@...>>; discuss@edk2.groups.io <mailto:discuss@edk2.groups.io> Cc: Wang, Jian J <jian.j.wang@... <mailto:jian.j.wang@...>>; Andrew Fish <afish@... <mailto:afish@...>>; Aaron Young <aaron.young@... <mailto:aaron.young@...>> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ <https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$> ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@... <mailto:lersek@...>] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@... <mailto:annie.li@...>>; discuss@edk2.groups.io <mailto:discuss@edk2.groups.io> Cc: jian.j.wang@... <mailto:jian.j.wang@...>; Andrew Fish <afish@... <mailto:afish@...>> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows- hardware/manufacture/desktop/windows-recovery-environment--windows-re- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
On 03/19/21 18:41, Andrew Fish wrote: Annie,
CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand. I think that's a typo: it should be CR3. But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry -- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.) Thanks, Laszlo https://wiki.osdev.org/Paging
Thanks,
Andrew Fish
On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@...] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...> Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows- hardware/manufacture/desktop/windows-recovery-environment--windows-re- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Hello Laszlo,
"click "Reboot" with Shift held down " works properly for me too, the Windows can boot into recovery mode in my environment. I just updated the bug with an another simpler way to reproduce this issue: Press F8 on the boot entry to boot into "Advanced Boot Option", and then select "Repair your computer". This issue happens every time when I do this in my environment, and I don't need to terminate the qemu monitor twice by <ctrl-C> as earlier.
Thanks Annie
toggle quoted message
Show quoted text
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Friday, March 19, 2021 2:11 PM To: Andrew Fish <afish@...>; Annie Li <annie.li@...> Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926 On 03/19/21 18:41, Andrew Fish wrote: Annie,
CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand. I think that's a typo: it should be CR3. But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry -- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.) Thanks, Laszlo https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7 Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$ <https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa 7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
Thanks,
Andrew Fish
On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@...] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...> Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V -UzBkycx-iObkffqGNBrw$
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi t /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd V v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows - hardware/manufacture/desktop/windows-recovery-environment--windows-r e- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug. cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Hello Laszlo,
I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no?
Thanks Annie
toggle quoted message
Show quoted text
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Friday, March 19, 2021 2:11 PM To: Andrew Fish <afish@...>; Annie Li <annie.li@...> Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926 On 03/19/21 18:41, Andrew Fish wrote: Annie,
CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand. I think that's a typo: it should be CR3. But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry -- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.) Thanks, Laszlo https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7 Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$ <https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa 7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
Thanks,
Andrew Fish
On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@...] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...> Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V -UzBkycx-iObkffqGNBrw$
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi t /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd V v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows - hardware/manufacture/desktop/windows-recovery-environment--windows-r e- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug. cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Never mind, please ignore my previous email. The QMP command 'xp' can dump the memory, I'll try to reproduce this issue and dump the memory.
Thanks Annie
toggle quoted message
Show quoted text
-----Original Message----- From: discuss@edk2.groups.io [mailto:discuss@edk2.groups.io] On Behalf Of annie li Sent: Friday, March 19, 2021 3:44 PM To: Laszlo Ersek <lersek@...>; Andrew Fish <afish@...> Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926 Hello Laszlo, I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no? Thanks Annie -----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Friday, March 19, 2021 2:11 PM To: Andrew Fish <afish@...>; Annie Li <annie.li@...> Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926 On 03/19/21 18:41, Andrew Fish wrote: Annie,
CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand. I think that's a typo: it should be CR3. But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry -- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.) Thanks, Laszlo https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7 Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$ <https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa 7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
Thanks,
Andrew Fish
On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@...] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...> Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V -UzBkycx-iObkffqGNBrw$
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi t /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd V v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows - hardware/manufacture/desktop/windows-recovery-environment--windows-r e- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug. cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Hello
I added log into the code to dump the page info, and also run QMP command "xp" to walk through the page manually. The result shows the corresponding page table entry doesn't exist. I updated details in the bug, please take a look.
Thanks Annie
toggle quoted message
Show quoted text
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Friday, March 19, 2021 2:11 PM To: Andrew Fish <afish@...>; Annie Li <annie.li@...> Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926 On 03/19/21 18:41, Andrew Fish wrote: Annie,
CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand. I think that's a typo: it should be CR3. But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry -- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.) Thanks, Laszlo https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7 Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$ <https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa 7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
Thanks,
Andrew Fish
On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@...] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...> Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V -UzBkycx-iObkffqGNBrw$
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi t /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd V v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows - hardware/manufacture/desktop/windows-recovery-environment--windows-r e- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug. cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
On 03/19/21 20:43, Annie Li wrote: Hello Laszlo,
I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no? CR2 is the address whose access faulted; I mentioned CR3 because Andrew wrote "points the 1st level of the page tables". I understood that expression as the point where you'd start walking the page tables manually -- and that "root pointer" is in CR3. Thanks Laszlo Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Friday, March 19, 2021 2:11 PM To: Andrew Fish <afish@...>; Annie Li <annie.li@...> Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
On 03/19/21 18:41, Andrew Fish wrote:
Annie,
CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand. I think that's a typo: it should be CR3.
But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry -- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)
Thanks, Laszlo
https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7 Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$ <https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa 7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
Thanks,
Andrew Fish
On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@...] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...> Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V -UzBkycx-iObkffqGNBrw$
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi t /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd V v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows - hardware/manufacture/desktop/windows-recovery-environment--windows-r e- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug. cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|
Ah, thanks for the clarification.
Thanks Annie
toggle quoted message
Show quoted text
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Monday, March 22, 2021 11:17 AM To: Annie Li <annie.li@...>; Andrew Fish <afish@...> Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926 On 03/19/21 20:43, Annie Li wrote: Hello Laszlo,
I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no? CR2 is the address whose access faulted; I mentioned CR3 because Andrew wrote "points the 1st level of the page tables". I understood that expression as the point where you'd start walking the page tables manually -- and that "root pointer" is in CR3. Thanks Laszlo Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Friday, March 19, 2021 2:11 PM To: Andrew Fish <afish@...>; Annie Li <annie.li@...> Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
On 03/19/21 18:41, Andrew Fish wrote:
Annie,
CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand. I think that's a typo: it should be CR3.
But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry -- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)
Thanks, Laszlo
https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa 7 Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$ <https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPV a 7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
Thanks,
Andrew Fish
On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:
Hi Jiewen,
In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?
Thanks Annie
-----Original Message----- From: Yao, Jiewen [mailto:jiewen.yao@...] Sent: Thursday, March 18, 2021 8:37 PM To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...> Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...> Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hi Arie I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug. c gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6 V -UzBkycx-iObkffqGNBrw$
If you can dump paging structure info for further analysis, we can help to check.
-----Original Message----- From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li Sent: Friday, March 19, 2021 3:27 AM To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...> Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926
Hello Laszlo,
In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).
I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer? The objdump is uploaded, as well as the details of page fault exception, please check the details there.
Thanks Annie
-----Original Message----- From: Laszlo Ersek [mailto:lersek@...] Sent: Thursday, March 18, 2021 9:23 AM To: Annie Li <annie.li@...>; discuss@edk2.groups.io Cc: jian.j.wang@...; Andrew Fish <afish@...> Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926
On 03/18/21 02:48, Annie Li wrote:
Hello,
I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch, MdeModulePkg/DxeIpl: support more NX related PCDs https://urldefense.com/v3/__https://github.com/tianocore/edk2/comm i t /5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGE d V v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$
This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.
1. Boot Windows VM up, and <ctrl-C> to exit the QMP
2. Repeat 1
3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/window s - hardware/manufacture/desktop/windows-recovery-environment--windows- r e- - technical-reference*entry-points-into- winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )
During above 3 windows booting procedures, the value of following variables are always the same, PcdSetNxForStack 0 PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2
However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug. cg i?id=1 116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy- sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?
The page fault exception is pasted here,
!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!! ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001 R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000 R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90 R14 - 00000000001A6B28, R15 - 00000000001AB000 DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 GS - 0000000000000030, SS - 0000000000000030 CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000 CR4 - 0000000000040668, CR8 - 0000000000000000 DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000 FXSAVE_STATE - 00000000001A6700 !!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2- 1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/ T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:
(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.
(2) Reproduce the issue, capture the register dump.
(3) Run the following command:
objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin alDxe/TerminalDxe/DEBUG/TerminalDxe.debug
The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.
Then, we can do two things:
- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,
- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.
It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.
If possible, please post: - your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree), - your full firmware log (feel free to compress it), - the register dump from serial, - the objdump (disassembly) output (feel free to compress it).
Thanks, Laszlo
|
|