Windows guest fails to boot into recovery mode due to commit 5267926


annie li
 

Hello,

I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)

During above 3 windows booting procedures, the value of following variables are always the same,
PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0
PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202
RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001
RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001
RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
Thanks
Annie


Andrew Fish
 

Annie,

Can you attach the entire serial log of the boot to give some context to the address ranges? Also please file a BZ.

CR2 is the fault address and I think the ExceptionData is implying a present page with a reserved bit set in one of the page table entries?

Thanks,

Andrew Fish

On Mar 17, 2021, at 6:48 PM, annie li <annie.li@...> wrote:

Hello,

I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)

During above 3 windows booting procedures, the value of following variables are always the same,
PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0
PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202
RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001
RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001
RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
Thanks
Annie





Laszlo Ersek
 

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)

During above 3 windows booting procedures, the value of following variables are always the same,
PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0
PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202
RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001
RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001
RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo


annie li
 

Hello Andrew

A bug is filed - https://bugzilla.tianocore.org/show_bug.cgi?id=3269
I changed the DEBUG flag to 0x8070FFFF in OvmfPkg/OvmfPkgX64.dsc, and hope it provides more helpful debug log. The whole log is pretty big, so I'll upload it in this bug, file name is bootfail.zip.
I also added more log to print out following variables, they are all tagged with "Annie" to distinguish from the other logs.
ERROR: IsEnableNonExecNeeded Annie PcdSetNxForStack 0
ERROR: IsEnableNonExecNeeded Annie PcdDxeNxMemoryProtectionPolicy 0
ERROR: IsEnableNonExecNeeded Annie PcdImageProtectionPolicy 2
ERROR: IsEnableNonExecNeeded Annie final return 1

Thanks
Annie

-----Original Message-----
From: discuss@edk2.groups.io [mailto:discuss@edk2.groups.io] On Behalf Of Andrew Fish via groups.io
Sent: Wednesday, March 17, 2021 11:28 PM
To: discuss <discuss@edk2.groups.io>; Annie Li <annie.li@...>
Cc: jian.j.wang@...; lersek@...
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Annie,

Can you attach the entire serial log of the boot to give some context to the address ranges? Also please file a BZ.

CR2 is the fault address and I think the ExceptionData is implying a present page with a reserved bit set in one of the page table entries?

Thanks,

Andrew Fish

On Mar 17, 2021, at 6:48 PM, annie li <annie.li@...> wrote:

Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!JLv8xfFnOSVRf
SXMUSrSBpQfbUJDQoJE27VfrLqKDKW4FplTDvOWWGhKfjWPvg$

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference*entry-points-into-winre__;Iw!!GqivPVa7Brio!JLv8xfFnOSVRfSXMUSrSBpQfbUJDQoJE27VfrLqKDKW4FplTDvOWWGho8w0tWw$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1116__;!!GqivPVa7Brio!JLv8xfFnOSVRfSXMUSrSBpQfbUJDQoJE27VfrLqKDKW4FplTDvOWWGgiCR8POQ$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
Thanks
Annie





annie li
 

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdVv
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference*entry-points-into-winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo


Andrew Fish
 

On Mar 18, 2021, at 6:22 AM, Laszlo Ersek <lersek@...> wrote:

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)

During above 3 windows booting procedures, the value of following variables are always the same,
PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0
PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202
RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001
RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001
RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.debug
WARNING Wish List off topic…..

It would nice to have a debug script that could post process serial log file and append the extra information. That tool would need to bee toolchain aware as for gcc you do `objdump -f -S TerminalDxe.debug` for Xcode you would do `lldb -o <lldbCommand> Terminal.dll. I guess it could also decode the execution and point out CR2 is the fault address and what ExceptionData means.

We could hook something like that into the CI and capture more detailed error reports.

Thanks,

Andrew Fish


The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo




Yao, Jiewen
 

Hi Arie
I added some of my thought in the Bugzilla. - https://bugzilla.tianocore.org/show_bug.cgi?id=3269

If you can dump paging structure info for further analysis, we can help to check.

-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>;
Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base
from where I started bisecting the comments. This time I reproduce this issue
with the code of branch 'stable/202011' of upstream. All the log I am collecting
is from this code base(75ab038). Since the overall size of all log is pretty big, I'll
attach all the data you required in to this
bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded into
this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception, please
check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the previous two
consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re--
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round
due to the patch above(5267926). I modified the return value to "(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in
MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause the
page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same
toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build for
which you have a matching "TerminalDxe.debug" file. Once you do that, you can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe
driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump,
matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and
use that offset into the "objdump -S" disassembly, to narrow down what the
terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but
knowing what TerminalDxe was up to, might shed light on the actual reason. It's
of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to
reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Andrew Fish
 

Stupid question? Is there a reason the page tables are not write protected and the write to the page table would fault?

Thanks,

Andrew Fish

On Mar 18, 2021, at 5:37 PM, Yao, Jiewen <jiewen.yao@...> wrote:

Hi Arie
I added some of my thought in the Bugzilla. - https://bugzilla.tianocore.org/show_bug.cgi?id=3269

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>;
Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base
from where I started bisecting the comments. This time I reproduce this issue
with the code of branch 'stable/202011' of upstream. All the log I am collecting
is from this code base(75ab038). Since the overall size of all log is pretty big, I'll
attach all the data you required in to this
bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded into
this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception, please
check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the previous two
consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re--
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round
due to the patch above(5267926). I modified the return value to "(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in
MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause the
page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same
toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build for
which you have a matching "TerminalDxe.debug" file. Once you do that, you can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe
driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump,
matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and
use that offset into the "objdump -S" disassembly, to narrow down what the
terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but
knowing what TerminalDxe was up to, might shed light on the actual reason. It's
of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to
reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Yao, Jiewen
 

Good question.
The CR0.WP is set. But the page table protection may be turn OFF/ON again, if CPU driver need update it to protect an EFI image. Maybe it a bug somewhere.

I just read the final debug log to see the final result.
RSVD in exception data is weird. I think we need confirm what at first.

-----Original Message-----
From: Andrew Fish <afish@...>
Sent: Friday, March 19, 2021 9:44 AM
To: Yao, Jiewen <jiewen.yao@...>
Cc: discuss@edk2.groups.io; annie.li@...; Laszlo Ersek
<lersek@...>; Wang, Jian J <jian.j.wang@...>; Aaron Young
<aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926

Stupid question? Is there a reason the page tables are not write protected and
the write to the page table would fault?

Thanks,

Andrew Fish

On Mar 18, 2021, at 5:37 PM, Yao, Jiewen <jiewen.yao@...> wrote:

Hi Arie
I added some of my thought in the Bugzilla. -
https://bugzilla.tianocore.org/show_bug.cgi?id=3269

If you can dump paging structure info for further analysis, we can help to
check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie
li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>;
Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode
due
to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base
from where I started bisecting the comments. This time I reproduce this issue
with the code of branch 'stable/202011' of upstream. All the log I am
collecting
is from this code base(75ab038). Since the overall size of all log is pretty big,
I'll
attach all the data you required in to this
bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded
into
this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception,
please
check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the previous
two
consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-
re--
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd
round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in
MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue
is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause the
page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI -
0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 -
0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS -
0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same
toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build
for
which you have a matching "TerminalDxe.debug" file. Once you do that, you
can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe
driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump,
matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and
use that offset into the "objdump -S" disassembly, to narrow down what the
terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but
knowing what TerminalDxe was up to, might shed light on the actual reason.
It's
of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to
reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Andrew Fish
 

Yes it is very strange.

On Mar 18, 2021, at 7:17 PM, Yao, Jiewen <jiewen.yao@...> wrote:

Good question.
The CR0.WP is set. But the page table protection may be turn OFF/ON again, if CPU driver need update it to protect an EFI image. Maybe it a bug somewhere.

I just read the final debug log to see the final result.
RSVD in exception data is weird. I think we need confirm what at first.


-----Original Message-----
From: Andrew Fish <afish@...>
Sent: Friday, March 19, 2021 9:44 AM
To: Yao, Jiewen <jiewen.yao@...>
Cc: discuss@edk2.groups.io; annie.li@...; Laszlo Ersek
<lersek@...>; Wang, Jian J <jian.j.wang@...>; Aaron Young
<aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926

Stupid question? Is there a reason the page tables are not write protected and
the write to the page table would fault?

Thanks,

Andrew Fish

On Mar 18, 2021, at 5:37 PM, Yao, Jiewen <jiewen.yao@...> wrote:
Hi Arie
I added some of my thought in the Bugzilla. -
https://bugzilla.tianocore.org/show_bug.cgi?id=3269

If you can dump paging structure info for further analysis, we can help to
check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie
li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>;
Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode
due
to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base
from where I started bisecting the comments. This time I reproduce this issue
with the code of branch 'stable/202011' of upstream. All the log I am
collecting
is from this code base(75ab038). Since the overall size of all log is pretty big,
I'll
attach all the data you required in to this
bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded
into
this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception,
please
check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the previous
two
consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-
re--
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd
round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in
MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue
is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause the
page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI -
0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 -
0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS -
0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same
toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build
for
which you have a matching "TerminalDxe.debug" file. Once you do that, you
can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe
driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump,
matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and
use that offset into the "objdump -S" disassembly, to narrow down what the
terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but
knowing what TerminalDxe was up to, might shed light on the actual reason.
It's
of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to
reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo









annie li
 

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@...]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...>
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code
base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice,
see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause
the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF
build for which you have a matching "TerminalDxe.debug" file. Once you
do that, you can run "objdump" on the ".debug" file, and get a
disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to narrow
down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on the
actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Andrew Fish
 

Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.

https://wiki.osdev.org/Paging


Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@...]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...>
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code
base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice,
see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause
the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF
build for which you have a matching "TerminalDxe.debug" file. Once you
do that, you can run "objdump" on the ".debug" file, and get a
disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to narrow
down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on the
actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Andrew Fish
 

Annie,

I’ve never used it, but I assume there is a way to dump page tables from the QEMU console. Maybe something like `info mem` ?

Thanks,

Andrew Fish

On Mar 19, 2021, at 10:41 AM, Andrew Fish <afish@...> wrote:

Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.

https://wiki.osdev.org/Paging


Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@... <mailto:annie.li@...>> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@... <mailto:jiewen.yao@...>]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>; Annie Li <annie.li@... <mailto:annie.li@...>>; Laszlo Ersek <lersek@... <mailto:lersek@...>>
Cc: Wang, Jian J <jian.j.wang@... <mailto:jian.j.wang@...>>; Andrew Fish <afish@... <mailto:afish@...>>; Aaron Young <aaron.young@... <mailto:aaron.young@...>>; Yao, Jiewen <jiewen.yao@... <mailto:jiewen.yao@...>>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ <https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$>

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <mailto:discuss@edk2.groups.io> <discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@... <mailto:lersek@...>>; discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>
Cc: Wang, Jian J <jian.j.wang@... <mailto:jian.j.wang@...>>; Andrew Fish
<afish@... <mailto:afish@...>>; Aaron Young <aaron.young@... <mailto:aaron.young@...>>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code
base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ <https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$> ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@... <mailto:lersek@...>]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@... <mailto:annie.li@...>>; discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>
Cc: jian.j.wang@... <mailto:jian.j.wang@...>; Andrew Fish <afish@... <mailto:afish@...>>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice,
see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause
the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF
build for which you have a matching "TerminalDxe.debug" file. Once you
do that, you can run "objdump" on the ".debug" file, and get a
disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to narrow
down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on the
actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo




Laszlo Ersek
 

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and
walk the page tables manually. (I figured I could help with this, but I
couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just
entered recovery fine.)

Thanks,
Laszlo


https://wiki.osdev.org/Paging


Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@...]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo Ersek <lersek@...>
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish <afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen <jiewen.yao@...>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code
base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice,
see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause
the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF
build for which you have a matching "TerminalDxe.debug" file. Once you
do that, you can run "objdump" on the ".debug" file, and get a
disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to narrow
down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on the
actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





annie li
 

Hello Laszlo,

"click "Reboot" with Shift held down " works properly for me too, the Windows can boot into recovery mode in my environment.
I just updated the bug with an another simpler way to reproduce this issue:
Press F8 on the boot entry to boot into "Advanced Boot Option", and then select "Repair your computer". This issue happens every time when I do this in my environment, and I don't need to terminate the qemu monitor twice by <ctrl-C> as earlier.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Friday, March 19, 2021 2:11 PM
To: Andrew Fish <afish@...>; Annie Li <annie.li@...>
Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)

Thanks,
Laszlo


https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7
Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
<https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa
7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$

Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@...]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo
Ersek <lersek@...>
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen
<jiewen.yao@...>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. -
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c
gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V
-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old
code base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi
t
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd
V
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C>
twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows
-
hardware/manufacture/desktop/windows-recovery-environment--windows-r
e-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does
cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an
OVMF build for which you have a matching "TerminalDxe.debug" file.
Once you do that, you can run "objdump" on the ".debug" file, and
get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to
narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on
the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





annie li
 

Hello Laszlo,

I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no?

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Friday, March 19, 2021 2:11 PM
To: Andrew Fish <afish@...>; Annie Li <annie.li@...>
Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)

Thanks,
Laszlo


https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7
Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
<https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa
7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$

Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@...]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo
Ersek <lersek@...>
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen
<jiewen.yao@...>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. -
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c
gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V
-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old
code base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi
t
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd
V
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C>
twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows
-
hardware/manufacture/desktop/windows-recovery-environment--windows-r
e-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does
cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an
OVMF build for which you have a matching "TerminalDxe.debug" file.
Once you do that, you can run "objdump" on the ".debug" file, and
get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to
narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on
the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





annie li
 

Never mind, please ignore my previous email.
The QMP command 'xp' can dump the memory, I'll try to reproduce this issue and dump the memory.

Thanks
Annie

-----Original Message-----
From: discuss@edk2.groups.io [mailto:discuss@edk2.groups.io] On Behalf Of annie li
Sent: Friday, March 19, 2021 3:44 PM
To: Laszlo Ersek <lersek@...>; Andrew Fish <afish@...>
Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hello Laszlo,

I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no?

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Friday, March 19, 2021 2:11 PM
To: Andrew Fish <afish@...>; Annie Li <annie.li@...>
Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)

Thanks,
Laszlo


https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7
Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
<https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa
7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$

Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@...]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo
Ersek <lersek@...>
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen
<jiewen.yao@...>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. -
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c
gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V
-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old
code base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi
t
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd
V
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C>
twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows
-
hardware/manufacture/desktop/windows-recovery-environment--windows-r
e-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does
cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an
OVMF build for which you have a matching "TerminalDxe.debug" file.
Once you do that, you can run "objdump" on the ".debug" file, and
get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to
narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on
the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





annie li
 

Hello

I added log into the code to dump the page info, and also run QMP command "xp" to walk through the page manually. The result shows the corresponding page table entry doesn't exist. I updated details in the bug, please take a look.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Friday, March 19, 2021 2:11 PM
To: Andrew Fish <afish@...>; Annie Li <annie.li@...>
Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)

Thanks,
Laszlo


https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7
Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
<https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa
7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$

Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@...]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo
Ersek <lersek@...>
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen
<jiewen.yao@...>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. -
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c
gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V
-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old
code base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi
t
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd
V
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C>
twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows
-
hardware/manufacture/desktop/windows-recovery-environment--windows-r
e-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does
cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an
OVMF build for which you have a matching "TerminalDxe.debug" file.
Once you do that, you can run "objdump" on the ".debug" file, and
get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to
narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on
the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Laszlo Ersek
 

On 03/19/21 20:43, Annie Li wrote:
Hello Laszlo,

I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no?
CR2 is the address whose access faulted; I mentioned CR3 because Andrew
wrote "points the 1st level of the page tables". I understood that
expression as the point where you'd start walking the page tables
manually -- and that "root pointer" is in CR3.

Thanks
Laszlo


Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Friday, March 19, 2021 2:11 PM
To: Andrew Fish <afish@...>; Annie Li <annie.li@...>
Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)

Thanks,
Laszlo


https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7
Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
<https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa
7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$

Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@...]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo
Ersek <lersek@...>
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen
<jiewen.yao@...>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. -
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c
gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V
-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old
code base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi
t
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd
V
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C>
twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows
-
hardware/manufacture/desktop/windows-recovery-environment--windows-r
e-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does
cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an
OVMF build for which you have a matching "TerminalDxe.debug" file.
Once you do that, you can run "objdump" on the ".debug" file, and
get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to
narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on
the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





annie li
 

Ah, thanks for the clarification.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Monday, March 22, 2021 11:17 AM
To: Annie Li <annie.li@...>; Andrew Fish <afish@...>
Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

On 03/19/21 20:43, Annie Li wrote:
Hello Laszlo,

I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no?
CR2 is the address whose access faulted; I mentioned CR3 because Andrew wrote "points the 1st level of the page tables". I understood that expression as the point where you'd start walking the page tables manually -- and that "root pointer" is in CR3.

Thanks
Laszlo


Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Friday, March 19, 2021 2:11 PM
To: Andrew Fish <afish@...>; Annie Li <annie.li@...>
Cc: Yao, Jiewen <jiewen.yao@...>; discuss@edk2.groups.io; Wang,
Jian J <jian.j.wang@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words,
and walk the page tables manually. (I figured I could help with this,
but I couldn't reproduce the issue locally. I used the manual Recovery
entry
-- click "Reboot" with Shift held down. For me the Windows VM just
entered recovery fine.)

Thanks,
Laszlo


https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa
7
Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
<https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPV
a
7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$

Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@...> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@...]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@...>; Laszlo
Ersek <lersek@...>
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>; Yao, Jiewen
<jiewen.yao@...>
Subject: RE: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. -
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
c
gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6
V
-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@...>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@...>; Andrew Fish
<afish@...>; Aaron Young <aaron.young@...>
Subject: Re: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old
code base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all
the data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@...]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@...>; discuss@edk2.groups.io
Cc: jian.j.wang@...; Andrew Fish <afish@...>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault
exception), and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/comm
i
t
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGE
d
V
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C>
twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to
the previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/window
s
-
hardware/manufacture/desktop/windows-recovery-environment--windows-
r
e-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page
fault issue is gone with this change. The patch(5267926) is for
fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does
cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP -
00000000001A6AA0, RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75)
/builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/
T erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and
the same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an
OVMF build for which you have a matching "TerminalDxe.debug" file.
Once you do that, you can run "objdump" on the ".debug" file, and
get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to
narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on
the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo