Date   

Re: Windows guest fails to boot into recovery mode due to commit 5267926

annie li
 

Never mind, please ignore my previous email.
The QMP command 'xp' can dump the memory, I'll try to reproduce this issue and dump the memory.

Thanks
Annie

-----Original Message-----
From: discuss@edk2.groups.io [mailto:discuss@edk2.groups.io] On Behalf Of annie li
Sent: Friday, March 19, 2021 3:44 PM
To: Laszlo Ersek <lersek@redhat.com>; Andrew Fish <afish@apple.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@intel.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hello Laszlo,

I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no?

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Friday, March 19, 2021 2:11 PM
To: Andrew Fish <afish@apple.com>; Annie Li <annie.li@oracle.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@intel.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)

Thanks,
Laszlo


https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7
Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
<https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa
7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$

Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@oracle.com> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@intel.com]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@oracle.com>; Laszlo
Ersek <lersek@redhat.com>
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish
<afish@apple.com>; Aaron Young <aaron.young@oracle.com>; Yao, Jiewen
<jiewen.yao@intel.com>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. -
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c
gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V
-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish
<afish@apple.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old
code base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi
t
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd
V
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C>
twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows
-
hardware/manufacture/desktop/windows-recovery-environment--windows-r
e-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does
cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an
OVMF build for which you have a matching "TerminalDxe.debug" file.
Once you do that, you can run "objdump" on the ".debug" file, and
get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to
narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on
the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Re: Google Summer of Code 2021 interested student

Laszlo Ersek
 

CC Nate

On 03/17/21 21:16, Ayush Dwivedi wrote:
Hello everyone,
My name is Ayush Dwivedi. I am currently studying Computer Science and
Engineering and am in my 3rd year of B.Tech program. I am interested in
operating systems, firmwares and game programming. I have written some
personal projects in C and C++ programming languages and have experimented
with x86_64 assembly. Recently I have been using QEMU with the OVMF UEFI
firmware which I had built from source using edk2. I have tried to write my
own EFI applications(it was an attempt to understand how UEFI differs from
BIOS). I wish to be a part of the TianoCore community and want to
contribute. I would like to know what skills and knowledge is needed for
the task "MinPlatform Qemu Support". It is explained that we are needed to
port MinPlatform to QEMU so as of now I have started looking into the
OvmfPkg(since it already runs on QEMU) and Platform/Intel/MinPlatformPkg
but the source tree for edk2 and edk2-platform is huge so I am confused on
how and where I should start. I am looking forward to guidance from the
community.

Thank you for your precious time.

Regards,
Ayush Dwivedi





Re: Windows guest fails to boot into recovery mode due to commit 5267926

annie li
 

Hello Laszlo,

I was thinking of CR2 because it has the address that causes this exception, and we suppose to find out the information of the page in which the address locates. no?

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Friday, March 19, 2021 2:11 PM
To: Andrew Fish <afish@apple.com>; Annie Li <annie.li@oracle.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@intel.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)

Thanks,
Laszlo


https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7
Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
<https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa
7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$

Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@oracle.com> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@intel.com]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@oracle.com>; Laszlo
Ersek <lersek@redhat.com>
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish
<afish@apple.com>; Aaron Young <aaron.young@oracle.com>; Yao, Jiewen
<jiewen.yao@intel.com>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. -
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c
gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V
-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish
<afish@apple.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old
code base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi
t
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd
V
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C>
twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows
-
hardware/manufacture/desktop/windows-recovery-environment--windows-r
e-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does
cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an
OVMF build for which you have a matching "TerminalDxe.debug" file.
Once you do that, you can run "objdump" on the ".debug" file, and
get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to
narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on
the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Re: Windows guest fails to boot into recovery mode due to commit 5267926

annie li
 

Hello Laszlo,

"click "Reboot" with Shift held down " works properly for me too, the Windows can boot into recovery mode in my environment.
I just updated the bug with an another simpler way to reproduce this issue:
Press F8 on the boot entry to boot into "Advanced Boot Option", and then select "Repair your computer". This issue happens every time when I do this in my environment, and I don't need to terminate the qemu monitor twice by <ctrl-C> as earlier.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Friday, March 19, 2021 2:11 PM
To: Andrew Fish <afish@apple.com>; Annie Li <annie.li@oracle.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; discuss@edk2.groups.io; Wang, Jian J <jian.j.wang@intel.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and walk the page tables manually. (I figured I could help with this, but I couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just entered recovery fine.)

Thanks,
Laszlo


https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa7
Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$
<https://urldefense.com/v3/__https://wiki.osdev.org/Paging__;!!GqivPVa
7Brio!Ns258pdi0mlsnSN0oODh9wsYUe4PNDNF6avU8uN1wSySF8ktrFBAJma0qi5hWw$

Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@oracle.com> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@intel.com]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@oracle.com>; Laszlo
Ersek <lersek@redhat.com>
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish
<afish@apple.com>; Aaron Young <aaron.young@oracle.com>; Yao, Jiewen
<jiewen.yao@intel.com>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. -
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.c
gi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V
-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish
<afish@apple.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into
recovery mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old
code base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commi
t
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEd
V
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C>
twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows
-
hardware/manufacture/desktop/windows-recovery-environment--windows-r
e-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.
cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does
cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an
OVMF build for which you have a matching "TerminalDxe.debug" file.
Once you do that, you can run "objdump" on the ".debug" file, and
get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to
narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on
the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Re: Windows guest fails to boot into recovery mode due to commit 5267926

Laszlo Ersek
 

On 03/19/21 18:41, Andrew Fish wrote:
Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.
I think that's a typo: it should be CR3.

But, I agree -- you can use QEMU monitor commands to read RAM words, and
walk the page tables manually. (I figured I could help with this, but I
couldn't reproduce the issue locally. I used the manual Recovery entry
-- click "Reboot" with Shift held down. For me the Windows VM just
entered recovery fine.)

Thanks,
Laszlo


https://wiki.osdev.org/Paging


Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@oracle.com> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@intel.com]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@oracle.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish <afish@apple.com>; Aaron Young <aaron.young@oracle.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish
<afish@apple.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code
base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice,
see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause
the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF
build for which you have a matching "TerminalDxe.debug" file. Once you
do that, you can run "objdump" on the ".debug" file, and get a
disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to narrow
down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on the
actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Re: Windows guest fails to boot into recovery mode due to commit 5267926

Andrew Fish
 

Annie,

I’ve never used it, but I assume there is a way to dump page tables from the QEMU console. Maybe something like `info mem` ?

Thanks,

Andrew Fish

On Mar 19, 2021, at 10:41 AM, Andrew Fish <afish@apple.com> wrote:

Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.

https://wiki.osdev.org/Paging


Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@oracle.com <mailto:annie.li@oracle.com>> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>; Annie Li <annie.li@oracle.com <mailto:annie.li@oracle.com>>; Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>>
Cc: Wang, Jian J <jian.j.wang@intel.com <mailto:jian.j.wang@intel.com>>; Andrew Fish <afish@apple.com <mailto:afish@apple.com>>; Aaron Young <aaron.young@oracle.com <mailto:aaron.young@oracle.com>>; Yao, Jiewen <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ <https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$>

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <mailto:discuss@edk2.groups.io> <discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>>; discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>
Cc: Wang, Jian J <jian.j.wang@intel.com <mailto:jian.j.wang@intel.com>>; Andrew Fish
<afish@apple.com <mailto:afish@apple.com>>; Aaron Young <aaron.young@oracle.com <mailto:aaron.young@oracle.com>>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code
base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ <https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$> ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com <mailto:lersek@redhat.com>]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com <mailto:annie.li@oracle.com>>; discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>
Cc: jian.j.wang@intel.com <mailto:jian.j.wang@intel.com>; Andrew Fish <afish@apple.com <mailto:afish@apple.com>>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice,
see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause
the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF
build for which you have a matching "TerminalDxe.debug" file. Once you
do that, you can run "objdump" on the ".debug" file, and get a
disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to narrow
down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on the
actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo




Re: Windows guest fails to boot into recovery mode due to commit 5267926

Andrew Fish
 

Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.

https://wiki.osdev.org/Paging


Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@oracle.com> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@intel.com]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@oracle.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish <afish@apple.com>; Aaron Young <aaron.young@oracle.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish
<afish@apple.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code
base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice,
see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause
the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF
build for which you have a matching "TerminalDxe.debug" file. Once you
do that, you can run "objdump" on the ".debug" file, and get a
disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to narrow
down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on the
actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Re: Windows guest fails to boot into recovery mode due to commit 5267926

annie li
 

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@intel.com]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io; Annie Li <annie.li@oracle.com>; Laszlo Ersek <lersek@redhat.com>
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish <afish@apple.com>; Aaron Young <aaron.young@oracle.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish
<afish@apple.com>; Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code
base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice,
see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause
the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF
build for which you have a matching "TerminalDxe.debug" file. Once you
do that, you can run "objdump" on the ".debug" file, and get a
disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to narrow
down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on the
actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Re: Windows guest fails to boot into recovery mode due to commit 5267926

Andrew Fish
 

Yes it is very strange.

On Mar 18, 2021, at 7:17 PM, Yao, Jiewen <jiewen.yao@intel.com> wrote:

Good question.
The CR0.WP is set. But the page table protection may be turn OFF/ON again, if CPU driver need update it to protect an EFI image. Maybe it a bug somewhere.

I just read the final debug log to see the final result.
RSVD in exception data is weird. I think we need confirm what at first.


-----Original Message-----
From: Andrew Fish <afish@apple.com>
Sent: Friday, March 19, 2021 9:44 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: discuss@edk2.groups.io; annie.li@oracle.com; Laszlo Ersek
<lersek@redhat.com>; Wang, Jian J <jian.j.wang@intel.com>; Aaron Young
<aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926

Stupid question? Is there a reason the page tables are not write protected and
the write to the page table would fault?

Thanks,

Andrew Fish

On Mar 18, 2021, at 5:37 PM, Yao, Jiewen <jiewen.yao@intel.com> wrote:
Hi Arie
I added some of my thought in the Bugzilla. -
https://bugzilla.tianocore.org/show_bug.cgi?id=3269

If you can dump paging structure info for further analysis, we can help to
check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie
li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish <afish@apple.com>;
Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode
due
to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base
from where I started bisecting the comments. This time I reproduce this issue
with the code of branch 'stable/202011' of upstream. All the log I am
collecting
is from this code base(75ab038). Since the overall size of all log is pretty big,
I'll
attach all the data you required in to this
bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded
into
this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception,
please
check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the previous
two
consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-
re--
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd
round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in
MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue
is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause the
page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI -
0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 -
0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS -
0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same
toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build
for
which you have a matching "TerminalDxe.debug" file. Once you do that, you
can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe
driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump,
matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and
use that offset into the "objdump -S" disassembly, to narrow down what the
terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but
knowing what TerminalDxe was up to, might shed light on the actual reason.
It's
of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to
reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo









Re: Windows guest fails to boot into recovery mode due to commit 5267926

Yao, Jiewen
 

Good question.
The CR0.WP is set. But the page table protection may be turn OFF/ON again, if CPU driver need update it to protect an EFI image. Maybe it a bug somewhere.

I just read the final debug log to see the final result.
RSVD in exception data is weird. I think we need confirm what at first.

-----Original Message-----
From: Andrew Fish <afish@apple.com>
Sent: Friday, March 19, 2021 9:44 AM
To: Yao, Jiewen <jiewen.yao@intel.com>
Cc: discuss@edk2.groups.io; annie.li@oracle.com; Laszlo Ersek
<lersek@redhat.com>; Wang, Jian J <jian.j.wang@intel.com>; Aaron Young
<aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926

Stupid question? Is there a reason the page tables are not write protected and
the write to the page table would fault?

Thanks,

Andrew Fish

On Mar 18, 2021, at 5:37 PM, Yao, Jiewen <jiewen.yao@intel.com> wrote:

Hi Arie
I added some of my thought in the Bugzilla. -
https://bugzilla.tianocore.org/show_bug.cgi?id=3269

If you can dump paging structure info for further analysis, we can help to
check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie
li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish <afish@apple.com>;
Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode
due
to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base
from where I started bisecting the comments. This time I reproduce this issue
with the code of branch 'stable/202011' of upstream. All the log I am
collecting
is from this code base(75ab038). Since the overall size of all log is pretty big,
I'll
attach all the data you required in to this
bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded
into
this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception,
please
check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the previous
two
consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-
re--
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd
round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in
MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue
is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause the
page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI -
0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 -
0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS -
0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same
toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build
for
which you have a matching "TerminalDxe.debug" file. Once you do that, you
can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe
driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump,
matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and
use that offset into the "objdump -S" disassembly, to narrow down what the
terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but
knowing what TerminalDxe was up to, might shed light on the actual reason.
It's
of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to
reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Re: Windows guest fails to boot into recovery mode due to commit 5267926

Andrew Fish
 

Stupid question? Is there a reason the page tables are not write protected and the write to the page table would fault?

Thanks,

Andrew Fish

On Mar 18, 2021, at 5:37 PM, Yao, Jiewen <jiewen.yao@intel.com> wrote:

Hi Arie
I added some of my thought in the Bugzilla. - https://bugzilla.tianocore.org/show_bug.cgi?id=3269

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish <afish@apple.com>;
Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base
from where I started bisecting the comments. This time I reproduce this issue
with the code of branch 'stable/202011' of upstream. All the log I am collecting
is from this code base(75ab038). Since the overall size of all log is pretty big, I'll
attach all the data you required in to this
bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded into
this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception, please
check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the previous two
consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re--
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round
due to the patch above(5267926). I modified the return value to "(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in
MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause the
page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same
toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build for
which you have a matching "TerminalDxe.debug" file. Once you do that, you can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe
driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump,
matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and
use that offset into the "objdump -S" disassembly, to narrow down what the
terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but
knowing what TerminalDxe was up to, might shed light on the actual reason. It's
of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to
reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Re: Windows guest fails to boot into recovery mode due to commit 5267926

Yao, Jiewen
 

Hi Arie
I added some of my thought in the Bugzilla. - https://bugzilla.tianocore.org/show_bug.cgi?id=3269

If you can dump paging structure info for further analysis, we can help to check.

-----Original Message-----
From: discuss@edk2.groups.io <discuss@edk2.groups.io> On Behalf Of annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com>; discuss@edk2.groups.io
Cc: Wang, Jian J <jian.j.wang@intel.com>; Andrew Fish <afish@apple.com>;
Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due
to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base
from where I started bisecting the comments. This time I reproduce this issue
with the code of branch 'stable/202011' of upstream. All the log I am collecting
is from this code base(75ab038). Since the overall size of all log is pretty big, I'll
attach all the data you required in to this
bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded into
this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception, please
check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the previous two
consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re--
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round
due to the patch above(5267926). I modified the return value to "(PcdGetBool
(PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in
MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is
gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause the
page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same
toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build for
which you have a matching "TerminalDxe.debug" file. Once you do that, you can
run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe
driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump,
matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and
use that offset into the "objdump -S" disassembly, to narrow down what the
terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but
knowing what TerminalDxe was up to, might shed light on the actual reason. It's
of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to
reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo





Re: Windows guest fails to boot into recovery mode due to commit 5267926

Andrew Fish
 

On Mar 18, 2021, at 6:22 AM, Laszlo Ersek <lersek@redhat.com> wrote:

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)

During above 3 windows booting procedures, the value of following variables are always the same,
PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0
PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202
RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001
RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001
RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.debug
WARNING Wish List off topic…..

It would nice to have a debug script that could post process serial log file and append the extra information. That tool would need to bee toolchain aware as for gcc you do `objdump -f -S TerminalDxe.debug` for Xcode you would do `lldb -o <lldbCommand> Terminal.dll. I guess it could also decode the execution and point out CR2 is the fault address and what ExceptionData means.

We could hook something like that into the CI and capture more detailed error reports.

Thanks,

Andrew Fish


The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo




Re: Windows guest fails to boot into recovery mode due to commit 5267926

annie li
 

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code base from where I started bisecting the comments. This time I reproduce this issue with the code of branch 'stable/202011' of upstream. All the log I am collecting is from this code base(75ab038). Since the overall size of all log is pretty big, I'll attach all the data you required in to this bug(https://bugzilla.tianocore.org/show_bug.cgi?id=3269).

I dump the register by qmp-regdump, and the result(regdump) is uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com>; discuss@edk2.groups.io
Cc: jian.j.wang@intel.com; Andrew Fish <afish@apple.com>
Subject: Re: Windows guest fails to boot into recovery mode due to commit 5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdVv
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference*entry-points-into-winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo


Re: Windows guest fails to boot into recovery mode due to commit 5267926

annie li
 

Hello Andrew

A bug is filed - https://bugzilla.tianocore.org/show_bug.cgi?id=3269
I changed the DEBUG flag to 0x8070FFFF in OvmfPkg/OvmfPkgX64.dsc, and hope it provides more helpful debug log. The whole log is pretty big, so I'll upload it in this bug, file name is bootfail.zip.
I also added more log to print out following variables, they are all tagged with "Annie" to distinguish from the other logs.
ERROR: IsEnableNonExecNeeded Annie PcdSetNxForStack 0
ERROR: IsEnableNonExecNeeded Annie PcdDxeNxMemoryProtectionPolicy 0
ERROR: IsEnableNonExecNeeded Annie PcdImageProtectionPolicy 2
ERROR: IsEnableNonExecNeeded Annie final return 1

Thanks
Annie

-----Original Message-----
From: discuss@edk2.groups.io [mailto:discuss@edk2.groups.io] On Behalf Of Andrew Fish via groups.io
Sent: Wednesday, March 17, 2021 11:28 PM
To: discuss <discuss@edk2.groups.io>; Annie Li <annie.li@oracle.com>
Cc: jian.j.wang@intel.com; lersek@redhat.com
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Annie,

Can you attach the entire serial log of the boot to give some context to the address ranges? Also please file a BZ.

CR2 is the fault address and I think the ExceptionData is implying a present page with a reserved bit set in one of the page table entries?

Thanks,

Andrew Fish

On Mar 17, 2021, at 6:48 PM, annie li <annie.li@oracle.com> wrote:

Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!JLv8xfFnOSVRf
SXMUSrSBpQfbUJDQoJE27VfrLqKDKW4FplTDvOWWGhKfjWPvg$

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference*entry-points-into-winre__;Iw!!GqivPVa7Brio!JLv8xfFnOSVRfSXMUSrSBpQfbUJDQoJE27VfrLqKDKW4FplTDvOWWGho8w0tWw$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=1116__;!!GqivPVa7Brio!JLv8xfFnOSVRfSXMUSrSBpQfbUJDQoJE27VfrLqKDKW4FplTDvOWWGgiCR8POQ$ , where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001, RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
Thanks
Annie





Google Summer of Code 2021 interested student

Ayush Dwivedi <21cencturyayush@...>
 

Hello everyone,
My name is Ayush Dwivedi. I am currently studying Computer Science and
Engineering and am in my 3rd year of B.Tech program. I am interested in
operating systems, firmwares and game programming. I have written some
personal projects in C and C++ programming languages and have experimented
with x86_64 assembly. Recently I have been using QEMU with the OVMF UEFI
firmware which I had built from source using edk2. I have tried to write my
own EFI applications(it was an attempt to understand how UEFI differs from
BIOS). I wish to be a part of the TianoCore community and want to
contribute. I would like to know what skills and knowledge is needed for
the task "MinPlatform Qemu Support". It is explained that we are needed to
port MinPlatform to QEMU so as of now I have started looking into the
OvmfPkg(since it already runs on QEMU) and Platform/Intel/MinPlatformPkg
but the source tree for edk2 and edk2-platform is huge so I am confused on
how and where I should start. I am looking forward to guidance from the
community.

Thank you for your precious time.

Regards,
Ayush Dwivedi


Re: Windows guest fails to boot into recovery mode due to commit 5267926

Laszlo Ersek
 

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)

During above 3 windows booting procedures, the value of following variables are always the same,
PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0
PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202
RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001
RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001
RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF build for which you have a matching "TerminalDxe.debug" file. Once you do that, you can run "objdump" on the ".debug" file, and get a disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register dump, matches the (relative) "start address" that "objdump -f" reports,

- we can take the crash offset (RIP - ImageBase), from the register dump, and use that offset into the "objdump -S" disassembly, to narrow down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a crash, but knowing what TerminalDxe was up to, might shed light on the actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo


Re: Windows guest fails to boot into recovery mode due to commit 5267926

Andrew Fish
 

Annie,

Can you attach the entire serial log of the boot to give some context to the address ranges? Also please file a BZ.

CR2 is the fault address and I think the ExceptionData is implying a present page with a reserved bit set in one of the page table entries?

Thanks,

Andrew Fish

On Mar 17, 2021, at 6:48 PM, annie li <annie.li@oracle.com> wrote:

Hello,

I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)

During above 3 windows booting procedures, the value of following variables are always the same,
PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0
PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202
RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001
RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001
RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
Thanks
Annie





Windows guest fails to boot into recovery mode due to commit 5267926

annie li
 

Hello,

I ran into a windows booting failure issue(a page fault exception), and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://github.com/tianocore/edk2/commit/5267926134d17e86672b84fd57b438f05ffa68e1

This issue always happens after QMP is terminated by <ctrl-C> twice, see following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows should boot into recovery mode in this round, and this is due to the previous two consecutive boot failure, see https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-recovery-environment--windows-re--technical-reference#entry-points-into-winre)

During above 3 windows booting procedures, the value of following variables are always the same,
PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0
PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the 3rd round due to the patch above(5267926). I modified the return value to "(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded" in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault issue is gone with this change. The patch(5267926) is for fixing bug https://bugzilla.tianocore.org/show_bug.cgi?id=1116, where the comments show PcdImageProtectionPolicy needs also to enable NXE. But this does cause the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS - 0000000000010202
RAX - 8000000000000003, RCX - 0000000000000001, RDX - 0000000001040001
RBX - 0000000000000001, RSP - 00000000001A6AA0, RBP - 0000000001040001
RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 - 000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 - 000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000003F1EE698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/TerminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll (ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!
Thanks
Annie


Re: lost stderr in console

Laszlo Ersek
 

On 03/17/21 04:23, wenyi,xie via groups.io wrote:
Hi,everyone,

I used to set makeflags like below,so that only message from stderr will be shown in console while compiling, and at the same time message from stderr and stdout will all be saved in xxx.log.
COMMAND MAKEFLAGS= set -eo pipefail && sh xxx.sh 2>&1 >> ${LOG_FILE_DIR}/xxx.log | tee -a ${LOG_FILE_DIR}/xxx.log
Redirections are processed in the following order:
- pipe first (redirects stdout)
- then left to right, as seen on the command line

So what we have for "xxx.sh" is:

step#1: file descriptor 1 refers to the (nameless) pipe's file description

step#2: file descriptor 2 refers to the same (nameless) pipe's file
description (i.e., to the file description that file descriptor 1
*currently* refers to)

step#3: file descriptor 1 now refers to a file description that refers
to the inode ("file") originally looked up by the name "xxx.log". At the
file description level, the O_APPEND status flag is set.

So "tee" will only see stderr from "xxx.sh". Furthermore, the stdout of
"xxx.sh" will only go to the log file ("xxx.log").


Let's consider "tee" then: "tee" opens the inode (the "file") looked up
by the name "xxx.log" separately from when the shell opens "xxx.log",
for the "xxx.sh" redirection. This means that, in the kernel, a separate
file description exists, for the "xxx.log" inode. This file description
also has the O_APPEND status flag set, but it doesn't matter -- the file
description that "xxx.sh" writes through, and the file description that
"tee" writes through, are independent. The "file offset" property is at
the file description level. Therefore "tee" and "xxx.sh" do not share
the file offset (and O_APPEND is useless, in both file descriptions),
and they will mutually overwrite parts of each other's output.

In other words, your command line is buggy.


In general:

file descriptor --> file description --> file (inode)

When you open() the same file by name, you get this:

file descriptor --> file description \
--> file (inode)
file descriptor --> file description /

Whereas, if you use fork() or dup(), this is what you get:

file descriptor \
--> file description --> file (inode)
file descriptor /

O_APPEND and the file offset both exist in the *file description*
object. So in the first case, you get no coordination from the kernel,
and in the second case, you do.

Note that even in the second case, that is, when both file descriptors
refer to the same file description, it is not guaranteed that
*concurrent* writes will not be *interleaved*. No data will be
*overwritten*, for sure, but the granularity of "atomic" writes is not
an easy question. If the file description refers to a pipe, then there
are some guarantees from POSIX, as long as the writes are "small enough"
(PIPE_BUF):
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html>.


So... what you want to do is actually difficult. There are two
approaches, but both are broken.

Approach (1): fuse stdout and stderr into a single stream, capture the
common stream in the log file, and print the "diagnostic" lines to the
console:

sh xxx.sh 2>&1 | tee -- xxx.log | grep -- 'what exacty?'

This is broken because you cannot identify the diagnostic-only lines
(the original stderr) by content.

Approach (2): duplicate the original stderr into two streams. The first
instance will go to the console. The second instance, together with the
script's stdout, will be written to the log file.

# fd 1: original stdout
# fd 2: original stderr
# make fd 3 point to the original stdout as well

exec 3>&1

(
# fd 1: pipe leading to the log file
# fd 2: original stderr
# fd 3: original stdout
# make fd 4 too point to the pipe leading to the log file

exec 4>&1

(
# fd 1: pipe carrying the main script's stderr
# fd 2: original stderr
# fd 3: original stdout
# fd 4: pipe leading to the log file

# main script's stderr goes to the duplicator pipe
# main script's stdout goes to the log file

xxx.sh 2>&1 >&4
) | (

# fd 0: pipe carrying the main script's stderr
# fd 1: pipe leading to the log file
# fd 2: original stderr
# fd 3: original stdout
# fd 4: pipe leading to the log file

# duplicate the main script's stderr to the original stdout (3)
# and to the log file (1)

tee /dev/fd/3
)
) | cat > xxx.log


In short form (no comments and no useless subshells):

exec 3>&1
(
exec 4>&1
xxx.sh 2>&1 >&4 | tee /dev/fd/3
) | cat > xxx.log

This will not lose messages. It will also not interleave write()
syscalls that do not exceed PIPE_BUF (usually 4KB) individually -- this
is the justification for the outermost pipe, and "cat".

However, it is still broken: dependent on process scheduling between
"xxx.sh" and "tee", it's now possible that stdout and stderr lines from
"xxx.sh" will be reordered, relative to each other. Let's say line#1 is
a diagnostic message, while line#2 is a normal message. "xxx.sh" writes
them in line#1, line#2 order. With the above, line#1 goes from "xxx.sh"
to "tee" to "cat", while line#2 goes from "xxx.sh" to "cat". If "tee" is
"slow", then "cat" could see the messages in line#2, line#1 order.

Thanks
Laszlo

101 - 120 of 752