Re: Windows guest fails to boot into recovery mode due to commit 5267926


Andrew Fish
 

Annie,

I’ve never used it, but I assume there is a way to dump page tables from the QEMU console. Maybe something like `info mem` ?

Thanks,

Andrew Fish

On Mar 19, 2021, at 10:41 AM, Andrew Fish <afish@apple.com> wrote:

Annie,

CR2 points the 1st level of the page tables. Those entries point to other page tables, so you kind of have to walk it by hand.

https://wiki.osdev.org/Paging


Thanks,

Andrew Fish

On Mar 19, 2021, at 9:56 AM, Annie Li <annie.li@oracle.com <mailto:annie.li@oracle.com>> wrote:

Hi Jiewen,

In DumpCpuContext function in ArchExceptionHandler.c, the exception details are gotten from either "SystemContextX64->ExceptionData" or "SystemContextX64.xxx". I am wondering how I can dump the page info there? Are there some related info that can be retrieved from CR2? can you enlighten me a little bit?

Thanks
Annie

-----Original Message-----
From: Yao, Jiewen [mailto:jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>]
Sent: Thursday, March 18, 2021 8:37 PM
To: discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>; Annie Li <annie.li@oracle.com <mailto:annie.li@oracle.com>>; Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>>
Cc: Wang, Jian J <jian.j.wang@intel.com <mailto:jian.j.wang@intel.com>>; Andrew Fish <afish@apple.com <mailto:afish@apple.com>>; Aaron Young <aaron.young@oracle.com <mailto:aaron.young@oracle.com>>; Yao, Jiewen <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>>
Subject: RE: [edk2-discuss] Windows guest fails to boot into recovery mode due to commit 5267926

Hi Arie
I added some of my thought in the Bugzilla. - https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ <https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$>

If you can dump paging structure info for further analysis, we can help to check.


-----Original Message-----
From: discuss@edk2.groups.io <mailto:discuss@edk2.groups.io> <discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>> On Behalf Of
annie li
Sent: Friday, March 19, 2021 3:27 AM
To: Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>>; discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>
Cc: Wang, Jian J <jian.j.wang@intel.com <mailto:jian.j.wang@intel.com>>; Andrew Fish
<afish@apple.com <mailto:afish@apple.com>>; Aaron Young <aaron.young@oracle.com <mailto:aaron.young@oracle.com>>
Subject: Re: [edk2-discuss] Windows guest fails to boot into recovery
mode due to commit 5267926

Hello Laszlo,

In my previous email, the exception is reproduced with pretty old code
base from where I started bisecting the comments. This time I
reproduce this issue with the code of branch 'stable/202011' of
upstream. All the log I am collecting is from this code base(75ab038).
Since the overall size of all log is pretty big, I'll attach all the
data you required in to this bug(https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$ <https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cgi?id=3269__;!!GqivPVa7Brio!JMob8PcNWJxj_RZSIWy7iwhqFFhIYSwtnR_8i0X6V-UzBkycx-iObkffqGNBrw$> ).

I dump the register by qmp-regdump, and the result(regdump) is
uploaded into this bug. If this log doesn't suffice, can you please suggest the way you prefer?
The objdump is uploaded, as well as the details of page fault
exception, please check the details there.

Thanks
Annie

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com <mailto:lersek@redhat.com>]
Sent: Thursday, March 18, 2021 9:23 AM
To: Annie Li <annie.li@oracle.com <mailto:annie.li@oracle.com>>; discuss@edk2.groups.io <mailto:discuss@edk2.groups.io>
Cc: jian.j.wang@intel.com <mailto:jian.j.wang@intel.com>; Andrew Fish <afish@apple.com <mailto:afish@apple.com>>
Subject: Re: Windows guest fails to boot into recovery mode due to
commit
5267926

On 03/18/21 02:48, Annie Li wrote:
Hello,

I ran into a windows booting failure issue(a page fault exception),
and narrow down it to the following patch,
MdeModulePkg/DxeIpl: support more NX related PCDs
https://urldefense.com/v3/__https://github.com/tianocore/edk2/commit
/5
267926134d17e86672b84fd57b438f05ffa68e1__;!!GqivPVa7Brio!PuMaBhjIGEdV
v
lQi7PKC_FQeyIy-sjSaIZXk_W_MusXNUlQBxGqsJBRONWwKWw$

This issue always happens after QMP is terminated by <ctrl-C> twice,
see
following steps.

1. Boot Windows VM up, and <ctrl-C> to exit the QMP

2. Repeat 1

3. Boot Windows VM, and this page fault issue happens. (Note: Windows
should boot into recovery mode in this round, and this is due to the
previous two consecutive boot failure, see
https://urldefense.com/v3/__https://docs.microsoft.com/en-us/windows-
hardware/manufacture/desktop/windows-recovery-environment--windows-re-
-
technical-reference*entry-points-into-
winre__;Iw!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBSkXMCNZA$ )

During above 3 windows booting procedures, the value of following
variables are always the same, PcdSetNxForStack 0
PcdDxeNxMemoryProtectionPolicy 0 PcdImageProtectionPolicy 2

However, Windows guest fails to boot up into recovery mode in the
3rd round
due to the patch above(5267926). I modified the return value to
"(PcdGetBool (PcdSetNxForStack)" in function "IsEnableNonExecNeeded"
in MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c, this page fault
issue is gone with this change. The patch(5267926) is for fixing bug
https://urldefense.com/v3/__https://bugzilla.tianocore.org/show_bug.cg
i?id=1
116__;!!GqivPVa7Brio!PuMaBhjIGEdVvlQi7PKC_FQeyIy-
sjSaIZXk_W_MusXNUlQBxGqsJBTLSxdsog$ , where the comments show
PcdImageProtectionPolicy needs also to enable NXE. But this does cause
the page fault exception in this scenario, any suggestion?

The page fault exception is pasted here,


!!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000009 I:0 R:1 U:0 W:0 P:1 PK:0 SS:0
SGX:0 RIP - 000000003E0A7C75, CS - 0000000000000038, RFLAGS -
0000000000010202 RAX - 8000000000000003, RCX - 0000000000000001,
RDX
- 0000000001040001 RBX - 0000000000000001, RSP - 00000000001A6AA0,
RBP - 0000000001040001 RSI - 000000003F2E2010, RDI - 0000000000000001
R8 - 0000000000000000, R9 - 000000003E0AEC90, R10 - 0000FFFFFFFFF000
R11 - 00000000001A6E90, R12 - 0000000000000000, R13 -
000000003E0AEC90
R14 - 00000000001A6B28, R15 - 00000000001AB000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 000000003F2E2010, CR3 -
000000003F401000
CR4 - 0000000000040668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 -
0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 -
0000000000000400 GDTR - 000000003F1EE698 0000000000000047, LDTR -
0000000000000000
IDTR - 000000003ECCA018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 00000000001A6700
!!!! Find image based on IP(0x3E0A7C75) /builddir/build/BUILD/edk2-
1.4.3/Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/T
erminalDxe/TerminalDxe/DEBUG/TerminalDxe.dll
(ImageBase=000000003E0A5000, EntryPoint=000000003E0A86E8) !!!!

In addition to what Andrew said, I suggest the following:

(1) Please rebuild OVMF *locally*, using the same edk2 tree, and the
same toolchain, and the same "build" flags.

(2) Reproduce the issue, capture the register dump.

(3) Run the following command:

objdump -f -S
Build/OvmfX64/DEBUG_GCC48/X64/MdeModulePkg/Universal/Console/Termin
alDxe/TerminalDxe/DEBUG/TerminalDxe.debug


The point of this exercise is to reproduce the issue with such an OVMF
build for which you have a matching "TerminalDxe.debug" file. Once you
do that, you can run "objdump" on the ".debug" file, and get a
disassembly of the TerminalDxe driver, inter-leaved with the C language source code.

Then, we can do two things:

- we can verify whether (EntryPoint - ImageBase), from the register
dump, matches the (relative) "start address" that "objdump -f"
reports,

- we can take the crash offset (RIP - ImageBase), from the register
dump, and use that offset into the "objdump -S" disassembly, to narrow
down what the terminal driver may have been doing to trigger the crash.

It's not necessarily the terminal driver's fault that encounter a
crash, but knowing what TerminalDxe was up to, might shed light on the
actual reason. It's of course also possible that TerminalDxe *is* at fault. We'll see.

If possible, please post:
- your precise edk2 version (if you have local patches, it would be
best to reproduce with an upstream-only tree),
- your full firmware log (feel free to compress it),
- the register dump from serial,
- the objdump (disassembly) output (feel free to compress it).

Thanks,
Laszlo



Join discuss@edk2.groups.io to automatically receive all group messages.