Re: Request for help understanding MemoryOverwriteRequestControl


Martyn Welch <martyn.welch@...>
 

On Tue, 2020-09-29 at 15:06 +0200, Laszlo Ersek wrote:
+Jiewen, comments below

On 09/24/20 11:36, Martyn Welch wrote:
Hi,

We have a number of MinnowBoards (both Turbot and Max variants)
here
that are used for various Linux development purposes, including
having
a number that are used in our LAVA farm which among other things
runs
Linux KernelCI jobs. The firmware on these devices is currently
"MNW2MAX1.X64.0101.R01.1908071815" as downloaded from the Intel
site:

https://software.intel.com/content/www/us/en/develop/articles/minnowboard-maxturbot-uefi-firmware.html

We are seeing the following message during boot on *some* of the
boards, but not others:

Clear memory in MRC per MOR request Start, Please wait for some
minutes...
Side comment:

Wow! I very much suspected that the memory overwrite is performed on
physical platforms via proprietary DIMM or board access, not via
plain
RAM writes. The above "MRC" reference confirms it.
Yeah, I don't think this platform has any fancy way of doing this.

We have "CONFIG_RESET_ATTACK_MITIGATION" set in the Linux kernel
configuration, which I understand will cause the
"MemoryOverwriteRequest" bit to be set during boot and hence
trigger
this behaviour (unless explicitly cleared before the board is
reset).
Yes, see commit ccc829ba3624 ("efi/libstub: Enable reset attack
mitigation", 2017-08-26).

(Side comment regarding historical Fedora kernels:
<https://bugzilla.redhat.com/show_bug.cgi?id=1498159>;.)

The config knob's documentation was later extended in commit
a5c03c31af22 ("x86/efi: Clarify that reset attack mitigation needs
appropriate userspace", 2018-01-19).

Some of our boards seem to only be exposing the related
"MemoryOverwriteRequestControlLock" EFI variable:

Shell> dmpstore -guid e20939be-32d4-41be-a150-897f85d49829
dmpstore: No matching variables found. Guid E20939BE-32D4-41BE-
A150-897F85D49829
Shell> dmpstore -guid bb983ccf-151d-40e1-a07b-4a17be168292
Variable NV+RT+BS 'BB983CCF-151D-40E1-A07B-
4A17BE168292:MemoryOverwriteRequestControlLock' DataSize = 0x01
00000000:
00 *.*

Thus this behaviour isn't triggered. Others expose both
"MemoryOverwriteRequestControl" and
"MemoryOverwriteRequestControlLock":

Shell> dmpstore -guid e20939be-32d4-41be-a150-897f85d49829
Variable NV+RT+BS 'E20939BE-32D4-41BE-A150-
897F85D49829:MemoryOverwriteRequestControl' DataSize = 0x01
00000000:
01 *.*

Shell> dmpstore -guid bb983ccf-151d-40e1-a07b-4a17be168292
Variable NV+RT+BS 'BB983CCF-151D-40E1-A07B-
4A17BE168292:MemoryOverwriteRequestControlLock' DataSize = 0x01
00000000:
00 *.*
MemoryOverwriteRequestControlLock is a Microsoft- (not TCG-)
originated
hardening:

https://docs.microsoft.com/en-us/windows-hardware/drivers/bringup/device-guard-requirements

It's been a while since I last thought about it, but basically it's a
way to prevent the attacker from even attempting to clear the MOR bit
in
the original MemoryOverwriteRequestControl variable, before they'd
force
a platform reset.


The situation where you see MOR Control Lock variable but not the MOR
Control variable, was a bug in edk2. It has been fixed under

https://bugzilla.tianocore.org/show_bug.cgi?id=727

(We first encountered this issue in
<https://bugzilla.redhat.com/show_bug.cgi?id=1496170>;.)
Ah! Awesome, this was the bit I was missing!

My understanding is that we should be seeing both these EFI
variables
being exposed. I'm rather unfamiliar with the EDK codebase and have
not
been able to work out how I would end up with
"MemoryOverwriteRequestControlLock" and not
"MemoryOverwriteRequestControl".
More precisely, the valid cases are:

- none of them present (= system doesn't support the Platform Reset
Attack Mitigation from the TCG)

- MOR Control is present, but MOR Control Lock is not (= the TCG spec
is
supported, but the Microsoft-defined hardening is not)

- both MOR Control and MOR Control Lock are present (= both specs are
supported)

I've tried using `J7` to reset the NVRAM on a board just exposing
"MemoryOverwriteRequestControlLock", following the process
described
here to see if it would have an effect and it hasn't:

https://uchan.hateblo.jp/entry/2018/01/09/075230
Yes, with the bug present, the firmware would re-create MOR Control
Lock. See TianoCore#727 (link above).
I suspect this is exactly what we are seeing.

One of the boards in our LAVA instance was initially only exposing
the
lock variable, but then seemingly randomly started to expose the
other
variable and perform the erase at boot. I've not been able to
determine
what triggered this change in behaviour.
This seems vaguely consistent with the OS kernel being buggy too
(that
is, <https://bugzilla.redhat.com/show_bug.cgi?id=1498159>;), *or
else*
with your Linux userspace not clearing the MOR bit in the MOR Control
variable, as a part of the controlled OS shutdown.
Since this board is in our LAVA farm and used by a few different
things, I guess there's a reasonable chance that one of these ran a
kernel of a sufficient vintage to have triggered the buggy behaviour.

Any help/pointers would be much appreciated.
I would suggest:

(1) Upgrade the platform firmware to a version that contains the edk2
commit range fixing TianoCore#727 (namely
35ac962b5473..fda8f631edbb).
This prevents the out-of-spec situation where only MOR Control Lock
exists. (While that situation is not your acute problem now, it's
best
to get it solved too.)
Based on the behaviour I'm seeing, I believe the latest offering on the
Intel site, released in Aug 2019, doesn't include these changes. It
also reports as `UEFI v2.60 (EDK II, 0x00010000)`. I'm unsure exactly
how the versioning in EDK2 is managed, but the
`EFI_SYSTEM_TABLE_REVISION` that I think is used here was changed to
`2.70` in Dec 2017, so I guess that minnowboard firmware is based on an
old tree.

Now that I know this is probably a bug, I can live with it to be
honest.

(2) Make sure your kernel does not *create* MOR Control under any
circumstances, only modifies it if it exists.
The one I really care about is currently a 5.7.x stable release, so
should be good there.

(3) Either remove CONFIG_RESET_ATTACK_MITIGATION from your kernel
config, or verify that your userspace clears the MOR bit in MOR
Control
before a controlled OS shutdown. (To be honest, I don't know what
Linux
distributions satisfy the userspace requirement, as
CONFIG_RESET_ATTACK_MITIGATION is not enabled in RHEL8 for example,
as
far as I can see.)
Yes, this is looking like it may be the best way to handle this.

Thanks for your help,

Martyn

Join discuss@edk2.groups.io to automatically receive all group messages.