Re: Request for help understanding MemoryOverwriteRequestControl


Laszlo Ersek
 

+Jiewen, comments below

On 09/24/20 11:36, Martyn Welch wrote:
Hi,

We have a number of MinnowBoards (both Turbot and Max variants) here
that are used for various Linux development purposes, including having
a number that are used in our LAVA farm which among other things runs
Linux KernelCI jobs. The firmware on these devices is currently
"MNW2MAX1.X64.0101.R01.1908071815" as downloaded from the Intel site:

https://software.intel.com/content/www/us/en/develop/articles/minnowboard-maxturbot-uefi-firmware.html

We are seeing the following message during boot on *some* of the
boards, but not others:

Clear memory in MRC per MOR request Start, Please wait for some
minutes...
Side comment:

Wow! I very much suspected that the memory overwrite is performed on
physical platforms via proprietary DIMM or board access, not via plain
RAM writes. The above "MRC" reference confirms it.


We have "CONFIG_RESET_ATTACK_MITIGATION" set in the Linux kernel
configuration, which I understand will cause the
"MemoryOverwriteRequest" bit to be set during boot and hence trigger
this behaviour (unless explicitly cleared before the board is reset).
Yes, see commit ccc829ba3624 ("efi/libstub: Enable reset attack
mitigation", 2017-08-26).

(Side comment regarding historical Fedora kernels:
<https://bugzilla.redhat.com/show_bug.cgi?id=1498159>.)

The config knob's documentation was later extended in commit
a5c03c31af22 ("x86/efi: Clarify that reset attack mitigation needs
appropriate userspace", 2018-01-19).


Some of our boards seem to only be exposing the related
"MemoryOverwriteRequestControlLock" EFI variable:

Shell> dmpstore -guid e20939be-32d4-41be-a150-897f85d49829
dmpstore: No matching variables found. Guid E20939BE-32D4-41BE-
A150-897F85D49829
Shell> dmpstore -guid bb983ccf-151d-40e1-a07b-4a17be168292
Variable NV+RT+BS 'BB983CCF-151D-40E1-A07B-
4A17BE168292:MemoryOverwriteRequestControlLock' DataSize = 0x01
00000000: 00 *.*

Thus this behaviour isn't triggered. Others expose both
"MemoryOverwriteRequestControl" and
"MemoryOverwriteRequestControlLock":

Shell> dmpstore -guid e20939be-32d4-41be-a150-897f85d49829
Variable NV+RT+BS 'E20939BE-32D4-41BE-A150-
897F85D49829:MemoryOverwriteRequestControl' DataSize = 0x01
00000000: 01 *.*

Shell> dmpstore -guid bb983ccf-151d-40e1-a07b-4a17be168292
Variable NV+RT+BS 'BB983CCF-151D-40E1-A07B-
4A17BE168292:MemoryOverwriteRequestControlLock' DataSize = 0x01
00000000: 00 *.*
MemoryOverwriteRequestControlLock is a Microsoft- (not TCG-) originated
hardening:

https://docs.microsoft.com/en-us/windows-hardware/drivers/bringup/device-guard-requirements

It's been a while since I last thought about it, but basically it's a
way to prevent the attacker from even attempting to clear the MOR bit in
the original MemoryOverwriteRequestControl variable, before they'd force
a platform reset.


The situation where you see MOR Control Lock variable but not the MOR
Control variable, was a bug in edk2. It has been fixed under

https://bugzilla.tianocore.org/show_bug.cgi?id=727

(We first encountered this issue in
<https://bugzilla.redhat.com/show_bug.cgi?id=1496170>.)

My understanding is that we should be seeing both these EFI variables
being exposed. I'm rather unfamiliar with the EDK codebase and have not
been able to work out how I would end up with
"MemoryOverwriteRequestControlLock" and not
"MemoryOverwriteRequestControl".
More precisely, the valid cases are:

- none of them present (= system doesn't support the Platform Reset
Attack Mitigation from the TCG)

- MOR Control is present, but MOR Control Lock is not (= the TCG spec is
supported, but the Microsoft-defined hardening is not)

- both MOR Control and MOR Control Lock are present (= both specs are
supported)


I've tried using `J7` to reset the NVRAM on a board just exposing
"MemoryOverwriteRequestControlLock", following the process described
here to see if it would have an effect and it hasn't:

https://uchan.hateblo.jp/entry/2018/01/09/075230
Yes, with the bug present, the firmware would re-create MOR Control
Lock. See TianoCore#727 (link above).

One of the boards in our LAVA instance was initially only exposing the
lock variable, but then seemingly randomly started to expose the other
variable and perform the erase at boot. I've not been able to determine
what triggered this change in behaviour.
This seems vaguely consistent with the OS kernel being buggy too (that
is, <https://bugzilla.redhat.com/show_bug.cgi?id=1498159>), *or else*
with your Linux userspace not clearing the MOR bit in the MOR Control
variable, as a part of the controlled OS shutdown.

Any help/pointers would be much appreciated.
I would suggest:

(1) Upgrade the platform firmware to a version that contains the edk2
commit range fixing TianoCore#727 (namely 35ac962b5473..fda8f631edbb).
This prevents the out-of-spec situation where only MOR Control Lock
exists. (While that situation is not your acute problem now, it's best
to get it solved too.)

(2) Make sure your kernel does not *create* MOR Control under any
circumstances, only modifies it if it exists.

(3) Either remove CONFIG_RESET_ATTACK_MITIGATION from your kernel
config, or verify that your userspace clears the MOR bit in MOR Control
before a controlled OS shutdown. (To be honest, I don't know what Linux
distributions satisfy the userspace requirement, as
CONFIG_RESET_ATTACK_MITIGATION is not enabled in RHEL8 for example, as
far as I can see.)

Thanks
Laszlo

Join {discuss@edk2.groups.io to automatically receive all group messages.