[edk2-rfc] [edk2-devel] UEFI Variable SMI Reduction


Kubacki, Michael A
 

Hello,

I would appreciate any feedback you may have for this proposal.

Overview
--------------
This is a proposal to reduce SMM usage when using VariableSmmRuntimeDxe with VariableSmm. It will do so by eliminating SMM usage for the vast majority of runtime service GetVariable ( ) and GetNextVariableName ( ) invocations. Most UEFI variable usage in typical systems after the variable store is initialized (e.g. manufacturing boots) is due to GetVariable ( ) and GetNextVariableName ( ) not SetVariable ( ). GetVariable ( ) calls can regularly exceed 100 per boot while SetVariable ( ) calls typically remain less than 10 per boot. By focusing on the common case, the majority of overhead associated with SMM can be avoided while still using existing and proven code for operations such as variable authentication that require an isolated execution environment.

* Advantage: Reduces overall system SMM usage
* Disadvantage: Requires more Runtime data memory usage

Initial Performance Observations
----------------------------------------------
* With these proposed changes, an Intel Atom based SoC saw GetVariable ( ) time for an existing variable reduce from ~220us to ~5us.

Major Changes
---------------------
1. Two UEFI variable caches will be maintained.
a. "Runtime Cache" - Maintained in VariableSmmRuntimeDxe. Used to serve runtime service GetVariable ( ) and GetNextVariableName ( ) callers.
b. "SMM cache" - Maintained in VariableSmm to service SMM GetVariable ( ) and GetNextVariableName ( ) callers.
i. A cache in SMRAM retained so SMM modules do not operate on data outside SMRAM.
2. A new UEFI variable read and write flow will be used as described below.

At any given time, the two caches would be coherent. On a variable write, the runtime cache is only updated after validation in SMM and, in the case of a non-volatile UEFI variable, the variable must also be successfully written to non-volatile storage.

Primary Concern
-----------------------
The primary item that I believe warrants feedback is whether there are substantial concerns with keeping a variable store cache that is used to serve UEFI Runtime Services callers in a buffer of EfiRuntimeServicesData type.

Proof-of-Concept Implementation
----------------------------------------------
The implementation is available in the following commit - check the commit message for some more details.
https://github.com/makubacki/edk2/commit/d812d43412a26e44581d283382596a863c1ae825

Please note this is "POC" level code quality and there will be cleanup of lock interfaces used and some other minor changes. Please feel free to leave any comments on the changes. This code was tested with the master branch of edk2 on an Intel Whiskey Lake U reference validation platform.

Why Keep SMM on Variable Writes
------------------------------------------------
* SMM provides a fairly ubiquitous isolated execution environment in x86 for authenticated UEFI variables.
* BIOS region SPI flash write restrictions to SMM in platforms today can be retained.

Today's UEFI Variable Cache
--------------------------------------
* Maintained in SMRAM via VariableSmm.
* A "write-through" cache of variable data in the form of a UEFI variable store.
* Non-volatile and volatile variables are maintained in separate buffers (variable stores).

Runtime & SMM Cache Coherency
----------------------------------------------
The non-volatile cache should always accurately reflect non-volatile storage contents (done today) and the "SMM cache" and "Runtime cache" should always be coherent on access. The runtime cache is updated by VariableSmm.

Updating both caches from within a SMM SetVariable ( ) operation is fairly straightforward but a race condition can occur if an SMI occurs during the execution of runtime code reading from the runtime cache. To handle this case, a runtime cache read lock is introduced that explicitly moves pending updates from SMM to the runtime cache if an SMM update occurs while the runtime cache is locked. Note that is not expected a Runtime services call will interrupt SMM processing since all cores rendezvous in SMM.

New Key Elements for Coherence
---------------------------------------------
Runtime DXE (VariableSmmRuntimeDxe)
1. RuntimeCacheReadLock - A global lock used to lock read access to the runtime cache.
2. RuntimeCachePendingUpdate - A global flag used to notify runtime code of a pending cache update in SMM.

SMM (VariableSmm)
1. FlushRuntimeCachePendingUpdate SMI - A SW SMI handler that synchronizes the runtime cache buffer with the SMM cache buffer.

Proposed Runtime DXE Read Flow
----------------------------------------------
1. Wait for RuntimeCacheReadLock to be free
2. Acquire RuntimeCacheReadLock
3. If RuntimeCachePendingUpdate flag is set then:
3.a. Trigger FlushRuntimeCachePendingUpdate SMI
3.b. Verify RuntimeCachePendingUpdate flag is cleared
4. Perform read from RuntimeCache
5. Release RuntimeCacheReadLock

Proposed FlushRuntimeCachePendingUpdate SMI
-------------------------------------------------------------------
1. If RuntimeCachePendingUpdate flag is not set:
1.a. Return
2. Copy the data at RuntimeCachePendingOffset of RuntimeCachePendingLength to RuntimeCache
3. Clear the RuntimeCachePendingUpdate flag

Proposed SMM Write Flow
-------------------------------------
1. Perform variable authentication and non-volatile write. If either fail, return an error to the caller.
2. If RuntimeCacheReadLock is set then:
2.a. Set RuntimeCachePendingUpdate flag
2.b. Update RuntimeCachePendingOffset and RuntimeCachePendingLength to cover the a superset of the pending chunk (for simplicity, the entire variable store is currently synchronized).
3. Else:
3.a. Update RuntimeCache
4. Update SmmCache
- Note: RT read cannot occur during SMI processing since all cores are locked in SMM.

Join devel@edk2.groups.io to automatically receive all group messages.