Re: [PATCH] UefiCpuPkg/MpInitLib: Allocate a separate SEV-ES AP reset stack area


Laszlo Ersek
 

On 05/17/21 00:15, Marvin Häuser wrote:
Am 16.05.2021 um 03:17 schrieb Laszlo Ersek:
On 05/14/21 17:44, Marvin Häuser wrote:
On 14.05.21 17:23, Lendacky, Thomas wrote:
On 5/14/21 10:04 AM, Marvin Häuser wrote:
+      // Check to be sure that the "allocate below" behavior hasn't
changed.
+      // This will also catch a failed allocation, as "-1" is
returned on
+      // failure.
+      //
+      if (CpuMpData->SevEsAPResetStackStart >=
CpuMpData->WakeupBuffer) {
+        DEBUG ((DEBUG_ERROR,
+          "SEV-ES AP reset stack is not below wakeup buffer\n"));
+
+        ASSERT (FALSE);
Should the ASSERT not only catch the broken "allocate below"
behaviour,
i.e. not trigger on failed allocation?
I think it's best to trigger on a failed allocation as well rather than
continuing and allowing a page fault or some other problem to occur.
Well, it should handle the error in a safe way, i.e. the deadloop below.
To not ASSERT on plausible conditions is a common design guideline in
most low-level projects including Linux kernel.

Best regards,
Marvin

Thanks,
Tom

+        CpuDeadLoop ();
"DEBUG + ASSERT(FALSE) + CpuDeadLoop()" is a pattern in edk2.

In RELEASE builds, it will lead to a CpuDeadLoop(). That's the main goal
-- don't continue execution if the condition controlling the whole block
fired.

In DEBUG and NOOPT builds, the pattern will lead to a debug message
(usually at the "error" level), followed by an assertion failure. The
error message of the assertion failure is irrelevant ("FALSE"). The
point of adding ASSERT ahead of CpuDeadLoop() is that the way ASSERT
hangs execution is customizable, via "PcdDebugPropertyMask", unlike
CpuDeadLoop(). In many cases, ASSERT() uses CpuDeadLoop() itself, so the
effect is the same -- the explicit CpuDeadLoop is not reached. In other
configs, ASSERT() can raise a debug exception (CpuBreakpoint()).
I absolutely do not *expect* Tom to change this, it was just a slight
remark (as many places have this anyway). I'll still try to explain why
I made that remark, but for whom it is of no interest, I do not expect
it to be read. I'm fine with the patch as-is myself. Thank you a lot, Tom!



I know it, unfortunately, is a pattern in EDK II - taking this pattern
too far is what caused the 8-revision patch regarding untrusted inputs
we submitted previously. :)

There are many concerns about unconventional ASSERTs, though I must
admit none but one (and that one barely) really apply here, which is why
I have trouble explaining why I believe it should be changed. Here are
some reasons outside the context of this patch:

1) Consistency between DEBUG and RELEASE builds: I think one can justify
to have a breakpoint on a condition that may realistically occur. But a
deadloop can give a wrong impression about how production code works.
E.g. it also is a common pattern in EDK II to ASSERT on memory
allocation failure but *not* have a proper check after, so DEBUG builds
will nicely error or deadloop, while RELEASE goes ahead and causes a CPU
exception or memory corruption depending on the context. Thus,
real-world error handling cannot really be tested. This does not apply
because there *is* a RELEASE deadloop.

2) Static analysis: Some static analysers use ASSERT information for
their own analysis, and try to give hints about unsafe or unreachable
code based on own annotations. This kind of applies, but only when
substituting EDK II ASSERT with properly recognisable ASSERTs (e.g.
__builtin_unreachable).

2) Dynamic analysis: ASSERTs can be useful when fuzzing for example.
Enabled Sanitizers will only catch unsafe behaviour, but maybe you have
some extra code in place to sanity-check the results further. An ASSERT
yields an error dump (usually followed by the worker dying). However, as
allocation failures are perfectly expected, this can cause a dramatic
about of False Positives and testing interruption. This does not apply
because deadloop'd code cannot really be fuzz-tested anyway.

ASSERTs really are designed as unbreakable conditions, i.e. 1)
preconditions 2) invariants 3) postconditions. No allocator in early
kernel-space or lower can really guarantee allocation success, thus it
cannot be a postcondition of any such function. And while it might make
debugging look a bit easier (but you will see from the backtrace anyway
where you halted), it messes with all tools that assume proper usage.

Also, I just realised, you can of course see it from the address value
when debugging, but you cannot see it from the ASSERT or DEBUG message
*which* of the two logical error conditions failed (i.e. broken
allocator or OOM). Changing the ASSERT would fix that. :)
I'm OK if the ASSERT() is dropped; from my perspective it's really just
a small convenience / developer/debugging aid. We still have the debug
message and the explicit deadloop.

Thanks
Laszlo

Join devel@edk2.groups.io to automatically receive all group messages.