Re: [PATCH EDK2 v2 1/1] SecurityPkg/DxeImageVerificationLib:Enhanced verification of Offset

Bret Barkelew <bret.barkelew@...>



I don’t completely agree with your second point. If the underlying issue is already out of embargo, and we’ve just failed to fix it, that’s not a new issue. I would want to see fixes to the fixes fast-tracked (or at least heavily prioritized), rather than re-entering a full embargo period.


- Bret


From: Yao, Jiewen via
Sent: Tuesday, August 18, 2020 6:12 AM
To:; lersek@...
Cc: xiewenyi2@...; Wang, Jian J; huangming23@...; songdongkuang@...; Marvin Häuser; Vitaly Cheptsov
Subject: [EXTERNAL] Re: [edk2-devel] [PATCH EDK2 v2 1/1] SecurityPkg/DxeImageVerificationLib:Enhanced verification of Offset


Hi Laszlo
I agree with on your most points.
It is all about *return of investment* or *risk control*. Like we cannot pursue 100% security because we will bankrupt ourselves if so.

Here I just raise my concern.
1) If we alway allow developers’ low quality patch without test, the overall quality will become lower and lower. Personally I have no confidence to help catch all those issues. You are the role model on code review. But not all people review the code like you. We need both expertise and time for code review.

2) I purposely separate security fix from non security one, because the process is different. The embargo might be 6 months. What if we found the security patch does not fix the problem after embargo? Unlike we send one more patch tomorrow, we need wait for another 6 months...

thank you!
Yao, Jiewen

> 2020818日,下午6:17Laszlo Ersek <lersek@...> 写道:
> Hi Jiewen,
> (+Marvin, +Vitaly)
> On 08/18/20 01:23, Yao, Jiewen wrote:
>>> -----Original Message-----
>>> From: <> On Behalf Of Laszlo
>>> Ersek
>>> Sent: Tuesday, August 18, 2020 12:53 AM
>>> To: Yao, Jiewen <jiewen.yao@...>;;
>>> xiewenyi2@...; Wang, Jian J <>
>>> Cc: huangming23@...; songdongkuang@...
>>> Subject: Re: [edk2-devel] [PATCH EDK2 v2 1/1]
>>> SecurityPkg/DxeImageVerificationLib:Enhanced verification of Offset
> [...]
>> However, I do think the producer is mandatory for a fix or at least a
>> security fix.
>> The owner to fix the issue should guarantee the patch is good.
>> The owner shall never rely on the code reviewer to figure out if the
>> patch is good and complete.
>> I have some bad experience that bug owner just wrote a patch and tried
>> to fix a problem, without any test.
>> And it happened passed code review from someone who does not well
>> understand the problem, but give rb based upon the time pressure.
>> Later, the fix was approved to be useless.
>> In my memory, at least 3 cases were security fix. They are found, just
>> because they are sensitive, more people took a look later.
>>    It was simple. It was one-line change.
>>   But it has not test, and it was wrong.
>> "It was ridiculous" -- commented by the people who find the so-called
>> security fix does not fix the issue.
> Just because sloppy/rushed reviews exist, and just because reviewers
> operate under time pressure, we should not automatically reject security
> fixes that come without a reproducer.
> Some organizations do develop reproducers, but they never share them
> publicly (for fear of abuse by others).
> But more importantly, in an open development project, a developer could
> have time and expertise to contribute a fix, but not to create a
> reproducer.
> - If we make contributing harder, fewer people will upstream their
>  fixes.
> - If we make contributing harder, then contributions that do make it to
>  the tree will be of higher quality.
> Both statements ring true to me -- so it's a tradeoff.
> (By "we", I mean the edk2 community.)
>>> Additionally, the exact statement that the bug report does make,
>>> namely
>>>  it's possible to overflow Offset back to 0 causing an endless loop
>>> is wrong (as far as I can tell anyway). It is not "OffSet" that can
>>> be overflowed to zero, but the *addend* that is added to OffSet can
>>> be overflowed to zero. Therefore the infinite loop will arise because
>>> OffSet remains stuck at its present value, and not because OffSet
>>> will be re-set to zero.
>>> For the reasons, we can only speculate as to what the actual problem
>>> is, unless Jian decides to join the discussion and clarifies what he
>>> had in mind originally.
>> [Jiewen] Would you please clarify what do you mean "we" here?
>> If "we" means the bug dispatcher, it is totally OK. The dispatcher
>> just assign the bug.
>> If "we" means the developer assigned to fix the bug, it is NOT OK. The
>> developer should take the responsibility to understand the problem.
> By "we", I mean the edk2 community.
>>> We can write a patch based on code analysis. It's possible to
>>> identify integer overflows based on code analysis, and it's possible
>>> to verify the correctness of fixes by code review. Obviously testing
>>> is always good, but many times, constructing reproducers for such
>>> issues that were found by code review, is difficult and time
>>> consuming. We can say that we don't fix vulnerabilities without
>>> reproducers, or we can say that we make an effort to fix them even if
>>> all we have is code analysis (and not a reproducer).
>> [Jiewen] I would say: yes and no.
>> Yes, I agree with you that it might be difficult and time consuming to
>> construct the reproducer.
>> However, "obviously" is a subject term. Someone may think something is
>> obvious, but other people does not.
>> We should be clear the responsibility of the patch provider is to
>> provide high quality patch.
>> Having basic unit test is the best way to prove that the fix is good.
>> I have seen bad cases when I ask for the test for patch, then the
>> answer I got is: "I test the windows boot".
>> But the test - windows boot - has nothing related to the patch. It
>> only proves no regression, but cannot prove the issue described is
>> resolved.
> Right. It would be ideal if every patch came with a unit test. But that
> also means some folks will contribute less.
> Consider normal (not security) patches. We require that all function
> return values be checked (unless it really doesn't matter if a function
> call fails). If a function call fails, we need to roll back the actions
> taken thus far. Release resources and so on. This is why we have the
> "cascade of error handling labels" pattern.
> But, of course, we don't test every possible error path in the code. So
> what's the solution there:
> - reject such patches that carefully construct the error paths, but do
>  not provide unit tests with complete error path coverage?
> - say that we don't care about thorough error paths, so let's just hang,
>  or leak resources, whenever something fails?
> Personally I prefer the detailed error paths. They need to be written
> and reviewed carefully. And they can be accepted even if they are not
> tested with complete coverage.
> Some people think otherwise; they say no untested (untestable) code
> should ever be merged.
> Back to security patches -- creating reproducers usually requires a
> setup (tools, expertise, time allocation etc) that is different from a
> "normal" setup. It may require specialized binary format editors,
> expertise in "penetration testing", and so on.
> I mostly know the C language rules that pertain to integer and buffer
> overflows, so I can usually spot their violations in C code, and propose
> fixes for them too. But I'm not a security researcher, so I don't write
> exploits as a norm -- I don't even investigate, generally speaking, the
> potential practical impact of "undefined behavior". When there's a
> buffer overflow or integer overflow in the code, that's the *end* of the
> story for me, while it's the *start* of the work for a security
> researcher.
> When you require reproducers for all security patches, you restrict the
> potential contributor pool to security researchers.
>> Let's think again in this case, if the patch provider does some basic
>> unit test, he/she may find out the problem by himself/herself.
>> That can save other people's time to review.
>> I don't prefer to move the responsibility from patch provider to the
>> code reviewer to check if the fix is good.
>> Otherwise, the code reviewer may be overwhelmed.
>> We may clarify and document the role and responsibility in EDKII
>> clearly. Once that is ready, we can follow the rule.
>> Before that is ready, in this particular case, I still prefer we have
>> producer to prove the patch is good.
> OK, thanks for explaining.
> Given that I'm unable to create such a PE file (from scratch or by
> modifying another one), I won't post the patches stand-alone.
>>> So the above paragraph concerns "correctness". Regarding
>>> "completeness", I guarantee you that this patch does not fix *all*
>>> problems related to PE parsing. (See the other BZ tickets.) It does
>>> fix *one* issue with PE parsing. We can say that we try to fix such
>>> issues gradually (give different CVE numbers to different issues, and
>>> address them one at a time), or we can say that we rewrite PE parsing
>>> from the ground up. (BTW: I have seriously attempted that in the
>>> past, and I gave up, because the PE format is FUBAR.)
>> [Jiewen] Maybe there is misunderstanding.
>> I do not mean to let patch provider to fix all issue in PE parsing.
>> Just like we cannot file one Bugzilla to fix all issue in EDKII - it
>> is unfair.
>> What I mean is that the patch provider should guarantee the
>> correctness and completeness of the issue in the bug report.
>> One faked bad example of correctness is:
>>    A bug report is file to say: the code has overflow class A.
>>    The factor is: the code has overflow class A at line X and line Y.
>>    The patch only modified some code at line X, but the overflow
>>    class A at line X still exists.
>> One faked bad example of completeness is:
>>    A bug report is file to say: the code has overflow class A.
>>    The factor is: the code has overflow class A at line X and line Y.
>>    The patch only fixed the overflow class A at line X but not line
>>    Y.
>> The patch provider should take responsibility to do that work
>> seriously to find out issue in line X and line Y and fix them.
>> He/she may choose to just fix line X and line Y. Rewrite is whole
>> module is NOT required.
> I agree completely.
> My point was that we need the bug report to be precise, in the first
> place. If the bug report doesn't clearly identify lines X and Y, we will
> likely not get the completeness part right.
> "Clearly identify" may mean spelling out lines X and Y specifically. Or
> it may mean defining "class A" sufficiently clearly that someone else
> reading the affected function can find X and Y themselves.
>> If I can give some comment, I would think about the provide the fix in
>> BasePeCoffLib.
> From a software design perspective, you are 100% right.
> Unfortunately, I can't do it.
> That's what I mentioned before -- I had tried rewriting BasePeCoffLib,
> because in my opinion, BasePeCoffLib is unsalvageable in its current
> form. And I gave up on the rewrite.
> Please see the following email. I sent it to some people off-list, on
> 2020-Feb-14:
>> There are currently four (4) TianoCore security BZs (1957, 1990, 1993,
>> 2215), embargoed, that describe various ways in which cunningly
>> crafted PE images can evade Secure Boot verification.
>> [...]
>> Primarily, I just couldn't find my peace with the idea that fixing
>> such PE/COFF parsing mistakes (integer overflows, buffer overflows)
>> *must* be a one-by-one fixing game. I wanted an approach that would
>> fix these *classes* of vulnerabilities, in PE/COFF parsing.
>> So, beginnning of this February I returned to this topic, and spent
>> two days on prototyping / researching a container / interval based
>> approach. Here's one of the commit messages, as a way of explaining:
>>    OvmfPkg/DxePeCoffValidatorLib: introduce CONTAINER type and helper funcs
>>    For validating the well-formedness of a PE/COFF file, introduce the
>>    CONTAINER type, and some workhorse functions. (The functions added in this
>>    patch will not be called directly from the code that will process PE/COFF
>>    structures.)
>>    The CONTAINER type describes a contiguous non-empty interval in a PE/COFF
>>    file (on-disk representation, or in-memory representation). Containers can
>>    be nested. The data from scalar-sized containers can be read out, as part
>>    of their creation. For on-disk representations of PE/COFF files, scalar
>>    reads are permitted; for in-memory representations, no data access is
>>    permitted (only CONTAINER tracking / nesting).
>>    The goals of CONTAINER are the following:
>>    - enforce the proper nesting of PE/COFF structures (structure boundaries
>>      must not be crossed by runs of data);
>>    - prevent integer overflows and buffer overflows;
>>    - prevent zero-size structures;
>>    - prevent infinite nesting by requiring proper sub-intervals;
>>    - prevent internal PE/COFF pointers from aliasing each other (unless they
>>      point at container and containee structures);
>>    - terminate nesting at scalar-sized containers;
>>    - assuming an array of pointers is processed in increasing element order,
>>      enforce that the pointed-to objects are located at increasing offsets
>>      too;
>>    - assign human-readable names to PE/COFF structures and fields, for
>>      debugging PE/COFF malformations.
>> Because, several of the vulnerabilities exploited cross-directed and
>> aliased internal pointers in PE/COFF files.
>> Two days of delirious spec reading and coding later, and 2000+ lines
>> later, I decided that my idea was unviable. The PE/COFF spec was so
>> incredibly mis-designed and crufty that enforcing the *internal*
>> consistency of all the size fields and the internal pointers would
>> inevitably fall into one of the following categories:
>> - the checks wouldn't be strict enough, and some nasty images would
>>  slip through,
>> - the checks would be too strict, and some quirky, but valid, images
>>  would be unjustifiedly caught.
>> So I gave up and I've accepted that it remains a whack-a-mole game.
>> [...]
>> (NB: I don't claim that ELF is not similarly brain-damaged.)
> So now, I've only considered contributing patches for bug#2215 because
> the code in question resides in DxeImageVerificationLib, and *not* in
> BasePeCoffLib. I'm not going to touch BasePeCoffLib -- in my opinion,
> BasePeCoffLib is unfixable without a complete rewrite.
> I would *like* if BasePeCoffLib were fixable incrementally, but I just
> don't see how that's possible.
> In support of my opinion, please open the following bugzilla ticket:
> and search the comments (with the browser's in-page search feature, such
> as Ctrl+F) for the following expression:
>  new PE loader
> I understand exactly what Vitaly and Marvin meant in those comments. :(
> Thanks,
> Laszlo


Join { to automatically receive all group messages.