Re: [edk2-devel] [edk2-rfc] GitHub Pull Request based Code Review Process


Nate DeSimone
 

Hi Sean,

My recent spelling fix patch series is a good example of why this is a bad idea actually:

https://edk2.groups.io/g/devel/message/59779
https://edk2.groups.io/g/devel/message/59780
https://edk2.groups.io/g/devel/message/59781
https://edk2.groups.io/g/devel/message/59782
https://edk2.groups.io/g/devel/message/59783
https://edk2.groups.io/g/devel/message/59784
https://edk2.groups.io/g/devel/message/59785

Notice that I split along package boundaries, because the maintainers for each package is a different set of people. If my patch series was squashed at merge time... how do I know who reviewed what? If the commit set is not correct.. I tend to say so in my feedback :). The only sane way to squash this series would be to have a human re-write all the commit messages, which I am against.

Generally those that prefer an easily bisectable history have such preference mostly due to the usage of validators that immediately resort bisecting as a method to root cause an issue since they tend to not understand the code very well. Edk2 already has 12 years of non-bisectable history, so this method is going to be ineffective anyway.

With regard to sending squashed commits, I understand that those who are new may have difficulty sending a properly formatted patch series, but frankly attempting to shield them from having to learn I am strongly against. I suggest that Microsoft invest in its human capital similar to how Intel does. If you cannot figure out how to send a properly formatted patch series... then do your work on the internal codebase (or perhaps MU.) Within the Intel, having the skillset to contribute to TianoCore is considered a mark of prestige, and thus needs to be earned.

TLDR, I will reject squashed commits on any packages that I maintain.

Thanks,
Nate

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Sean
Sent: Tuesday, May 19, 2020 9:54 AM
To: rfc@edk2.groups.io; Desimone, Nathaniel L
<nathaniel.l.desimone@...>; lersek@...;
bret.barkelew@...; devel@edk2.groups.io; Kinney, Michael D
<michael.d.kinney@...>
Subject: Re: [edk2-devel] [edk2-rfc] GitHub Pull Request based Code Review
Process

Nate/Laszlo,

Regarding a squash merge workflow. I agree it can be abused and we all
have seen terrible examples. But a patch series that contains 500+ file
changes isn't really much better. Just because it is broken into multiple
commits doesn't mean it is the right set of commits.

Anyway a squash merge workflow works amazingly well with and is
optimized for a web based review and PR processes. It allows a user to
respond to changes, fix issues, learn thru the PR process, all while keeping
complete track of the progression. Then once all "status"
checks and reviews are complete, it is squashed into a neat commit for
mainline, containing only the relevant data in the message.

So, the ask is that we don't exclude squash merge workflows. Those
reviewing the PR can decide what is appropriate for the PR content
submitted. Just as you would request changes to the contents (or
ordering) of a commit in a series, if the reviewers don't agree that the PR
contents should be in a single commit then obviously it shouldn't be
squashed to one.

Contributions like spelling fixes, typos, minor bug fixes, documentation
additions/fixes, etc all are great examples of PRs that can easily leverage
squash merges and this workflow significantly lowers the burden of the
contribution and review process. This workflow is also are much easier for
casual or first time contributors.

I don't exactly know how we would enable this but I assume we could
leverage tags or make it clear in the PR description. First step is to get
alignment that a squash merge workflow, while not appropriate for all
contributions, is not something to be excluded.

Thanks
Sean



On 5/19/2020 12:21 AM, Nate DeSimone wrote:
Hi All,



I tend to agree with most of Laszlo's points. Specifically, that moving to pull
requests will not fix the fact that maintainers are usually busy people and
don't always give feedback in a punctual manner. Like Laszlo, I would also
prefer that we do not squash patch series. My biggest reason for not
squashing patch series is because when you put everything into a single
commit, I have had to review commits with 500+ files changed. Opening git
difftool on a commit like that is awful.



However, I would like to register my general endorsement for pull requests
or some other web based system of code review… and I don’t have an
Instagram account by the way :) Personally, I prefer Gerrit as I use it a lot with
coreboot and other projects. But since we are using Github for hosting, pull
requests are an easy switch and a logical choice. My main reason for being
excited about pull requests mostly has to do with the amount of manual
effort required to be a TianoCore maintainer right now. I have set up my
email filter so that the mailing list is categorized like so:



[cid:image001.png@...]



Implementing the logic to parse the contents of emails to categorize them
like this required me to define no less than 12 email filter rules in Microsoft
Outlook, and I have to change my filtering logic every time I am
added/removed from a Maintainers.txt file. I’m sure every other maintainer
has spent a time separately implementing filtering logic like I have. This helps,
but still for every thread, I have to go and check if one of the other
maintainers has already reviewed/pushed that patch series yet, and if not
review/push it. If I have ] feedback on a patch series, I have to categorize it
as awaiting response from author and check up on it from time to time,
sometimes I ping the author directly and remind them to send a new patch
series. Implementing this state machine is a lot of manual work and it kind of
feels like I’m a telephone operator in the 1950s. I greatly welcome
automation here as I am sure it will increase the number of patch series I am
able to review per hour.



Thanks,

Nate



-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Laszlo
Ersek
Sent: Friday, May 15, 2020 2:08 AM
To: rfc@edk2.groups.io; bret.barkelew@...;
devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@...>
Subject: Re: [EXTERNAL] Re: [edk2-devel] [edk2-rfc] GitHub Pull
Request based Code Review Process



On 05/15/20 06:49, Bret Barkelew via groups.io wrote:



I would far prefer the approach of individual PRs for commits to
allow
for the squash flexibility (and is the strategy I think I would
pursue
with my PRs). For example, the VarPol PR would be broken up into 9
PRs
for each final commit, and we can get them in one by one.
Ideally, each one would be a small back and forth and then in. If it
had been done that way to begin with, it would be over in a week and
a
half or so, rather than the multiple months that we’re now verging
on.


This differs extremely from how we've been working on edk2-devel (or
from how any git-based project works that I've ever been involved with).

And I think the above workflow is out of scope, for migrating the edk2
process to github.



Again, the structuring of a patch series is a primary trait. Iterating only on
individual patches does not allow for the reordering / restructuring of the
patch series (dropping patches, reordering patches, inserting patches,
moving hunks between patches).



It's common that the necessity to revise an earlier patch emerges while
reworking a later patch. For instance, the git-rebase(1) manual dedicates a
separate section to "splitting commits".



In the initial evaluation of "web forges", Phabricator was one of the
"contestants". Phabricator didn't support the "patch series" concept at all, it
only supported review requests for individual patches, and it supported
setting up dependencies between them. So, for example, a 27-patch series
would require 27 submissions and 26 dependencies.



Lacking support for the patch series concept was an immediate deal
breaker with Phabricator.



The longest patch series I've ever submitted to edk2-devel had 58 patches.
It was SMM enablement for OVMF. It went from v1 to v5 (v5 was merged),
and the patch count varied significantly:



v1: 58 patches (25 Jul 2015)

v2: 41 patches ( 9 Oct 2015)

v3: 52 patches (15 Oct 2015)

v4: 41 patches ( 3 Nov 2015)

v5: 33 patches (27 Nov 2015)



(The significant drop in the patch count was due to Mike Kinney open
sourcing and upstreaming the *real* PiSmmCpuDxeSmm driver (which was
huge work in its own right), allowing me to drop the Quark-originated
32-bit-only PiSmmCpuDxeSmm variant, from my series.)



The contribution process should make difficult things possible, even if that
complicates simple things somewhat. A process that makes simple things
simple and difficult things impossible is useless. This is what the Instagram
generation seems to be missing.





I don't know why the VariablePolicy work took months. I can see the
following threads on the list:



* [edk2-devel] [PATCH v1 0/9] Add the VariablePolicy feature

Fri, 10 Apr 2020 11:36:01 -0700



* [edk2-devel] [PATCH v2 00/12] Add the VariablePolicy feature

Mon, 11 May 2020 23:46:23 -0700



I have two sets of comments:



(1) It's difficult to tell in retrospect (because the series seem to have been
posted with somewhat problematic threading), but the delay apparently
came from multiple sources.



(1a) Review was slow and spotty.



The v1 blurb received some comments in the first week after it was posted.
But the rest of the v1 series (the actual patches) received feedback like this:



- v1 1/9: no feedback

- v1 2/9: 12 days after posting

- v1 3/9: 16 days after posting

- v1 4/9: no feedback

- v1 5/9: no feedback

- v1 6/9: no feedback

- v1 7/9: no feedback

- v1 8/9: no feedback

- v1 9/9: no feedback



(1b) There was also quite some time between the last response in the v1
thread (Apr 26th, as far as I can see), and the posting of the v2 series (May
11th).



(1c) The v2 blurb got almost immediate, and numerous feedback (on the
day of posting, and the day after). Regarding the individual patches, they
didn't fare too well:



- v2 01/12: superficial comment on the day of posting from me (not a

designated MdeModulePkg review), on the day of posting;
no

other feedback thus far

- v2 02/12: ditto

- v2 03/12: no feedback

- v2 04/12: superficial (coding style) comments on the day of posting

- v2 05/12: no feedback

- v2 06/12: no feedback

- v2 07/12: no feedback

- v2 08/12: no feedback

- v2 09/12: no feedback

- v2 10/12: no feedback

- v2 11/12: reasonably in-depth review from responsible co-maintainer

(yours truly), on the day of posting

- v2 12/12: no feedback



In total, I don't think the current process takes the blame for the delay. If
reviewers don't care (or have no time) now, that problem will not change
with the transition to github.com.





(2) The VariablePolicy series is actually a good example that patch series
restructuring is important.



(2a) The patch count went from 9 (in v1) to 12 (in v2).



(2b) And under v2, Liming still pointed out: "To keep each commit build
pass, the patch set should first add new library instance, then add the library
instance into each platform DSC, last update Variable driver to consume new
library instance."



Furthermore, I requested enabling the feature in ArmVirtPkg too, and
maybe (based on owner feedback) UefiPayloadPkg.



Thus, the v2->v3 update will most likely bring about both patch order
changes, and an increased patch count.



Thanks

Laszlo










Join rfc@edk2.groups.io to automatically receive all group messages.