Re: Git commit message RFC


Michael D Kinney
 

Hi Artem,

We discussed this topic briefly in the EDK II Community meeting this morning With Felix.

For the tags topic, I believe the feature being asked for here is a simple way to
identify and search for commits that have one or more of the tag attributes.

These tags do not have to be in the subject line to find them easily. Laszlo's suggestion
to use a Name: Value pair in the commit body can support this type of search using
standard git commands.

For example, the git log command supports the --grep flag to search for contents of the
commit messages that match a specific pattern. For example, if you want to find all the
commits that resolve a CVE issue from 2019, I can use the following command:

c:\work\GitHub\tianocore\edk2>git log --oneline --grep=CVE-2019
ffde22468e SecurityPkg/TcgPei: Use Migrated FV Info Hob for calculating hash (CVE-2019-11098)
d7c9de51d2 UefiCpuPkg/CpuMpPei: Enable paging and set NP flag to avoid TOCTOU (CVE-2019-11098)
012809cdca SecurityPkg/Tcg2Pei: Use Migrated FV Info Hob for calculating hash (CVE-2019-11098)
4b68cef04c MdeModulePkg/Core: Create Migrated FV Info Hob for calculating hash (CVE-2019-11098)
479613bd06 UefiCpuPkg/SecMigrationPei: Add initial PEIM (CVE-2019-11098)
60b12e69fb UefiCpuPkg/CpuMpPei: Add GDT migration support (CVE-2019-11098)
9bedaec05b MdeModulePkg/PeiCore: Enable T-RAM evacuation in PeiCore (CVE-2019-11098)
1facb8fdef MdeModulePkg: Add new PCD to control the evacuate temporary memory feature (CVE-2019-11098)
1d3215fd24 NetworkPkg/ArpDxe: Recycle invalid ARP packets (CVE-2019-14559)
c230c002ac SecurityPkg/DxeImageVerificationLib: change IsCertHashFoundInDatabase name (CVE-2019-14575)
b1c1147059 SecurityPkg/DxeImageVerificationLib: Differentiate error/search result (2) (CVE-2019-14575)
cb30c8f251 SecurityPkg/DxeImageVerificationLib: plug Data leak in IsForbiddenByDbx() (CVE-2019-14575)
5cd8be6079 SecurityPkg/DxeImageVerificationLib: tighten default result (CVE-2019-14575)
a83dbf008c SecurityPkg/DxeImageVerificationLib: Differentiate error/search result (1) (CVE-2019-14575)
adc6898366 SecurityPkg/DxeImageVerificationLib: refactor db/dbx fetching code (CVE-2019-14575)
929d1a24d1 SecurityPkg/DxeImageVerificationLib: avoid bypass in fetching dbx (CVE-2019-14575)
9e56970090 SecurityPkg/DxeImageVerificationLib: fix wrong fetch dbx in IsAllowedByDb (CVE-2019-14575)
c13742b180 SecurityPkg/DxeImageVerificationLib: reject CertStack.CertNumber==0 per DBX (CVE-2019-14575)
fbb9607223 SecurityPkg/DxeImageVerificationLib: Fix memory leaks (CVE-2019-14575)
578bcdc260 NetworkPkg/Ip4Dxe: Check the received package length (CVE-2019-14559).
e36d5ac7d1 MdeModulePkg/SdMmcPciHcDxe: Fix double PciIo Unmap in TRB creation (CVE-2019-14587)
f1d78c489a MdeModulePkg/DisplayEngine: Zero memory before free (CVE-2019-14558)
764e8ba138 MdeModulePkg/String.c: Zero memory before free (CVE-2019-14558)
c32be82e99 MdeModulePkg/HiiDB: Remove configuration table when it's freed (CVE-2019-14586)
322ac05f8b MdeModulePkg/PiDxeS3BootScriptLib: Fix potential numeric truncation (CVE-2019-14563)
e2fc508128 NetworkPkg/HttpDxe: Set the HostName for the verification (CVE-2019-14553)
703e7ab21f NetworkPkg/TlsDxe: Add the support of host validation to TlsDxe driver (CVE-2019-14553)
1e72b1fb2e CryptoPkg/TlsLib: TlsSetVerifyHost: parse IP address literals as such (CVE-2019-14553)
8d16ef8269 CryptoPkg/Crt: import "inet_pton.c" (CVE-2019-14553)
2ac41c12c0 CryptoPkg/Crt: satisfy "inet_pton.c" dependencies (CVE-2019-14553)
eb520d94db CryptoPkg/Crt: turn strchr() into a function (CVE-2019-14553)
2ca74e1a17 CryptoPkg/TlsLib: Add the new API "TlsSetVerifyHost" (CVE-2019-14553)
31efec8279 MdePkg/Include/Protocol/Tls.h: Add the data type of EfiTlsVerifyHost (CVE-2019-14553)
b26691c471 MdeModulePkg RegularExpressionDxe: Update Oniguruma from v6.9.0 to v6.9.3

Notice that the last one does not have CVE in the subject line. It is in the body and the
git log command found that one too:

c:\work\GitHub\tianocore\edk2>git show b26691c471
commit b26691c47188ce255b8a4d920bf07ddf1431e2cd
Author: Liming Gao <liming.gao@intel.com>
Date: Thu Aug 8 19:53:03 2019 +0800

MdeModulePkg RegularExpressionDxe: Update Oniguruma from v6.9.0 to v6.9.3

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2066
Update Oniguruma to the latest version v6.9.3.
Oniguruma https://github.com/kkos/oniguruma
This release is the security fix release. It includes the changes:
Fixed CVE-2019-13224
Fixed CVE-2019-13225
Fixed many problems (found by libfuzzer programs)

Verify VS2015, GCC5 build.
Verify RegularExpressionProtocol GetInfo() and Match() function.

Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Hao A Wu <hao.a.wu@intel.com>
Cc: Cinnamon Shia <cinnamon.shia@hpe.com>
Signed-off-by: Liming Gao <liming.gao@intel.com>
Reviewed-by: Hao A Wu <hao.a.wu@intel.com>

The main requirement is to make sure the Name: Value tag line for the nature of the
commit is unique enough so a grep regular expression will not falsely match a different
commit. I like the **<type name>** syntax to help with uniqueness. We can even
define a Name: Value tag syntax that supports a list of types if a single commit is
doing more than one type of change

TYPE: **fix** | **feat** | **imp** | **doc** | **style** | **chore**

To search for all commits that begin the line with TYPE: and have **fix** in the same
line, you can use the following command:

git log --oneline --extended-regexp --grep=^TYPE:.*\*\*fix\*\*

To search for all commits that begin the line with TYPE: and have either **fix** OR
**feat** in the same line, you can use the following command:

git log --oneline --extended-regexp --grep=^TYPE:.*\*\*fix\*\* --grep=^TYPE:.*\*\*feat\*\*

To search for all commits that begin the line with TYPE: and have **fix** AND
**feat** in the same line or across multiple TYPE: lines, you can use the following command:

git log --oneline --extended-regexp --all-match --grep=^TYPE:.*\*\*fix\*\* --grep=^TYPE:.*\*\*feat\*\*

As a double check to avoid false positives, I did a search of the entire git log
history for all commits that contain **<any word>**. It hit on 2 commits that
contained ***To*** and ***On***.

c:\work\GitHub\tianocore\edk2>git log --extended-regexp --grep=\*\*[a-zA-Z]+\*\*
commit 186bc15553cb1a2dcfcc2e60091b82d4870a657b
Author: qwang12 <qwang12@6f19259b-4bc3-4df7-8a09-765794883524>
Date: Wed Jan 21 09:48:54 2009 +0000

Rename module name from ***To*** to ***On***. AAAOnBBB means this module produce AAA Protocol/PPI based on BBB. This change improves the readability.

git-svn-id: https://edk2.svn.sourceforge.net/svnroot/edk2/trunk/edk2@7329 6f19259b-4bc3-4df7-8a09-765794883524

commit 4c9c0f719df00f211790eadf04d21a93a0cdf76c
Author: qwang12 <qwang12@6f19259b-4bc3-4df7-8a09-765794883524>
Date: Wed Jan 21 09:47:43 2009 +0000

Rename module name from ***To*** to ***On***. AAAOnBBB means this module produce AAA Protocol/PPI based on BBB. This change improves the readability.

git-svn-id: https://edk2.svn.sourceforge.net/svnroot/edk2/trunk/edk2@7328 6f19259b-4bc3-4df7-8a09-765794883524

Best regards,

Mike

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Artem Shchygel
Sent: Monday, August 3, 2020 1:24 PM
To: Laszlo Ersek <lersek@redhat.com>; rfc@edk2.groups.io
Subject: Re: [edk2-rfc] Git commit message RFC

On Sat, Aug 1, 2020 at 03:47 PM, Laszlo Ersek wrote:


On 08/01/20 00:19, Artem Shchygel wrote:
Hi, All

In this RFC we're proposing slight changes to Git commit message format to
make it more informative and more friendly for automation tools.

Here is the list of changes proposed

1. Start subject line with the tag that describes nature of the commit. List
of tags is as follows:
- **fix:** For commits that are fixing the bug
- **feat:** For commits that introduce a new feature
- **imp:** For commits that introduce improvement of the existing code
logic without changing functionality
- **doc:** For commits that don't change code logic but deal with
documentation (including adding comments, fix spelling, etc.)
- **style:** For commits that don't change code logic but deal with coding
style (indents, whitespaces, etc.)
- **chore:** For commits that don't affect output target (like changes to
CI scripts)
Having tagged commits will help closed source maintainers to decide which
commits to cherry-pick for projects that are in "maintenance mode" as opposite
to "active development mode"

Hints are very welcome, but not in subject lines. Please use

Name: value
Name: value # comment

style tags near the end of the commit messages instead.
Here are the reasons I believe subject line tags should be in subject line:
1. It is expected for those tags to be mandatory. This requirement will force committer to think one more time about what
commit changes are about. This may (hopefully) lead to more structured commits, where fixes, improvements, features and
style changes are not mixed together. But when you try to make something mandatory it's better to minimize the
friction/cognitive load for actually doing the thing. From this point of view short tag in subject line is easier to add
than separate line with "Name: value" in the commit body, which for trivial changes may even not be there.
2. I believe these tags, while certainly be machine-friendly still have some human-readable value. For example we can
easily say that commits marked as doc/style/chore have no regression impact even if the commit itself is pretty large. We
can also assume that "fix" commits will be smaller and easier to merge/backport than "feat" or "imp" commits.

Subject lines are primarily for human readers. And because edk2 has long
filenames, and we're supposed to include package names and component
names in subject lines, hardly any space is left for actual meaningful
subjects. CVE identifiers are the only exceptions (they *should* be
mentioned in subjects lines; and PatchCheck.py already permits more
characters in subjects that contain CVE identifiers).

2. Remove "Package/Module" reference from subject line. Since subject line
length is limited it's better to be allocated for commit description which is
more important

I'm opposed to this. "Package/Module" is the absolute key by which I
orient myself for determining impact / maintainership relevance /
regression risk. I regularly browse the git history for new commits, and
I entirely orient myself after the Package/Module prefixes.
As you've said above adding "Package/Module" description leave almost no space for meaningful subjects, practically
rendering them useless. Add to that the fact that this information is redundant in the sense, that it's already included
in changed files paths. Now I'm obviously don't know actual workflow maintainers use to perform their tasks, so it's hard
to come up with replacements. But filtering commits changing files in folders of the particular interest can be easily
done with git log command if you work with actual repo. If you look for particular folders in email inbox I think there is
the way to have email subject include "Package/Module" reference, while not having it in the commit subject line

3. Remove "CVE" number from subject line for the same reason. CVE number (if
present) should be placed on separate line after long description (see example
below)

I agree that mentioning CVE IDs in subject lines is an exceptions. I
could let go of that, if contributors were *very* disciplined about
stating CVE IDs properly in commit messages.
I think including CVE reference (which you probably can't easily type right off your head, but have to look it up) takes
the same amount of discipline regardless of whether it's included in the subject line or on the separate line in commit's
body
4. Add optional tag to the long description. List of tags is as follows:
- **[BREAKING CHANGE]:** For commits that break backward compatibility
- **[SECURITY FIX]:** For commits that fix known security vulnerability
Explicitly stating such traits is very welcome (even if not with this
specific format, perhaps) in commit message bodies.
The format of optional tag is flexible, any suggestions are welcome

5. Move bugzilla reference to separate line after long description and
(possible) CVE line (see example below). This move follows the logic of
evaluation of commit nature (read subject, if unclear read long description,
if still unclear see bugzilla)

Bugzilla numbers should already be stated as:

Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=2129

or

Link: https://bugzilla.tianocore.org/show_bug.cgi?id=2129

or

Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=2129

tags.
We're not proposing format changes for bugzilla references, just their placement in the commit body. Instead of being in
first lines we propose to move them to the last lines


An example would be:
```
imp: Short one line description of change

Several lines of
description for the
change.

REF:https://bugzilla.tianocore.org/show_bug.cgi?id=1000

Signed-off-by: Contributor Name <contributor@email.server>
Reviewed-by: Reviewer Name <reviewer@reviewer-email.server>
```

A CVE example would be:
```
fix: Short one line description of change

[SECURITY FIX]: Several lines of
description for the
change.

CVE-2018-12180

REF:https://bugzilla.tianocore.org/show_bug.cgi?id=2000

Signed-off-by: Contributor Name <contributor@email.server>
Reviewed-by: Reviewer Name <reviewer@reviewer-email.server>
```
I entirely support including as much information as possible in commit
messages; if we can do that in machine readable format, that's even
better. But subject lines are not the place; the commit message bodies are.

I recommend the "Name: value" and "Name: value # comment" formats, but
don't insist on those.

What really matters to me is that we don't litter subject lines with
machine-readable artifacts.

Obviously: this is just my personal opinion.

Thanks
Laszlo
Thank you for taking your time to review our proposal. Your comments allowed me to express our intentions more clearly

Artem

Join rfc@edk2.groups.io to automatically receive all group messages.