Git commit message RFC
Artem Shchygel
Hi, All In this RFC we're proposing slight changes to Git commit message format to make it more informative and more friendly for automation tools. Here is the list of changes proposed
An example would be:
A CVE example would be:
|
|
Laszlo Ersek
On 08/01/20 00:19, Artem Shchygel wrote:
Hi, AllHints are very welcome, but not in subject lines. Please use Name: value Name: value # comment style tags near the end of the commit messages instead. Subject lines are primarily for human readers. And because edk2 has long filenames, and we're supposed to include package names and component names in subject lines, hardly any space is left for actual meaningful subjects. CVE identifiers are the only exceptions (they *should* be mentioned in subjects lines; and PatchCheck.py already permits more characters in subjects that contain CVE identifiers). 2. Remove "Package/Module" reference from subject line. Since subject line length is limited it's better to be allocated for commit description which is more importantI'm opposed to this. "Package/Module" is the absolute key by which I orient myself for determining impact / maintainership relevance / regression risk. I regularly browse the git history for new commits, and I entirely orient myself after the Package/Module prefixes. 3. Remove "CVE" number from subject line for the same reason. CVE number (if present) should be placed on separate line after long description (see example below)I agree that mentioning CVE IDs in subject lines is an exceptions. I could let go of that, if contributors were *very* disciplined about stating CVE IDs properly in commit messages. 4. Add optional tag to the long description. List of tags is as follows:Explicitly stating such traits is very welcome (even if not with this specific format, perhaps) in commit message bodies. Bugzilla numbers should already be stated as: Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=2129 or Link: https://bugzilla.tianocore.org/show_bug.cgi?id=2129 or Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=2129 tags. I entirely support including as much information as possible in commit messages; if we can do that in machine readable format, that's even better. But subject lines are not the place; the commit message bodies are. I recommend the "Name: value" and "Name: value # comment" formats, but don't insist on those. What really matters to me is that we don't litter subject lines with machine-readable artifacts. Obviously: this is just my personal opinion. Thanks Laszlo |
|
Artem Shchygel
On Sat, Aug 1, 2020 at 03:47 PM, Laszlo Ersek wrote:
Here are the reasons I believe subject line tags should be in subject line: 1. It is expected for those tags to be mandatory. This requirement will force committer to think one more time about what commit changes are about. This may (hopefully) lead to more structured commits, where fixes, improvements, features and style changes are not mixed together. But when you try to make something mandatory it's better to minimize the friction/cognitive load for actually doing the thing. From this point of view short tag in subject line is easier to add than separate line with "Name: value" in the commit body, which for trivial changes may even not be there. 2. I believe these tags, while certainly be machine-friendly still have some human-readable value. For example we can easily say that commits marked as doc/style/chore have no regression impact even if the commit itself is pretty large. We can also assume that "fix" commits will be smaller and easier to merge/backport than "feat" or "imp" commits. > > Subject lines are primarily for human readers. And because edk2 has long > filenames, and we're supposed to include package names and component > names in subject lines, hardly any space is left for actual meaningful > subjects. CVE identifiers are the only exceptions (they should be > mentioned in subjects lines; and PatchCheck.py already permits more > characters in subjects that contain CVE identifiers). > > > 2. Remove "Package/Module" reference from subject line. Since subject line > length is limited it's better to be allocated for commit description which is > more important > > I'm opposed to this. "Package/Module" is the absolute key by which I > orient myself for determining impact / maintainership relevance / > regression risk. I regularly browse the git history for new commits, and > I entirely orient myself after the Package/Module prefixes. As you've said above adding "Package/Module" description leave almost no space for meaningful subjects, practically rendering them useless. Add to that the fact that this information is redundant in the sense, that it's already included in changed files paths. Now I'm obviously don't know actual workflow maintainers use to perform their tasks, so it's hard to come up with replacements. But filtering commits changing files in folders of the particular interest can be easily done with git log command if you work with actual repo. If you look for particular folders in email inbox I think there is the way to have email subject include "Package/Module" reference, while not having it in the commit subject line > > > 3. Remove "CVE" number from subject line for the same reason. CVE number (if > present) should be placed on separate line after long description (see example > below) > > I agree that mentioning CVE IDs in subject lines is an exceptions. I > could let go of that, if contributors were very disciplined about > stating CVE IDs properly in commit messages. > I think including CVE reference (which you probably can't easily type right off your head, but have to look it up) takes the same amount of discipline regardless of whether it's included in the subject line or on the separate line in commit's body > > 4. Add optional tag to the long description. List of tags is as follows: > > - [BREAKING CHANGE]: For commits that break backward compatibility > > - [SECURITY FIX]: For commits that fix known security vulnerability > > Explicitly stating such traits is very welcome (even if not with this > specific format, perhaps) in commit message bodies. The format of optional tag is flexible, any suggestions are welcome > > > > 5. Move bugzilla reference to separate line after long description and > (possible) CVE line (see example below). This move follows the logic of > evaluation of commit nature (read subject, if unclear read long description, > if still unclear see bugzilla) > > Bugzilla numbers should already be stated as: > > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=2129 > > or > > Link: https://bugzilla.tianocore.org/show_bug.cgi?id=2129 > > or > > Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=2129 > > tags. > We're not proposing format changes for bugzilla references, just their placement in the commit body. Instead of being in first lines we propose to move them to the last lines
>
> >
> > An example would be:
> > Thank you for taking your time to review our proposal. Your comments allowed me to express our intentions more clearly Artem > > |
|
Michael D Kinney
Hi Artem,
toggle quoted message
Show quoted text
We discussed this topic briefly in the EDK II Community meeting this morning With Felix. For the tags topic, I believe the feature being asked for here is a simple way to identify and search for commits that have one or more of the tag attributes. These tags do not have to be in the subject line to find them easily. Laszlo's suggestion to use a Name: Value pair in the commit body can support this type of search using standard git commands. For example, the git log command supports the --grep flag to search for contents of the commit messages that match a specific pattern. For example, if you want to find all the commits that resolve a CVE issue from 2019, I can use the following command: c:\work\GitHub\tianocore\edk2>git log --oneline --grep=CVE-2019 ffde22468e SecurityPkg/TcgPei: Use Migrated FV Info Hob for calculating hash (CVE-2019-11098) d7c9de51d2 UefiCpuPkg/CpuMpPei: Enable paging and set NP flag to avoid TOCTOU (CVE-2019-11098) 012809cdca SecurityPkg/Tcg2Pei: Use Migrated FV Info Hob for calculating hash (CVE-2019-11098) 4b68cef04c MdeModulePkg/Core: Create Migrated FV Info Hob for calculating hash (CVE-2019-11098) 479613bd06 UefiCpuPkg/SecMigrationPei: Add initial PEIM (CVE-2019-11098) 60b12e69fb UefiCpuPkg/CpuMpPei: Add GDT migration support (CVE-2019-11098) 9bedaec05b MdeModulePkg/PeiCore: Enable T-RAM evacuation in PeiCore (CVE-2019-11098) 1facb8fdef MdeModulePkg: Add new PCD to control the evacuate temporary memory feature (CVE-2019-11098) 1d3215fd24 NetworkPkg/ArpDxe: Recycle invalid ARP packets (CVE-2019-14559) c230c002ac SecurityPkg/DxeImageVerificationLib: change IsCertHashFoundInDatabase name (CVE-2019-14575) b1c1147059 SecurityPkg/DxeImageVerificationLib: Differentiate error/search result (2) (CVE-2019-14575) cb30c8f251 SecurityPkg/DxeImageVerificationLib: plug Data leak in IsForbiddenByDbx() (CVE-2019-14575) 5cd8be6079 SecurityPkg/DxeImageVerificationLib: tighten default result (CVE-2019-14575) a83dbf008c SecurityPkg/DxeImageVerificationLib: Differentiate error/search result (1) (CVE-2019-14575) adc6898366 SecurityPkg/DxeImageVerificationLib: refactor db/dbx fetching code (CVE-2019-14575) 929d1a24d1 SecurityPkg/DxeImageVerificationLib: avoid bypass in fetching dbx (CVE-2019-14575) 9e56970090 SecurityPkg/DxeImageVerificationLib: fix wrong fetch dbx in IsAllowedByDb (CVE-2019-14575) c13742b180 SecurityPkg/DxeImageVerificationLib: reject CertStack.CertNumber==0 per DBX (CVE-2019-14575) fbb9607223 SecurityPkg/DxeImageVerificationLib: Fix memory leaks (CVE-2019-14575) 578bcdc260 NetworkPkg/Ip4Dxe: Check the received package length (CVE-2019-14559). e36d5ac7d1 MdeModulePkg/SdMmcPciHcDxe: Fix double PciIo Unmap in TRB creation (CVE-2019-14587) f1d78c489a MdeModulePkg/DisplayEngine: Zero memory before free (CVE-2019-14558) 764e8ba138 MdeModulePkg/String.c: Zero memory before free (CVE-2019-14558) c32be82e99 MdeModulePkg/HiiDB: Remove configuration table when it's freed (CVE-2019-14586) 322ac05f8b MdeModulePkg/PiDxeS3BootScriptLib: Fix potential numeric truncation (CVE-2019-14563) e2fc508128 NetworkPkg/HttpDxe: Set the HostName for the verification (CVE-2019-14553) 703e7ab21f NetworkPkg/TlsDxe: Add the support of host validation to TlsDxe driver (CVE-2019-14553) 1e72b1fb2e CryptoPkg/TlsLib: TlsSetVerifyHost: parse IP address literals as such (CVE-2019-14553) 8d16ef8269 CryptoPkg/Crt: import "inet_pton.c" (CVE-2019-14553) 2ac41c12c0 CryptoPkg/Crt: satisfy "inet_pton.c" dependencies (CVE-2019-14553) eb520d94db CryptoPkg/Crt: turn strchr() into a function (CVE-2019-14553) 2ca74e1a17 CryptoPkg/TlsLib: Add the new API "TlsSetVerifyHost" (CVE-2019-14553) 31efec8279 MdePkg/Include/Protocol/Tls.h: Add the data type of EfiTlsVerifyHost (CVE-2019-14553) b26691c471 MdeModulePkg RegularExpressionDxe: Update Oniguruma from v6.9.0 to v6.9.3 Notice that the last one does not have CVE in the subject line. It is in the body and the git log command found that one too: c:\work\GitHub\tianocore\edk2>git show b26691c471 commit b26691c47188ce255b8a4d920bf07ddf1431e2cd Author: Liming Gao <liming.gao@...> Date: Thu Aug 8 19:53:03 2019 +0800 MdeModulePkg RegularExpressionDxe: Update Oniguruma from v6.9.0 to v6.9.3 BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2066 Update Oniguruma to the latest version v6.9.3. Oniguruma https://github.com/kkos/oniguruma This release is the security fix release. It includes the changes: Fixed CVE-2019-13224 Fixed CVE-2019-13225 Fixed many problems (found by libfuzzer programs) Verify VS2015, GCC5 build. Verify RegularExpressionProtocol GetInfo() and Match() function. Cc: Jian J Wang <jian.j.wang@...> Cc: Hao A Wu <hao.a.wu@...> Cc: Cinnamon Shia <cinnamon.shia@...> Signed-off-by: Liming Gao <liming.gao@...> Reviewed-by: Hao A Wu <hao.a.wu@...> The main requirement is to make sure the Name: Value tag line for the nature of the commit is unique enough so a grep regular expression will not falsely match a different commit. I like the **<type name>** syntax to help with uniqueness. We can even define a Name: Value tag syntax that supports a list of types if a single commit is doing more than one type of change TYPE: **fix** | **feat** | **imp** | **doc** | **style** | **chore** To search for all commits that begin the line with TYPE: and have **fix** in the same line, you can use the following command: git log --oneline --extended-regexp --grep=^TYPE:.*\*\*fix\*\* To search for all commits that begin the line with TYPE: and have either **fix** OR **feat** in the same line, you can use the following command: git log --oneline --extended-regexp --grep=^TYPE:.*\*\*fix\*\* --grep=^TYPE:.*\*\*feat\*\* To search for all commits that begin the line with TYPE: and have **fix** AND **feat** in the same line or across multiple TYPE: lines, you can use the following command: git log --oneline --extended-regexp --all-match --grep=^TYPE:.*\*\*fix\*\* --grep=^TYPE:.*\*\*feat\*\* As a double check to avoid false positives, I did a search of the entire git log history for all commits that contain **<any word>**. It hit on 2 commits that contained ***To*** and ***On***. c:\work\GitHub\tianocore\edk2>git log --extended-regexp --grep=\*\*[a-zA-Z]+\*\* commit 186bc15553cb1a2dcfcc2e60091b82d4870a657b Author: qwang12 <qwang12@6f19259b-4bc3-4df7-8a09-765794883524> Date: Wed Jan 21 09:48:54 2009 +0000 Rename module name from ***To*** to ***On***. AAAOnBBB means this module produce AAA Protocol/PPI based on BBB. This change improves the readability. git-svn-id: https://edk2.svn.sourceforge.net/svnroot/edk2/trunk/edk2@7329 6f19259b-4bc3-4df7-8a09-765794883524 commit 4c9c0f719df00f211790eadf04d21a93a0cdf76c Author: qwang12 <qwang12@6f19259b-4bc3-4df7-8a09-765794883524> Date: Wed Jan 21 09:47:43 2009 +0000 Rename module name from ***To*** to ***On***. AAAOnBBB means this module produce AAA Protocol/PPI based on BBB. This change improves the readability. git-svn-id: https://edk2.svn.sourceforge.net/svnroot/edk2/trunk/edk2@7328 6f19259b-4bc3-4df7-8a09-765794883524 Best regards, Mike -----Original Message----- |
|
Artem Shchygel
Hello Mike, Thanks for your feedback. There are several levels of intent behind our proposal. The one that you mentioned (search and identify commits by certain tag attribute) is important, but not the only one. Moreover one of your examples where you can have both "fix" and "feat" tag in the same commit is exactly the thing we're trying to avoid. And I also want to emphasize the notion that short tag in subject line must be mandatory. To me personally adding short tag in the subject line coupled with removing "Package/Module" and CVE references from it serves the following purposes: 1. Mandatory tag forces committer to explicitly state what commit is about. This should prevent mixed commits (like bug fix and improvement together) 2. Removing "Package/Module" reference allows commits across packages boundaries where it makes sense. It also allows reviewer to see changes in their logical context instead of package-boundary context. Now I'm not advocating for huge commits, just commit partitioning better be on logical level than package boundary level. 3. Removing CVE reference as well as "Package/Module" reference frees up the space in the subject line for more meaningful messages. Moreover your example shows why having CVE refs in subject line is bad - what if one commit fixes several of them. 4. Placing short tag in the subject line allows for commits without long description, that can be created from the command line without the need of firing up the editor, as opposite to adding "Package/Module" ref and named tag inside the commit body. For mandatory things ergonomic matters All in all we understand that following our proposal to the letter will create disruptions for maintainers/reviewers workflow while simplify things for committers/users. But we still believe this is change for the better. Regards, Artem |
|
Laszlo Ersek
On 08/06/20 20:43, Kinney, Michael D wrote:
The main requirement is to make sure the Name: Value tag line for the nature of theI'd like to suggest "Type: " rather than "TYPE: ". Furthermore, assuming a consistent use of the "Type: " name (at the beginning of the line), do we really need the '**' value prefix/suffix? On one hand, I find '**value**' a bit too "loud"; on the other, the asterisk is a regexp special character, so it requires escaping with backslash in git-log's "--grep" option argument, which makes it uncomfortable. I think the lines: Type: fix Type: feature Type: refactoring Type: optimization Type: documentation Type: style Type: chore would read very nice. (I used the "refactoring" and "optimization" values as two (more specific) replacements for the "**imp**" value -- "improvement of the existing code logic without changing functionality". Another one I can think of is "cleanup", but I think it's included under "refactoring".) At this time, $ git log --oneline --grep='^Type: ' master returns zero hits, so I think the '^Type: ' prefix makes these tags unique, without the "**" value prefix/suffix. Also, I think we should not combine multiple values on a single "Type: " line. In practice, we've applied this "single-value" rule to all other "Name: "s. (You have showed that "--all-match" matches multiple lines in a commit message.) Finally, I'd like to mention that "Type: fix" and "Type: feature" are not always mutually exclusive. Sometimes, there are issues that cannot be surgically / incrementally fixed and that call for a whole-sale replacement (rewrite) of a function. I wouldn't go as far as recommending "Type: rewrite" for this, I'm just saying that in some (rare!) cases, "Type: fix" and "Type: feature" may both show up in the same commit message. Like, we have a long-standing design issue that emerges over time, and it gets replaced with new (better designed) functionality. That's both a fix and a feature. Thanks! Laszlo |
|
Laszlo Ersek
On 08/06/20 22:05, Artem Shchygel wrote:
Hello Mike,I find that regrettable, because in some (rare!) cases, a commit would really deserve both tags. And I also want to emphasize the notion that short tag in subject line must be mandatory.I agree this is a good general principle; it's always good to make a patch author think explicitly about the nature of their patch, and to make them capture the outcome of that thought process. But this does not need to be encoded in the subject line, plus see my point above about the (infrequent) case when a patch is both a fix and a feature. 2. Removing "Package/Module" reference allows commits across packages boundaries where it makes sense. It also allows reviewer to see changes in their logical context instead of package-boundary context. Now I'm not advocating for huge commits, just commit partitioning better be on logical level than package boundary level.We've managed very well to split up patches *both* at logical boundaries *and* at package boundaries, at the same time. What you advocate for in this point is 100% correct for a self-contained open source project, one that explicitly disregards forks (such as the Linux kernel); I don't think it's a good fit for edk2 though. 3. Removing CVE reference as well as "Package/Module" reference frees up the space in the subject line for more meaningful messages. Moreover your example shows why having CVE refs in subject line is bad - what if one commit fixes several of them.This is a good point. I'm in favor of including CVEs in subject lines (they are extremely important, and also exceptional). But including multiple CVE numbers in the subject is indeed a tough nut. 4. Placing short tag in the subject line allows for commits without long description, that can be created from the command line without the need of firing up the editor, as opposite to adding "Package/Module" ref and named tag inside the commit body. For mandatory things ergonomic mattersAs a reviewer, I reject patches with empty or meaningless message bodies, as a norm (for the packages that I co-maintain / co-review), and as a community member, I protest against such patches for other packages too, if I notice them on the list. A commit message that consists of a subject line and a Signed-off-by, plus (perhaps) some mechanic boilerplate included from a template, is entirely useless. The message on a commit is nearly as important as the code in the commit; sometimes *much more* important. It's entirely normal to write a full page of rationale for a one-liner code change. I've been on a crusade for years to train community members to write more expressive commit messages. Your point#4 is a huge red flag to me, personally. Unfortunately, in any open source project that professes "no patch merged without review", the bottleneck is *always* on the reviewer/maintainer side, and not on the contributor / consumer side. Any change that makes things harder for reviewers / maintainers will decrease code quality and long-term coherence (regressions!), and/or worsen review feedback (such as response time), and/or erode commitment to the above principle ("no patch merged without review"). Thanks Laszlo |
|