[RFC] BaseTools/Source/Python as a standalone python package in independent repo


Matthew Carlson
 

Hello Tianocore Community,

I’m submitting an RFC, proposing the movement of the basetools folder in edk2 to a separate repo and treated as a separate python project.

We talked about it during the April 16th design meeting and on devel (https://edk2.groups.io/g/devel/topic/73069134#58048), feel free to look over the discussion.

Here’s a basic overview of the what and the why behind this proposal:

Why a separate repo?
The recent efforts in expanding the role of CI in the platform and core code of EDK2 will pay big dividends in the future, leading to higher quality code and easier integrations for everyone. Having basetools as it’s own repo would simplify adding a similar CI/linting process and unit-tests to the basetools python code, leading to higher quality code.

A second major benefit is it would allow others that write tools for UEFI and Edk2 to leverage this vast resource of python code using standard python package inclusion. It would allow those tools to be decoupled from edk2 source and provide a consistent and managed user experience. The python project would be published as a Pip module for those that want to leverage the basetools modules the same way they leverage the existing python ecosystem. Packing basetools as a pip module, would reach the most developers and provide the most flexibility and versatility. There are numerous way this could be used; Pip is just one method suggested here. Other ways to leverage this are described below.

Why a pip module?
The investment into basetools is sizable and it has some amazing functionality that’s difficult to reproduce. For example, the DSC, FDF, INF, and DEC parsers handle an incredible amount of edge cases. If I wanted to write a tool that could do a CI validation check or build a UEFI capsule, currently I would need to clone all of EDK2 to get basetools. If it was in a separate repo and available as a wheel, as a developer, I could include it via pip and have that dependency managed with virtual environment or just in the global cache. In addition, other tools that currently are difficult to build would become possible with access to the Basetools functionality.

However, there have been some concerns expressed about having a global basetools and the impact this has on developers with multiple workspaces (potentially at different versions). There are several tools and strategies to consider for managing this dependency. Some outlined below.

How will this change your workflow?
If this moved there would have to be a change for all platforms using edk2 and we have been evaluating options. These could become requirements a developer must do before building edk2 or with minimal effort could be added to the edksetup or build scripts. These can also be more easily isolated if python virtual environments are used.

For those just consuming released versions of basetools python code:
Option A: leverage Python package management and install a released version from Pypi into a per project virtual environment.
Option B: leverage pip to install from the git repo at known tag/release (pip_requirements)

For those wanting to do active development of basetools within an edk2 project:
Option C: Clone the python package source and install the package locally (pip install -e ./). All changes made in the local package are reflected into your python site packages.

Option D: An offline server that builds edk2. You would need to download the wheel, tar, or git repo before hand. Pip supports installing from a variety of sources: https://packaging.python.org/tutorials/installing-packages/ such as a tar.gz, a local folder, or a wheel.

PIP vs Submodules:
The issue with submodule and not using python packages (pip is really just a super convenient helper to install that consistently) is that it requires the paradigm of file system python path management. Think of this as edk1 vs edk2 public includes (if you remember back that far), basically you just figure out the path from yourself to the file you want and add that. It gives you no ability to apply consistent ways to access modules and in the end means you just have a folder with python scripts/modules in it. It causes major pain for downstream consumers when/if there is ever refactoring (and we know there needs to be refactoring). It leads to others not being able to extend or leverage these modules without introducing significant fragility.

What Pip does is gives you a way to consistently install and access the python modules. You can pip install from a local copy of the git repo or you can pip install a "release" using something like pypi. Either way it doesn't change how consuming code accesses the python modules. This is the value of pip in this case.

If there is a strong desire from the developer community to have a workflow that avoids pip I believe a little documentation could handle that. Pip is just a helper. Python packages can be built from the source and installed. I see no reason to recommend this as it requires numerous more commands but if avoiding pip is your goal, it is possible.

More value for python package management (PIP):
As we start looking at refactoring and potentially moving tools from C to python we can take advantage of package management to lower our maintenance burden. For example Brotli has a pypi release and thus we wouldn't need to carry that code. https://pypi.org/project/Brotli/

Versioning and Dependencies:
To minimize the dependency challenges and "bisectability" I would suggest we leverage the versioning capabilities within pip and repo tagging. With versioning you have lots of options as you can lock to a specific version which requires an update each time or you can use some sort of floating version within the tuple of version (xx.yy.zz). These two tools can make this pretty flexible.

In a scenario of DEC or INF syntax change, the suggested workflow would be:
1. Create the issue for basetools
2. Update basetools python
3. Write the unit test that shows it works as expected
4. Check in and make a release
5. Update edk2 pip-requirements.txt to require at least this new version. This gives you the tracking necessary to align with the tools.
6. Use this new feature in the edk2 fw code.

In the scenario of a change in BaseTools that causes a break for a downstream project (like EDK2 or a closed source platform), it is easy to simply checkout the previous pip_requirements and pip install the previous release or to install locally (using option C), checking out the commit in Basetools and inspecting the git history to debug the issue. It’s easy for a developer to modify source locally and test against local code. This is covered by workflow option C. It is very easy to manage and is one of the reasons we are proposing this change.

Demo:
We have a demo of what this would look like: https://github.com/matthewfcarlson/edk2-pytool-base/
And the EDK2 that leverages it https://github.com/matthewfcarlson/edk2/tree/feature/pip-basetools

Contribution/Dev Process:
Since this is a separate repo, it will follow a slightly different contribution and code review process.
1. Github PR process will be used for contributions and code review feedback
a. The yet to be released “Tianocore PR archiver” will be used to send to a dedicated list for basetools patch review archive
2. PRs will only be committed if they keep linear history (no merge commits)
3. The PR review must be approved by at least 2 members of the basetools team (not including the author)
4. The PR must pass all automated checks
a. Formatting/style
b. Unit tests
c. Code coverage (can’t commit change that would decrease overall %)
d. DCO enforcement - https://probot.github.io/apps/dco/
e. See other python requirements from the Python coding standard
5. Github Issues will be used for non-security sensitive bugs/issues/feature requests

Releases:
1. Version will follow Semantic Versioning (https://semver.org/) xx.yy.zz
a. X is major version and will update at incompatible change for Core basetools API
b. Y is minor version. Will be updated when new functionality is added in compatible manner
c. Z is patch version. Updated with each change
d. Target 1.0.0 for 2020 Q3 stable tag timeframe
2. Git tags will be created for each version
3. Github “release” will be created for each version
4. Git branches will not be created proactively. If servicing is determined to be needed then a branch can be created from the tag.
5. Pypi release will be made for each version
6. Github milestones will be used for tracking content in every minor version
What happens next?
Right now, we’re gathering feedback and seeing if anyone has an concerns or platforms that this would not work for. We’d love to hear what you have to say. Baring any serious concerns, we’d move forward with:
1. Create new GitHub repo on tianocore for the basetools project
2. Develop the testing, PR, and release process
3. Release the initial version to pypi
4. Delete the source folder in edk2 repo and replace with readme and method to get pip version installed (this patch will be post Q2 stable tag to give developers time to adjust).
5. Continually improve basetools and add more testing
This RFC will close May 11th-12th, please respond with comments and questions before that date.

What’s the long-term plan?
The current tentative long term plan is to merge some or all of basetools in with the existing edk2-pytool-library repo. This would likely involve the conversion of C based basetools to python based ones. This is still an active conversation, and we’d like to hear your thoughts.


Matthew Carlson
Core UEFI
Microsoft


Matthew Carlson
 

Since we haven't had any feedback and the deadline is quickly approaching. We are going to move ahead by creating a new repo inside of TianoCore and creating patches post-stable tag and submit them to the mailing list as soon as the stable tag is made.

If any other comments of feedback, feel free to chime in. If anyone has any basetools python code changes, please coordinate with us as the patch will be removing Basetools inside of the EDK2 repo. We want to make sure no changes are lost or misplaced.


Laszlo Ersek
 

Hi Matthew,

On 05/13/20 00:40, Matthew Carlson via groups.io wrote:
Since we haven't had any feedback and the deadline is quickly approaching. We are going to move ahead by creating a new repo inside of TianoCore and creating patches post-stable tag and submit them to the mailing list as soon as the stable tag is made.

If any other comments of feedback, feel free to chime in. If anyone has any basetools python code changes, please coordinate with us as the patch will be removing Basetools inside of the EDK2 repo. We want to make sure no changes are lost or misplaced.
I didn't provide any feedback in this specific thread because I thought
our discussion earlier was sufficient feedback from me.

(just commenting on the particular "we haven't had any feedback" bit)

Thanks,
Laszlo


Laszlo Ersek
 

On 04/29/20 02:33, Matthew Carlson via groups.io wrote:

Versioning and Dependencies:
To minimize the dependency challenges and "bisectability" I would suggest we leverage the versioning capabilities within pip and repo tagging. With versioning you have lots of options as you can lock to a specific version which requires an update each time or you can use some sort of floating version within the tuple of version (xx.yy.zz). These two tools can make this pretty flexible.

In a scenario of DEC or INF syntax change, the suggested workflow would be:
1. Create the issue for basetools
2. Update basetools python
3. Write the unit test that shows it works as expected
4. Check in and make a release
5. Update edk2 pip-requirements.txt to require at least this new version. This gives you the tracking necessary to align with the tools.
6. Use this new feature in the edk2 fw code.
Here's an example why the above procedure (i.e., strict & lock-step
versioning) is important:

https://bugzilla.tianocore.org/show_bug.cgi?id=2719

When looking at a particular commit in edk2, for example for backporting
purposes, or else when looking at the whole edk2 tree *at* a particular
commit, it must be clear to the reader what basetools *state* was able
to build that commit / that tree, at the time. Because then the reader
will also be able to either backport the necessary basetools patches
too, or else (perhaps more simply) upgrade their separate basetools
component to that particular version.

So, I agree with the suggested workflow, I just wanted to emphasize how
important it is.

Thanks
Laszlo


Matthew Carlson
 

Sorry- I wasn't more clear with my initial message. We meant with this forum here. The discussion in the earlier forum brought up several great points and outlined a pretty decent list of must haves in terms of workflow and process. I've been brainstorming some ways to get the lock-step versioning you're describing and I agree that it's quite important.