FLOP:FAEPyPI

From Funtoo
Jump to: navigation, search
Created on
2019/12/10
Original Author(s)
digifuzzy
Current Maintainer(s)
digifuzzy
Status

Funtoo Linux Optimization Proposal: FAEPyPI

Funtoo Automated Ebuilds - Python Packaging Index - A python3 based application that will periodically query PyPI (Python Packaging Index) servers for metadata of identified packages. If an update is necessary, the application will further parse the metadata and transform the information to produce a new ebuild file.

FAEPyPI boils down to the idea...

PyPI metadata spec --> each package metadata --> JSON import --> Funtoo ebuild.

Last Update 2020/01/02

Motivation

During the transition from the Funtoo 1.3 to 1.4 release, numerous instances were encountered where the current available python version was not set in an ebuild file's PYTHON_COMPAT string. This required a maintainer to manually add the new python version, test, and upload a new ebuild file with either a -r revision, or more often a new revision as upstream had released newer software.

Ebuilds for Python Package Index ( PyPI ) packages require maintenance that can become quite pedantic and burdensome. Determine latest revision. Review ebuild file for changes. Update ebuild for new version. Given the diversity of PyPI packages and the range of different user needs, ebuild file maintenance becomes a whack-a-mole affair. Updates are handled on a case-by-case basis as time is available. Complicating matters is maintaining quality of ebuilds, given different sources and developer preferences over time. Some ebuilds require correction due to eclass usage or poor implementation of portage. Maintaining all the ebuilds for a variety of user needs is huge task. Overall, these dev-python ebuilds lack a sense of "best practices". Maintaining these ebuilds becomes a chore consuming maintainer time and other resources that could be spent elsewhere.

PyPI expects software to be installed via pip. Usage of this install method is antithetical to maintaining a clean Funtoo system. Clashes between pip-installed and ebuild-installed packages are not uncommon. To choose one method, as suggested on numerous python support pages, and doing so via ebuilds further burdens maintainers to keep ebuilds up-to-date.

Background

During the release transition, it was discovered that PyPI publishes quite a bit of metadata for each package that is uploaded to their servers. This metadata is intended to be used by package pip install method. Further investigation found that metadata content is governed by PyPI's Core-Metadata specification. It is the processed Core-Metadata that users see when accessing a package's PyPI landing page (i.e. https://pypi.org/project/[package name]/). Reviewing supporting documentation and web searches led to realization that package metadata can be accessed through means other than pip and that the metadata is available to public users.

PyPI's Warehouse Codebase describes the server architecture and software employed to automate the process users follow to upload their packages, process metadata, and update the public-facing information about that package to PyPI's servers. Package upload and information updating is done with only minimal input or intervention by PyPI maintainers. Also found in Warehouse supporting documentation was the realization that an API exists to access package metadata in JSON format.

With experimentation, a proof-of-concept was developed to confirm that a process could be employed to automate metadata retrieval, parse the obtained JSON formatted metadata, and produce something close to an operational ebuild file. The current proof-of-concept code, building to FAEPyPI release, is maintained here. (Update: Development now at Bitbucket - FAEPyPI)

Proposal

Proposed is FAEPyPI, or Funtoo Automated Ebuilds - Python Packaging Index. This python 3 based application will periodically query PyPI servers for metadata of identified packages. If an update is necessary, the application will further parse the metadata and transform the information to produce a new ebuild file.

The process of ebuild production will be guided by regulated overall and per-package data. This data, such as slotting, mirror urls, or other factors specific to install or implementation on Funtoo Linux installation would be maintained under git revision control.

Again, FAEPyPI boils down to the idea... PyPI metadata spec --> each package metadata --> JSON import --> Funtoo ebuild

Design Considerations

  • Minimal python packaging dependency - the point of the exercise is to produce an ebuild from python package data. A maintainer, with bare minimum core install, should be able to perform updating actions.
  • Caching - see this PyPi Warehouse doc. PyPI does not rate limit pulls from their Warehouse, however, link reference does highlight PyPI maintainer concerns about pulling large data sets. As a "consumer" of data, Funtoo should honour their requests and coordinate access if required.
  • Updating - In a package's PyPI JSON metadata is the key ['last_serial']. No specific description of purpose could be located in API documentation. It appears, from example and observation, this key represents a value to indicate when package metadata was last updated (similar in idea to a serial value for a DNS zone file). If true, this value can be directly accessed, no other JSON parsing needed, and compared to a distributed stored value when the last automated update of a package in Funtoo was performed. This cache of last_value needs to be managed and distributed to minimize work overlap between maintainers. Note: Core Metadata is a subset of Warehouse data. JSON file keys immediately below root: ['last_serial'], ['releases'], and ['urls'] are apart of Warehouse specification supporting a package. Package Metadata (as defined by PEP 566) is stored under the key ['info']).
  • Local Preferences - not all maintainers keep git checkout's in the same location on their machines. Having local settings makes FAEPyPI more flexible to workflows.
  • Per-package Data - every PyPI-based package does not follow the same methodology or design. There will be deviations between some packages. Allowances for this situation would be to have git-controlled information stored that FAEPyPI installed on a maintainer's machine can lookup if and how a package deviates from expected install.
  • Uploads/Merges - De-centralizing the updating process and not requiring a single user to do all the work helps cut down workload of our BDFL. And if he suddenly wins the lottery and retires to a Caribbean Island, Funtoo would still have some means to continue.
  • Specification Adherence and Maintenance - FAEPyPI will be dependent on the published Core-Metadata specification from PyPI (see link to PEP566 below). Any code should be annotated and commented so that multiple developers can contribute to it's maintenance without the excessive burden of a steep-learning curve just to understand how the code works. Clear, concise, simplistic code utilizing PEP8 conventions with documented and agreed-upon deviations. Pylint/Flake8 with clean runs will be encouraged.

Update (2019 Dec 28)

Changes:

  • Changed all references of PyPi to the correct identifier PyPI
  • PoC code modified and cleaned up to better conform to Python packaging practices. All development moved to a new public facing repository at Bitbucket - FAEPyPI.
  • Discovered aspects of Core Metadata is covered by Python Enhancement Proposals (PEPs). For example, Core Metadata is covered by PEP 566 -- Metadata for Python Software Packages 2.1. Other aspects of version usage and requirements are also covered by PEPs released by PyPA (Python Packaging Authority). Where these PEPs are found relevant, these should be annotated in code commentary/docstrings.
  • Initial PoC code had a hand-written package to handle version information. This was found to be not ideal. Existing Python package packaging from the Python standard library (installed with Python) will be used instead.
  • Ebuild files will be generated using Jinja2 Templates. This makes code easier to manage going forward but creates a dependency on an external package (including its dependencies). This is an acceptable trade-off.
  • See README.txt in repository root for more information on usage.

Update (2020 Jan 01)

Changes:

  • Corrected JSON key error in 'Design Considerations'; 'Updating'. Correct key now shown.
  • Added more clear information on how Warehouse and Core Metadata relate to each other.
  • Added pylint/flake8 usage - encouraged for now - may become mandatory.
  • Typo and error corrections.

Update (2020 Jan 02)

Changes (code):

  • Correct error in retrieve - wrong output folder name being used. Method created to consistently retrieve correct path to warehouse data store.
  • Update all headers to current year.