FLOP:FAEPyPI

From Funtoo
Revision as of 07:03, December 28, 2019 by Digifuzzy (talk | contribs) (Digifuzzy moved page FLOP:FAEPyPi to FLOP:FAEPyPI: Reflect proper identity of Python Packaging Index)
Jump to navigation Jump to search
Created on
2019/12/10
Original Author(s)
digifuzzy
Current Maintainer(s)
digifuzzy
Status

Funtoo Linux Optimization Proposal: FAEPyPI

Funtoo Automated Ebuilds - PyPi - A python3 based application that will periodically query PyPi servers for metadata of identified packages. If an update is necessary, the application will further parse the metadata and transform the information to produce a new ebuild file.

FAEPyPi boils down to the idea...

PyPi metadata spec --> each package metadata --> JSON import --> Funtoo ebuild.

Motivation

During the transition from the Funtoo 1.3 to 1.4 release, numerous instances were encountered where the current available python version was not set in an ebuild file's PYTHON_COMPAT string. This required a maintainer to manually add the new python version, test, and upload a new ebuild file with either a -r revision, or more often a new revision as upstream had released newer software.

Ebuilds for Python Package Index ( PyPi ) packages require maintenance that can become quite pedantic and burdensome. Determine latest revision. Review ebuild file for changes. Update ebuild for new version. Given the diversity of PyPi packages and the range of different user needs, ebuild file maintenance becomes a whack-a-mole affair. Updates are handled on a case-by-case basis as time is available. Complicating matters is maintaining quality of ebuilds, given different sources and developer preferences over time. Some ebuilds require correction due to eclass usage or poor implementation of portage. Maintaining all the ebuilds for a variety of user needs is huge task. Overall, these dev-python ebuilds lack a sense of "best practices". Maintaining these ebuilds becomes a chore consuming maintainer time and other resources that could be spent elsewhere.

PyPi expects software to be installed via pip. Usage of this install method is antithetical to maintaining a clean Funtoo system. Clashes between pip-installed and ebuild-installed packages are not uncommon. To choose one method, as suggested on numerous python support pages, and doing so via ebuilds further burdens maintainers to keep ebuilds up-to-date.

Background

During the release transition, it was discovered that PyPi publishes quite a bit of metadata for each package that is uploaded to their servers. This metadata is intended to be used by package pip install method. Further investigation found that metadata content is governed by PyPi's Core-Metadata specification. It is the processed Core-Metadata that users see when accessing a package's PyPi landing page (i.e. https://pypi.org/project/[package name]/). Reviewing supporting documentation and web searches led to realization that package metadata can be accessed through means other than pip and that the metadata is available to public users.

PyPi's Warehouse Codebase describes the server architecture and software employed to automate the process users follow to upload their packages, process metadata, and update the public-facing information about that package to PyPi's servers. Package upload and information updating is done with only minimal input or intervention by PyPi maintainers. Also found in Warehouse supporting documentation was the realization that an API exists to access package metadata in JSON format.

With experimentation, a proof-of-concept was developed to confirm that a process could be employed to automate metadata retrieval, parse the obtained JSON formatted metadata, and produce something close to an operational ebuild file. The current proof-of-concept code, building to FAEPyPi release, is maintained here

Proposal

Proposed is FAEPyPi, or Funtoo Automated Ebuilds - PyPi. This python 3 based application will periodically query PyPi servers for metadata of identified packages. If an update is necessary, the application will further parse the metadata and transform the information to produce a new ebuild file.

The process of ebuild production will be guided by regulated overall and per-package data. This data, such as slotting, mirror urls, or other factors specific to install or implementation on Funtoo Linux installation would be maintained under git revision control.

Again, FAEPyPi boils down to the idea... PyPi metadata spec --> each package metadata --> JSON import --> Funtoo ebuild

Design Considerations

  • Minimal python packaging dependency - the point of the exercise is to produce an ebuild from python package data. A maintainer, with bare minimum core install, should be able to perform updating actions.
  • Caching - see this PyPi Warehouse doc. PyPi does not rate limit pulls from their Warehouse, however, link reference does highlight PyPi maintainer concerns about pulling large data sets. As a "consumer" of data, Funtoo should honour their requests and coordinate access if required.
  • Updating - In a package's PyPi JSON metadata is the key ['info']['last_serial']. No specific description of purpose could be located in API documentation. It appears, from example and observation, this key represents a value to indicate when package metadata was last updated (similar in idea to a serial value for a DNS zone file). If true, this value can be directly accessed, no other JSON parsing needed, and compared to a distributed stored value when the last automated update of a package in Funtoo was performed. This cache of last_value needs to be managed and distributed to minimize work overlap between maintainers.
  • Local Preferences - not all maintainers keep git checkout's in the same location on their machines. Having local settings makes FAEPyPi more flexible to workflows.
  • Per-package Data - every PyPi-based package does not follow the same methodology or design. There will be deviations between some packages. Allowances for this situation would be to have git-controlled information stored that FAEPyPi installed on a maintainer's machine can lookup if and how a package deviates from expected install.
  • Uploads/Merges - Dr-centralizing the updating process and not requiring a single user to do all the work helps cut down workload of our BDFL. And if he suddenly wins the lottery and retires to a Caribbean Island, Funtoo would still have some means to continue.
  • Specification Adherence and Maintenance - FAEPyPi will be dependent on the published Core-Metadata specification from PyPi. Any code should be annotated and commented so that multiple developers can contribute to it's maintenance without the excessive burden of a steep-learning curve just to understand how the code works. Clear, concise, simplistic code utilizing PEP8 conventions with documented and agreed-upon deviations.