Wednesday, September 1, 2021

Cooperative Package Management for Python

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

By Jake Edge
August 31, 2021

A longstanding tug-of-war between system package managers and Python's own installation mechanisms (primarily pip, but there are others) looks on its way to being resolved—or at least regularized. PEP 668 ("Graceful cooperation between external and Python package managers") has been created to provide ways for the two types of package installation to work together, rather than at cross-purposes at times. Since many operating systems depend on Python tools, with package versions that may differ from those of users' Python applications, making them play together nicely should result in more stable systems.

The root cause of the problem is that distribution package managers and Python package managers ("pip" is shorthand to refer to those throughout the rest of the article) often share the same "site‑packages" directory for storing installed packages. Updating a package, or, worse yet, removing one, may make perfect sense in the context of the specific package manager, but completely foul up the other. As the PEP notes, that can cause real havoc:

This may pose a critical problem for the integrity of distros, which often have package-management tools that are themselves written in Python. For example, it's possible to unintentionally break Fedora's dnf command with a pip install command, making it hard to recover.

The sys.path system parameter governs where Python looks for modules when it encounters an import statement; it gets initialized from the PYTHONPATH environment variable, with some installation- and invocation-specific directories added. sys.path is a Python list of directories that get consulted in order, much like the shell PATH environment variable that it is modeled on. Python programs can manipulate sys.path to redirect the search, which is part of what makes virtual environments work.

Using virtual environments with pip, instead of installing packages system-wide, has been the recommended practice to avoid conflicts with OS-installed packages for quite some time. But it is not generally mandatory, so users sometimes still run into problems. One goal of PEP 668 is to allow distributions to indicate that they provide another mechanism for managing Python packages, which will then change the default behavior of pip. Users will still be able to override that default, but that will hopefully alert them to the problems that could arise.

A distribution that wants to opt into the new behavior will tell pip that it manages Python packages with its tooling by placing a configuration file called EXTERNALLY‑MANAGED in the directory where the Python standard library lives. If pip finds the EXTERNALLY‑MANAGED file there and is not running within a virtual environment, it should exit with an error message unless the user has explicitly overridden the default with command-line flag; the PEP recommends ‑‑break‑system‑packages for the flag name. The EXTERNALLY‑MANAGED file can contain an error message that pip should return when it exits due to those conditions being met; the messages can be localized in the file as well. The intent is for the message to give distribution-specific information guiding the user to the proper way to create a virtual environment.

Another problem that can occur is when packages are removed from system-wide installs by pip. If, for example, the user installs a package system-wide and runs into a problem, the "obvious" solution to that may cause bigger problems:

There is a worse problem with system-wide installs: if you attempt to recover from this situation with sudo pip uninstall, you may end up removing packages that are shipped by the system's package manager. In fact, this can even happen if you simply upgrade a package - pip will try to remove the old version of the package, as shipped by the OS. At this point it may not be possible to recover the system to a consistent state using just the software remaining on the system.

A second change proposed in the PEP would limit pip to only operating on the directories specified for its use. The idea is that distributions can separate the two kinds of packages into their own directories, which is something that several Linux distributions already do:

For example, Fedora and Debian (and their derivatives) both implement this split by using /usr/local for locally-installed packages and /usr for distro-installed packages. Fedora uses /usr/local/lib/python3.x/site‑packages vs. /usr/lib/python3.x/site‑packages. (Debian uses /usr/local/lib/python3/dist‑packages vs. /usr/lib/python3/dist‑packages as an additional layer of separation from a locally-compiled Python interpreter: if you build and install upstream CPython in /usr/local/bin, it will look at /usr/local/lib/python3/site‑packages, and Debian wishes to make sure that packages installed via the locally-built interpreter don't show up on sys.path for the distro interpreter.)

So the proposal would require pip to query the location where it is meant to place its packages and only modify files in that directory. Since the locally installed packages are normally placed ahead of the system-wide packages on sys.path, though, this can lead to pip "shadowing" a distribution package. Shadowing an installed package can, of course, lead to some of the problems mentioned, so it is recommended that pip emit a warning when this happens.

The PEP has an extensive analysis of the use cases and the impact these changes will have. "The changed behavior in this PEP is intended to 'do the right thing' for as many use cases as possible." In particular, the changes to allow distributions to have two different locations for packages and for pip not to change the system-wide location are essentially standardizing the current practice of some distributions. The "Recommendations for distros" section of the PEP specifically calls out that separation as a best practice moving forward.

There are situations where distributions would not want to default to this new behavior, however. Containers for single applications may not benefit from the restrictions, so the PEP recommends that distributions change their behavior for those container images:

Distros that produce official images for single-application containers (e.g., Docker container images) should remove the EXTERNALLY‑MANAGED file, preferably in a way that makes it not come back if a user of that image installs package updates inside their image (think RUN apt‑get dist‑upgrade). On dpkg-based systems, using dpkg‑divert ‑‑local to persistently rename the file would work. On other systems, there may need to be some configuration flag available to a post-install script to re-remove the EXTERNALLY‑MANAGED file.

In general, the PEP seems not to be particularly controversial. The PEP discussion thread is positive for the most part, though Paul Moore, who may be the PEP-Delegate deciding on the proposal, is concerned that those affected may not even know about it:

One thing I would be looking for is a bit more discussion - the linux-sig discussion mentioned was only 6 messages since May, and there's only a couple of messages here. I'm not convinced that "silence means approval" is sufficient here, it's difficult to be sure where interested parties hang out, so silence seems far more likely to imply "wasn't aware of the proposal" in this case. In fact, I'd suggest that the PEP gets a section listing distributions that have confirmed their intent to support this proposal, including the distribution, and a link to where the commitment was made.

Assuming said confirmations are forthcoming, or that any objections and suggestions can be accommodated, PEP 668 seems like a nice step forward for Python. Having tools like DNF and apt fight with pip and others is obviously a situation that has caused problems in the past and will do so again. Finding a way to cooperate without causing any major backward-compatibility headaches is important. Ensuring that other distributions are on-board with these changes, all of which are ultimately optional anyway, should lead to more stability and, ultimately, happier users—both for Python and for the distributions.

Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.

(

Log in

to post comments)



from Hacker News https://ift.tt/3t7ai4K

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.