Tuesday, December 20, 2022

A not so unfortunate sharp edge in Pipenv

I’ve been a proponent of pipenv for several years now, particularly for application development (rather than library development). While the features around virtual environment management and the integration with pyenv to automatically install the version of python necessary for an application are nice, the features that I’ve really advocated for are the separation of direct dependencies and transient dependencies, via Pipfile and Pipfile.lock, and the hash validation provided by Pipfile.lock. I find it helpful in improving the deterministic nature of builds (not solving, mind you, but improving), making sure everyone in the engineering organization is using the same versions of packages as everyone else. It’s also a minor reassurance against supply chain attacks, which is sort of what I want to write about today.

When you install a package with pipenv install Django, for instance, pipenv will automatically add Django to your Pipfile as a direct dependency, and then add Django’s dependencies as transient dependencies in Pipfile.lock. Say I install Django today and the latest version is 4.1.4, and then tomorrow Django releases 4.1.5. Pipfile.lock ensures that when my coworkers run pipenv sync (or when our Dockerfile does), they get 4.1.4 – the version that I originally installed. Of course we can update this automatically with pipenv update, but for the most part it is easy to install the same versions of the same packages I have installed. But because Pipfile.lock also contains the hashes of the distribution files for Django==4.1.4, if someone were to try to publish new distribution files for Django==4.1.4 then pipenv sync would fail, because the hashes on pypi are not the same as the hashes your Pipfile.lock is expecting.

Therein lies a small problem, though. While Django was just an example using a popular package, I’m going to switch gears to a real life scenario.

Real world example - python-crontab

We’ve got an application, and that application relies on a package called python-crontab. From it’s own description on PyPI:

Crontab module for reading and writing crontab files and accessing the system cron automatically and simply using a direct API.

We installed python-crontab back in late October of 2021, and we installed the latest version at the time – 2.6.0. pipenv install python-crontab gives us an entry in our Pipfile.lock that looks like this (along with some dependencies, not shown):

         "python-crontab": {
            "hashes": [
                "sha256:1e35ed7a3cdc3100545b43e196d34754e6551e7f95e4caebbe0e1c0ca41c2f1b"
            ],
            "version": "==2.6.0"
        }

Fast forward to today, our build is breaking and we aren’t really sure why. We automatically update dependencies every Monday morning, but after that happened the build was still working just fine. We also happened to update to the latest version of pipenv yesterday, but that still passed through the build fine. Something else had to have happened. Someone on our team attempted to rebuild our container environment with no cache and managed to get this output error from our pipenv sync step.

Installing dependencies from Pipfile.lock (2cdb99)...
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    python-crontab==2.6.0 from https://files.pythonhosted.org/packages/8a/65/ee4f4db956d14b42aa6cf0dbd0b77217a206484b99f1d4aa11326cd3952a/python_crontab-2.6.0-py3-none-any.whl (from -r /tmp/pipenv-uqho_tzz-requirements/pipenv-2hctl0kl-hashed-reqs.txt (line 183)):
        Expected sha256 1e35ed7a3cdc3100545b43e196d34754e6551e7f95e4caebbe0e1c0ca41c2f1b
             Got        f308a64b8b1d072da4a235e9320398a242e92d080c1d8143bd0c600b24e160f8
Installing initially failed dependencies...
ERROR: Disabling PEP 517 processing is invalid: project specifies a build backend of setuptools.build_meta in pyproject.toml

If we install the same version of python-crontab today into a new virtual environment with pipenv install python-crontab==2.6.0 then we get this entry:

"python-crontab": {
            "hashes": [
                "sha256:1e35ed7a3cdc3100545b43e196d34754e6551e7f95e4caebbe0e1c0ca41c2f1b",
                "sha256:f308a64b8b1d072da4a235e9320398a242e92d080c1d8143bd0c600b24e160f8"
            ],
            "index": "pypi",
            "version": "==2.6.0"
        }

Investigating

Interesting. It looks like the python-crontab wheel changed. Did my annoying insistence on using pipenv to help us combat supply chain attacks finally yield some fruit? Is this what vindication feels like? Well… sorta.

The hash that pipenv was expecting actually corresponds to the hash for the sdist from the initial release (October 19th, 2021) – the .tar.gz file. But a wheel is available now, as of December 19th 2022, which apparently wasn’t available before. This caused the hash validation to fail, which caused our build to fail, which caused us to get a hundred messages deep in a slack thread trying to understand what had happened.

I pulled down the source for python-crontab-2.6.0.tar.gz and the source for the new wheel python_crontab-2.6.0-py3-none-any.whl. I unzipped the wheel and then I went through each of the files in the wheel and diff’d them against the files available from the sdist.

output of the diff command on each of the three suspicious files – there is no diff

No differences in any of the files in the wheel compared to the versions that we were running from the sdist for the past year. We can dig deeper and look at the issue tracker for python-crontab, where we see an issue created yesterday asking for python wheels to be published to PyPI. This corresponds to when the new wheel showed up. We can also see a ticket opened earlier today complaining about the same thing we noticed in our build system – the hash changed for version 2.6.0.

Overall this is pipenv working exactly as it should. This could have very easily been a wheel containing different code than what we’d previously been installing, and pipenv tried its hardest to make sure we investigated that before running the new code. If this had been an actual supply chain attack, we would have avoided deploying the malicious code into production. Hooray! And thankfully, it was just a developer trying to be helpful by providing a prebuilt wheel for an old package version. Just a minor pain point that those of us doing hash validation in our build pipelines had to investigate.

Resolving

In resolving this incident, we opted for the smallest change that would not break the build. In this case, we simply changed the hash in our Pipfile.lock by hand to the one pipenv was expecting. Note that we didn’t add a hash, we just changed the hash to the wheel hash instead of the sdist hash. pipenv accepts this fine, which seems to mean that if a wheel matching your environment (in this case, any) is present, it doesn’t actually care if there is an sdist hash at all or not. Which means that it is possible to (but certainly not ergonomic to) go through all of your dependencies and forbid pipenv from installing from source distributions, provided you have a fairly tightly scoped development/deployment environment, which is pretty neat. Perhaps I’ll write a little utility to strip out sdist hashes from my Pipfile.lock files.

But now, to the sharp edge.

Because this developer released a wheel for an old version, and because we are tied to using pipenv to setup our application, we can no longer do a build from scratch using any version of our repo between October 19th 2021 and December 19th 2022 without first making modifications. pipenv sync will fail for all of those commits, because it will prefer the wheel over the sdist (rightfully so, mind you, setup.py considered harmful), and the wheel hash doesn’t match the hash that was available for over a year of commits to our application. This will make bisecting problems difficult, to say the least. As far as I can tell, pipenv sync does not provide a flag that will avoid this, but if I’m wrong then please @ me:

➜ pipenv sync --help
Usage: pipenv sync [OPTIONS]

  Installs all packages specified in Pipfile.lock.

Options:
  --system            System pip management.  [env var: PIPENV_SYSTEM]
  --bare              Minimal output.
  --sequential        Install dependencies one-at-a-time, instead of
                      concurrently.  [env var: PIPENV_SEQUENTIAL]

  -d, --dev           Install both develop and default packages  [env var:
                      PIPENV_DEV]

  --keep-outdated     Keep out-dated dependencies from being updated in
                      Pipfile.lock.  [env var: PIPENV_KEEP_OUTDATED]

  --pre               Allow pre-releases.
  --python TEXT       Specify which version of Python virtualenv should use.
  --three / --two     Use Python 3/2 when creating virtualenv.
  --clear             Clears caches (pipenv, pip, and pip-tools).  [env var:
                      PIPENV_CLEAR]

  -v, --verbose       Verbose mode.
  --pypi-mirror TEXT  Specify a PyPI mirror.
  -h, --help          Show this message and exit.

I suppose the behavior I’d like to see here is that if my Pipfile.lock has hashes in it for a distribution, then even if additional distributions are available, pipenv sync should be allowed to continue to install from the distribution corresponding to the “trusted” hash. Emit a warning, sure, but I’m not sure that it warrants failing a build because a developer added a new wheel to an old version. If the hash you have is for an sdist, then you’re already fine installing from that sdist with that hash. Adding a wheel is nice to have, sure, but we’re already installing with the sdist. Even if a new sdist was available for the “same” version of the same package (such as 2.6.0-1, which I was informed about in a video by anthonywritescode), as long as the old dist is still available with the same hash, it should still be able to install. The existence of other distributions for a package version does not negate the validity of the build for the hashed versions I already know about. (Even if those have bugs, were bad releases, whatever, the hash about the contents is still correct)

Please reach out to me @dade@crime.st if you have insight into a change we can make in our system to not get inconvenienced by this, or if you’re aware of efforts to make this change in pipenv already. Or maybe you have a different way of ensuring repeatability that you’d like to share. I’d love to chat about it. Just please don’t try to convince me to use poetry.



from Hacker News https://ift.tt/CiDIb0u

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.