Monday, December 26, 2022

Extending Python with Rust

Extending Python with Rust

Sometimes a pure Python script can't just deliver the performance we need. When that's the case we have to resort to writing our logic in a "fast" compiled language like C or Rust and expose the function with through a Python module. This way we get the best of both worlds. Today I focus on how to use Rust for writing such extensions. I choose Rust over C because it is just nicer to use and less of a minefield of gotchas waiting for you trip them. Also, since as a data scientist I spend spend most of the time manipulating Numpy arrays so I will focus on how to pass them and return them from Rust. To accomplish this I'll make use of the PyO3 and Numpy crates.

The code has been borrowed from the rust-numpy examples and it is just to showcase how to write Rust extensions.

Setup

In order to get through the following steps you will need to have the Rust toolchain and pyenv installed.

Let's start by creating a virtual environment with all the necessary dependencies and a new Rust library. Our Python dependencies are:

  • Numpy
  • Maturin: To help with the building process of the Rust library.
cargo new --lib Rumpy

# Prep virtualenv, Python must be >=3.6
pyenv virtualenv 3.8.5 Rumpy
pyenv activate Rumpy
python -m pip install --upgrade pip
pip install numpy maturin

Next, to configure our Rust project we update the Cargo.toml file with the followig dependencies. The name for lib has to match both what we will define in the Rust code and the name used in import clauses from Python.

[lib]
name = "rust_ext"
crate-type = ["cdylib"]

[dependencies]
numpy = "0.13"
ndarray = "0.14"

[dependencies.pyo3]
version = "0.13"
features = ["extension-module"]

With all that ready lets have a look at the actual code.

The library

The library will provide two simple examples. The first one, axpy, multiplies an array by a scalar value and adds it to a second array. Our other function, mult, just multiplies an array by a scalar.

First we annotate the function that will ultimately represent the Python module with the #[pymodule] annotation. This function must takes _py which shows that we're holding the GILboth and the module itself. This macro takes of exporting the initialization function of the module.

When defining functions we will first define the logic that carries out the function logic, in this case axpy and mult, together with wrapper functions ,axpy_py and mult_py. The wrapper functions which eventually get exported must be annotated as #[pyfn(m, "axpy")]. The first argument of the annotation is the Python module that was passed to the "module" function and the second one, the name that the exported function will take. This will register the functions to the module. More details on the details of the PyO3 annotations can be found on its documentation.

Compilation is as simple as running:

maturin develop --release

This will take care of compiling the module with optimizations turned on and install it on your env so you can immediately test it.

Benchmarks

Finally, lets do some simple benchmarks to see how well does the Rust implementation compare against both the natural Numpy solution and a naive Python implementation. We are just interested on a quick check so the IPython %%timeit magic is enough here. The IPython session would look something like this.

Rumpy Vs Numpy benchmark
Quick snapshot of the IPython session used for benchmarking.
Rumpy Vs Numpy benchmark
Rumpy Vs Numpy benchmark.

As expected the pure Python implementation is comically slow and won't be considered futher. What's more interesting is that the Rust implementation is just a factor of 1.23 slower (for large arrays) than just using Numpy. Using PyO3 apparently introduces zero overhead and for smaller inputs the Rust implementation was actually marginally faster than Numpy. In exchange for a slight loss in performance we get code that reads exactly as the Numpy implementation and with stronger guarantees about correctness than if we had written a C algorithm using CFFI.

Of course, you would never go down the route of writing a compiled extension when the algorithm can be expressed so simply using vectorized Numpy operations. However, when writing more complex logic and algorithms that can't be simply expressed with Numpy ops I am willing to take the tradeoff a small (relatively speaking) overhead in exchange for a modern programming language which is both nicer to use and more secure than C when writing Python extensions.

Farewells

On future blog posts I will explore other alternatives to accelerate Python code such as: cython, numba, async programming and the multiprocessing library. Stay tuned!!!



from Hacker News https://ift.tt/xF6SYPA

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.