Friday, May 12, 2023

Extreme Multi-Threading: C++ and Julia 1.9 Integration

In this tutorial we demonstrate how to call Julia libraries with multiple threads from C++. With the introduction of Julia 1.9 in May 2023, the runtime can dynamically "adopt" external threads, enabling the integration of Julia libraries into multi-threaded codebases written in other languages, such as C++. This article is written in collaboration with Evangelos Paradas, the maestro of algorithm deployment at ASML. Evangelos has been responsible for heavily testing and debugging this multi-threading feature. I humbly repeated the final results after his many trial-and-error attempts and summarized everything for you in this article.

Julia in production

Julia is a general-purpose language designed for scientific and numerical computing, striking a balance between speed and simplicity. The adoption of Julia in the industry is growing every year, but complex cases require enhanced deployment capabilities in the core of the language. One such crucial improvement we needed was the ability to call Julia libraries with multiple threads from another language. Fortunately, this is now possible in Julia version 1.9. Since we have been involved in testing this new feature extensively, we would like to share this tutorial with you to accelerate your journey with external threads in Julia.

Weaving threads across multiple programming languages is an extreme sport in software engineering. You do so at your own risk. Incorrect usage of this technology will crash your production systems. You have been duly warned.

Before starting, making sure you are working with Julia 1.9, either by using juliaup or downloading Julia 1.9 manually and adding it to your path.

Introduction to C++ embedding

In the past, I have spent quite some time writing a tutorial about how to embed Julia libraries into C++. It's not trivial. High level the steps involved are:

  • Create a Julia package with the Julia c-interface functions

  • Write the C++ code that will call those Julia functions

  • Compile the Julia code to a library with PackageCompiler.jl

  • Compile C++ and link it to the Julia library

I won't delve into all the specifics above, so if you wish to reproduce the results of this article, it's advisable to first read my previous article. Prior to embarking on a multi-threaded adventure, make sure that you are intimately familiar with embedding in a single-threaded manner. Having multiple C++ threads call into Julia is an exceptionally advanced subject, particularly if you have limited prior experience with C++ and multi-threading. Take your time to learn the ropes.

Julia code example

We wrote a very simple Julia function that throws an error depending on the input value. The exact Julia functionality doesn't matter in this article. As mentioned, you can read the extensive blog post on my previous blog for details, but here are the important highlights for making a Julia function ready for C/C++ embedding:

  • Use Base.@ccallable to make sure the Julia function can be called from C/C++

  • Use C types on the interface. In this example I only use Cint types. Note that Cint is an alias for Int32, so Julia integers and C integers actually have the same memory layout.

Base.@ccallable function divide_function(input::Cint)::Cint
    if input > 10
        throw(ErrorException("You cannot divide by more than 10"))
    end
    outputValue::Cint = div(12, input)
    return outputValue
end

When you place this function inside a Julia package, you can compile it to a library with PackageCompiler. An example build script can be found in my github repository that accompanies this article.

Initializing Julia

Here are some of the interfaces that are important for initializing the Julia library in the correct manner for accepting/adopting external threads from C++. We requested advise to use many of these functions, as we're not experts in this either. The C API of the Julia runtime (those jl_* functions) could definitely use some more documentation.

  • The init_julia function comes from a header file that is created together with your compiled Julia library. Nothing special here.

  • The code with jl_is_initialized has to go into a try/catch block because when Julia is not initialized this variable is not available in the memory and returns a segfault. A surprising gotcha.

  • Make sure to lock and unlock the initialization of Julia, so that no other thread can accidentally try to start Julia as well, while this thread is busy initializing Julia.

  • jl_adopt_thread enables this C++ thread to be used by Julia. This is the most important C API function to remember for external multi-threading. It's available since Julia 1.9.

  • the job of jl_gc_safe_enter is to mark the thread as safe, so that the garbage collector (GC) can run concurrently to that thread. By using this function, you make a promise not to do any GC visible work, such as allocating new memory. The use of parentheses around the function is simply to avoid confusion with a function-like macro.

  • jl_enter_threaded_region sets Julia to multi-threading mode, I believe. This function is also used for example by the Julia @threads macro, but lacks any documentation.

According to the link with news about thread adoption says that @ccallable Julia function will automatically adopt threads. This is true, but what if you execute a Julia function or macro before the @ccallable function? In that case you get a segmentation fault, because this thread is not yet adopted. For example, when you want to capture Julia errors, you need to call the JL_TRY macro before the @ccallable. In the next section, we will show how to use such macros within a multithread environment. In this initialization section, we show the safest way is to perform the thread adoption by calling jl_adopt_thread explicitly.

All together we use these functions to initialize the Julia compiled library as follows. I have kept the code example concise to highlight what matters.

#include "julia_init.h"

bool is_julia_initialized()
{
    try
    {
        return jl_is_initialized() != 0;
    }
    catch (...)
    {
        return false;
    }
}

void initialize_julia(int argc, char *argv[])
{
    mtx.lock();

    if (!is_julia_initialized())
    {
        init_julia(argc, argv);
        jl_adopt_thread();
        (jl_gc_safe_enter)();
        jl_enter_threaded_region();
    }

    mtx.unlock();
}

The main C++ code

Let's write a simple wrapper around our lovely c-callable Julia function and show you how to catch any errors thrown by Julia. All in a multi-threaded way. Remember, the Julia function divide_function is a trivial function that uses integers and throws an exception when the input integer is larger than 10.

We use jl_get_pgcstack to check if a thread is already adopted by Julia. If you attempt to adopt a thread twice, you will encounter a segmentation fault. This is one way to avoid making that mistake accidentally.

The JL_TRY macro will check if an error occurred in the adopted thread. This macro only works if the thread is actually adopted, else you get yet another segmentation fault. Inside the macro we call the function from the Julia library.

If you want to retrieve the actual Julia error inside the JL_CATCH, you will need to call into the Julia runtime. I have some example code in a previous article about catching Julia exceptions from C++ on my personal blog. In the example here, we kept it simple and just printed a message.


void call_and_catch(int x)
{
    
    if (jl_get_pgcstack() == NULL)
        jl_adopt_thread();    

    
    JL_TRY
    {
        divide_function(x); 
        std::cout << "Succeeded for x = " << x << std::endl;
    }
    JL_CATCH
    {
        std::cout << "Caught error for x = " << x << std::endl;
    }
}

We can now write a piece of multi-threaded C++ code and call our Julia function. The easiest way is to first create a pool of threads. If you want to make this example more complicated, you'll have to learn a bit more about C++, which is beyond the scope of this article. But this is a good example to get you started.


int main()
{
    const size_t n_of_threads(15);
    initialize_julia();

    
    std::thread all_threads[n_of_threads];
    for(int i=0; i<n_of_threads; i++)
        all_threads[i] = std::thread(call_and_catch, i+1);

    
    for(auto& thread : all_threads)
        thread.join();

    return 0;
}

Compiling

Make sure to add the -lpthread flag, this is a system library that is required for C++ threads. I've already added this flag to the MakeFile in my repository. Other than that, compilation is identical to regular Julia embedding in C++.

After compiling with the makefile, I can run the generated executable, and we see 15 printed messages, as expected. They appear in somewhat random order, due to the nature of multi-threading, but the erroring threads appear last, probably because the error handling takes additional time.

If you ever manage to arrive at this same point, please congratulate yourself! This is tricky business.

Succeeded for x = 2 
Succeeded for x = 1
Succeeded for x = 4
Succeeded for x = 3
Succeeded for x = 5
Succeeded for x = 8
Succeeded for x = 10
Succeeded for x = 9
Succeeded for x = 7
Succeeded for x = 6
Caught error for x = 15
Caught error for x = 12
Caught error for x = 13
Caught error for x = 14
Caught error for x = 11

Pitfalls to avoid

In general multi-threading requires a lot of attention due to many possible pitfalls, such as thread-safety issues, deadlocks, race conditions and much more. Adding external multi-threading to the mix makes everything even more complicated. Consider carefully whether you really want to go down this route with multiple languages. If you want to continue, here's a few complexities we encountered along the way, but be aware that you may find many more.

We encountered some issues with BLAS and other libraries. It's best to set the number of threads to one via LinearAlgebra.BLAS.set_num_threads(1), else every thread in Julia spawns multiple threads in the BLAS library. Same for MKL and any other third party library you use. Things may work fine, but your performance might not be optimal. You probably don't want your 4 external C++ threads accidentally spawning 16 BLAS threads or more.

In general, be sure to test every binary artifact you want to use in production and consider the implications for your multi-threading setup. This is good advice for any software development project you undertake, independent of Julia.

We encountered a pitfall with Java, when embedding our library into Spark. In this article, we will not go into the details of passing Java threads (via C++) to Julia, but we noticed some issues with the Java signal handler. Make sure that your library is explicitly aware of the Java signal handling library, for example via export LD_PRELOAD=/path/to/libjsig.so . Otherwise Julia will produce a segmentation fault and your application will crash. This is some kind of language interoperability issue that we had to circumvent.

Big lesson learned from the above: never ever disable the Julia signal handler, because else Java is only handling the signals. These signals are operating system signals, such as segfaults or sigabort or the famous sigkill (when you hit ctrl+c to kill something). If Julia cannot handle those signals, you've got a serious problem. We made this mistake while figuring out the previous pitfall.

Conclusion

Integrating C++ and Julia with multiple threads can be a complex task, but it offers powerful capabilities for incorporating Julia libraries into multi-threaded C++ codebases. By carefully initializing the Julia runtime and handling potential pitfalls, developers can successfully combine these two languages for improved performance and functionality. However, it's crucial to be mindful of complicated multi-threading challenges to ensure the reliability of the final product.



from Hacker News https://ift.tt/sQWnf84

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.