Saturday, September 26, 2020

Is the C runtime and library a legitimate part of the Unix API? (2017)

One of the knocks against Go is, to quote from Debugging an evil Go runtime bug (partly via):

Go also happens to have a (rather insane, in my opinion) policy of reinventing its own standard library, so it does not use any of the standard Linux glibc code to call vDSO, but rather rolls its own calls (and syscalls too).

Ordinary non-C languages on Unixes generally implement a great many low level operations by calling into the standard C library. This starts with things like making system calls, but also includes operations such as getaddrinfo(3). Go doesn't do this; it implements as much as possible itself, going straight down to direct system calls in assembly language. Occasionally there are problems that ensue.

A few Unixes explicitly say that the standard C library is the stable API and point of interface with the system; one example is Solaris (and now Illumos). Although they don't casually change the low level system call implementation, as far as I know Illumos officially reserves the right to change all of their actual system calls around, breaking any user space code that isn't dynamically linked to libc. If your code breaks, it's your fault; Illumos told you that dynamic linking to libc is the official API.

Other Unixes simply do this tacitly and by accretion. For example, on any Unix using nsswitch.conf, it's very difficult to always get the same results for operations like getaddrinfo() without going through the standard C library, because these may use arbitrary and strange dynamically loaded modules that are accessed through libc and require various random libc APIs to work. This points out one of the problems here; once you start (indirectly) calling random bits of the libc API, they may quite reasonably make assumptions about the runtime environment that they're operating in. How to set up a limited standard C library runtime environment is generally not documented; instead the official view is generally 'let the standard C library runtime code start your main() function'.

I'm not at all sure that all of this requirement and entanglement with the standard C library and its implicit runtime environment is a good thing. The standard C library's runtime environment is designed for C, and it generally contains a tangled skein of assumptions about how things work. Forcing all other languages to fit themselves into these undocumented constraints is clearly confining, and the standard C library generally isn't designed to be a transparent API; in fact, at least GNU libc deliberately manipulates what it does under the hood to be more useful to C programs. Whether these manipulations are useful or desired for your non-C language is an open question, but the GNU libc people aren't necessarily going to even document them.

(Marcan's story shows that the standard C library behavior would have been a problem for any language environment that attempted to use minimal stacks while calling into 'libc', here in the form of a kernel vDSO that's designed to be called through libc. This also shows another aspect of the problem, in that as far as I know how much stack space you must provide when calling the standard C library is generally not documented. It's just assumed that you will have 'enough', whatever that is. C code will; people who are trying to roll their own coroutines and thread environment, maybe not.)

This implicit assumption has a long history in Unix. Many Unixes have only really documented their system calls in the form of the standard C library interface to them, quietly eliding the distinction between the kernel API to user space and the standard C library API to C programs. If you're lucky, you can dig up some documentation on how to make raw system calls and what things those raw system calls return in unusual cases like pipe(2). I don't think very many Unixes have ever tried to explicitly and fully document the kernel API separately from the standard C library API, especially once you get into cases like ioctl() (where there are often C macros and #defines that are used to form some of the arguments, which are of course only 'documented' in the C header files).



from Hacker News https://ift.tt/2lxCdYV

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.