Wednesday, March 31, 2021

Never use environment variables for configuration

Suppose you need to create a function for adding two numbers together in plain C. How would you write it? What sort of an API would it have? One possible implementation would be this:

int add_numbers(int one, int two) {

    return one + two;

}


// to call it you'd do

int three = add_numbers(1, 2);

Seems reasonable? But what if it was implemented like this instead:

int first_argument;

int second_argument;


void add_numbers(void) {

    return first_argument + second_argument;

}


// to call it you'd do

first_argument = 1;

second_argument = 2;

int three = add_numbers();

This is, I trust you all agree, terrible. This approach is plain wrong, against all accepted coding practices and would get immediately rejected in any code review. It is left as an exercise to the reader to come up with ways in which this architecture is broken. You don't even need to look into thread safety to find correctness bugs.

And yet we have environment variables

Environment variables is exactly this: mutable global state. Envvars have some legitimate usages (such as enabling debug logging) but they should never, ever be used for configuring core functionality of programs. Sadly they are used for this purpose a lot and there are some people who think that this is a good thing. This causes no end of headaches due to weird corner, edge and even common cases.

Persistance of state

For example suppose you run a command line program that has some sort of a persistent state.

$ SOME_ENVVAR=... some_command <args>

Then some time after that you run it again:

$ some_command <args>

The environment is now different. What should the program do? Use the old configuration that had the env var set or the new one where it is not set? Error out? Try to silently merge the different options into one? Something else?

The answer is that you, the end user, can not now. Every program is free to do its own thing and most do. If you have ever spent ages wondering why the exact same commands work when run from one terminal but not the other, this is probably why.

Lack of higher order primitives

An environment variable can only contain a single null-terminated stream of bytes. This is very limiting. At the very least you'd want to have arrays, but it is not supported. Surely that is not a problem, you say, you can always do in-band signaling. For example the PATH environment variable has many directories which are separated by the : character. What could be simpler? Many things, it turns out.

First of all the separator for paths is not always :. On Windows it is ;. More generally every program is free to choose its own. A common choice is space:

CFLAGS='-Dfoo="bar" -Dbaz' <command>

Except what if you need to pass a space character as part of the argument? Depending on the actual program, shell and the phase of the moon, you might need to do this:

ARG='-Dfoo="bar bar" -Dbaz'

or this:

ARG='-Dfoo="bar\ bar" -Dbaz'

or even this:

ARG='-Dfoo="bar\\ bar" -Dbaz'

There is no way to know which one of these is the correct form. You have to try them all and see which one works. Sometimes, as an implementation detail, the string gets expanded multiple times so you get to quote quote characters. Insert your favourite picture of Xzibit here.

For comparison using JSON configuration files this entire class os problems would not exist. Every application would read the data in the same way, because JSON provides primitives to express these higher level constructs. In contrast every time an environment variable needs to carry more information than a single untyped string, the programmer gets to create a new ad hoc data marshaling scheme and if there's one thing that guarantees usability it's reinventing the square wheel.

There is a second, more insidious part to this. If a decision is made to configure something via an environment variable then the entire design goal changes. Instead of coming up with a syntax that is as good as possible for the given problem, instead the goal is to produce syntax that is easy to use when typing commands on the terminal. This reduces work in the immediate short term but increases it in the medium to long term.

Why are environment variables still used?

It's the same old trifecta of why things are bad and broken:

  1. Envvars are easy to add
  2. There are existing processes that only work via envvars
  3. "This is the way we have always done it so it must be correct!"

The first explains why even new programs add configuration options via envvars (no need to add code to the command line parser, so that's a net win right?).

The second makes it seem like envvars are a normal and reasonable thing as they are so widespread.

The third makes it all but impossible to improve things on a larger scale. Now, granted, fixing these issues would be a lot of work and the transition would unearth a lot of bugs but the end result would be more readable and reliable.



from Hacker News https://ift.tt/3sHcUoJ

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.