SymmetricalDataSecurity: Writing Consistent Tools (2019)

Monday, August 2, 2021

Writing Consistent Tools (2019)

February 20th, 2019

"Style is a set of different repeated microdecisions, each made the same way whenever it arises, even though the context may be different. [...] I believe that consistency underlies all principles of quality." -- Frederick P. Brooks, The Design of Design

With the above quote in mind, I strive to be eventually consistent in writing simple tools and support, implement, and consider the same interface, options, features, and behavioral traits where reasonably possible. These properties not surprisingly align with the Unix Philosophy, and some of them I've covered before. However, perhaps as a reminder to myself, here are my primary guidelines:

Most of my commands will support the same command-line options with the same, consistent meaning:

-V -- print the version number of the tool and exit
-h -- generate a terse usage statement on stdout; if an invalid option is specified, the same usage statement is printed to stderr together with a preceding error message
-j -- generate output in JSON format (if applicable); see below for details
-p -- provide a password (if needed); see below for details
-u -- provide a username (if needed); see below for details
-v -- increase verbosity; by default, my tools are silent except for the desired output (and errors or warnings going to stderr). -v can be used multiple times to increase the verbosity of the tool, generating increasingly detailed messages about what it is doing

I almost always only use short options, because I'm lazy and don't like to type long options. If long options are needed, then two dashes are required; standard getopt(3) (or equivalent) behavior applies.

My tools frequently need to access different resources that require some form of authentication via a secret, a token, or a password. Passing passwords on the command-line has its own set of considerations and pitfalls, so my tools will generally support the following methods as arguments to the -p flag:

`env:var`	Obtain the password from the environment variable var. Since the environment of other processes may be visible via e.g. ps(1), this option should be used with caution.
`file:pathname`	The first line of pathname is the password. pathname need not refer to a regular file: it could for example refer to a device or fifo. Note that standard Unix file access controls should be used to protect this file.
`keychain:service`	Use the (Mac OS X) `security(1)` tool to retrieve the generic password for the service from the keychain.
`pass:password`	The actual password is password. Since the password is visible to utilities such as ps(1) this form should only be used where security is not important. (My tool may try to overwrite `argv`, but you shouldn't rely on that.)

If none of the above options is provided, then my tool will prompt the user for a password on the tty, so as to allow input from stdin to continue to be processed (see below).

Specifying the user to authenticate as is done via the -u flag. If -u is not specified, the tool will use the $USER environment variable or, if that is not set, getlogin(2).

Most of my tools tend to operate on large sets of input, such as a long list of hostnames pushing the full command well beyond ARG_MAX. As such, my tools will accept input from stdin, but allow iteration over optional arguments:

$ <file-with-many-hostnames tool
$ tool hostname1 hostname2
$ grep something file | tool hostname1 hostname2 hostname3

If, for some reason, I need to be able to support reading input from a file via e.g., a command-line option (-f or -c for a configuration file), then my tools will always also support an argument of - to read from stdin.

Whenever possible, I try to iterate over input as a stream and begin processing it as it comes in, one line at a time. That is, I try to avoid reading all input into memory and then work it off. This allows my tool to work as a true filter and not block process.

Errors encountered should not lead to the program aborting, but instead generating a suitable error message and moving on to the next entry.

One of my favorite signals (besides SIGSNOW) is SIGINFO. Given that my tools frequently have to operate on large input sets, I equally frequently would like to know how far along it is. Passing -v to generate verbose output may work, but I also often want no output but then be able to see which record the tool is currently working on.

For this, I install a signal handler to catch SIGINFO and spit out some statistic or diagnostic message. This way, I can run the tool in non-verbose mode, then hit Ctrl+T, et voilà, I can haz info!

$ <input tool
[... time elapses ...]
^T
load: 1.23  cmd: tool 57943 running 0.11u 0.05s
Resolving 'hostname33' (33/120).
[... time elapses ...]
^T
load: 1.23  cmd: tool 57943 running 0.14u 0.07s
Frobbing hobknobbin on 'hostname18' (18/120).
[...]
$

My tools generally produce line-based output on stdout, as you would expect. If the output consists of multiple fields, it is either space- or comma separated. (I have not yet gone so far as to develop a need to standardize on inspection of the OFS environment variable.)

If the output generated is more complex, then many of my tools support JSON as an alternative output format via the -j command-line option. This is particularly useful, since I can then more easily post-process the results via jq(1) and continue on with my pipe.

As noted above, output is generally generated on stdout. Input is read from stdin. That is, I try to avoid any and all file I/O wherever possible. However, sometimes it cannot be avoided. To ensure I do not fall victim of the many pitfalls of temporary files, my tools will generally:

set a restrictive umask
generate a safe temporary directory using mktemp(3)
remove the temporary directory and all contents via an exit handler

This blog post covers the handling of temporary files in more detail.

Obviously not all of my tools always support all of the above options or behaviors. However, I've often found that with time I end up going back and adding them (or at least wish I had added them). That is, I'm striving for consistency, for my tools to behave more or less the same, and to consistently fit into the Unix ecosystem.

"Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency." -- Richard P. Gabriel, The Rise of Worse is Better

May my tools always be worse, they're better that way.

February 20th, 2019

SymmetricalDataSecurity

Monday, August 2, 2021

Writing Consistent Tools (2019)

No comments:

Post a Comment

Blog Archive

Search This Blog

Total Pageviews