Sunday, January 9, 2022

Avoid Meaningless Binary Labels

Imagine this: you're getting some medical test for some condition, and it's falsely returned a negative even though you should be testing positive. What should we call that scenario? That's right, a "Type II error"! WTF?

Or imagine this: you're thinking about a pop psychology book you read a decade ago, which mentioned the two main ways that people think. Sometimes it's carefully considered and logical, what you think of when you think "thinking". But sometimes it's unconscious and emotional, more like an instinctive reaction than anything deliberate. Now - without looking it up, which one of these was "System 1" and which is "System 2"? Are you sure?

Daniel Kahneman, if you're reading this - how could you torture us like this? You literally titled the book Thinking, Fast and Slow! Why isn't it "Fast" and "Slow"?

Worst of all, imagine this: you've just started an internship at a tech company with a microservices-based architecture. When service A initiates a message to service B, one of them is called the "upstream", and the other is the "downstream". Which one is which? There's no inherent direction, so even within the industry, people don't agree!


Meaningless binary labels is the laziest way to be bad at naming things. Please say "false positive" instead of "Type I error". Or "client + server" or "caller + callee" instead of "upstream + downstream". I'll admit "fast system" and "slow system" are a little lame... but it's way harder to mix up than "System 1" and "System 2". Hopefully by the time you start to share your binary categorization with others, you'll know enough to label them descriptively.

Some related thoughts on naming:

  • Even if you have a lot of options you can still give them meaningful names. For example, 16Personalities assigns a cute name like "Virtuoso" or "Advocate" for each of the Myers-Briggs results, which makes them much more memorable. [footnote 1]
  • It's probably okay to assign arbitrary-ish names if there's really no meaningful distinction except that they're different: think the ABO blood type system. [footnote 2]
  • Equations that are named after people also tend to be meaningless; what if "Boyle's law" were called the "pressure-volume inverse relation"? Admittedly, that is a mouthful. In electrical engineering, thankfully, people do often do refer to Kirchhoff's first and second laws as "Kirchhoff's current law / junction rule" and "Kirchhoff's voltage law / loop rule" instead.
  • On the other hand, it is sometimes useful to have names that are specifically meaningless; for example in programming, foo and bar, or in computer security, Alice and Bob.

Thanks Yee Aun for bouncing these ideas around with me. Not-thanks to whoever coined and popularized the usage of "upstream and downstream" in microservice architecture.


[footnote 1] Of course, if you don't believe in the Myers-Briggs Type Indicator, that means you're an INTJ (to paraphrase Scott Alexander quoting Tumblr user tropylium).


[footnote 2] And in fact, the Wikipedia page's history section describes how blood types were at various points in history described using the completely arbitrary [A, B, C], as well as two mutually-incompatible versions of [I, II, III, IV], until Landsteiner's 1927 proposal of the modern ABO system and consequent adoption by the National Research Council. This was done because if you mix up blood types, you can kill people; so unfortunately, we are unlikely to see the disappearance of "Type I and II errors" unless statisticians get a lot more hardcore.


The space below is left empty so that clicking on footnotes will scroll to the correct location.



from Hacker News https://ift.tt/3G8XWyF

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.