Wednesday, September 1, 2021

On Variance and Extensibility

If you do see an extensible system working in the wild on the Input, Process and Output side, that means it's got at least one defacto standard driving it. Either different Inputs and Outputs have agreed to convert to and from the same intermediate language... or different middle blocks have agreed to harmonize data and instructions the same way.

This must either flatten over format-specific nuances, or be continually forked to support every new concept being used. Likely this is a body of practices that has mostly grown organically around the task at hand. Given enough time, you can draw a boundary around a "kernel" and a "user land" anywhere. To make this easier, a run-time can help do auto-conversion between versions or types. But somebody still has to be doing it.

This describes exactly what happened with web browsers. They cloned each other's new features, while web developers added workarounds for the missing pieces. Not to make it work differently, but to keep it all working exactly the same. Eventually people got fed up and just adopted a React-like.

That is, you never really apply extensibility on all three fronts at the same time. It doesn't make sense: arbitrary code can't work usefully on arbitrary data. The input and output need to have some guarantees about the process, or vice versa.

Putting data inside a free-form key/value map doesn't change things much. It's barely an improvement over having a unknownData byte[] mix-in on each native type. It only pays off if you actually adopt a decomposable model and stick with it. That way the data is not unknown, but always provides a serviceable view on its own. Arguably this is the killer feature of a dynamic language. The benefit of "extensible data" is mainly "fully introspectable without recompilation."

You need a well-defined single type T that sets the ground rules for both data and code, which means T must be a common language. It must be able to work equally well as both an A, B and C, which are needs that must have been anticipated. Yet they should be built such that you can just use a D of your own, without inconvenient dependency. The key quality to aim for is not creativity but discipline.

If you can truly substitute a type with something else everywhere, it can't be arbitrarily extended or altered, it must retain the exact same interface. In the real world, that means it must actually do the same thing, only marginally better or in a different context. A tool like ffmpeg only exists because we invented a bajillion different ways to encode the same video, and the only solution is to make one thing that supports everything. It's the Unicode of video.

If you extend something into a new type, it's not actually a substitute, it's a fork trying to displace the old standard. As soon as it's used, it creates a data set that follows a new standard. Even when you build your own parsers and/or serializers, you are inventing a spec of your own. Somebody else can do the same thing to you, and that somebody might just be you 6 months from now. Being a programmer means being an archivist-general for the data your code generates.

* * *

If you actually think about it, extensibility and substitution are opposites in the design space. You must not extend, you must decompose, if you wish to retain the option of substituting it with something simpler yet equivalent for your needs. Because the other direction is one-way only, only ever adding complexity, which can only be manually refactored out again.

If someone is trying to sell you on something "extensible," look closely. Is it actually à la carte? Does it come with a reasonable set of proven practices on how to use it? If not, they are selling you a fairy tale, and possibly themselves too. They haven't actually made it reusable yet: if two different people started using it to solve the same new problem, they would not end up with compatible solutions. You will have 4 standards: the original, the two new variants, and the attempt to resolve all 3.

Usually it is circumstance, hierarchy and timing that decides who adapts their data and their code to whom, instead of careful consideration and iteration. Conway's law reigns, and most software is shaped like the communication structure of the people who built it. "Patch," "minor" or "major" release is just the difference between "Pretty please?", "Right?" and "I wasn't asking."

We can do a lot better. But the mindset it requires at this point is not extensibility. The job at hand is salvage.



from Hacker News https://ift.tt/38tlSgM

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.