Saturday, September 17, 2022

A Mathematical Theory of License Plates

Tags: math-that-isnt-useful

Preface

“You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight?”

Richard Feynman,1962

Introduction

The year is 3000.

All buildings, bridges, roads, vehicles, plants, animals, and fossils on Earth are gone. All that is left is every license plate in America.

Intelligent life arrives into the Earth’s atmosphere and lands their ship in Cambridge, MA, along what humans used to call Memorial Drive.

What will they find?

In Massachusetts alone, they will find over three million aluminum rectangles of a uniform shape and size (30.48cm X 15.24cm), all bearing the proud declaration that in fact, the state they have landed in embodies The Spirit of America, a nation long ago lost.

Figure 1: Computer rendering of a standard issue Massachusetts license plate

Upon these three million plates, they will find a rich universe of symbols indicating that whatever process created these plates, living or otherwise, was anything but random.

Part 1: The Distribution of Symbols in a Sample of Standard Issue Massachusetts License Plates

Some basic rules apply in the generation of the sequence of characters for Massachusetts license license plate (see Part 2 for discussion of vanity plates):

  • Plates are at most 6 characters long
  • Only alphanumeric characters are allowed
  • Letters must be uppercase

The distribution of characters in Massachusetts plates are very roughly estimated by the following histogram generated from a sample of 700 Massachusetts plates collected in August 2021 in Cambridge, MA and on a two hour trip northbound on I-93.

Figure 2: Character frequencies in a sample of MA license plates

Notably, the numbers 1 and 2 appear much more frequently in this sample relative to other numbers. I, O, and 0 are all less frequent due to how easily they can be confused with each other. (see https://xkcd.com/1105/). It is reasonable to assume that the two O’s and one I that appear in this dataset only appear because of mistakes in data entry. Q doesn’t appear at all. Our extraterrestrial visitors will not know about the letter Q. M appears with a relatively high frequency as well, but to a lesser degree than 1 and 2. Many government vehicles whose plates are prefixed with M were parked outside of the Cambridge Water Works which was visited during the data collection period due to its proximity to Fresh Pond, a local point of interest. The relatively low frequencies of 8 and E are not as easily explained, and a larger sample would help discriminate between the signal and the noise here.

A grid chart of plates, with character position on the X axis, character on the Y axis, and color representing the frequency of that character observed in that position give us a clearer picture of the sample.

Figure 3: Character frequencies by position in a sample of MA license plates

The most easily detectable pattern here is that index 0, 4, and 5 are quite often numbers in the dataset. The string 3LSB99 from Figure 1 is an example of this phenomenon. Other common patterns and their frequencies are shown in Figure 4, where N denotes a number and L denotes a letter.

Pattern Frequency
NLLLNN 34.2%
NLLNNN 26.3%
NNNLLN 12.4%
NNLLNN 5.8%
Total 78.9%
Figure 4: Frequency of selected character patterns in a sample of MA license plates

The frequencies of the common patterns help explain the following chart of the negative log likelihood (NLL) of transitioning from one character (char1, X-axis) to a second (char2, Y-axis). They also suggest that Feynman’s Plate, “ARW357”, likely would not be generated at today's RMV.

Figure 5: Bigram frequencies in a sample of MA license plates

Part 2: Vanity Plates

To be continued



from Hacker News https://ift.tt/8nsRrT3

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.