Of Course "Changeme" Is Valid Base64
By Susam Pal on 24 Oct 2020
Today, I came across this blog post regarding how the author of the post used the string "changeme"
as test data while testing a Base64 decoding functionality in their application. However, the author incorrectly believed that this test data is not a valid Base64-encoded string and therefore would fail to decode successfully when decoded as Base64. To their surprise, they found that this string "changeme"
does in fact decode successfully.
The post did not go any further into understanding why indeed "changeme"
is a valid Base64-encoded string and why it can successfully be decoded into binary data. It appears that the author was using Base64 encoding scheme as a black box.
I think it is worth noting and illustrating that any alphanumeric string with a length that is a multiple of 4 is a valid Base64-encoded string. Here are some examples that illustrate this:
$ printf AAAA | base64 --decode | od -tx1 0000000 00 00 00 0000003 $ printf AAAAAAAA | base64 --decode | od -tx1 0000000 00 00 00 00 00 00 0000006 $ printf AQEB | base64 --decode | od -tx1 0000000 01 01 01 0000003 $ printf AQID | base64 --decode | od -tx1 0000000 01 02 03 0000003 $ printf main | base64 --decode | od -tx1 0000000 99 a8 a7 0000003 $ printf scrabble | base64 --decode | od -tx1 0000000 b1 ca da 6d b9 5e 0000006 $ printf 12345678 | base64 --decode | od -tx1 0000000 d7 6d f8 e7 ae fc 0000006
Further, since +
and /
are also used as symbols in Base64 encoding (for binary 111110
and 111111
, respectively), we also have a few more intersting examples:
$ printf 1+2+3+4+5/11 | base64 --decode | od -tx1 0000000 d7 ed be df ee 3e e7 fd 75 0000011 $ printf "\xd7\xed\xbe\xdf\xee\x3e\xe7\xfd\x75" | base64 1+2+3+4+5/11
I think it is good to understand why any string with a length that is a multiple of 4 turns out to be a valid Base64-encoded string. The Base64 encoding scheme encodes each group of 6 bits in the binary input with a chosen ASCII character. For every possible 6-bit binary value, we have assigned an ASCII character that appears in the Base64-encoded string. Each output ASCII character can be one of the 64 carefully chosen ASCII characters: lowercase and uppercase letters from the English alphabet, the ten digits from the Arabic numerals, the plus sign (+
) and the forward slash (/
). For example, the bits 000000
is encoded as A
, the bits 000001
is encoded as B
, and so on. The equals sign (=
) is used for padding but that is not something we will discuss in detail in this post. This means that every group of 3 bytes (24 bits) of binary data is translated to 4 ASCII characters in its Base64-encoded string. Thus the entire input data is divided into groups of 3 bytes each and then each group of 3 bytes is encoded into 4 ASCII characters. What if the last group is less than 3 bytes long? There are certain padding rules for such cases but I will not discuss them right now in this post. For more details on the padding rules, see RFC 4648.
Now as a natural result of the encoding scheme, it turns out that any 4 alphanumeric characters is a valid Base64 encoding of some binary data. That's because for every alphanumeric character, we can find some 6-bit binary data that would be translated to it during Base64 encoding. This is the reason why any alphanumeric string with a length that is a multiple of 4 is a valid Base64-encoded string and can be successfully decoded to some binary data.
from Hacker News https://ift.tt/3sWizZ3
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.