Monday, December 25, 2023

Building a decentralized name system on top of IRC

The requirements

Over the past few years I’ve been slowly building a peer-to-peer networking library called ‘P2PD.’ The library is designed to make it easier to build peer-to-peer applications. But if I’m to build a library for software that’s supposed to be architecturally resilient – it makes sense for the library not to require centralization itself. My goal is that if I disappear the library should keep working without me having to manage things. The only way to make this possible is to design the system in such a way that it can utilize pre-existing public infrastructure. And it would be even better if that infrastructure were already serving multiple purposes to exclude the possibility of infrastructure disappearing. I know this is a little abstract but bare with me.

The problem

As an example: In P2PD there’s a need to be able to exchange meta-data signaling messages to peers. So given the requirements above – what protocols and infrastructure do you choose? Well, there are public servers that offer ‘pub-sub.’ MQTT is widely used and it’s used by enough devices that it’s unlikely to disappear. So if I have a list of stable, high-quality, MQTT servers, and build this into an address format to contact a node, then I don’t have to host these servers myself. I can disappear and it keeps working. Now, why am I saying all this? Because there was a piece missing: the DNS part.

DNS is the thing that makes the Internet user-friendly. But think about this: while there seems to be public infrastructure for all kinds of services on the Internet – nothing exists for something as simple as a permissioned key-value store. We can say that DNS is a paid, permissioned, key-value store, and that it’s good enough for the Internet today. But DNS has a few key problems. Firstly (1) there are no widespread standards to programmatically register domains and (2) DNS costs money to use (which I argue is not suitable for all uses.)

What alternatives are there to DNS? Well, there is Opennic – it’s actually great but again it’s a fork of the existing DNS system (and so has no standardized registration system.) There is the ‘Ethereum Name System’ – but it needs money to use (and we’ve learned the hard way that blockchains aren’t the right solution for everything.) There was no protocol for programmatic and free registration of names…

IRC as a DNS system

IRC is an ancient protocol used by multiple large networks. The protocol is popular for chat but flexible enough that all kinds of services have been built on top of it. A protocol based on IRC won’t easily die due to its long history as a chat protocol (there are IRC servers so old they’ve been online for decades.) Such infrastructure can be considered public, ubiquitous, open, and has withstood the test of time. Crucially: it has all the features needed to build a permissioned key-value store on.

The idea sounded promising to me in theory but before I went overboard I needed to verify if it would work. What I mean by that is would it be possible for a program to register a channel without the need for a human to verify their account? I knew that email registration was required on many popular IRC networks but I wasn’t sure if any were truly ‘open.’ So I manually tested about 20 different servers and eventually found one that allowed for account registration without email verification. This validated my theory that IRC could be used to build a name system. But how many of these servers were there?

To answer this question I needed to code a specialized scanner. The tool should test an IRC server by attempting to register an account and then register a channel. I rapidly wrote this code and used the server I already found to test it. Next I needed a list of IRC servers to scan. A big list. I ended up compiling every IRC server list from all major IRC clients by looking at their source code. The scanner could rapidly test servers as it was written to be asynchronous. I was able to find around 11 servers representing a prevalence of 1 server every 20 teste. Of these 11 servers – 5 were dual-stack and supported SSL.

I decided to reduce my list to only the dual-stack servers. The requirement for SSL was due to the fact IRC is a plaintext protocol and I needed to send a password over the connection. At least with SSL the passwords would be encrypted. Then I filtered the servers that supported both IPv4 and IPv6.

How it should work

For my design I wanted names to be stored across IRC servers as channels. It should be possible for the software to keep working if some of the servers go down for short periods. I wanted registration to succeed if registration worked on a minimum number of servers. And for name lookups to be able to get by on only needing to check a sub-set of servers (until getting enough results implied registration must have succeeded.) So while registration may require say: 5 / 7 channel registrations to succeed — lookup should only need 3 successes (2 is the failure number + 1 implies that registration must have succeeded for that name.)

Registering a name:
– Given N IRC servers
– At least M names must be registered (minimum)
– Where M = N – (N * 0.4)
– Measures are taken to make this atomic (don’t register any if M names can’t be registered)

Looking up a name:
– A name may be registered on any of the IRC servers.
– Given that there can be up to failures = (N * 0.4)
– Finding failures + 1 must imply a name was successfully registered.
– Names are grouped by an owner’s ECDSA public key.
– Lookups succeed when result no per pub key >= failures + 1.
– A lookup isn’t guaranteed and may fail if all servers are tested.

Password management with a single seed

The first decision to make was how to manage passwords. If I were to generate passwords for every account, the software would need to have a horrible amount of state… Probably with a database, with a schema, with just the right queries, with just the right everything, and all the code to manage that. To mitigate that I opted to use a cryptographic seed to deterministically generate passwords.


base64_encode(
    sha3_256(
        IRC_PREFIX + 
        ":pass:" + 
        irc_server_domain + 
        seed
    )
)

I chose SHA3 to avoid length-extension attacks; Base64 encoding was used as a way to avoid breaking plaintext IRC protocol messages. The prefix portion is just a unique number that helped me out with testing. Otherwise, I couldn’t register accounts without generating a new seed. Using the IRC servers domain results in a unique password per server. The inclusion of the “:pass:” portion is used to differentiate the construct from other similar hash-based derivations (later username, nick, and email.) These fields are the same but use a more restrictive encoding to be compatible with the IRC protocol. The encoding for them is base36 (0-9A-Z).

Channel names for ‘domains’

At first my design to convert names to channel names looked like this:


"#" +
base36_encode(
    hash160(
        dns_name
    )
)

The reason for using hash160 is there’s quite a small limit for the length of a channel name. I have found it can be reliably assumed to be at least 32 bytes. Hash160 returns 20 bytes that need to be encoded in base32 to work as a channel name. That increases the size by reducing the symbols that can be used in a byte. This encoding scheme should allow for the digest to be stored in the channel name. But there is one more problem. Suppose you want to register a name and an attacker is watching the channel list. The moment they see your channel name they can quickly register the name on the other IRC servers potentially hijacking the name.

The fix I came up with is inspired by so-called ‘decentralized exchanges’. In order to prevent race conditions in name registration: names will be uniquely determined by a time-lock encrypted nonce. The registering party can “save-up” the time-locks in advance so that every channel name can be registered at once. The channel names will further be derived per server so that attackers need to decrypt the initial time-lock to determine the masked name for registration on other servers: by which time the channel names will have already been registered by the names rightful owner. The construct is as follows.


def get_chan_name(name, tld, irc_server, optional_pw=""):
    msg = "{name} {tld} {optional_pw} {irc_server_domain}"
    time_lock = argon2(
        msg,
        salt="some custom salt",
        time_cost=2, # Iterations
        memory_cost=1024 * 2, # KB needed.
        parallelism=1 # Threads needed.
    )   
    
    "#" + 
    base36_encode(
        hash160(
            sha3_256(
               msg + time_lock
            )
        )
     )

Argon2 is a nice hash algorithm as you can adjust memory usage, time cost, and threading. The final output of Argon2 becomes the time-lock. Here sha3_256 is used again but this time inside hash160. The reason for this is stacking hash functions can help reduce the chances of finding collisions (you would need to find a collision in both functions.) Note the inclusion of the specific IRC server hosting the ‘name’ and the introduction of a TLD. The user can be encouraged to use anything for the TLD so this becomes a quasi-password.

If an attacker wanted to generate rainbow tables they would need the TLD used and Argon2 would massively increase the cost of generating such a table. The introduction of an optional password field allows for names to be masked with a password. Attackers would then need to know the password in addition to the TLD and name.

Channel topics to store values

IRC channel names translate the names in this system while topics store their values. Since this is a ‘threshold’ design where a sub-set of servers may fail the topic format needs to account for that. I initially chose a scheme that contained a version, ECDSA public key, record signature, and record portion. The good thing about the topic field is the characters it can support are quite extensive. In my tests every server supported unicode topics. However, I decided to use a more restrictive encoding scheme (using only the full range of characters on a standard keyboard) to ensure that binary data would be safely encoded across servers.


ecdsa_priv = sha3_256("{name}{tld}{pw}{seed}")
ecdsa_keys = ECDSA.generate(ecdsa_priv)
topic =  "p2pd.net/irc" + " " +
base92_encode(ecdsa_keys.pub) + " " +
base92_encode(ecdsa_keys.sign(record)) + " " +
base92_encode(record)

Names are owned by public keys. Their ECDSA private key is built from a hash. Since it’s possible a resulting key will be invalid. The private key is hashed until the result is a valid key. Shortly after this I remembered an old trick that allowed the ECDSA public key to be recovered from a signature if you have the message. The recovery returns multiple keys that need to be saved. The original public key can be determined by creating a list of public keys for each topic component and choosing the one that passes the validity threshold (of course: keys still need to be able to verify signatures.) Thus, I was able to shorten the topic portion to three fields.


def encrypt(msg, otp):
    # Make otp long enough.
    while len(otp) < len(msg):
        otp += sha3_256(otp).digest()

    # Xor MSG with OTP.
    buf = ""
    for i in range(0, len(msg)):
        buf += msg[i] ^ otp[i]

    return buf
    
record = encrypt(record, time_lock)

Above is a simple algorithm for a one-time pad that uses hashing to do key-stretching. It is used as a way to encrypt the record portion of the topic and signature portions without increasing the message size. The fields are encrypted with encrypt(field, sha3_256(“{field_name}:” + time_lock)) so that each field uses a different pad. The reason the signature portion is encrypted is that names across servers could be correlated by observing which names share the same public key (like mentioned above.) The security guarantee for this is at least tied to the time-lock provided by Argon2 across the name meta-data.

Handling expiry

When nicknames and channels are registered on an IRC server they usually expire if they’re not used. The duration that nicknames and channels stay active is generally between 12 – 60 days. This means that being able to monitor how close accounts and channels are to expiry is important. I’ve created a basic function that automatically handles expiry and refreshes everything. The refresh code also attempts to register names on servers that had previously failed (where servers may have temporarily been down.)

In IRC there’s a few interesting mechanisms that might help prevent channels from being lost. (1) IRC supports ‘successors’ that gain control over a channel if the owner’s account expires. (2) Another option to help prevent channel expiry would be to utilize bots. This would be very easy to make low-trust. Naturally, names and accounts are automatically ‘refreshed’ when they’re used.

Mitigating disruption

I didn’t want to create something that may negatively impact IRC service. So my software takes a number of measures:

  1. The channels registered are set to private to avoid flooding the channel listing.
  2. Channel names are uniquely determined by an algorithm that requires CPU work to be done from clients to map names to values. This sets a limit on channel registrations and is required by the protocol for security.
  3. Channels will expire naturally due to chanserv optimizations and the software doesn’t aggressively attempt to refresh channels (refreshing nicknames and channels has to be done manually by the user.)
  4. The design means that lookups for names can be load-balanced across servers.

My software currently has few users so speculation about possible disruption remains theoretical. But if it should become a problem in the future I’d be happy to work with IRC operators to minimize abuse.

Usage for the software

That’s really all I have to say for now. I worked hard to build a prototype over the last month and have a basic library that can be used now. Details of the software can be seen at https://p2pd.readthedocs.io/en/latest/built/irc_kvs.html I’m also looking for a new role so if anyone is interested in working with someone who is reasonably skilled and can think outside the box hit me up.

My resume is here: https://github.com/robertsdotpm/resume/blob/main/resume.pdf

Merry Haxmas!



from Hacker News https://ift.tt/QiOhJnV

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.