Saturday, June 26, 2021

Lamport Clock

Problem

When values are stored across multiple servers, there needs to be a way to know which values were stored before the other. The system timestamp can not be used, because wall clocks are not monotonic and clock values from two different servers should not be compared.

The system timestamp, which represents the time of the day, is measured by a clock machinery generally built with an crystal oscillator. The known problem with this mechanism is that it can drift away from the actual time of the day, based on how fast or slow the crystals oscillate. To fix this, computers typically have a service like NTP which synchronizes computer clocks with well known time sources on the internet. Because of this, two consecutive readings of the system time on a given server can have time going backwards.

As there is no upper bound on clock drift across servers, it is impossible to compare timestamps on two different servers

Solution

Lamport Clock maintains a single number to represent timestamp as following:

class LamportClock…

  class LamportClock {
      int latestTime;
  
      public LamportClock(int timestamp) {
          latestTime = timestamp;
      }

Every cluster node maintains an instance of a Lamport Clock.

class Server…

  MVCCStore mvccStore;
  LamportClock clock;

  public Server(MVCCStore mvccStore) {
      this.clock = new LamportClock(1);
      this.mvccStore = mvccStore;
  }

Whenever a server carries out any write operation, it should advance the Lamport Clock, using the `tick()` method

class LamportClock…

  public int tick(int requestTime) {
      latestTime = Integer.max(latestTime, requestTime);
      latestTime++;
      return latestTime;
  }

This way, the server can be sure that the write is sequenced after the request and after any other action the server has carried out since the request was initiated by the client. The server returns the timestamp that was used for writing the value to the client. The requesting client then uses this timestamp to issue any further writes to other set of servers. This way, the causal chain of requests is maintained.

Causality, Time and Happens-Before

When an event A in a system happens before another event B, it might have a causal relationship. Causal relationship means that A might have some role in causing B. This 'A happens before B' relationship is established by attaching a timestamp to each event. If A happens before B, the timestamp attached to A will be lower than the timestamp attached to B. But because we can not rely on system time, we need some way to make sure that the happens-before relationship is maintained for the timestamp attached to the events. Leslie Lamport suggested a solution to use logical timestamps to track happens-before relationships, in his seminal paper Time, Clocks and Ordering Of Events. So this technique of using logical timestamps to track causality is named as the Lamport Timestamp.

It is useful to note that in a database, events are about storing data. So Lamport Timestamps are attached to the values stored. This also fits very well with versioned storage mechanism discussed in Versioned Value

An example key value store

Consider an example of a simple key value store with multiple server nodes. There are two servers, Blue and Green. Each server is responsible for storing a specific set of keys. This is a typical scenario when data is partitioned across a set of servers. Values are stored as Versioned Value with the Lamport Timestamp as a version number.

Figure 1: Two servers each responsible for specific keys

The receiving server compares and updates its own timestamp and uses it to write a versioned key value.

class Server…

  public int write(String key, String value, int requestTimestamp) {
      //update own clock to reflect causality
      int writeAtTimestamp = clock.tick(requestTimestamp);
      mvccStore.put(new VersionedKey(key, writeAtTimestamp), value);
      return writeAtTimestamp;
  }

The timestamp used for writing the value is returned to the client. The client keeps track of the maximum timestamp, by updating its own timestamp. It uses this timestamp to issue any further writes.

class Client…

  LamportClock clock = new LamportClock(1);
  public void write() {
      int server1WrittenAt = server1.write("name", "Alice", clock.getLatestTime());
      clock.updateTo(server1WrittenAt);

      int server2WrittenAt = server2.write("title", "Microservices", clock.getLatestTime());
      clock.updateTo(server2WrittenAt);

      assertTrue(server2WrittenAt > server1WrittenAt);
  }

The sequence of requests look like following:

Figure 2: Two servers each responsible for specific keys

The same technique works even when the client is communicating with a leader with Leader and Followers groups, with each group responsible for specific keys. The client sends requests to the leader of the group as detailed above. The Lamport Clock instance is maintained by the leader of the group, and is updated exactly the same way as discussed in the previous section.

Figure 3: Different leader follower groups storing different key values

Partial Order

The values stored by Lamport Clock are only partially ordered. If two clients store values in two separate servers, the timestamp values cannot be used to order the values across servers. In the following example, the title stored by Bob on server Green is at timestamp 2. But it can not be determined if Bob stored the title before or after Alice stored the name on server Blue.

Figure 4: Partial Order

A single server/leader updating values

For a single leader-follower group of servers, where a leader is always responsible for storing values, the basic implementation discussed in Versioned Value is enough to maintain causality.

Figure 5: Single leader-follower group saving key values

In this case, the key value store keeps an integer version counter. It increments the version counter every time the key value write command is applied from the Write Ahead Log. It then constructs the new key with the incremented version counter. Only the leader is responsible for incrementing the version counter, and followers use the same version number.

class ReplicatedKVStore…

  int version = 0;
  MVCCStore mvccStore = new MVCCStore();

  @Override
  public CompletableFuture<Response> put(String key, String value) {
      return server.propose(new SetValueCommand(key, value));
  }

  private Response applySetValueCommand(SetValueCommand setValueCommand) {
      getLogger().info("Setting key value " + setValueCommand);
      version = version + 1;
      mvccStore.put(new VersionedKey(setValueCommand.getKey(), version), setValueCommand.getValue());
      Response response = Response.success(version);
      return response;
  }


from Hacker News https://ift.tt/3gPCuEH

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.