Thursday, May 28, 2020

The Writings of Leslie Lamport

LaTeX: A Document Preparation System
Addison-Wesley, Reading, Mass. (1986).
No electronic version available.
In the early 80s, I was planning to write the Great American Concurrency Book. I was a TeX user, so I would need a set of macros. I thought that, with a little extra effort, I could make my macros usable by others. Don Knuth had begun issuing early releases of the current version of TeX, and I figured I could write what would become its standard macro package. That was the beginning of LaTeX. I was planning to write a user manual, but it never occurred to me that anyone would actually pay money for it. In 1983, Peter Gordon, an Addison-Wesley editor, and his colleagues visited me at SRI. Here is his account of what happened.

Our primary mission was to gather information for Addison-Wesley "to publish a computer-based document processing system specifically designed for scientists and engineers, in both academic and professional environments." This system was to be part of a series of related products (software, manuals, books) and services (database, production). (La)TeX was a candidate to be at the core of that system. (I am quoting from the original business plan.) Fortunately, I did not listen to your doubt that anyone would buy the LaTeX manual, because more than a few hundred thousand people actually did. The exact number, of course, cannot accurately be determined, inasmuch as many people (not all friends and relatives) bought the book more than once, so heavily was it used.

Meanwhile, I still haven't written the Great American Concurrency Book.
    On Interprocess Communication--Part I: Basic Formalism, Part II: Algorithms
    Distributed Computing 1, 2 (1986), 77-101. Also appeared as SRC Research Report 8.
    Postscript- Compressed Postscript- PDF
    Copyright 1986 by Springer-Verlag.
    Most computer scientists regard synchronization problems, such as the mutual exclusion problem, to be problems of mathematics. How can you use one class of mathematical objects, like atomic reads and writes, to implement some other mathematical object, like a mutual exclusion algorithm? I have always regarded synchronization problems to be problems of physics. How do you use physical objects, like registers, to achieve physical goals, like not having two processes active at the same time?

    With the discovery of the bakery algorithm (see

    [12]), I began considering the question of how two processes communicate. I came to the conclusion that asynchronous communication requires some object whose state can be changed by one process and observed by the other. We call such an object a register. This paper introduced three classes of registers. The weakest class with which arbitrary synchronization is possible is called safe. The next strongest is called regularand the strongest, generally assumed by algorithm writers, is called atomic.

    I had obtained all the results presented here in the late 70s and had described them to a number of computer scientists. Nobody found them interesting, so I never wrote them up. Around 1984, I saw a paper by Jay Misra, motivated by VLSI, that was heading in the general direction of my results. It made me realize that, because VLSI had started people thinking about synchronization from a more physical perspective, they might now be interested in my results about registers. So, I wrote this paper. As with

    [61], the first part describes my formalism for describing systems with nonatomic operations. This time, people were interested--perhaps because it raised the enticing unsolved problem of implementing multi-reader and multi-writer atomic registers. It led to a brief flurry of atomic register papers.

    Fred Schneider was the editor who processed this paper. He kept having trouble understanding the proof of my atomic register construction. After a couple of rounds of filling in the details of the steps that Schneider couldn't follow, I discovered that the algorithm was incorrect. Fortunately, I was able to fix the algorithm and write a proof that he, I, and, as far as I know, all subsequent readers did believe.

    Some fifteen years later, Jerry James, a member of the EECS department at the University of Kansas, discovered a small error in Proposition 1 when formalizing the proofs with the PVS mechanical verification system. He proved a corrected version of the proposition and showed how that version could be used in place of the original one. A pdf file containing a note by James describing the error and its correction can be obtained by

    clicking here.
    The Byzantine Generals(with Danny Dolev, Marshall Pease, and Robert Shostak)
    In Concurrency Control and Reliability in Distributed Systems, Bharat K. Bhargava, editor, Van Nostrand Reinhold (1987) 348-369.
    PDF
    All copyrights reserved by Van Nostrand Reinhold 1987.
    I have only a vague memory of this paper. I believe Bhargava asked me to write a chapter about the results in [41]and [46]. I was probably too lazy and asked Dolev to write a chapter that combined his more recent results on connectivity requirements with our original results. I would guess that he did all the work, though I must have at least read and approved of what he wrote.
    A Formal Basis for the Specification of Concurrent Systems
    In Distributed Operating Systems: Theory and Practice, Paker, Banatre and Bozyigit, editors, Springer-Verlag (1987), 1-46.
    Postscript- Compressed Postscript- PDF
    Copyright 1987 by Springer-Verlag.
    This paper describes the transition axiom method I introduced in [50]. It was written for a NATO Advanced Study Institute that took place in Turkey in August, 1986, and contains little that was new.
    A Fast Mutual Exclusion Algorithm
    ACM Transactions on Computer Systems 5, 1 (February 1987), 1-11. Also appeared as SRC Research Report 7.
    Postscript- Compressed Postscript- PDF
    Copyright © 1987 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    Soon after I arrived at SRC, I was approached by some people at WRL (Digital's Western Research Laboratory) who were building a multiprocessor computer. They wanted to avoid having to add synchronization instructions, so they wanted to know how efficiently mutual exclusion could be implemented with just read and write instructions. They figured that, with properly designed programs, contention for a critical section should be rare, so they were interested in efficiency in the absence of contention. I deduced the lower bound on the number of operations required and the optimal algorithm described in this paper. They decided that it was too slow, so they implemented a test-and-set instruction.

    I find it remarkable that, 20 years after Dijkstra first posed the mutual exclusion problem, no one had thought of trying to find solutions that were fast in the absence of contention. This illustrates why I like working in industry: the most interesting theoretical problems come from implementing real systems.

    Derivation of a Simple Synchronization Algorithm
    Rejected by Information Processing Letters(February 1987).
    Postscript- Compressed Postscript- PDF
    Chuck Thacker posed a little synchronization problem to me, which I solved with Jim Saxe's help. At that time, deriving concurrent algorithms was the fashion--the idea that you discover the algorithm by some form of black magic and then verify it was considered passé. So, I decided to see if I could have derived the algorithm from approved general principles. I discovered that I could--at least, informally--and that this informal derivation seemed to capture the thought process that led me to the solution in the first place.
    Distribution
    Email message sent to a DEC SRC bulletin board at 12:23:29 PDT on 28 May 87.
    Text File
    This message is the source of the following observation, which has been quoted (and misquoted) rather widely:

    A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.


    Document Production: Visual or Logical?
    Notices of the American Mathematical Society(June 1987), 621-624.
    Postscript- Compressed Postscript- PDF
    Copyright 1987 by the American Mathematical Society.
    Richard Palais ran a column on mathematical typesetting in the AMS Notices, and he invited me to be guest columnist. This is what I wrote--a short exposition of my ideas about producing mathematical documents.
    Synchronizing Time Servers
    SRC Research Report 18 (June 1987).
    Postscript- Compressed Postscript- PDF
    When I joined DEC in 1985, they were the world leader in networking. Using their VMS operating system, I could type a simple copycommand to a computer in California, specifying a file and machine name, to copy a file from a computer in Massachusetts. Even today, I can't copy a file from Massachusetts to California nearly as easily with Unix or Windows.

    The people responsible for DEC's network systems were the Network and Communications group (NAC). Around 1987, NAC asked for my help in designing a network time service. I decided that there were two somewhat conflicting requirements for a time service: delivering the correct time, and keeping the clocks on different computers closely synchronized. This paper describes the algorithms I devised for doing both.

    I withdrew the paper because Tim Mann observed that the properties I proved about the algorithms were weaker than the ones needed to make them interesting. The major problem is that the algorithms were designed to guarantee both a bound epsilon on the synchronization of each clock with a source of correct time and an independent bound delta on the synchronization between any two clocks that could be made much smaller than epsilon. Mann observed that the bound I proved on delta was not the strong one independent of epsilon that I had intended to prove. We believe that the algorithms do satisfy the necessary stronger properties, and Mann and I began rewriting the paper with the stronger results. But that paper is still only partly written and is unlikely ever to see the light of day.

    Control Predicates Are Better than Dummy Variables for Representing Program Control
    ACM Transactions on Programming Languages and Systems 10, 2 (April 1988), 267-281. Also appeared as SRC Research Report 11.
    Postscript- Compressed Postscript- PDF
    Copyright © 1988 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    This paper describes an example I came across in which the explicit control predicates introduced in [47]lead to a simpler proof than do dummy variables. This example served as an excuse. The real reason for publishing it was to lobby for the use of control predicates. There used to be an incredible reluctance by theoretical computer scientists to mention the control state of a program. When I first described the work in [40]to Albert Meyer, he immediately got hung up on the control predicates. We spent an hour arguing about them--I saying that they were necessary (as was first proved by Susan in her thesis), and he saying that I must be doing something wrong. I had the feeling that I was arguing logical necessity against religious belief, and there's no way logic can overcome religion.
    "EWD 1013"
    Unpublished. Probably written around April, 1988.
    Text File
    Dijkstra's EWD 1013, Position Paper on "Fairness", argues that fairness is a meaningless requirement because it can't be verified by observing a system for a finite length of time. The weakness in this argument is revealed by observing that it applies just as well to termination. To make the point, I wrote this note, which claims to be an early draft of EWD 1013 titled Position Paper on "Termination". It is, of course, essentially the same as EWD 1013 with fairnessreplaced by termination. Because of other things that happened at that time, I was afraid that Dijkstra might not take it in the spirit of good fun in which it was intended, and that he might find it offensive. So, I never showed it to anyone but a couple of friends. I think the passage of time has had enough of a mellowing effect that no one will be offended any more by it. It is now of more interest for the form than for the content.
    Another Position Paper on Fairness(with Fred Schneider)
    Software Engineering Notes 13, 3 (July, 1988) 1-2.
    Postscript- Compressed Postscript- PDF
    Copyright © 1988 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    This is a more traditional response to Dijkstra's EWD 1013 (see [79]). We point out that Dijkstra's same argument can be applied to show that termination is a meaningless requirement because it can't be refuted by looking at a program for a finite length of time. The real argument in favor of fairness, which we didn't mention, is that it is a useful concept when reasoning about concurrent systems.
    A Lattice-Structured Proof of a Minimum Spanning Tree Algorithm(with Jennifer Welch and Nancy Lynch)
    Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing(August, 1988).
    PDF
    Copyright © 1988 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    In 1983, Gallager, Humblet, and Spira published a distributed algorithm for computing a minimum spanning tree. For several years, I regarded it as a benchmark problem for verifying concurrent algorithms. A couple of times, I attempted to write an invariance proof, but the invariant became so complicated that I gave up. On a visit to M.I.T., I described the problem to Nancy Lynch, and she became interested in it too. I don't remember exactly how it happened, but we came up with the idea of decomposing the proof not as a simple hierarchy of refinements, but as a lattice of refinements. Being an efficient academic, Lynch got Jennifer Welch to do the work of actually writing the proof as part of her Ph. D. thesis. This paper is the conference version, written mostly by her.

    There were three proofs of the minimum spanning-tree algorithm presented at PODC that year: ours, one by Willem-Paul de Roever and his student Frank Stomp, and the third by Eli Gafni and his student Ching-Tsun Chou. Each paper used a different proof method. I thought that the best of the three was the one by Gafni and Chou--not because their proof method was better, but because they understood the algorithm better and used their understanding to simplify the proof. If they had tried to formalize their proof, it would have turned into a standard invariance proof. Indeed, Chou eventually wrote such a formal invariance proof in his doctoral thesis.

    The Gallager, Humblet, and Spira algorithm is complicated and its correctness is quite subtle. (Lynch tells me that, when she lectured on its proof, Gallager had to ask her why it works in a certain case.) There doesn't seem to be any substitute for a standard invariance proof for this kind of algorithm. Decomposing the proof the way we did seemed like a good idea at the time, but in fact, it just added extra work. (See

    [124]for a further discussion of this.)
    A Simple Approach to Specifying Concurrent Systems
    Communications of the ACM 32, 1 (January 1989), 32-45. Also appeared as SRC Research Report 15.
    Postscript- Compressed Postscript- PDF
    Copyright © 1989 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    This is a "popular" account of the transition-axiom method that I introduced in [50]. To make the ideas more accessible, I wrote it in a question-answer style that I copied from the dialogues of Galileo. The writing in this paper may be the best I've ever done.
    Pretending Atomicity(with Fred Schneider)
    SRC Research Report 44 (May 1989).
    Postscript- Compressed Postscript- PDF
    Reasoning about concurrent systems is simpler if they have fewer separate atomic actions. To simplify reasoning about systems, we'd like to be able to combine multiple small atomic actions into a single large one. This process is called reduction. This paper contains a reduction theorem for multiprocess programs. It was accepted for publication, subject to minor revisions, in ACM Transactions on Programming Languages and Systems. However, after writing it, I invented TLA, which enabled me to devise a stronger and more elegant reduction theorem. Schneider and I began to revise the paper in terms of TLA. We were planning to present a weaker, simpler version of the TLA reduction theorem that essentially covered the situations considered in this report. However, we never finished that paper. A more general TLA reduction theorem was finally published in [123].
    Realizable and Unrealizable Specifications of Reactive Systems(with Martín Abadi and Pierre Wolper)
    Automata, Languages and Programming, Springer-Verlag (July 1989) 1-17.
    Postscript- Compressed Postscript- PDF
    Copyright 1989 by Springer-Verlag.
    Abadi and I came upon the concept of realizability in [97]. Several other people independently came up with the idea at around the same time, including Wolper. Abadi and Wolper worked together to combine our results and his into a single paper. Abadi recollects that the section of the paper dealing with the general case was mostly ours, and Wolper mostly developed the finite case, including the algorithms. He remembers adopting the term "realizability" from realizability in intuitionistic logic, and thinking of the relation with infinite games after seeing an article about such games in descriptive set theory in the Journal of Symbolic Logic. As I recall, I wasn't very involved in the writing of this paper.
    A Temporal Logic of Actions
    SRC Research Report 47 (April 1990).
    Postscript- Compressed Postscript- PDF
    This was my first attempt at TLA, and I didn't get it quite right. It is superseded by [102].
    win and sin: Predicate Transformers for Concurrency
    ACM Transactions on Programming Languages and Systems 12, 3 (July 1990) 396-428. Also appeared as SRC Research Report 17 (May 1987).
    Postscript- Compressed Postscript- PDF
    Copyright © 1990 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    I had long been fascinated with algorithms that, like the bakery algorithm of [12], do not assume atomicity of their individual operations. I devised the formalism first published in [33]for writing behavioral proofs of such algorithms. I had also long been a believer in invariance proofs, which required that the algorithm be represented in terms of atomic actions. An assertional proof of the bakery algorithm required modeling its nonatomic operations in terms of multiple atomic actions--as I did in [12]. However, it's easy to introduce tacit assumptions with such modeling. Indeed, sometime during the early 80s I realized that the bakery algorithm required an assumption about how a certain operation is implemented that I had never explicitly stated, and that was not apparent in any of the proofs I had written. This paper introduces a method of writing formal assertional proofs of algorithms directly in terms of their nonatomic operations. It gives a proof of the bakery algorithm that explicitly reveals the needed assumption. However, I find the method very difficult to use. With practice, perhaps one could become facile enough with it to make it practical. However, there don't seem to be enough algorithms requiring reasoning about nonatomic actions for anyone to acquire that facility.
    A Theorem on Atomicity in Distributed Algorithms
    Distributed Computing 4, 2 (1990), 59-68. Also appeared as SRC Research Report 28.
    Postscript- Compressed Postscript- PDF
    Copyright 1990 by Springer-Verlag.
    This paper gives a reduction theorem for distributed algorithms (see the discussion of [83]). It includes what I believe to be the first reduction result for liveness properties.
    Distributed Computing: Models and Methods(with Nancy Lynch)
    Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics, Jan van Leeuwen, editor, Elsevier (1990), 1157-1199.
    Postscript- Compressed Postscript- PDF
    All copyrights reserved by Elsevier Science 1990.
    Jan van Leeuwen asked me to write a chapter on distributed systems for this handbook. I realized that I wasn't familiar enough with the literature on distributed algorithms to write it by myself, so I asked Nancy Lynch to help. I also observed that there was no chapter on assertional verification of concurrent algorithms. (That was probably due to the handbook's geographical origins, since process algebra rules in Europe.) So I included a major section on proof methods. As I recall, I wrote most of the first three sections and Lynch wrote the fourth section on algorithms pretty much by herself.
    A Completeness Theorem for TLA
    Unpublished (October, 1990).
    Postscript- Compressed Postscript- PDF
    This is the beginning of a note that states and proves a relative completeness result for the axioms of TLA in the absence of temporal existential quantification (variable hiding). (The ancient LaTeX macros used to format the note only work on the first part. A text version of the complete note and of all other TLA notes are available here. There are undoubtedly errors in the proof, but I think they're minor.
    The Concurrent Reading and Writing of Clocks
    ACM Transactions on Computer Systems 8, 4 (November 1990), 305-310. Also appeared as SRC Research Report 27.
    Postscript- Compressed Postscript- PDF
    Copyright © 1990 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    This paper uses the results from [25]to derive a couple of algorithms for reading and writing multi-word clocks. These algorithms are in the same vein as the ones in [25], involving reading and writing multi-digit numbers in opposite directions. In fact, I think I knew the algorithms when I wrote [25]. When the problem of reading and writing a two-word clock arose in a system being built at SRC, I was surprised to discover that the solution wasn't in [25]. I don't know why it wasn't, but I welcomed the opportunity to publish a paper reminding people of the earlier results.
    The Mutual Exclusion Problem Has Been Solved
    Communications of the ACM 34, 1 (January 1991), 110.
    Postscript- Compressed Postscript- PDF
    Copyright © 1991 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    In 1990, CACMpublished one of their self-assessment procedures, this time on concurrent programming. The "correct" answer to one of the questions implied that mutual exclusion can be implemented only using atomic operations that are themselves implemented with lower-level mutual exclusion. It seemed appropriate to point out that this was wrong, and that I had actually solved the mutual exclusion problem 16 years earlier in [12]--ironically, an article in CACM. So, I submitted this short note to that effect. The quotation from Samuel Johnson at the end is one that Bob Taylor likes very much and taught to everyone at SRC when he was lab director.

    The original version, which I no longer have, quoted all the sources cited in support of the "correct" answer, showing how all those experts in the field had no idea that the mutual exclusion problem had been solved. However, the editor of CACM took it upon himself to handle my submission personally. He insisted that all this material be deleted, along with the accompanying sarcasm. Although I didn't like this editorial bowdlerization, I didn't feel like fighting.

    I was appalled at how this note finally appeared. I have never seen a published article so effectively hidden from the reader. I defy anyone to take that issue of CACM and find the note without knowing the page number.

    The Existence of Refinement Mappings(with Martín Abadi)
    Theoretical Computer Science 82, 2 (May 1991), 253-284. (An abridged version appeared in Proceedings of the Third Annual Logic In Computer Science Conference(July 1988).) Also appeared as SRC Research Report 29.
    Postscript- Compressed Postscript- PDF
    All copyrights reserved by Elsevier Science 1991.
    The method of proving that one specification implements another by using a refinement mapping was well-established by the mid-80s. (It is clearly described in [54], and it also appears in [50].) It was known that constructing the refinement mapping might require adding history variables to the implementation. Indeed, Lam and Shankar essentially constructed all their refinement mappings with history variables. Jim Saxe discovered a simple example showing that history variables weren't enough. To handle that example, he devised a more complicated refinement-mapping rule. I realized that I could eliminate that complicated rule, and use ordinary refinement, by introducing a new kind of dummy variable that I called a prophecy variable. A prophecy variable is very much like a history variable, except it predicts the future instead of remembering the past. (Nancy Lynch later rediscovered Saxe's rule and used it to "simplify" refinement proofs by eliminating prophecy variables.) I then remembered a problematic example by Herlihy and Wing in their classic paper Linearizability: A Correctness Condition for Concurrent Objectsthat could be handled with prophecy variables.

    This paper was my first collaboration with Abadi. Here's my recollection of how it was written. I had a hunch that history and prophecy variables were all one needed. Abadi had recently joined SRC, and this seemed like a fine opportunity to interest him in the things I was working on. So I described my hunch to him and suggested that he look into proving it. He came back in a few weeks with the results described in the paper. My hunch was right, except that there were hypotheses needed that I hadn't suspected. Abadi, however, recalls my having had a clearer picture of the final theorem, and that we worked out some of the details together when writing the final proof.

    I had just developed the structured proof style described in

    [101], so I insisted that we write our proofs in this style, which meant rewriting Abadi's original proofs. In the process, we discovered a number of minor errors in the proofs, but no errors in the results.

    This paper won the LICS 1988 Test of Time Award (awarded in 2008).

    Preserving Liveness: Comments on `Safety and Liveness from a Methodological Point of View'(with Martín Abadi et al.)
    Information Processing Letters 40, 3 (November 1991), 141-142.
    Postscript- Compressed Postscript- PDF
    All copyrights reserved by Elsevier Science 1991.
    This is a very short article--the list of authors takes up almost as much space as the text. In a note published in IPL, Dederichs and Weber rediscovered the concept of non-machine-closed specifications. We observed here that their reaction to those specifications was naive.
    Critique of the Lake Arrowhead Three
    Distributed Computing 6, 1 (1992), 65-71.
    Postscript- Compressed Postscript- PDF
    Copyright 1992 by Springer-Verlag.
    For a number of years, I was a member of a committee that planned an annual workshop at Lake Arrowhead, in southern California. I was finally pressured into organizing a workshop myself. I got Brent Hailpern to be chairman of a workshop on specification and verification of concurrent systems. A large part of the conference was devoted to a challenge problem of specifying sequential consistency. This was a problem that, at the time, I found difficult. (I later learned how to write the simple, elegant specification that appears in [126].)

    Among the presentations at the workshop session on the challenge problem, there were only two serious attempts at solving the problem. (As an organizer, I felt that I shouldn't present my own solution.) After a long period of review and revision, these two and a third, subsequently-written solution, appeared in a special issue of Distributed Computing. This note is a critique of the three solutions that I wrote for the special issue.

    The Reduction Theorem
    Unpublished (April, 1992).
    Postscript- Compressed Postscript- PDF
    This note states and proves a TLA reduction theorem. See the discussion of [123]. Text versions of this and all other TLA notes are available here.
    Mechanical Verification of Concurrent Systems with TLA(with Urban Engberg and Peter Grønning)
    Computer-Aided Verification, G. v. Bochmann and D. K. Probst editors. (Proceedings of the Fourth International Conference, CAV'92.) Lecture Notes in Computer Science, number 663, Springer-Verlag, (June, 1992) 44-55.
    Postscript- Compressed Postscript- PDF
    Copyright 1992 by Springer-Verlag.
    When I developed TLA, I realized that, for the first time, I had a formalism that really was completely formal--so formal that mechanically checking TLA proofs should be straightforward. Working out a tiny example (the specification and trivial implementation of mutual exclusion) using the LP theorem prover, I confirmed that this was the case. I used LP mainly because we had LP experts at SRC--namely, Jim Horning and Jim Saxe.

    My tiny example convinced me that we want to reason in TLA, not in LP. To do this, we need to translate a TLA proof into the language of the theorem prover. The user should write the proof in the hierarchical style of

    [101], and the prover should check each step. One of the advantages of this approach turned out to be that it allows separate translations for the action reasoning and temporal reasoning. This is important because about 95% of a proof consists of action reasoning, and these proofs are much simpler if done with a special translation than in the same translation that handles temporal formulas. (In action reasoning, xand x' are two completely separate variables; x' is handled no differently than the variable y.) So, I invited Urban Engberg and Peter Grønning, who were then graduate students in Denmark, to SRC for a year to design and implement such a system. The result of that effort is described in this paper. For his doctoral research, Engberg later developed the system into one he called TLP.

    Georges Gonthier demonstrated how successful this system was in his mechanical verification of the concurrent garbage collector developed by Damien Doligez and hand-proved in his thesis. Gonthier estimated that using TLP instead of working directly in LP reduced the amount of time it took him to do the proof by about a factor of five. His proof is reported in:

    Georges Gonthier, Verifying the Safety of a Practical Concurrent Garbage Collector, in Rajeev Alur, Thomas A. Henzinger (Ed.): Computer Aided Verification, 8th International Conference, CAV '96. Lecture Notes in Computer Science, Vol. 1102, Springer, 1996, 462-465.

    TLP's input language was essentially a very restricted subset of TLA+ (described in [127])--a language that did not exist when TLP was implemented. I regarded TLP as a proof of concept and I did not use it after it was built. Around 2006, work began on a theorem prover for real TLA+ specifications. It led to TLAPS, the TLA+ proof system. TLAPS uses several back-end theorem provers, including Isabelle, Zenon, and SMT solvers.
    Composing Specifications(with Martín Abadi)
    ACM Transactions on Programming Languages and Systems 15, 1 (January 1993), 73-132. Also appeared as SRC Research Report 66. A preliminary version appeared in Stepwise Refinement of Distributed Systems, J. W. de Bakker, W.-P. de Roever, and G. Rozenberg editors, Springer-Verlag Lecture Notes in Computer Science Volume 430 (1989), 1-41..
    Postscript- Compressed Postscript- PDF
    Copyright © 1993 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    Since the late 80s, I had vague concerns about separating the specification of a system from requirements on the environment. The ability to write specifications as mathematical formulas (first with temporal logic and then, for practical specifications, with TLA) provides an answer. The specification is simply Eimplies M, where Especifies what we assume of the environment and Mspecifies what the system guarantees. This specification allows behaviors in which the system violates its guarantee and the environment later violates its assumption--behaviors that no correct implementation could allow. So, we defined the notion of the realizable part of a specification and took as the system specification the realizable part of Eimplies M. We later decided that introducing an explicit realizable-partoperator was a mistake, and that it was better to replace implication with a temporal whileoperator that made the specifications realizable. That's the approach we took in [112], which supersedes this paper.

    This is the second paper in which we used structured proofs, the first being

    [92]. In this case, structured proofs were essential to getting the results right. We found reasoning about realizability to be quite tricky, and on several occasions we convinced ourselves of incorrect results, finding the errors only by trying to write structured proofs.
    Verification of a Multiplier: 64 Bits and Beyond(with R. P. Kurshan)
    Computer-Aided Verification, Costas Courcoubetis, editor. (Proceedings of the Fifth International Conference, CAV'93.) Lecture Notes in Computer Science, number 697, Springer-Verlag (June, 1993), 166-179.
    Postscript- Compressed Postscript- PDF
    Copyright 1993 by Springer-Verlag.
    As I observed in [124], verifying a system by first decomposing it into separate subsystems can't reduce the size of a proof and usually increases it. However, such a decomposition can reduce the amount of effort if it allows much of the resulting proof to be done automatically by a model checker. This paper shows how the decomposition theorem of [112]can be used to decompose a hardware component (in this case, a multiplier) that is too large to be verified by model checking alone. Here is Kurshan's recollection of how this paper came to be. My CAV'92 talk was mostly about [101], and the "Larch support" refers to [96]
    I cornered you after your invited address at CAV92. At CAV, you talked about indenting (and TLA, and its Larch support). I challenged you with a matter I had been thinking about since at least 1990, the year of the first CAV. In the preface to the CAV90 proceedings, I stated as a paramount challenge to the CAV community, to create a beneficial interface between automated theorem proving, and model checking.

    I asked you if you thought that linking TLA/Larch with S/R (which should be simple to do on account of their very close syntax and semantics for finite-state models), could be useful. I suggested the (artificial) problem of verifying a multiplier constructed from iterated stages of a very complex 8x8 multiplier. The 8x8 multiplier would be too complex to verify handily with a theorem prover. A (say) 64x64 multiplier could be built from the 8x8 one. We'd use the model checker (cospan) to verify the 8x8, and Larch/TLA to verify the induction step. You liked the idea, and we did it, you working with Urban and I working with Mark [Foissoitte].

    Sadly, with the interface in place, I was unable to come up with a non-artificial feasible application. To this day, although there have been a succession of such interfaces built (I believe ours was the first), none has really demonstrated a benefit on a real application. The (revised) challenge is to find an application in which the combination finds a bug faster than either one could by itself.


    Verification and Specification of Concurrent Programs
    A Decade of Concurrency: Reflections and Perspectives, J. W. de Bakker, W.-P. de Roever, and G. Rozenberg editors. Lecture Notes in Computer Science, number 803, Springer-Verlag, (June, 1993) 347-374.
    Postscript- Compressed Postscript- PDF
    Copyright 1993 by Springer-Verlag.
    In keeping with the theme of the workshop, this paper provides a brief, biased overview of 18 years of verifying and specifying concurrent systems, along with an introduction to TLA. Looking at it almost 10 years later, I find it a rather nice read.
    Hybrid Systems in TLA+
    Hybrid Systems, Robert L. Grossman, Anil Nerode, Hans Rischel, and Anders P. Ravn, editors. Lecture Notes in Computer Science, number 736, Springer-Verlag (1993), 77-102.
    Postscript- Compressed Postscript- PDF
    Copyright 1993 by Springer-Verlag.
    In the early 90s, hybrid systems became a fashionable topic in formal methods. Theoreticians typically respond to a new problem domain by inventing new formalisms. Physicists don't have to revise the theory of differential equations every time they study a new kind of system, and computer scientists shouldn't have to change their formalisms when they encounter a new kind of system. Abadi and I showed in [106]that TLA can handle real-time specifications by simply introducing a variable to represent the current time. It's just as obvious that it can handle hybrid systems by introducing variables to represent other physical quantities. It is often necessary to demonstrate the obvious to people, and the opportunity to do this arose when there was a workshop in Denmark devoted to a toy problem of specifying a simple gas burner and proving the correctness of a simple implementation. I was unable to attend the workshop, but I did write this TLA+ solution. (The version of TLA+ used here is slightly different from the more current version that is described in [127].)

    The correctness conditions given in the problem statement included an ad hoc set of rules about how long the gas could be on if the flame was off. The purpose of those conditions was obviously to prevent a dangerous build-up of unburned gas. To demonstrate the power of TLA+, and because it made the problem more fun, I wrote a higher-level requirement stating that the concentration of gas should be less than a certain value. Assuming that the dissipation of unburned gas satisfied a simple differential equation, I proved that their conditions implied my higher-level specification--under suitable assumptions about the rate of diffusion of the gas. This required, among other things, formally specifying the Riemann integral, which took about 15 lines. I also sketched a proof of the correctness of the next implementation level. All of this was done completely in TLA+. The introduction to the volume says that I "extend[ed] ... TLA+ with explicit variables that denote continuous states and clocks." That, of course, is nonsense. Apparently, by their logic, you have extended C if you write a C program with variables named time and flame instead of t and f.

    How to Write a Proof
    American Mathematical Monthly102, 7 (August-September 1995) 600-608. Also appeared in Global Analysis in Modern Mathematics, Karen Uhlenbeck, editor. Publish or Perish Press, Houston. Also appeared as SRC Research Report 94.
    Postscript- Compressed Postscript- PDF
    TLA gave me, for the first time, a formalism in which it was possible to write completely formal proofs without first having to add an additional layer of formal semantics. I began writing proofs the way I and all mathematicians and computer scientists had learned to write them, using a sequence of lemmas whose proofs were a mixture of prose and formulas. I quickly discovered that this approach collapsed under the weight of the complexity of any nontrivial proof. I became lost in a maze of details, and couldn't keep track of what had and had not been proved at any point. Programmers learned long ago that the way to handle complexity is with hierarchical structuring. So, it was quite natural to start structuring the proofs hierarchically, and I soon developed a simple hierarchical proof style. It then occurred to me that this structured proof style should be good for ordinary mathematical proofs, not just for formal verification of systems. Trying it out, I found that it was great. I now never write old-fashioned unstructured proofs for myself, and use them only in some papers for short proof sketches that are not meant to be rigorous.

    I first presented these ideas in a talk at a celebration of the 60th birthday of Richard Palais, my de jure thesis advisor, collaborator, and friend. I was invited along with all of Palais' former doctoral students, and I was the only non-mathematician who gave a talk. (I believe all the other talks presented that day appear among the articles in the volume edited by Uhlenbeck.) Lots of people jumped on me for trying to take the fun out of mathematics. The strength of their reaction indicates that I hit a nerve. Perhaps they really do think it's fun having to recreate the proofs themselves if they want to know whether a theorem in a published paper is actually correct, and to have to struggle to figure out why a particular step in the proof is supposed to hold. I republished the paper in the AMM Monthly so it would reach a larger audience of mathematicians. Maybe I should republish it again for computer scientists.

    The Temporal Logic of Actions
    ACM Transactions on Programming Languages and Systems 16, 3 (May 1994), 872-923. Also appeared as SRC Research Report 79.
    Postscript- Compressed Postscript- PDF
    Copyright © 1994 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    This paper introduces TLA, which I now believe is the best general formalism for describing and reasoning about concurrent systems. The new idea in TLA is that one can use actions--formulas with primed and unprimed variables--in temporal formulas. An action describes a state-transition relation. For example, the action x'=x+1means approximately the same thing as the programming-language statement x := x+1. However, the action is much simpler because it talks only about xand says nothing about another variable y, while the assignment statement may (or may not) assert that ydoesn't change. TLA allows you to write specifications essentially the same way advocated in [82]. However, the specification becomes a single mathematical formula. This opens up a whole new realm of possibilities. Among other things, it provides an elegant way to formalize and systematize all the reasoning used in concurrent system verification.

    The moral of TLA is: if you're not writing a program, don't use a programming language. Programming languages are complicated and have many ugly properties because a program is input to a compiler that must generate reasonably efficient code. If you're describing an algorithm, not writing an actual program, you shouldn't burden yourselves with those complications and ugly properties. The toy concurrent programming languages with which computer scientists have traditionally described algorithms are not as bad as real programming languages, but they are still uglier and more complicated than they need to be. Such a toy program is no closer to a real C or Java program than is a TLA formula. And the TLA formula is a lot easier to deal with mathematically than is a toy program. (Everything I say about programming languages applies just as well to hardware description languages. However, hardware designers are generally more sensible than to try to use low-level hardware languages for higher-level system descriptions.) Had I only realized this 20 years ago!

    The first major step in getting beyond traditional programming languages to describe concurrent algorithms was Misra and Chandy's Unity. Unity simply eliminated the control state, so you just had a single global state that you reasoned about with a single invariant. You can structure the invariant any way you want; you're not restricted by the particular programming constructs with which the algorithm is described. The next step was TLA, which eliminated the programming language and allowed you to write your algorithm directly in mathematics. This provides a much more powerful and flexible way of describing the next-state relation.

    An amusing footnote to this paper is that, after reading an earlier draft, Simon Lam claimed that he deserved credit for the idea of describing actions as formulas with primed and unprimed variables. A similar notation for writing postconditions dates from the 70s, but that's not the same as actually specifying the action in this way. I had credited Rick Hehner's 1984 CACM article, but I figured there were probably earlier instances. After a modest amount of investigation, I found one earlier published use--in

    [50].
    Decomposing Specifications of Concurrent Systems(with Martín Abadi)
    Programming Concepts, Methods and Calculi, Ernst-Rüdiger Olderog editor. (Proceedings of the IFIP TC2/WG2.1/WG2.2/WG2.3 Working Conference, Procomet '94, San Miniato, Italy.) North-Holland, (1994) 327-340.
    Postscript- Compressed Postscript- PDF
    All copyrights reserved by Elsevier Science 1994.
    See the discussion of [112].
    Open Systems in TLA(with Martín Abadi)
    Proceedings of the Thirteenth Annual ACM Symposium on Principles of Distributed Computing,(August 1994) 81-90.
    Postscript- Compressed Postscript- PDF
    Copyright © 1994 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    See the discussion of [112].
    TLZ (Abstract)
    Z User's Workshop, Cambridge 1994. J.P. Bowen and J.A. Hall (Eds.) 267-268.
    Postscript- Compressed Postscript- PDF
    Copyright 1994 by Springer-Verlag.
    Z is a formal specification language that describes a system by writing actions--essentially the same kinds of actions that appear in a TLA specification. It was developed by Mike Spivey and others at Oxford for specifying sequential programs. When someone develops a method for sequential programs, they usually think that it will also handle concurrent programs--perhaps by adding an extra feature or two. I had heard that this wasn't true of the Z developers, and that they were smart enough to realize that Z did not handle concurrency. Moreover, Z is based on mathematics not programming languages, so it is a fairly nice language.

    TLA assumes an underlying logic for writing actions. The next step was obvious: devise a language for specifying concurrent systems that extends Z with the operators of TLA. Equally obvious was the name of such a language: TLZ.

    In the Spring of 1991, I visited Oxford and gave a talk on TLA, pointing out how naturally it could be combined with Z. The idea was as popular as bacon and eggs at Yeshiva University. Tony Hoare was at Oxford, and concurrency at Oxford meant CSP. The Z community was interested only in combining Z with CSP--which is about as natural as combining predicate logic with C++.

    A couple of years later, I was invited to give a talk at the Z User's Meeting. I dusted off the TLZ idea and presented it at the meeting. Again, I encountered a resounding lack of interest.

    Had TLA been adopted by the Z community, it might have become a lot more popular. On the other hand, not being attached to Z meant that I didn't have to live with Z's drawbacks and was free to design a more elegant language for specifying actions. The result was TLA+, described in

    [127].
    An Old-Fashioned Recipe for Real Time(with Martín Abadi)
    ACM Transactions on Programming Languages and Systems 16, 5 (September 1994) 1543-1571. Also appeared as SRC Research Report 91. A preliminary version appeared in Real-Time: Theory in Practice, J. W. de Bakker, C. Huizing, W. P. de Roever, and G. Rozenberg, editors (1992), Springer-Verlag, 1-27.
    Postscript- Compressed Postscript- PDF
    Copyright © 1994 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    As explained in the discussion of [51], it's been clear for a long time that assertional methods for reasoning about concurrent algorithms can easily handle real-time algorithms. Time just becomes another variable. That hasn't stopped academics from inventing new formalisms for handling time. (Model checking real-time algorithms does raise new problems, since they are inherently not finite-state.) So, when de Roever held a workshop on formalisms for real-time systems, it was a good opportunity to show off how easily TLA deals with real-time algorithms. We also proved some new results about nonZeno specifications. I believe this paper introduced the terms Zenoand nonZeno, though the notion of Zeno behaviors had certainly occurred to others. It does seem to have been the first to observe the relation between nonZenoness and machine closure. Abadi has the following to say about this paper:

    For me, this paper was partly a vindication of some work I had done with [Gordon] Plotkin [A Logical View of Composition, Theoretical Computer Science 114, 1 (June 1993), 3-30], where we explored the definition and properties of the "while" operator (-|>). I believe that you thought that the work was a formal game, so I was pleased to find that we could use it in this paper.

    The paper uses as an example a mutual exclusion protocol due to Michael Fischer. This example has an amusing history. When I was working on [73], I sent Fischer email describing my algorithm and asking if he knew of a better solution. He responded

    No, I don't, but the practical version of the problem sounds very similar to the problem of bus allocation in contention networks. I wonder if similar solutions couldn't be used? For example, how about...

    He followed this with a simple, elegant protocol based on real-time delays. Because of its real-time assumptions, his protocol didn't solve the problem that motivated [73]. I mentioned this algorithm in [73], but had forgotten about it by the time of de Roever's workshop. Fred Schneider reminded me of it when he used it as an example in his talk at the workshop. We then incorporated the example in our paper. I later mentioned this to Fischer, who had no memory of the protocol and even claimed that it wasn't his. Fortunately, I happen to have saved his original email, so I had the proof. The message, complete with message number, is cited in the paper--the only instance of such a citation that I know of.
    Specifying and Verifying Fault-Tolerant Systems(with Stephan Merz)
    Formal Techniques in Real-Time and Fault-Tolerant Systems, H. Langmaack, W.-P. de Roever, J. Vytopil editors. Lecture Notes in Computer Science, number 863, Springer-Verlag, (September 1994) 41-76.
    Postscript- Compressed Postscript- PDF
    Copyright 1994 by Springer-Verlag.
    Willem-Paul de Roever invited me to give a talk at this symposium. I was happy to have a podium to explain why verifying fault-tolerant, real-time systems should not be a new or especially difficult problem. This was already explained in [106]for real-time systems, but I knew that there would be people who thought that fault-tolerance made a difference. Moreover, de Roever assured me that Lübeck, where the symposium was held, is a beautiful town. (He was right.) So, I decided to redo in TLA the proof from [51]. However, when the time came to write the paper, I realized that I had overextended myself and needed help. The abstract states

    We formally specify a well known solution to the Byzantine generals problem and give a rigorous, hierarchically structured proof of its correctness. We demonstrate that this is an engineering exercise, requiring no new scientific ideas.

    However, there were few computer scientists capable of doing this straightforward engineering exercise. Stephan Merz was one of them. So, I asked him to write the proof, which he did with his usual elegance and attention to detail. I think I provided most of the prose and the initial version of the TLA specification, which Merz modified a bit. The proof was all Merz's.
    How to Write a Long Formula
    FACJ 6(5) (September/October 1994) 580-584. Also appeared as SRC Research Report 119.
    Postscript- Compressed Postscript- PDF
    Copyright 1994 by Springer-Verlag.
    Specifications often contain formulas that are a page or two long. Mathematicians almost never write formulas that long, so they haven't developed the notations needed to cope with them. This article describes my notation for using indentation to eliminate parentheses in a formula consisting of nested conjunctions and disjunctions. I find this notation very useful when writing specifications. The notation is part of the formal syntax of TLA+ (see [127]).
    Introduction to TLA
    SRC Technical Note 1994-001 (December 1994).
    Postscript- Compressed Postscript- PDF
    This is a very brief (7-page) introduction to what TLA formulas mean.
    Adding "Process Algebra" to TLA
    Unpublished (January 1995).
    Postscript- Compressed Postscript- PDF
    At the Dagstuhl workshop described in the discussion of [114], I was impressed by the elegance of the process-algebraic specification presented by Rob van Glabbeek. The ability to encode control state in the process structure permits one to express some specifications quite nicely in CCS. However, pure CCS forces you to encode the entire state in the process structure, which is impractical for real specifications. I had the idea of trying to get the best of both worlds by combining CCS and TLA, and wrote this preliminary note about it. I hoped to work on this with van Glabbeek but, although he was interested, he was busy with other things and we never discussed it, and I never did anything more with the idea. When I wrote this note, I wasn't sure if it was a good idea. I now think that examples in which adding CCS to TLA would significantly simplify the specification are unlikely to arise in practice. So, I don't see any reason to complicate TLA in this way. But, someone else may feel otherwise.
    What Process Algebra Proofs Use Instead of Invariance
    Unpublished (January 1995).
    Postscript- Compressed Postscript- PDF
    Working on [110]got me thinking about how process-algebraic proofs work. This draft note describes my preliminary thoughts about what those proofs use instead of invariance. I never developed this far enough to know if it's right.
    Conjoining Specifications(with Martín Abadi)
    ACM Transactions on Programming Languages and Systems 17, 3 (May 1995), 507-534. Also appeared as SRC Research Report 118.
    Postscript- Compressed Postscript- PDF
    Copyright © 1995 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    The obvious way to write an assume/guarantee specification is in the form Eimplies M, where Especifies what we assume of the environment and Mspecifies what the system guarantees. That is what we did in [97]. However, such a specification allows behaviors in which the system violates the guarantee and the environment later violates its assumption. This paper presents a better way to write the specification that we discovered later. Instead of Eimplies M, we take as the specification the stronger condition that Mmust remain true at least one step longer than Eis. This enabled us to simplify and strengthen our results.

    This paper contains two major theorems, one for decomposing closed-system specifications and another for composing open-system specifications. A preliminary conference version of the result for closed systems appeared in

    [103]. A preliminary conference version of the second appeared in [104].

    Although the propositions and theorems in this paper are not in principle difficult, it was rather hard to get the details right. We couldn't have done it without writing careful, structured proofs. So, I wanted those proofs published. But rigorous structured proofs, in which all the details are worked out, are long and boring, and the referees didn't read them. Since the referees hadn't read the proofs, the editor didn't want to publish them. Instead, she wanted simply to publish the paper without proofs. I was appalled that she was willing to publish theorems whose proofs hadn't been checked, but was unwilling to publish the unchecked proofs. But, I sympathized with her reluctance to kill all those trees, so we agreed that she would find someone to referee the proof and we would publish the appendix electronically. The referee read the proofs carefully and found three minor errors, which were easily corrected. Two of the errors occurred when we made changes to one part of the proof without making corresponding changes to another. The third was a careless mistake in a low-level statement. When asked, the referee said that the hierarchical structure, with all the low-level details worked out, made the proofs quite clear and easy to check.

    When I learned that ACM was going to publish some appendices in electronic form only, I was worried about their ability to maintain an electronic archive that would enable people to obtain an appendix twenty or fifty years later. Indeed, when I checked in August of 2011, none of the methods for obtaining a copy from the ACM that were printed with the article worked, and the appendix did not seem to be on their web site. It was still available from a Princeton University ftp site. (The link above is to a version of the paper containing the appendix.)

    TLA in Pictures
    IEEE Transactions on Software Engineering SE-21, 9 September 1995), 768-775. Also appeared as SRC Research Report 127.
    Postscript- Compressed Postscript- PDF
    Copyright © 1995 Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
    Back in the 50s and 60s, programmers used flowcharts. Eventually, guided by people like Dijkstra and Hoare, we learned that pictures were a bad way to describe programs because they collapsed under the weight of complexity, producing an incomprehensible spaghetti of boxes and arrows. In the great tradition of learning from our mistakes how to make the same mistake again, many people decided that drawing pictures was a wonderful way to specify systems. So, they devised graphical specification languages.

    Not wanting to be outdone, I wrote this paper to show that you can write TLA specifications by drawing pictures. It describes how to interpret as TLA formulas the typical circles and arrows with which people describe state transitions. These diagrams represent safety properties. I could also have added some baroque conventions for adding liveness properties to the pictures, but there's a limit to how silly I will get. When I wrote the paper, I actually did think that pictures might be useful for explaining parts of specifications. But I have yet to encounter any real example where they would have helped.

    This paper contains, to my knowledge, the only incorrect "theorem" I have ever published. It illustrates that I can be as lazy as anyone in not bothering to check "obvious" assertions. I didn't published a correction because the theorem, which requires an additional hypothesis, was essentially a footnote and didn't affect the main point of the paper. Also, I was curious to see if anyone would notice the error. Apparently, no one did. I discovered the error in writing

    [115]
    The RPC-Memory Specification Problem: Problem Statement(with Manfred Broy)
    Formal Systems Specification: The RPC-Memory Specification Case Study, Manfred Broy, Stephan Merz, and Katharina Spies editors. Lecture Notes in Computer Science, number 1169, (1996), 1-4.
    Postscript- Compressed Postscript- PDF
    Copyright 1996 by Springer-Verlag.
    I don't remember how this came about, but Manfred Broy and I organized a Dagstuhl workshop on the specification and verification of concurrent systems. (I'm sure I agreed to this knowing that Broy and his associates would do all the real organizing.) We decided to pose a problem that all participants were expected to solve. This is the problem statement.

    There is an interesting footnote to this workshop. As explained in the discussion of

    [50], I don't believe in writing specifications as a conjunction of the properties that the system should satisfy. Several participants used this approach. I thought that the high-level specification was sufficiently trivial that by now, people would be able to specify it in this way. However, Reino Kurki-Suonio noticed an error that was present in all the "purely axiomatic" specifications--that is, ones that mentioned only the interface, without introducing internal variables.
    A TLA Solution to the RPC-Memory Specification Problem(with Martín Abadi and Stephan Merz)
    Formal Systems Specification: The RPC-Memory Specification Case Study, Manfred Broy, Stephan Merz, and Katharina Spies editors. Lecture Notes in Computer Science, number 1169, (1996), 21-66.
    Postscript- Compressed Postscript- PDF
    Copyright 1996 by Springer-Verlag.
    Since the problem posed in [114]was devised by both Broy and me, I felt it was reasonable for me to present a TLA solution. Martín Abadi, Stephan Merz, and I worked out a solution that Merz and I presented at the workshop. Afterwards, we worked some more on it and finally came up with a more elegant approach that is described in this paper. I believe that Abadi and I wrote most of the prose. Merz wrote the actual proofs, which he later checked using his embedding of TLA in Isabelle. We all contributed to the writing of the specifications.

    This is the only example I've encountered in which the pictures of TLA formulas described in

    [113]were of some use. In fact, I discovered the error in [113]when I realized that one of the pictures in this paper provided a counterexample to its incorrect theorem.
    How to Tell a Program from an Automobile
    In A Dynamic and Quick Intellect, John Tromp editor (1996)--a Liber Amicorumissued by the CWI in honor of Paul Vitanyi's 25-year jubilee.
    Postscript- Compressed Postscript- PDF
    I wrote this brief note in January, 1977. It came about because I was struck by the use of the term program maintenance, which conjured up in my mind images of lubricating the branch statements and cleaning the pointers. So, I wrote this to make the observation that programs are mathematical objects that can be analyzed logically. I was unprepared for the strong emotions this stirred up among my colleagues at Massachusetts Computer Associates, who objected vehemently to my thesis. So, I let the whole thing drop. Years later, when I was invited to submit a short paper for the volume honoring Vitanyi, I decided that this paper would be appropriate because he had learned to drive only a few years earlier. The paper retains its relevance, since programmers still don't seem to understand the difference between a program and an automobile.
    Refinement in State-Based Formalisms
    SRC Technical Note 1996-001 (December 1996).
    Postscript- Compressed Postscript- PDF
    A brief (7-page) note explaining what refinement and dummy variables are all about. It also sneaks in an introduction to TLA. In September 2004, Tommaso Bolognesi pointed out an error in the formula on the bottom of page 4 and suggested a correction. Instead of modifying the note, I've decided to leave the problem of finding and correcting the error as an exercise for the reader.
    Marching to Many Distant Drummers(with Tim Mann)
    Unpublished (May 1997).
    Postscript- Compressed Postscript- PDF
    In 1990, there were two competing proposals for a time service for the Internet. One was from the DEC networking group and the other was in an RFC by David Mills. The people in DEC asked me for theoretical support for their belief that their proposal was better than that of Mills. I asked Tim Mann to help me. We decided that we didn't like either proposal very much, and instead we wrote a note with our own idea for an algorithm to obtain the correct time in an Internet-like environment. We sat on the idea for a few years, and eventually Tim presented it at a Dagstuhl workshop on time synchronization. We then began writing a more rigorous paper on the subject. This is as far as we got. The paper is mostly finished, but it contains some minor errors in the statements of the theorems and the proofs are not completed. We are unlikely ever to work on this paper again.
    Processes are in the Eye of the Beholder
    Theoretical Computer Science, 179, (1997), 333-351. Also appeared as SRC Research Report 132.
    Postscript- Compressed Postscript- PDF
    All copyrights reserved by Elsevier Science 1997.
    The notion of a process has permeated much of the work on concurrency. Back in the late 70s, I was struck by the fact that a uniprocessor computer could implement a multiprocess program, and that I had no idea how to prove the correctness of this implementation. Once I had realized that a system was specified simply as a set of sequences of states, the problem disappeared. Processes are just a particular way of viewing the state, and different views of the same system can have different numbers of processors.

    A nice example of this is an N-buffer producer/consumer system, which is usually viewed as consisting of a producer and a consumer process. But we can also view it as an N-process system, with each buffer being a process. Translating the views into concrete programs yields two programs that look quite different. It's not hard to demonstrate their equivalence with a lot of hand waving. With TLA, it's easy to replace the hand waving by a completely formal proof. This paper sketches how.

    I suspected that it would be quite difficult and perhaps impossible to prove the equivalence of the two programs with process algebra. So, at the end of the paper, I wrote "it would be interesting to compare a process-algebraic proof ... with our TLA proof." As far as I know, no process algebraist has taken up the challenge. I figured that a proof similar to mine could be done in any trace-based method, such as I/O automata. But, I expected that trying to make it completely formal would be hard with other methods. Yuri Gurevich and Jim Huggins decided to tackle the problem using Gurevich's evolving algebra formalism (now called abstract state machines). The editor processing my paper told me that they had submitted their solution and suggested that their paper and mine be published in the same issue, and that I write some comments on their paper. I agreed, but said that I wanted to comment on the final version. I heard nothing more about their paper, so I assumed that it had been rejected. I was surprised to learn, three years later, that the Gurevich and Huggins paper, Equivalence is in the Eye of the Beholder, appeared right after mine in the same issue of Theoretical Computer Science. They chose to write a "human-oriented" proof rather than a formal one. Readers can judge for themselves how easy it would be to formalize their proof.

    How to Make a Correct Multiprocess Program Execute Correctly on a Multiprocessor
    IEEE Transactions on Computers 46, 7 (July 1997), 779-782. Also appeared as SRC Research Report 96.
    Postscript- Compressed Postscript- PDF
    Copyright © 1997 Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
    This paper was inspired by Kourosh Gharachorloo's thesis. The problem he addressed was how to execute a multiprocess program on a computer whose memory did not provide sequential consistency (see [35]), but instead required explicit synchronization operations (such as Alpha's memory barrier instruction). He presented a method for deducing what synchronization operations had to be added to a program. I realized that, if one proved the correctness of an algorithm using the two-arrow formalism of [33], the proof would tell you what synchronization operations were necessary. This paper explains how.
    Substitution: Syntactic versus Semantic
    SRC Technical Note 1998-004 (March 1998). Rejected by Information Processing Letters.
    Postscript- Compressed Postscript- PDF
    What I find to be the one subtle and somewhat ugly part of TLA involves substitution in Enabledpredicates. In the predicate Enabled A, there is an implicit quantification over the primed variables in A. Hence, mathematical substitution does not distribute over the Enabledoperator. This four-page note explains that the same problem arises in most program logics because there is also an implicit quantification in the sequential-composition (semicolon) operator, so substitution does not distribute over semicolon. Apparently, no one had noticed this before because they hadn't tried using programming logics to do the sort of things that are easy to do in TLA.
    The Part-Time Parliament
    ACM Transactions on Computer Systems 16, 2 (May 1998), 133-169. Also appeared as SRC Research Report 49. This paper was first submitted in 1990, setting a personal record for publication delay that has since been broken by [60].
    Postscript- Compressed Postscript- PDF
    Copyright © 1998 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    A fault-tolerant file system called Echowas built at SRC in the late 80s. The builders claimed that it would maintain consistency despite any number of non-Byzantine faults, and would make progress if any majority of the processors were working. As with most such systems, it was quite simple when nothing went wrong, but had a complicated algorithm for handling failures based on taking care of all the cases that the implementers could think of. I decided that what they were trying to do was impossible, and set out to prove it. Instead, I discovered the Paxos algorithm, described in this paper. At the heart of the algorithm is a three-phase consensus protocol. Dale Skeen seems to have been the first to have recognized the need for a three-phase protocol to avoid blocking in the presence of an arbitrary single failure. However, to my knowledge, Paxos contains the first three-phase commit algorithm that is a real algorithm, with a clearly stated correctness condition and a proof of correctness.

    I thought, and still think, that Paxos is an important algorithm. Inspired by my success at popularizing the consensus problem by describing it with Byzantine generals, I decided to cast the algorithm in terms of a parliament on an ancient Greek island. Leo Guibas suggested the name Paxos for the island. I gave the Greek legislators the names of computer scientists working in the field, transliterated with Guibas's help into a bogus Greek dialect. (Peter Ladkin suggested the title.) Writing about a lost civilization allowed me to eliminate uninteresting details and indicate generalizations by saying that some details of the parliamentary protocol had been lost. To carry the image further, I gave a few lectures in the persona of an Indiana-Jones-style archaeologist, replete with Stetson hat and hip flask.

    My attempt at inserting some humor into the subject was a dismal failure. People who attended my lecture remembered Indiana Jones, but not the algorithm. People reading the paper apparently got so distracted by the Greek parable that they didn't understand the algorithm. Among the people I sent the paper to, and who claimed to have read it, were Nancy Lynch, Vassos Hadzilacos, and Phil Bernstein. A couple of months later I emailed them the following question:

    Can you implement a distributed database that can tolerate the failure of any number of its processes (possibly all of them) without losing consistency, and that will resume normal behavior when more than half the processes are again working properly?

    None of them noticed any connection between this question and the Paxos algorithm.

    I submitted the paper to TOCS in 1990. All three referees said that the paper was mildly interesting, though not very important, but that all the Paxos stuff had to be removed. I was quite annoyed at how humorless everyone working in the field seemed to be, so I did nothing with the paper. A number of years later, a couple of people at SRC needed algorithms for distributed systems they were building, and Paxos provided just what they needed. I gave them the paper to read and they had no problem with it. Here is Chandu Thekkath's account of the history of Paxos at SRC.

    When Ed Lee and I were working on

    Petalwe needed some sort of commit protocol to make sure global operations in the distributed system completed correctly in the presence of server failures. We knew about 3PC and studied a description of it in Bernstein, Hadzilacos, and Goodman's book Concurrency Control and Recovery in Database Systems. We found the protocol a bit difficult to understand and therefore abandoned our attempts at implementing it. At around this time, Mike Schroeder told us about a protocol for consensus that Leslie Lamport had invented and suggested we ask him about it. Leslie gave Ed a copy of the Part-Time Parliamenttech report, which we both enjoyed reading. I particularly liked its humour and to this day, cannot understand why people don't like that tech report. Paxos had all the necessary properties we wanted for our system and we figured we could implement it. Leslie provided essential consulting help as well, which resulted in the first implementation of the Paxos algorithm (including dynamic reconfiguration) as far as I am aware. A year later, when we needed a distributed lock server for the Frangipani file systemwe used Paxos again. So, I thought that maybe the time had come to try publishing it again.

    Meanwhile, the one exception in this dismal tale was Butler Lampson, who immediately understood the algorithm's significance. He mentioned it in lectures and in a paper, and he interested Nancy Lynch in it. De Prisco, Lynch, and Lampson published their version of a specification and proof. Their papers made it more obvious that it was time for me to publish my paper. So, I proposed to Ken Birman, who was then the editor of TOCS, that he publish it. He suggested revising it, perhaps adding a TLA specification of the algorithm. But rereading the paper convinced me that the description and proof of the algorithm, while not what I would write today, was precise and rigorous enough. Admittedly, the paper needed revision to take into account the work that had been published in the intervening years. As a way of both carrying on the joke and saving myself work, I suggested that instead of my writing a revision, it be published as a recently rediscovered manuscript, with annotations by Keith Marzullo. Marzullo was willing, Birman agreed, and the paper finally appeared.

    There was an amusing typesetting footnote to this. To set off Marzullo's annotations, I decided that they should be printed on a gray background. ACM had recently acquired some wonderful new typesetting software, and TOCS was not accepting camera-ready copy. Unfortunately, their wonderful new software could not do shading. So, I had to provide camera-ready copy for the shaded text. Moreover, their brilliant software could accept this copy only in floating figures, so Marzullo's annotations don't appear quite where they should. Furthermore, their undoubtedly expensive software wasn't up to typesetting serious math. (After all, it's a computing journal, so why should they have to typeset formulas?) Therefore, I had to provide the camera-ready copy for the definitions of the invariants in section A2, which they inserted as Figure 3 in the published version. So, the fonts in that figure don't match those in the rest of the paper.

    This paper won an ACM SIGOPS Hall of Fame Award in 2012.

    Reduction in TLA(with Ernie Cohen)
    CONCUR'98 Concurrency Theory, David Sangiorgi and Robert de Simone editors. Lecture Notes in Computer Science, number 1466, (1998), 317-331.
    Postscript- Compressed Postscript- PDF
    Copyright 1998 by Springer-Verlag.
    Reduction is a method of deducing properties of a system by reasoning about a coarser-grained model--that is, one having larger atomic actions. Reduction was probably first used informally for reasoning about multiprocess programs to justify using the coarsest model in which each atomic operation accesses only a single shared variable. The term reductionwas coined by Richard Lipton, who published the first paper on the topic. Reduction results have traditionally been based on an argument that the reduced (coarser-grained) model is in some sense equivalent to the original. For terminating programs that simply produce a result, equivalence just means producing the same result. But for reactive programs, it has been difficult to pin down exactly what equivalence means. TLA allowed me for the first time to understand the precise relation between the original and the reduced systems. In [95], I proved a result for safety specifications that generalized the standard reduction theorems. This result formulated reduction as a temporal theorem relating the original and reduced specifications--that is, as a property of individual behaviors. This formulation made it straightforward to extend the result to handle liveness, but I didn't get around to working on the extension until late in 1996.

    Meanwhile, Ernie Cohen had been working on reduction using Kleene algebra, obtaining elegant proofs of nice, general results for safety properties. I showed him the TLA version and my preliminary results on liveness, and we decided to collaborate. This paper is the result. We translated his more general results for safety into TLA and obtained new results for liveness properties. The paper promises a complete proof and a more general result in a later paper. The result exists, but the later paper is unlikely ever to appear. A draft of the complete proof is available as a

    Postscript, compressed Postscript, or pdffile.
    Composition: A Way to Make Proofs Harder
    Compositionality: The Significant Difference (Proceedings of the COMPOS'97 Symposium), Willem-Paul de Roever, Hans Langmaack, and Amir Pnueli editors. Lecture Notes in Computer Science, number 1536, (1998), 402-423.
    Postscript- Compressed Postscript- PDF
    Copyright 1998 by Springer-Verlag.
    Systems are complicated. We master their complexity by building them from simpler components. This suggests that to master the complexity of reasoning about systems, we should prove properties of the separate components and then combine those properties to deduce properties of the entire system. In concurrent systems, the obvious choice of component is the process. So, compositional reasoning has come to mean deducing properties of a system from properties of its processes.

    I have long felt that this whole approach is rather silly. You don't design a mutual exclusion algorithm by first designing the individual processes and then hoping that putting them together guarantees mutual exclusion. Similarly, anyone who has tried to deduce mutual exclusion from properties proved by considering the processes in isolation knows that it's the wrong way to approach the problem. You prove mutual exclusion by finding a global invariant, and then showing that each process's actions maintains the invariant. TLA makes the entire reasoning process completely mathematical--the specifications about which one reasons are mathematical formulas, and proving correctness means proving a single mathematical formula. A mathematical proof is essentially decompositional: you apply a deduction rule to reduce the problem of proving a formula to that of proving one or more simpler formulas.

    This paper explains why traditional compositional reasoning is just one particular, highly constrained way of decomposing the proof. In most cases, it's not a very natural way and results in extra work. This extra work is justified if it can be done by a computer. In particular, decomposition along processes makes sense if the individual processes are simple enough to be verified by model checking. TLA is particularly good for doing this because, as illustrated by

    [119], it allows a great deal of flexibility in choosing what constitutes a process.
    Proving Possibility Properties
    Theoretical Computer Science 206, 1-2, (October 1998), 341-352. Also appeared as SRC Research Report 137.
    Postscript- Compressed Postscript- PDF
    All copyrights reserved by Elsevier Science 1998.
    One never wants to assert possibility properties as correctness properties of a system. It's not interesting to know that a system mightproduce the correct answer. You want to know that it will never produce the wrong answer (safety) and that it eventually will produce an answer (liveness). Typically, possibility properties are used in branching-time logics that cannot express liveness. If you can't express the liveness property that the system must do something, you can at least show that the system might do it. In particular, process algebras typically can express safety but not liveness. But the trivial system that does nothing implements any safety property, so process algebraists usually rule out such trivial implementations by requiring bisimulation--meaning that the implementation allows all the same possible behaviors as the specification.

    People sometimes argue that possibility properties are important by using the ambiguities of natural language to try to turn a liveness property into a possibility property. For example, they may say that it should be possible for the user of a bank's ATM to withdraw money from his account. However, upon closer examination, you don't just want this to be possible. (It's possible for me to withdraw money from an ATM, even without having an account, if a medium-sized meteorite hits it.) The real condition is that, if the user presses the right sequence of buttons, then he must receive the money.

    Since there is no reason to prove possibility properties of a system, I was surprised to learn from Bob Kurshan--a very reasonable person--that he regularly uses his model checker to verify possibility properties. Talking to him, I realized that although verifying possibility properties tells you nothing interesting about a system, it can tell you something interesting about a specification, which is a mathematical model of the system. For example, you don't need to specify that a user can hit a button on the ATM, because you're specifying the ATM, not the user. However, we don't reason about a user interacting with the ATM; we reason about a mathematical model of the user and the ATM. If, in that mathematical model, it were impossible for the button to be pushed, then the model would be wrong. Proving possibility properties can provide sanity checks on the specification. So, I wrote this paper explaining how you can use TLA to prove possibility properties of a specification--even though a linear-time temporal logic like TLA cannot express the notion of possibility.

    I originally submitted this paper to a different journal. However, the editor insisted that, to publish the paper, I had to add a discussion about the merits of branching-time versus linear-time logic. I strongly believe that it's the job of an editor to judge the paper that the author wrote, not to get him to write the paper that the editor wants him to. So, I appealed to the editor-in-chief. After receiving no reply for several months, I withdrew the paper and submitted it to TCS.

    A Lazy Caching Proof in TLA(with Peter Ladkin, Bryan Olivier, and Denis Roegel)
    Distributed Computing 12, 2/3, (1999), 151-174.
    Postscript- Compressed Postscript- PDF
    Copyright 1999 by Springer-Verlag.
    At some gathering (I believe it was the workshop where I presented [103]), Rob Gerth told me that he was compiling a collection of proofs of the lazy caching algorithm of Afek, Brown, and Merritt. I decided that this was a good opportunity to demonstrate the virtues of TLA, so there should be a TLA solution. In particular, I wanted to show that the proof is a straightforward engineering task, not requiring any new theory. I wanted to write a completely formal, highly detailed structured proof, but I didn't want to do all that dull work myself. So, I enlisted Ladkin, Olivier (who was then a graduate student of Paul Vitanyi in Amsterdam), and Roegel (who was then a graduate student of Dominique Mery in Nancy), and divided the task among them. However, writing a specification and proof is a process of continual refinement until everything fits together right. Getting this done in a long-distance collaboration is not easy, and we got tired of the whole business before a complete proof was written. However, we had done enough to write this paper, which contains specifications and a high-level overview of the proof.
    Specifying Concurrent Systems with TLA+
    Calculational System Design. M. Broy and R. Steinbrüggen, editors. IOS Press, Amsterdam, (1999), 183-247.
    Postscript- Compressed Postscript- PDF
    I was invited to lecture at the 1998 Marktoberdorf summer school. One reason I accepted was that I was in the process of writing a book on concurrency, and I could use the early chapters of the book as my lecture notes. However, I decided to put aside (perhaps forever) that book and instead write a book on TLA+. I was able to recycle much material from my original notes for the purpose. For the official volume of published notes for the course, I decided to provide this, which is a preliminary draft of the first several chapters of [144].
    TLA+ Verification of Cache-Coherence Protocols(with Homayoon Akhiani et al.)
    Rejected from Formal Methods '99(February 1999).
    Postscript- Compressed Postscript- PDF
    Mark Tuttle, Yuan Yu, and I formed a small group applying TLA to verification problems at Compaq. Our two major projects, in which we have had other collaborators, have been verifications of protocols for two multiprocessor Alpha architectures. We thought it would be a good idea to write a paper describing our experience doing verification in industry. The FM'99 conference had an Industrial Applications track, to include "Experience reports [that] might describe a case study or industrial project where a formal method was applied in practice." So, we wrote this paper and submitted it. It was rejected. One of the referees wrote, "The paper is rather an experience report than a scientific paper." Our paper is indeed short on details, since neither system had been released at that time and almost all information about it was still company confidential. However, I think it still is worth reading if you're interested in what goes on in the industrial world.
    Should Your Specification Language Be Typed?(with Larry Paulson)
    ACM Transactions on Programming Languages and Systems 21, 3 (May 1999) 502-526. Also appeared as SRC Research Report 147.
    Postscript- Compressed Postscript- PDF
    Copyright © 1999 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    In 1995, I wrote a diatribe titled Types Considered Harmful. It argued that, although types are good for programming languages, they are a bad way to formalize mathematics. This implies that they are bad for specification and verification, which should be mathematics rather than programming. My note apparently provoked some discussion, mostly disagreeing with it. I thought it might be fun to promote a wider discussion by publishing it, and TOPLASwas, for me, the obvious place. Andrew Appel, the editor-in-chief at the time, was favorably disposed to the idea, so I submitted it. Some of my arguments were not terribly sound, since I know almost nothing about type theory. The referees were suitably harsh, but Appel felt it would still be a good idea to publish a revised version along with rebuttals. I suggested that it would be better if I and the referees cooperated on a single balanced article presenting both sides of the issue. The two referees agreed to shed their anonymity and participate. Larry Paulson was one of the referees. It soon became apparent that Paulson and I could not work with the other referee, who was rabidly pro-types. (At one point, he likened his situation to someone being asked by a neo-Nazi to put his name on a "balanced" paper on racism.) So, Paulson and I wrote the paper by ourselves. We expected that the other referee would write a rebuttal, but he apparently never did.
    Model Checking TLA+ Specifications(with Yuan Yu and Panagiotis Manolios)
    In Correct Hardware Design and Verification Methods (CHARME '99), Laurence Pierre and Thomas Kropf editors. Lecture Notes in Computer Science, number 1703, Springer-Verlag, (September 1999) 54-66.
    Postscript- Compressed Postscript- PDF
    Copyright 1999 by Springer-Verlag.
    Despite my warning him that it would be impossible, Yuan Yu wrote a model checker for TLA+ specifications. He succeeded beyond my most optimistic hopes. This paper is a preliminary report on the model checker. I was an author, at Yu's insistence, because I gave him some advice on the design of the model checker (more useful advice than just don't do it). Manolios worked at SRC as a summer intern and contributed the state-compression algorithm that is described in the paper, but which ultimately was not used in the model checker.
    How (La)TeX changed the face of Mathematics
    Mitteilungen der Deutschen Mathematiker-Vereinigung 1/2000(Jan 2000) 49-51.
    Postscript- Compressed Postscript- PDF
    Günther Ziegler interviewed me by email for this note, which delves into the history of LaTeX.
    Fairness and Hyperfairness
    Distributed Computing 13, 4 (2000), 239-245. Also appeared as SRC Research Report 152. A preliminary version of this paper was rejected by Concur99.
    Postscript- Compressed Postscript- PDF
    Copyright 2000 by Springer-Verlag.
    In 1993, Attie, Francez, and Grumberg published a paper titled Fairness and Hyperfairness in Multi-Party Interactions. This was a follow-up to the 1988 paper Appraising Fairness in Languages for Distributed Programmingby Apt, Francez, and Katz, which attempted to define fairness. I have long believed that the only sensible formal definition of fairness is machine closure, which is essentially one of the conditions mentioned by Apt, Francez, and Katz. (They called it feasibilityand phrased it as a condition on a language rather than on an individual specification.) I refereed the Attie, Francez, and Grumberg paper and found it rather uninteresting because it seemed to be completely language-dependent. They apparently belonged to the school, popular in the late 70s and early 80s, that equated concurrency with the CSP programming constructs. I wrote a rather unkind review of that paper, which obviously didn't prevent its publication. Years later, it suddenly struck me that there was a language-independent definition of hyperfairness--or more precisely, a language-independent notion that seemed to coincide with their definition on a certain class of CSP programs. I published this paper for three reasons: to explain the new definition of hyperfairness; to explain once again that fairness is machine closure and put to rest the other two fairness criteria conditions of Apt, Francez, and Katz; and, in some small way, to make up for my unkind review of the Attie, Francez, and Grumberg paper.
    Archival References to Web Pages
    Ninth International World Wide Web Conference: Poster Proceedings (May 2000), page 74..
    Web Page
    On several occasions, I've had to refer to a web page in a published article. The problem is that articles remain on library shelves for many decades, while URLs are notoriously transitory. This short note describes a little trick of mine for referring to a web page by something more permanent than a URL. You can discover the trick by looking here. Although my idea is ridiculously simple and can be used by anyone right now, I've had a hard time convincing anyone to use it. (Because it's only a pretty good solution and not perfect, people prefer to do nothing and wait for the ideal solution that is perpetually just around the corner.) Since I was going to be in the neighborhood, I decided to try to publicize my trick with this poster at WWW9. Some people I spoke to there thought it was a nice idea, but I'm not optimistic that anyone will actually use it.

    Unfortunately, the version posted by the conference is missing a gif file. A version with the gif is available

    here.
    Disk Paxos(with Eli Gafni)
    Distributed Computing 16, 1 (2003) 1-20.
    Postscript- gzipped Postscript- PDF
    Copyright 2003 by Springer-Verlag.
    In 1998, Jim Reuter of DEC's storage group asked me for a leader-election algorithm for a network of processors and disks that they were designing. The new wrinkle to the problem was that they wanted a system with only two processors to continue to operate if either processor failed. We could assume that the system had at least three disks, so the idea was to find an algorithm that achieved fault tolerance by replicating disks rather than processors. I convinced them that they didn't want a leader-election protocol, but rather a distributed state-machine implementation (see [27]). At the time, Eli Gafni was on sabbatical from UCLA and was consulting at SRC. Together, we came up with the algorithm described in this paper, which is a disk-based version of the Paxos algorithm of [122].

    Gafni devised the initial version of the algorithm, which didn't look much like Paxos. As we worked out the details, it evolved into its current form. Gafni wanted a paper on the algorithm to follow the path with which the algorithm had been developed, starting from his basic idea and deriving the final version by a series of transformations. We wrote the first version of the paper in this way. However, when trying to make it rigorous, I found that the transformation steps weren't as simple as they had appeared. I found the resulting paper unsatisfactory, but we submitted it anyway to PODC'99, where it was rejected. Gafni was then willing to let me do it my way, and I turned the paper into its current form.

    A couple of years after the paper was published, Mauro J. Jaskelioff encoded the proof in Isabelle/HOL and mechanically checked it. He found about a dozen small errors. Since I have been proposing Disk Paxos as a test example for mechanical verification of concurrent algorithms, I have decided not to update the paper to correct the errors he found. Anyone who writes a rigorous mechanically-checked proof will find them.

    Disk Paxos (Conference Version)(with Eli Gafni)
    Distributed Computing: 14th International Conference, DISC 2000, Maurice Herlihy, editor. Lecture Notes in Computer Science number 1914, Springer-Verlag, (2000) 330-344.
    Postscript- Compressed Postscript- PDF
    Copyright 2000 by Springer-Verlag.
    This is the abridged conference version of [134].
    When Does a Correct Mutual Exclusion Algorithm Guarantee Mutual Exclusion(with Sharon Perl and William Weihl)
    Information Processing Letters 76, 3 (March 2000), 131-134.
    Postscript- Compressed Postscript- PDF
    All copyrights reserved by Elsevier Science 2000.
    Mutual exclusion is usually defined to mean that two processes are not in their critical section at the same time. Something Dan Scales said during a conversation made me suddenly realize that conventional mutual exclusion algorithms do not satisfy that property. I then conjectured how that property could be satisfied, and Perl and Weihl proved that my conjecture was correct. This paper explains why mutual exclusion had not previously been achieved, and how to achieve it--all in less than five pages.
    Lower Bounds on Consensus
    unpublished note (March 2000).
    Postscript- Compressed Postscript- PDF
    This short note is described by its abstract:

    We derive lower bounds on the number of messages and the number of message delays required by a nonblocking fault-tolerant consensus algorithm, and we show that variants of the Paxos algorithm achieve those bounds.

    I sent it to Idit Keidar who told me that the bounds I derived were already known, so I forgot about it. About a year later, she mentioned that she had cited the note in:

    Idit Keidar and Sergio Rajsbaum. On the Cost of Fault-Tolerant Consensus When There Are No Faults - A Tutorial. MIT Technical Report MIT-LCS-TR-821, May 24 2001. Preliminary version in SIGACT News 32(2), Distributed Computing column, pages 45-63, June 2001 (published in May 15th).

    I then asked why they had cited my note if the results were already known. She replied,

    There are a few results in the literature that are similar, but not identical, because they consider slightly different models or problems. This is a source of confusion for many people. Sergio and I wrote this tutorial in order to pull the different known results together. Hopefully, it can help clarify things up.


    The Wildfire Challenge Problem(with Madhu Sharma, Mark Tuttle, and Yuan Yu)
    Rejected from CAV 2001 (January 2001).
    Postscript- Compressed Postscript- PDF
    From late fall 1996 through early summer 1997, Mark Tuttle, Yuan Yu, and I worked on the specification and verification of the cache-coherence protocol for a computer code-named Wildfire. We worked closely with Madhu Sharma, one of Wildfire's designers. We wrote a detailed specification of the protocol as well as a specification of the memory model that it was supposed to implement. We then proved various properties, but did not attempt a complete proof. In early 2000, Madhu, Mark, and I wrote a specification of a higher-level abstract version of the protocol.

    There was one detail of the protocol that struck me as particularly subtle. I had the idea of publishing an incorrect version of the specification with that detail omitted as a challenge problem for the verification community. I did that and put it

    on the Webin June, 2000. To further disseminate the problem, we wrote this description of it for the CAV (Computer Aided Verification) conference.
    Paxos Made Simple
    ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) 51-58.
    Postscript- Compressed Postscript- PDF
    Copyright © 2001 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.
    At the PODC 2001 conference, I got tired of everyone saying how difficult it was to understand the Paxos algorithm, published in [122]. Although people got so hung up in the pseudo-Greek names that they found the paper hard to understand, the algorithm itself is very simple. So, I cornered a couple of people at the conference and explained the algorithm to them orally, with no paper. When I got home, I wrote down the explanation as a short note, which I later revised based on comments from Fred Schneider and Butler Lampson. The current version is 13 pages long, and contains no formula more complicated than n1 > n2.

    In 2015, Michael Dearderuff of Amazon informed me that one sentence in this paper is ambiguous, and interpreting it the wrong way leads to an incorrect algorithm. Dearderuff found that a number of Paxos implementations on Github implemented this incorrect algorithm. Apparently, the implementors did not bother to read the precise description of the algorithm in

    [122]. I am not going to remove this ambiguity or reveal where it is. Prose is not the way to precisely describe algorithms. Do not try to implement the algorithm from this paper. Use [122]instead.
    Specifying and Verifying Systems with TLA+(with John Matthews, Mark Tuttle, and Yuan Yu)
    Proceedings of the Tenth ACM SIGOPS European Workshop (2002), 45-48.
    Postscript- gzipped Postscript- PDF
    This describes our experience at DEC/Compaq using TLA+ and the TLC model checker on several systems, mainly cache-coherence protocols. It is shorter than, and more up-to-date than [128].
    Arbiter-Free Synchronization
    Distributed Computing 16, 2/3, (2003) 219-237.
    Postscript- gzipped Postscript- PDF
    Copyright 2003 by Springer-Verlag.
    With the bakery algorithm of [12], I discovered that mutual exclusion, and hence all conventional synchronization problems, could be solved with simple read/write registers. However, as recounted in the description of [22], such a register requires an arbiter. This leads to the question: what synchronization problems can be solved without an arbiter? Early on, I devised a more primitive kind of shared register that can be implemented without an arbiter, and I figured out how to solve the producer/consumer problem with such registers. I think that hardware designers working on self-timed circuits probably already knew that producer/consumer synchronization could be implemented without an arbiter. (If not, they must have figured it out at about the same time I did.) Hardware people used Muller C-elements instead of my shared registers, but it would have been obvious to them what I was doing.

    In Petri nets, arbitration appears explicitly as conflict. A class of Petri nets called marked graphs, which were studied in the early 70s by Anatol Holt and Fred Commoner, are the largest class of Petri nets that are syntactically conflict-free. Marked-graph synchronization is a natural generalization of producer/consumer synchronization. It was clear to me that marked-graph synchronization can be implemented without an arbiter, though I never bothered writing down the precise algorithm. I assumed that marked graphs describe precisely the class of synchronization problems that could be solved without an arbiter.

    That marked-graph synchronization can be implemented without an arbiter is undoubtedly obvious to people like Anatol Holt and Chuck Seitz, who are familiar with multiprocess synchronization, Petri nets, and the arbiter problem. However, such people are a dying breed. So, I thought I should write up this result before it was lost. I had been procrastinating on this for years when I was invited to submit an article for a special issue of Distributed Computing celebrating the 20th anniversary of the PODC conference. The editors wanted me to pontificate for a few pages on the past and future of distributed computing--something I had no desire to do. However, it occurred to me that it would be fitting to contribute some unpublished 25-year-old work. So, I decided to write about arbiter-free synchronization.

    Writing the paper required me to figure out the precise arbiter-free implementation of marked graphs, which wasn't hard. It also required me to prove my assumption that marked graphs were all one could implement without an arbiter. When I tried, I realized that my assumption was wrong. There are multiprocess synchronization problems not describable by marked graphs that can be solved without an arbiter. The problem was more complicated than I had realized.

    I wish I knew exactly what can be done without an arbiter, but I don't. It turns out that I don't really understand arbiter-free synchronization. Lack of understanding leads to messy exposition. I understand the results about the equivalence of registers, and I have nice, crisp ways of formalizing and proving these results. But the other results in the paper are a mess. They're complicated and I don't have a good way of formalizing them. Had I written this twenty-five years ago, I would probably have kept working on the problem before publishing anything. But I don't have the time I once did for mulling over hard problems. I decided it was better to publish the results I have, even though they're messy, and hope that someone else will figure out how to do a better job.

    A Discussion With Leslie Lamport
    An interview in IEEE Distributed Systems Online 3, 8 (Web Page).
    PDF
    Copyright © 2002 Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
    In the spring of 2002, Dejan Milojicic proposed interviewing me for an IEEE on-line publication. He volunteered to send me the questions in advance, and to send me the transcript afterwards for correction. This seemed pretty silly, so I just wrote my answers. The "interview" was conducted by a few email exchanges.
    Lower Bounds for Asynchronous Consensus
    Future Directions in Distributed Computing, André Schiper, Alex A. Shvartsman, Hakim Weatherspoon, and Ben Y. Zhao, editors. Lecture Notes in Computer Science number 2584, Springer, (2003) 22-23.
    Postscript- gzipped Postscript- PDF
    Copyright 2003 by Springer-Verlag.
    The FuDiCo (Future Directions in Distributed Computing) workshop was held in a lovely converted monastery outside Bologna. I was supposed to talk about future directions, but restrained my natural inclination to pontificate and instead presented some new lower-bound results. The organizers wanted to produce a volume telling the world about the future of distributed computing research, so everyone was supposed to write a five-page summary of their presentations. I used only two pages. Since I didn't have rigorous proofs of my results, and I expected to discover special-case exceptions, I called them approximate theorems. This paper promises that future papers will give precise statements and proofs of the theorems, and algorithms showing that the bounds are tight. Despite my almost perfect record of never writing promised future papers, I actually wrote up the case of non-Byzantine failures in [153]. I intend some day to write another paper with the general results for the Byzantine case. Really.
    Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
    Addison-Wesley (2002).
    Available On-Line
    The complete book of TLA+. The first seven chapters (83 pages) are a rewritten version of [127]. That and the chapter on the TLC model checker are about as much of the book as I expect people to read. The web page contains errata and some exercises and examples.
    Checking Cache-Coherence Protocols with TLA+(with Rajeev Joshi, John Matthews, Serdar Tasiran, Mark Tuttle, and Yuan Yu)
    Formal Methods in System Design 22, 2 (March 2003) 125-131.
    Postscript- Compressed Postscript- PDF
    All copyrights reserved by Kluwer Academic 2003.
    Yet another report on the TLA+ verification activity at Compaq. It mentions some work that's been done since we wrote [140].
    High-Level Specifications: Lessons from Industry(with Brannon Batson)
    Formal Methods for Components and Objects, Frank S. de Boer, Marcello M. Bonsangue, Susanne Graf, and Willem-Paul de Roever, editors. Lecture Notes in Computer Science number 2852, Springer, (2003) 242-262.
    Postscript- gzipped Postscript- PDF
    Copyright 2003 by Springer-Verlag.
    I was invited to speak about TLA at the FMCO symposium. I didn't feel that I had anything new to say, so I asked Brannon Batson who was then at Intel to help me prepare the talk and the paper. Brannon is a hardware designer who started using TLA+ while at Compaq and has continued using it at Intel. The most interesting part of this paper is Section 4, which is mainly devoted to Brannon's description of how his group is using TLA+ in their design process. Section 5 was inspired by the symposium's call for papers, whose list of topics included such fashionable buzzwords as "object-oriented", "component-based", and "information hiding". It explains why those concepts are either irrelevant to or are a bad idea for high-level specification.
    The Future of Computing: Logic or Biology
    Text of a talk given at Christian Albrechts University, Kiel on 11 July 2003.
    Postscript- gzipped Postscript- PDF
    I was invited to give a talk to a general university audience at Kiel. Since I'm not used to giving this kind of non-technical talk, I wrote it out in advance and read it. Afterwards, I received several requests for a copy to be posted on the Web. So, here it is.
    Consensus on Transaction Commit(with Jim Gray)
    ACM Transactions on Database Systems 31, 1 (2006), 133-160. Also appeared as Microsoft Research Technical Report MSR-TR-2003-96 (February 2005).
    Available On-Line
    In [143], I announced some lower-bound results for the consensus problem. One result states that two message delays are required to choose a value, and a relatively large number of processors are needed to achieve that bound. When writing a careful proof of this result, I realized that it required the hypothesis that values proposed by two different processors could be chosen in two message delays. This led me to realize that fewer processors were needed if there were only one processor whose proposed value could be chosen in two message delays, and values proposed by other processors took longer to be chosen. In fact, a simple modification to the Paxos algorithm of [122]accomplished this.

    I then looked for applications of consensus in which there is a single special proposer whose proposed value needs to be chosen quickly. I realized there is a "killer app"--namely, distributed transaction commit. Instead of regarding transaction commit as one consensus problem that chooses the single value commit or abort, it could be presented as a set of separate consensus problems, each choosing the commit/abort desire of a single participant. Each participant then becomes the special proposer for one of the consensus problems. This led to what I call the Paxos Commit algorithm. It is a fault-tolerant (non-blocking) commit algorithm that I believed had fewer message delays in the normal (failure-free) case than any previous algorithm. I later learned that an algorithm published by Guerraoui, Larrea, and Schiper in 1996 had the same normal-case behavior.

    Several months later, Jim Gray and I got together to try to understand the relation between Paxos and the traditional Two-Phase Commit protocol. After a couple of hours of head scratching, we figured out that Two-Phase Commit is the trivial version of Paxos Commit that tolerates zero faults. That realization and several months of procrastination led to this paper, which describes the Two-Phase Commit and Paxos Commit algorithms and compares them. It also includes an appendix with TLA+ specifications of the transaction-commit problem and of the two algorithms.

    On Hair Color in France(with Ellen Gilkerson)
    Annals of Improbable Research, Jan/Feb 2004, 18-19.
    Postscript- gzipped Postscript- PDF
    While traveling in France, Gilkerson and I observed many blonde women, but almost no blonde men. Suspecting that we had stumbled upon a remarkable scientific discovery, we endured several weeks of hardship visiting the auberges and restaurants of France to gather data. After several years of analysis and rigorous procrastination, we wrote this paper. Much of our magnificent prose was ruthlessly eliminated by the editor to leave space for less important research.
    Formal Specification of a Web Services Protocol(with James E. Johnson, David E. Langworthy, and Friedrich H. Vogt)
    Electronic Notes in Theoretical Computer Science 105, M. Bravetti and G. Zavattaro editors. (December 2004) 147-158.
    Postscript- gzipped Postscript- PDF
    Fritz Vogt spent part of a sabbatical at our lab during the summer and fall of 2003. I was interested in getting TLA+ used in the product groups at Microsoft, and Fritz was looking for an interesting project involving distributed protocols. Through his contacts, we got together with Jim Johnson and Dave Langworthy, who work on Web protocols at Microsoft in Redmond. Jim and Dave were interested in the idea of formally specifying protocols, and Jim suggested that we look at the Web Services Atomic Transaction protocol as a simple example. Fritz and I spent part of our time for a couple of months writing it, with a lot of help from Jim and Dave in understanding the protocol. This paper describes the specification and our experience writing it. The specification itself can be found by clicking here. This was a routine exercise for me, as it would have been for anyone with a moderate amount of experience specifying concurrent systems. Using TLA+ for the first time was a learning experience for Fritz. It was a brand new world for Jim and Dave, who had never been exposed to formal methods before. They were happy with the results. Dave began writing specifications by himself, and has become something of a TLA+ guru for the Microsoft networking group. We submitted this paper to WS-FM 2004 as a way of introducing the Web services community to formal methods and TLA+.

    References [4] and [5] are no longer at the URLs given in the reference list. Reference [5] no longer seems to be on the Web. Reference [4] seems to be the document now available

    here. However, that paper seems to be a revised version of the one on which we based our formal spec, and it could have been influenced by our spec.
    Cheap Paxos(with Mike Massa)
    Proceedings of the International Conference on Dependable Systems and Networks (DSN 2004)held in Florence in June-July 2004.
    Postscript- gzipped Postscript- PDF
    Copyright © 2004 Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
    A system that can tolerate a single non-Byzantine failure requires three processors. It has long been realized that only two of those processors need to maintain the system state, but the third processor must take part in every decision to maintain consistency. Mike Massa, at the time an engineer at Microsoft, observed that if we weaken the fault-tolerance guarantee, then the third processor needs to be used only in the case of failure or repair of one of the other two processors. The third processor can then be a less powerful machine or a process run occasionally on a computer devoted to other tasks. I generalized his idea to a variation of the Paxos algorithm of [122]called Cheap Paxos that tolerates up to ffailures with f+1 main processors and fauxiliary ones. A paper on this algorithm was rejected from the PODC and DISC conferences. Most of the referees thought that it just presented the old idea that only the main processors need to maintain the system state, not realizing that it differed from the old approach because the remaining fprocessors need not take part in every decision. One review contained a silly assertion that it was easy to solve the problem in a certain way. When trying to prove his or her assertion false, I discovered a somewhat simpler version of Cheap Paxos that achieved the same result as the original version. (The new algorithm wasn't at all what the referee said should be done, which was impossible.) This paper describes the simpler algorithm. The original algorithm actually has some advantage over the new one, but it remains unpublished.
    Implementing and Combining Specifications
    Unpublished note (September 2004).
    PDF
    I wrote this note to help some Microsoft developers understand how they could write TLA+ specifications of the software they were designing. Their biggest problem was figuring out how to specify an API (Application Programming Interface) in TLA+, since there were no published examples of such specifications. The note also explains two other things they didn't understand: what it means to implement an API specification and how to use an API specification in specifying a system that calls the API.
    Lower Bounds for Asynchronous Consensus
    Distributed Computing 19, 2 (2006), 104-125. Also appeared as Microsoft Research Technical Report MSR-TR-2004-72 (July 2004, revised August 2005).
    PDF
    Copyright 2006 by Springer-Verlag.
    This paper contains the precise statements and proofs of the results announced in [143]for the non-Byzantine case. It also includes another result showing that a completely general consensus algorithm cannot be faster than the Paxos algorithm of [122]in the presence of conflicting requests. However, there are two exceptional cases in which this result does not hold, and the paper presents potentially useful optimal algorithms for both cases.
    Generalized Consensus and Paxos
    Microsoft Research Technical Report MSR-TR-2005-33 (15 March 2005).
    Available On-Line
    In [153], I proved lower bounds for the number of message delays required to reach consensus. I showed that the best algorithms can reach consensus in the normal case in 2 message delays. This result in turn led me to a new version of the Paxos algorithm of [122]called Fast Paxos, described in [158], that achieves this bound. However, Fast Paxos can take 3 message delays in the event of conflict, when two values are proposed concurrently. I showed in [153]that this was unavoidable in a general algorithm, so this seemed to be the last word.

    It then occurred to me that, in the state-machine approach (introduced in

    [27]), such conflicting proposals arise because two different commands are issued concurrently by two clients, and both are proposed as command number i. This conflict is necessary only if the two proposed commands do not commute. If they do, then there is no need to order them. This led me to a new kind of agreement problem that requires dynamically changing agreement on a growing partially ordered set of commands. I realized that generalizing from partially ordered sets of commands to a new mathematical structure I call a c-structleads to a generalized consensus problem that covers both ordinary consensus and this new dynamic agreement problem. I also realized that Fast Paxos can be generalized to solve this new problem. I wrote up these results in March 2004. However, I was in the embarrassing position of having written a paper generalizing Fast Paxos without having written a paper about Fast Paxos. So, I just let the paper sit on my disk.

    I was invited to give a keynote address at the 2004 DSN conference, and I decided to talk about fast and generalized Paxos. Fernando Pedone came up after my talk and introduced himself. He said that he and André Schiper had already published a paper with the same generalization from the command sequences of the state-machine approach to partially ordered sets of commands, together with an algorithm that achieved the same optimal number of message delays in the absence of conflict. It turns out that their algorithm is different from the generalized Paxos algorithm. There are cases in which generalized Paxos takes only 2 message delays while their algorithm takes 3. But the difference in efficiency between the two algorithms is insignificant. The important difference is that generalized Paxos is more elegant.

    I've been sitting on this paper for so long because it doesn't seem right to publish a paper on a generalization of Fast Paxos before publishing something about Fast Paxos itself. Since generalized Paxos is a generalization, this paper also explains Fast Paxos. But people's minds don't work that way. They need to understand Fast Paxos before they can really understand its generalization. So, I figured I would turn this paper into the second part of a long paper or monograph whose first part explains Fast Paxos. However, in recent years I've been discovering new Paxonian results faster than I can write them up. It therefore seems silly not to release a paper that I've already written about one of those results. So, I added a brief discussion of the Pedone-Schiper result and a citation to

    [153]and am posting the paper here. Now that I have written the Fast Paxos paper and submitted it for publication, I may rewrite this paper as part two of that one.
    Real Time is Really Simple
    Microsoft Research Technical Report MSR-TR-2005-30 (4 March 2005). Rejected by Formal Methods in Systems Design.
    Available On-Line
    It should be quite obvious that no special logic or language is needed to write or reason about real-time specifications. There's a simple way to do it: just use a variable to represent time. Martín Abadi and I showed in [106]that this can be done very elegantly in TLA. A simpler, more straightforward approach works with any sensible formal method, but it's too simple and obvious to publish. So instead, hundreds of papers and theses have been written about special real-time logics and languages--even though, for most purposes, there's no reason to use them instead of the simple, obvious approach. And since no one writes papers about the simple way of handling real time, people seem to assume that they need to use a real-time logic. Naturally, I find this rather annoying. So when I heard that a computer scientist was planning to write a book about one of these real-time logics, I decided it was time to write another paper explaining the simple approach.

    Since you can't publish a new paper about an old idea, no matter how good the idea may be, I needed to find something new to add. The TLC model checker provided the opportunity I needed. The method described in

    [106]and [144]is good for specifying and reasoning about real-time systems, but it produces specifications that TLC can't handle. TLC works only with the simpler approach, so I had an excuse for a new paper.

    There's a naive approach for checking real-time specifications with TLC that I had thought for a while about trying. It involves checking the specification for all runs up to some maximum time value that one hopes is large enough to find any bugs. So I did that using as examples two versions of Fischer's mutual exclusion protocol, which is mentioned in the discussion of

    [106].

    One possible reason to use a special real-time approach is for model checking. I figured that model checkers using special algorithms for real time should do much better than this naive approach, so I wanted some justification for using TLA+ and TLC instead. Looking through the literature, I found that all the real-time model checkers seemed to use low-level languages that could describe only simple controllers. So I added the example of a distributed algorithm that they couldn't represent. Then I discovered that, since the papers describing it had been published, the Uppaal model checker had been enhanced with new language features that enabled it to model this algorithm. This left me no choice but to compare TLC with Uppaal on the example.

    I asked Kim Larsen of Aalborg University, the developer of Uppaal, for help writing an Uppaal spec of the algorithm. Although I really did this because I'm lazy, I could justify my request because I had never used Uppaal and couldn't see how to write a nice model with it. Larsen got his colleague Arne Skou to write a model that was quite nice, though it did require a bit of a "hack" to encode the high-level constructs of TLA+ in Uppaal's lower-level language. Skou was helped by Larsen and his colleague, Gerd Behrmann. As I expected, Uppaal was much faster than TLC--except in one puzzling case in which Uppaal ran out of memory.

    I put the paper aside for a while. When I got back to it, I realized that there's a simple way of using TLC to do complete checking of these real-time specifications that is much faster than what I had been doing. The idea is so simple that I figured it was well known, and I kicked myself for not seeing it right away. I checked with Tom Henzinger, who informed me that the method was known, but it had apparently not been published. It seems to be an idea that is obvious to the experts and unknown to others. So this provided another incentive for publishing my paper, with a new section on how to use an explicit-time model checker like TLC to check real-time specs. Henzinger also corrected a basic misunderstanding I had about real-time model checkers. Rather than trying to be faster, most of them try to be better by using continuous time. He wrote:

    If you are happy with discrete time, I doubt you can do any better [than my naive approach]. Uppaal, Kronos etc. deal with real-numbered time, and therefore rely on elaborate and expensive clock region constructions.

    I was then inspired to do some more serious data gathering. I discovered the explanation of that puzzling case: Uppaal runs out of memory when the ratio of two parameters becomes too large. The results reported in the paper show that neither TLC nor Uppaal comes out clearly better on this example.

    The Uppaal distribution comes with a model of a version of Fischer's algorithm, and I decided to get some data for that example too. Uppaal did clearly better than TLC on it. However, I suspected that the reason was not because real-time model checkers are better, but because TLC is less efficient for this kind of simple algorithm than a model checker that uses a lower-level language. So I got data for two ordinary model checkers that use lower-level languages, Spin and SMV. I was again lazy and got the developers of those model checkers, Gerard Holzmann and Ken McMillan, to do all the work of writing and checking the models.

    I submitted this paper to the journal Formal Methods in Systems Design. I thought that the part about model checking was interesting enough to be worth putting into a separate conference paper. I therefore wrote

    [157], which was accepted at the 2005 Charme conference. However, the journal submission was rejected because it didn't contain enough new ideas.
    How Fast Can Eventual Synchrony Lead to Consensus?(with Partha Dutta and Rachid Guerraoui)
    Proceedings of the International Conference on Dependable Systems and Networks (DSN 2005).
    Postscript- gzipped Postscript- PDF
    During a visit I made to the EPFL in March 2004, Dutta and Guerraoui explained a problem they were working on. Asynchronous consensus algorithms like Paxos [122]maintain safety despite asynchrony, but are guaranteed to make progress only when the system becomes synchronous--meaning that messages are delivered in a bounded length of time. Dutta and Guerraoui were looking for an algorithm that always reaches agreement within a constant number of message delays after the system becomes synchronous. This is a hard problem only if messages sent before the system becomes synchronous can be delivered arbitrarily far in the future. I took the solution they had come up with and combined it with Paxos to obtain the algorithm described is this paper. It's a nice solution to a mildly interesting theoretical problem with no apparent practical application. As I recall, I wanted to include a sentence in the paper saying this, but my co-authors sensibly pointed out that doing so would ensure the paper's rejection. (My co-authors don't remember this.) Computer scientists in this field must keep up the pretense that everything they do is practical.
    Real-Time Model Checking is Really Simple
    Correct Hardware Design and Verification Methods(CHARME 2005), Dominique Borrione and Wolfgang J. Paul editors, Springer-Verlag Lecture Notes in Computer Science Volume 3725 (2005), 162-175.
    PDF
    This is an abridged version of [155], containing only the material on model checking.
    Fast Paxos
    Distributed Computing 19, 2 (October 2006) 79-103. Also appeared as Microsoft Research Technical Report MSR-TR-2005-112 (14 July 2005). .
    Available On-Line
    The Paxos consensus algorithm of [122]requires two message delays between when the leader proposes a value and when other processes learn that the value has been chosen. Since inventing Paxos, I had thought that this was the optimal message delay. However, sometime in late 2001 I realized that in most systems that use consensus, values aren't picked out of the air by the system itself; instead, they come from clients. When one counts the message from the client, Paxos requires three message delays. This led me to wonder whether consensus in two message delays, including the client's message, was in fact possible. I proved the lower-bound result announced in [143]that an algorithm that can make progress despite ffaults and can achieve consensus in two message delays despite efaults requires more than 2e+fprocesses. The proof of that result led me pretty quickly to the Fast Paxos algorithm described here. Fast Paxos generalizes the classic Paxos consensus algorithm. It can switch between learning in two or three message delays depending on how many processes are working. More precisely, it can achieve learning in two message delays only in the absence of concurrent conflicting proposals, which [153]shows is the best a general algorithm can do.
    Measuring Celebrity
    Annals of Improbable Research, Jan/Feb 2006, 14-15.
    PDF
    In September 2005, I had dinner with Andreas Podelski, who was visiting Microsoft's Cambridge Research Laboratory. He mentioned that his home page was the fourth item returned by a Google search on his first name. His casual remark inspired the scientific research reported here.
    Checking a Multithreaded Algorithm with +CAL
    In Distributed Computing: 20th International Conference, DISC 2006, Shlomi Dolev, editor. Springer-Verlag (2006) 11-163.
    PDF
    Copyright 2006 by Springer-Verlag.
    Yuan Yu told me about a multithreaded algorithm that was later reported to have a bug. I thought that writing the algorithm in PlusCal (formerly called +CAL) [161]and checking it with the TLC model checker [127]would be a good test of the PlusCal language. This is the story of what I did. The PlusCal specification of the algorithm and the error trace it found are available here.
    The PlusCal Algorithm Language
    Theoretical Aspects of Computing-ICTAC 2009, Martin Leucker and Carroll Morgan editors. Lecture Notes in Computer Science, number 5684, 36-60.
    PDF
    PlusCal (formerly called +CAL) is an algorithm language. It is meant to replace pseudo-code for writing high-level descriptions of algorithms. An algorithm written in PlusCal is translated into a TLA+ specification that can be checked with the TLC model checker [127]. This paper describes the language and the rationale for its design. A language manual and further information are available here.

    An earlier version was rejected from POPL 2007. Based on the reviews I received and comments from Simon Peyton-Jones, I revised the paper and submitted it to TOPLAS, but it was again rejected. It may be possible to write a paper about PlusCal that would be considered publishable by the programming-language community. However, such a paper is not the one I want to write. For example, two of the three TOPLAS reviewers wanted the paper to contain a formal semantics--something that I would expect people interested in using PlusCal to find quite boring. (A formal TLA+ specification of the semantics is available on the Web.) I therefore decided to publish it as an invited paper in the ICTAC conference proceedings.

    TLA+
    Chapter in Software Specification Methods: An Overview Using a Case Study, Henri Habrias and Marc Frappier, editors. Hermes, April 2006.
    PDF
    I was asked to write a chapter for this book, which consists of a collection of formal specifications of the same example system written in a multitude of different formalisms. The system is so simple that the specification should be trivial in any sensible formalism. I bothered writing the chapter because it seemed like a good idea to have TLA+ represented in the book, and because it wasn't much work since I was able to copy a lot from the Z specification in Jonathan Bowen's chapter and simply explain how and why the Z and TLA+ specifications differ. Bowen's chapter is available here.

    Because the example is so simple and involves no concurrency, its TLA+ specification is neither interesting nor enlightening. However, my comments about the specification process may be of some interest.

    Implementing Dataflow With Threads
    Distributed Computing 21, 3 (2008), 163-181. Also appeared as Microsoft Research Technical Report MSR-TR-2006-181 (December 2006)..
    Available On-Line
    Copyright 2008 by Springer-Verlag.
    In the summer of 2005, I was writing an algorithm in PlusCal [161]and essentially needed barrier synchronization as a primitive. The easiest way to do this in PlusCal was to write a little barrier synchronization algorithm. I used the simplest algorithm I could think of, in which each process maintains a single 3-valued variable--the Barrier1 algorithm of this paper. The algorithm seemed quite nice, and I wondered if it was new. A Web search revealed that it was. (In 2008, Wim Hesselink informed me that he had discovered this algorithm in 2001, but he had "published" it only in course notes.) I was curious about what barrier synchronization algorithm was used inside the Windows operating system and how it compared with mine, so I asked Neill Clift. He and John Rector found that my algorithm outperformed the one inside Windows. Meanwhile, I showed my algorithm to Dahlia Malkhi, who suggested some variants, including the paper's Barrier2 algorithm.

    By around 1980, I knew that the producer/consumer algorithm introduced in

    [23]should generalize to an arbitrary marked graph, but I never thought it important enough to bother working out the details. (Marked graphs, which specify dataflow computation, are described in the discussion of [141].) I realized that these new barrier synchronization algorithms should also be instances of that generalization. The fact that the barrier algorithms worked well on a real multiprocessor made the general algorithm seem more interesting. Further thought revealed that the good performance of these barrier algorithms was not an accident. They have optimal caching behavior, and that optimal behavior can be achieved in the general case. All this makes the general synchronization algorithm relevant for the coming generation of multicore processor chips.
    Leslie Lamport: The Specification Language TLA+
    In Logics of Specification Languages, Dines Bjørner and Martin C. Henson, editors. Springer (2008), 616-620.
    PDF
    Copyright 2008 by Springer-Verlag.
    This is a "review" of a chapter by Stephan Merz in the same book. It is mainly a brief account of the history behind TLA and TLA+. It includes an interesting quote from Brannon Battson. (See [146].)
    Computation and State Machines
    Unpublished (February 2008).
    PDF
    I have long thought that computer science is about concepts, not languages. On a visit to the University of Lugano in 2006, the question arose of what that implied about how computer science should be taught. This is a first, tentative attempt at an answer.
    A TLA+ Proof System(with Kaustuv Chaudhuri, Damien Doligez, and Stephan Merz)
    Proceedings of the LPAR Workshops, CEUR Workshop Proceedings No.~418, 17-37 (2008).
    PDF
    This is a description of the TLA+ constructs for writing formal proofs, and a preliminary description of the TLA proof system. It includes an appendix with a formal semantics of TLA+ proofs.
    The Mailbox Problem(with Marcos Aguilera and Eli Gafni)
    Distributed Computing 23, 2 (2010), 113-134. (A shorter version appeared in Proceedings of the 22nd International Symposium on Distributed Computing, (DISC 2008), 1-15.).
    PDF
    Copyright 2010 by Springer-Verlag.
    This paper addresses a little synchronization problem that I first thought about in the 1980s. When Gafni visited MSR Silicon valley in 2008, I proposed it to him and we began working on it. I thought the problem was unsolvable, but we began to suspect that there was a solution. Gafni had an idea for an algorithm, but instead of trying to understand the idea, I asked for an actual algorithm. We then went through a series of iterations in which Gafni would propose an algorithm, I'd code it in PlusCal (see [161]) and let the model checker find an error trace, which I would then give to him. (At some point, he learned enough PlusCal to do the coding himself, but he never installed the TLA+ tools and I continued to run the model checker.) This process stopped when Aguilera joined MSR and began collaborating with us. He turned Gafni's idea into an algorithm that the model checker approved of. Gafni and Aguilera came up with the impossibility results. Aguilera and I did most of the actual writing, which included working out the details of the proofs.
    Teaching Concurrency
    ACM SIGACT News Volume 40, Issue 1 (March 2009), 58-62.
    PDF
    Idit Keidar invited me to submit a note to a distributed computing column in SIGACT News devoted to teaching concurrency. In an introduction, she wrote that my note "takes a step back from the details of where, what, and how, and makes a case for the high level goal of teaching students how to think clearly." What does it say about the state of computer science education that one must make a case for teaching how to think clearly?
    Vertical Paxos and Primary-Backup Replication(with Dahlia Malkhi and Lidong Zhou)
    Proceedings of the 28th Annual ACM Symposium on Principles of Distributed Computing, PODC 2009, Srikanta Tirthapura and Lorenzo Alvisi, editors. ACM (2009), 312-313.
    PDF
    This paper came out of much discussion between Malkhi, Zhou, and myself about reconfiguration. Some day, what we did may result in a long paper about state-machine reconfiguration containing these results and others that have not yet been published. The ideas here are related to the original, unpublished version of [151].
    Computer Science and State Machines
    Concurrency, Compositionality, and Correctness (Essays in Honor of Willem-Paul de Roever).Dennis Dams, Ulrich Hannemann, and Martin Steffen editors. Lecture Notes in Computer Science, number 5930 (2010), 60-65.
    PDF
    This is the six-page version of [165]. I think it is also the first place I have mentioned the Whorfian syndrome in print. It is structured around a lovely simple example in which an important hardware protocol is derived from a trivial specification by substituting an expression for the specification's variable. This example is supporting evidence for the thesis of [168]that computation should be described with mathematics. (Substitution of an expression for a variable is an elementary operation of mathematics, but is meaningless in a programming language.)
    Reconfiguring a State Machine(with Dahlia Malkhi and Lidong Zhou)
    ACM SIGACT News Volume 41, Issue 1 (March 2010)..
    PDF
    This paper describes several methods of reconfiguring a state machine. All but one of them can be fairly easily derived from the basic state-machine reconfiguration method presented in the Paxos paper [122]. We felt that it was worthwhile publishing them because few people seemed to understand the basic method. (The basic method has a parameter α that I took to be 3 in [122]because I stupidly thought that everyone would realize that the 3 could be any positive integer.) The one new algorithm, here called the "brick wall" method, is just sketched. It is described in detail in [172].

    This paper was rejected by the 2008 PODC conference. Idit Keidar invited us to submit it as a tutorial to her distributed computing column in SIGACT News.

    Stoppable Paxos(with Dahlia Malkhi and Lidong Zhou)
    Unpublished (April 2009).
    PDF
    This paper contains a complete description and proof of the "brick wall" algorithm that was sketched in [171]. It was rejected from the 2008 DISC conference.
    Verifying Safety Properties With the TLA+ Proof System(with Kaustuv Chaudhuri et al.)
    Fifth International Joint Conference on Automated Reasoning (IJCAR), Edinburgh, UK. (July 2010) 142-148.
    Available from Math arXiv.
    This was essentially a progress report on the development of the TLAPS proof system. I believe it describes the state of the system, largely implemented by Chaudhuri, at the end of his post-doc position on the project.
    Byzantizing Paxos by Refinement
    Distributed Computing: 25th International Symposium: DISC 2011, David Peleg, editor. Springer-Verlag (2011) 211-224.
    PDF
    The Castro-Liskov algorithm (Miguel Castro and Barbara Liskov, Practical Byzantine Fault Tolerance and Proactive Recovery, TOCS 20:4 [2002] 398-461) intuitively seems like a modification of Paxos [122]to handle Byzantine failures, using 3n+1 processes instead of 2n+1 to handle n failures. In 2003 I realized that a nice way to think about the algorithm is that 2n+1 non-faulty processes are trying to implement ordinary Paxos in the presence of n malicious processes--each good process not knowing which of the other processes are malicious. Although I mentioned the idea in lectures, I didn't work out the details.

    The development of

    TLAPS, the TLA+ proof system, inspired me to write formal TLA+ specifications of the two algorithms and a TLAPS-checked proof that the Castro-Liskov algorithm refines ordinary Paxos. This paper describes the results. The complete specifications and proof are available here.
    Leaderless Byzantine Paxos
    Distributed Computing: 25th International Symposium: DISC 2011, David Peleg, editor. Springer-Verlag (2011) 141-142.
    PDF
    This two-page note describes a simple idea that I had in 2005. I have found the Castro-Liskov algorithm and other "Byzantine Paxos" algorithms unsatisfactory because they use a leader and, for progress, they require detecting and removing a malicious leader. My idea was to eliminate the leader by using a synchronous Byzantine agreement algorithm to implement a virtual leader. The note is too short to discuss the practical details, but they seem to be straightforward.
    Euclid Writes an Algorithm: A Fairytale
    International Journal of Software and Informatics 5, 1-2 (2011) Part 1, 7-20.
    PDF
    This was an invited paper for a festschrift in honor of Manfred Broy's 60th birthday. It's a whimsical introduction to TLA+, including proofs. Judged as literature, it's probably the best thing I have ever written.
    How to Write a 21st Century Proof
    Journal of Fixed Point Theory and Applicationsdoi:10.1007/s11784-012-0071-6 (6 March 2012)..
    PDF
    Copyright 2012 by Springer-Verlag.
    I was invited to give a talk at a celebration of the 80th birthday of Richard Palais. It was at a celebration of his 60th birthday that I first gave a talk about how to write a proof--a talk that led to [101]. So, I thought it would be fun to give the same talk, updated to reflect my 20 years of experience writing structured proofs. The talk was received much more calmly than my earlier one, and the mathematicians were open to considering that I might have something interesting to say about writing proofs. Perhaps in the last 20 years I have learned to be more persuasive, or perhaps the mathematicians in the audience had just grown older and calmer. In any case, they were still not ready to try changing how they write their own proofs.

    My experience preparing and giving the talk made me realize it was time for a new paper on the subject. This paper is built around a simple example--a lemma from Michael Spivak's calculus text. I tried to show how a mathematician can easily transform the proofs she now writes into structured proofs. The paper also briefly describes how formal structured proofs are written in TLA+, and an appendix contains a machine-checked proof of Spivak's lemma. While mathematicians will not write formal proofs in the forseeable future, I argue that learning how to write them is a good way to learn how to write rigorous informal proofs.

    TLA+ Proofs(with Denis Cousineau et al.)
    Proceedings of the 18th International Symposium on Formal Methods (FM 2012), Dimitra Giannakopoulou and Dominique Mery, editors. Springer-Verlag Lecture Notes in Computer Science, Volume 7436 (2012) 147-154.
    PDF
    This is a short paper describing TLAPS, the TLA+ proof system being developed at the Microsoft Research-INRIA Joint Centre.
    Why We Should Build Software Like We Build Houses
    Wired(on-line version) 25 January 2013.
    web publication
    I was approached by an editor at Wiredto write an article for them. After a great deal of discussion and rewriting, we finally came up with this compromise between what Wiredwanted and what I was willing to sign my name to. The basic message of the piece is that, when programming, it's a good idea to think before you code. I was surprised when the posted comments revealed that this is a controversial statement.
    Adaptive Register Allocation with a Linear Number of Registers(with Delporte-Gallet et al.)
    Proceedings of the 27th International Symposium on Distributed Computing (DISC 2013)269-283.
    PDF
    I had little to do with the algorithms in this paper. I was mainly responsible for writing them in PlusCal and getting a TLA+ proof written.
    Coalescing: Syntactic Abstraction for Reasoning in First-Order Modal Logics(with Damien Doligez et al.)
    Proceedings of the Workshop on Automated Reasoning in Quantified Non-Classical Logics (ARNL 2014).
    PDF
    When using a theorem prover that reasons about a certain class of mathematical operators to reason about expressions containing a larger class of operators, we have to hide from the prover the operators it can't handle. For example, a prover for ordinary mathematics would (incorrectly) deduce that the TLA+ action formula (x = y) => (x' = y')is a tautology if it were to treat priming ( ') as an ordinary mathematical operator. We call this kind of hiding coalescing. This paper shows that coalescing is somewhat more subtle than one might think, and explains how to do it correctly in some important cases.
    Who Builds a House without Drawing Blueprints?
    Communications of the ACM 58, 4 (April 2015), 38-41.
    Available on ACM web site.
    Discusses informal specification. It is an expanded version of [179].
    The Computer Science of Concurrency: The Early Years
    Communications of the ACM, June 2015, Vol. 58 No. 6, Pages 71-76.
    PDF
    This is the written version of my Turing lecture, which I gave at the PODC conference in Paris in July, 2014.
    Auxiliary Variables in TLA+(with Stephan Merz)
    Unpublished, arXiv paper 1703.05121 (May 2017).
    PDF
    Although most of the ideas were already well-established at the time, paper [92]has become the standard reference on refinement mappings and auxiliary variables--variables added to a specification to permit constructing a refinement mapping under which it implements a higher-level specification. A major new idea that paper introduced was prophecy variables, which are auxiliary variables that predict the future. Prophecy variables seemed very elegant, being history variables run backwards. They had one serious problem: they were extremely difficult even for me to use in practice.

    A 2015 paper by Martín Abadi introducing a method for making prophecy variables easier to use inspired me to take a new look at them. I came up with a completely new kind of prophecy variable--one that I find quite easy to use. I believe that Abadi and I did not discover this kind of variable in 1988 because I had not yet invented TLA+, so we were thinking only in terms of states and not in terms of actions (predicates on pairs of states). After I had the initial idea, I asked Stephan Merz to work with me to get the details right.

    This paper is written for engineers. It contains a TLA+ module for each type of auxiliary variable and shows how to use operators defined in that module to add such an auxiliary variable to a specification. These modules, along with the modules for all the examples in the paper, are

    available on the Web.

    There are three types of auxiliary variables, each with its own module. In addition to history variables that record the past and prophecy variables that predict the future, there are stuttering variables that add stuttering steps (steps that do nothing). Paper

    [92]used prophecy variables to add stuttering steps, but we have long known that it's better to use a separate kind of variable for that purpose.

    The auxiliary variables described in the paper can be defined semantically; they are not specific to the TLA+ language. We hope in a later paper to define them in a language-independent fashion, and to prove a completeness result similar to that of paper

    [92]. We believe that the new kind of prophecy variable is more powerful than the original one, and that completeness does not require the hypothesis of finite internal nondeterminism needed for the original prophecy variables.
    If You’re Not Writing a Program, Don’t Use a Programming Language
    Bulletin of EATCS (The European Association for Theoretical Computer Science)No. 125, June 2018.
    web publication
    In January, 2018 I was invited to contribute an article to the Distributed Computing column of the EATCS Bulletin. I replied that I had nothing to say on that subject to the EATCS community, and I offered instead to write something along the lines of [168]. That offer was accepted, and I wrote this article. It discusses in detail what I could only sketch in my earlier article: how to describe algorithms with mathematics.


from Hacker News https://ift.tt/2TOj9Yu

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.