The September/October 2020 issue of acmqueue is out now
Subscribers and ACM Professional members login here
Drill Bits
Decentralized Computing
Terence Kelly
Welcome back to Drill Bits, the new column about programming. This second installment explains how decentralized computing relates to distributed computing and parallel computing, describes decentralized solutions to commercially important problems, and provides working example code. To be concrete, I'll begin with practical wireless networking problems; later I'll apply the solutions in other domains.
A decentralized method employs several distinct computations that consume separate inputs and emit separate outputs; the outputs collectively constitute a solution to a given problem. Crucially, no computation accesses all of the inputs required to solve the overall problem. In interesting cases, a change to a single input may change many of the outputs. Computations may, but need not, execute simultaneously and/or in different places; in other words, decentralization is orthogonal to parallelism and distribution.
Practical examples illustrate these distinctions: Internet routing algorithms are inherently distributed because routers are physically dispersed, and each computes its own routes using information gathered from others. The "link state" family of distributed routing algorithms, however, is the antithesis of decentralized because every router assembles a complete map of the topology to compute routes.8 By contrast, the wireless networking problems considered here admit truly decentralized solutions involving only local computations upon local data.
Software and patterns that support distribution can facilitate decentralization, but the former do not define the latter. For example, a decentralized design may freely leverage old-fashioned remote procedure calls,10 tried-and-true service-oriented architecture,1 or newfangled microservices.4 The essence of decentralized computing, however, is not infrastructure but rather ignorance: No entity amasses enough information to solve the problem on its own. Each participant sees a piece of the overall input and generates a piece of the output; in the end everything works out just so, as if a single all-knowing solver were at work.
Why decentralize? Sometimes the inputs for a given problem are inherently scattered, and collecting them would be a bother. Privacy concerns recommend decentralization when entrusting a single entity with all relevant information is feasible yet unwise; just as striving for efficiency reduces effort,3 striving for decentralization reduces disclosure. Designing decentralized algorithms can yield insights that improve parallel and distributed algorithms. Most importantly, decentralized methods are sometimes the best way to solve practical problems.
Mobile Wireless Problems and Solutions
Wireless networks, particularly for mobile devices, are more challenging than networks of utility-powered computers linked by fixed wires. Mobile devices have limited energy storage. They move around, so their opportunities for mutual aid and mutual annoyance fluctuate. Devices may be owned and operated by different parties, with no obvious single authority to dictate their behavior.
Figure 1 depicts two thorny problems. Several small devices must communicate with a distant base station. Some pairs of small devices are within close range of one another. If all of the devices talk to the base directly, as in figure 1(a), then they may consume excessive power and/or jam one another.
FIGURE 1: Wireless networking problems & solutions
A stream of patents spanning two decades underscores the commercial importance of these problems. (Patent applications can cost tens of thousands of dollars in fees and demand years of intermittent focused attention from well-paid engineers and lawyers. Seldom does anyone squander such time, money, and sustained commitment on unimportant problems or unworkable solutions.)
Let's consider three of these patents. Two prevent mutual jamming by selecting a subset of devices permitted to communicate with the base station simultaneously, as in figure 1(b); no two devices in the subset are close enough to jam one another.5,6 A third patent reduces overall power consumption by selecting a subset of representative devices to communicate with the base on behalf of nearby peers, as in figure 1(c); every device is either a representative or is within range of one.9 To account for movement of the devices and to spread benefits or burdens evenly across them, each of the three schemes periodically chooses a new subset.
The remarkable thing about these approaches, and other practical solutions to related wireless networking problems, is that they all select subsets of devices the same way. They all compute an MIS (maximal independent set) of devices based on communication/interference adjacencies: a subset of devices such that no two in the MIS are adjacent, and every device outside the MIS is adjacent to at least one that is in it.14A given topology may have more than one MIS. In the following topology, for example, the outer circle surrounding the hub constitutes one MIS, and the hub alone is another.
Sometimes you may prefer some MISes over others (e.g., based on the number of devices selected or their characteristics). Often, however, any MIS will suffice, as is the case in several wireless networking applications.
A central solver with full knowledge of the network topology can easily compute an MIS: While unclassified devices remain, it repeatedly picks one arbitrarily, classifies it IN
the MIS, and classifies adjacent devices OUT
. Far more sophisticated centralized algorithms exist, but let's focus instead on a way for devices to classify themselves, based solely on local communication, without an all-knowing central solver.
TalkDown
The decentralized MIS algorithm, called TalkDown, operates asynchronously: Devices exchange messages, and each device eventually classifies itself. TalkDown leverages the fact that wireless devices have unique IDs, or names; that any two devices capable of communicating know each other's names; and that names can be compared (e.g., numerically or lexicographically). Each device creates two unordered sets, HI
and LO
, containing adjacent device names that compare respectively greater/less than its own name.
To compute an MIS, each device first enters a "ListenUp" phase in which it collects messages from peer devices in its HI
set. If it receives "I'm OUT
" messages from every member of its HI
set, then a device declares itself IN
, but if it hears "I'm IN
" from any HI
peer, it declares itself OUT
. It then enters a "TalkDown" phase that notifies all devices in its LO
set of its decision. Devices with empty HI
sets start the ball rolling by immediately declaring themselves IN
and telling their LO
peers, "I'm IN
."
In the example topology of figure 2, devices 5, 7, and 9 have empty HI
sets, so they skip the ListenUp phase, declare themselves IN
the MIS, and announce their decisions to their LO
peers. This causes devices 2, 4, 6, and 8 to receive "I'm IN
" messages during their ListenUp phases, so they declare themselves OUT
. Device 3 receives an "I'm OUT
" message from the sole device in its HI
set, device 8, during its ListenUp phase, so it declares itself IN
, enters its TalkDown phase, and sends "I'm IN
" to device 1. Device 1 declares itself OUT
. At this point all of the devices have classified themselves either IN
(3, 5, 7, and 9) or OUT
(1, 2, 4, 6, and 8).
FIGURE 2: Example topology
Figure 3 shows the stages in which devices classify themselves and inform their peers. Solid circles and arrows indicate devices that end up IN
the MIS and the "I'm IN
" messages they send; dashed circles and arrows indicate OUT
. Devices 5, 7, and 9 classify themselves immediately. Devices 2, 4, 6, and 8 can classify themselves after receiving a single message. Device 3 must wait for 8 to classify itself, and 1 must wait for 3.
FIGURE 3: Classifications and message propagation
Worst-case performance for the TalkDown algorithm, in terms of time to classify all devices, occurs when device names and network topology conspire malevolently. Consider the network ⓐ-ⓑ-ⓒ- -ⓧ-ⓨ-ⓩ. TalkDown messages propagate from ⓩ to ⓐ in this star-crossed topology, one after the other. Fortunately, as you would expect, the worst case is neither common enough nor bad enough to rule out TalkDown.
TalkDown has several nice properties. It's simple and easy to implement, and it's frugal with communication: Exactly one bit of information travels between each pair of adjacent devices. TalkDown preserves privacy in the sense that no device learns anything it didn't already know beyond the final IN/OUT
classification of adjacent devices—which it might learn or infer anyway if an MIS were computed by other means.
In some practical wireless networking applications, a new and different MIS must be computed from time to time. TalkDown can handle this requirement. Every device can replace its own name and the names of all adjacent devices with temporary pseudonyms used only for the purpose of computing a new MIS. For example, each pseudonym might be fashioned as a hash of the corresponding real name concatenated with a salt string. The times at which TalkDown must be run and the salts for making pseudonyms can follow a convention known to all devices, or in the scenario of figure 1, the base station can provide them.
Other Applications
Decentralization isn't just for wireless networks, nor is TalkDown.
Turning from wireless networks to social networks, we find strong motivations for decentralized computing. Centralized solutions to social-networking problems raise anxiety over the concentration of power that inevitably accompanies concentration of knowledge.7 But social networking problems inherently require divulging all relevant information to a central solver—don't they?
Not always. Imagine a community that wants to throw a "mixer" party where strangers can meet: Anyone may attend so long as no two partygoers know one another. It's easy to see that a maximal independent set of the community's acquaintance network should attend. Individuals can invite/exclude themselves by executing TalkDown, obviating the need for everyone to disclose their acquaintances to a central solver. Luddites may execute TalkDown using pencil, paper, and postcards, though a simple software implementation such as the mixer.py
script provided at the end of this column is more convenient. To hold a mixer party every Saturday with different invitees each time, replace all names with pseudonyms for the purpose of computing the MIS as described earlier. For example, concatenate each real name with the date of the party and compute a hash, as in
$ echo 'John Smith 31 October 2020' | sha256sum
0c1fcd5c3a822559f32fbfc8951a5d712e9aa241a6b0279fa2feda9d6dd3e415
Going Further
Decentralized computing is a large topic, and this episode of Drill Bits has drilled a mere pilot hole, so to speak. Inquisitive readers can drill down further on their own, check out my example code, and try the suggested exercises.
Research literature has explored several decentralized computing models and paradigms. The quintessential abstract model is cellular automata (e.g., Conway's Game of Life, made famous by screen savers in the 1990s13). The Game of Life is Turing-complete and therefore can in principle perform any computation—though seldom conveniently. "Market-oriented programming," a general paradigm intended for practical use, involves local computations that interact via auction-like mechanisms to solve optimization problems.12 Similar ideas later appeared with different motivations and emphases in the literature on "algorithmic mechanism design."2,11 Finally, some algorithms advertised as "distributed" are also decentralized or can be made so with modest effort; hunting for such algorithms can be fun.
Bits
Example code for this column is at https://queue.acm.org/downloads/2020/Drill_Bits_02_sample_code.tar.gz. The tarball contains two MIS programs. The Python program mixer.py
is decentralized, distributed, and parallel; the C program mis.c
is none of these. Shell scripts show how I run the two implementations on the example network of figure 2; one calls a separate script to check that the output is indeed an MIS.
Contrasting the two MIS programs highlights the tradeoffs between decentralized and centralized approaches. For example, the centralized solver can get away with storing edges in one direction only, sometimes cutting the memory footprint nearly in half. The decentralized implementation would be prettier were it not for fussy networking code and jumping through hoops to ensure that various write()
s are atomic; see the comments in the shell script for details.
Drills
Ambitious readers, grab the sample code tarball and unpack it. Here are a few questions to ponder and exercises in ascending order of difficulty:
1. The mixer.py
program uses lists for HI
and LO
. Would sets or dictionaries be better (e.g., more efficient) when checking for duplicates?
2. Would SOCK_RDM
be better than the socket type that mixer.py
currently uses?
3. Modify mixer.py
to classify itself immediately if it receives an "I'm IN
" message from a HI
peer. Conduct tests on large topologies to quantify the reduction in time for all processes to classify themselves.
4. (Theory) Given a topology and one of its MISes, is it always possible to rename nodes so that the TalkDown algorithm computes the given MIS?
5. Modify mis.c
to "stop when done"3 (i.e., terminate when the last classification happens). Measure the performance improvement. How many unnecessary messages are sent by mixer.py
?
6. Implement a fault-tolerant distributed TalkDown (hint: try building it atop Ken15).
7. Organize a mixer party using TalkDown. Which is harder, preparing the inputs or running the algorithm?
Acknowledgments
Timothy Chow and ACM Fellow/SIAM Fellow Robert Schreiber reviewed early drafts of this work, and Antonio Lain reviewed the example programs; all three provided valuable feedback.
References
1. Coatta, T. 2008. From here to there, the SOA way. acmqueue 5(6); https://queue.acm.org/detail.cfm?id=1388788.
2. Feigenbaum, J., Shenker, S. 2002. Distributed algorithmic mechanism design: recent results and future directions. In Proceedings of the Sixth International Workshop on Discrete Algorithms and Methods for Mobile Computing and Communications (September), 1-13; https://doi.org/10.1145/570810.570812.
3. Kelly, T. 2020. Efficient graph search. acmqueue 18(4); https://queue.acm.org/detail.cfm?id=3424304.
4. Killalea, T. 2016. The hidden dividends of microservices. acmqueue 14(3); https://queue.acm.org/detail.cfm?id=2956643.
5. Lee, J. G., Cheung, G., Lee, S.-L., Sharma, P. 2013. Channel assignment for a wireless network. U.S. Patent # 8,531,956 (September).
6. Natarajan, K. S. 1993. Robust scheduling mechanism for efficient bandwidth usage in multicell wireless local networks. U.S. Patent # 5,210,753 (May).
7. Onion News Network. 2011. CIA's "Facebook" program dramatically cut agency's costs; https://www.youtube.com/watch?v=ZJ380SHZvYU and https://www.theonion.com/cias-facebook-program-dramatically-cut-agencys-costs-1819594988.
8. Perlman, R. 1999. Interconnections: Bridges, Routers, Switches, and Internetworking Protocols, second edition. Addison-Wesley. Chapter 12.
9. Schreiber, R. S., Kelly, T. P. 2014. Determination of maximal independent sets of mobile devices. U.S. Patent # 8,849,325 (September).
10. Tay, B. H., Ananda, A. L. 1990. A survey of remote procedure calls. ACM SIGOPS Operating Systems Review 24(3): 68—79; https://doi.org/10.1145/382244.382832.
11. Varian, H. R. 2008. Designing the perfect auction. Communications of the ACM 51(8), 9—11; https://doi.org/10.1145/1378704.1378708.
12. Wellman, M. P. 1996. Market-oriented programming: some early lessons. In Market-Based Control: A Paradigm for Distributed Resource Allocation. Chapter 4. World Scientific; http://strategicreasoning.org/wp-content/uploads/2010/03/mbc95.pdf.
13. Wikipedia. Conway's Game of Life; https://en.wikipedia.org/wiki/Conway's_Game_of_Life.
14. Wikipedia. Maximal Independent Set. https://en.wikipedia.org/wiki/Maximal_independent_set.
15. Yoo, S., Killian, C., Kelly, T., Cho, H. K., Plite, S. 2012. Composable reliability for asynchronous systems. In Procedings of the Usenix Annual Technical Conference (June); https://www.usenix.org/system/files/conference/atc12/atc12-final206-7-20-12.pdf.
Terence Kelly ([email protected]) is a Distinguished Member and a Lifetime Member of the ACM. He studied computer science at Princeton and the University of Michigan followed by 18 years as an industrial researcher (HP Labs) and software engineer. Scatterbrained computing is his favorite kind.
Copyright © 2020 held by owner/author. Publication rights licensed to ACM.
Originally published in Queue vol. 18, no. 5—
see this item in the ACM Digital Library
Related:
Matt Fata, Philippe-Joseph Arida, Patrick Hahn, Betsy Beyer - Corp to Cloud: Google’s Virtual Desktops
Over one-fourth of Googlers use internal, data-center-hosted virtual desktops. This on-premises offering sits in the corporate network and allows users to develop code, access internal resources, and use GUI tools remotely from anywhere in the world. Among its most notable features, a virtual desktop instance can be sized according to the task at hand, has persistent user storage, and can be moved between corporate data centers to follow traveling Googlers. Until recently, our virtual desktops were hosted on commercially available hardware on Google’s corporate network using a homegrown open-source virtual cluster-management system called Ganeti. Today, this substantial and Google-critical workload runs on GCP (Google Compute Platform).
Pat Helland - Life Beyond Distributed Transactions
This article explores and names some of the practical approaches used in the implementation of large-scale mission-critical applications in a world that rejects distributed transactions. Topics include the management of fine-grained pieces of application data that may be repartitioned over time as the application grows. Design patterns support sending messages between these repartitionable pieces of data.
Ivan Beschastnikh, Patty Wang, Yuriy Brun, Michael D, Ernst - Debugging Distributed Systems
Distributed systems pose unique challenges for software developers. Reasoning about concurrent activities of system nodes and even understanding the system’s communication topology can be difficult. A standard approach to gaining insight into system activity is to analyze system logs. Unfortunately, this can be a tedious and complex process. This article looks at several key features and debugging challenges that differentiate distributed systems from other kinds of software. The article presents several promising tools and ongoing research to help resolve these challenges.
Sachin Date - Should You Upload or Ship Big Data to the Cloud?
It is accepted wisdom that when the data you wish to move into the cloud is at terabyte scale and beyond, you are better off shipping it to the cloud provider, rather than uploading it. This article takes an analytical look at how shipping and uploading strategies compare, the various factors on which they depend, and under what circumstances you are better off shipping rather than uploading data, and vice versa. Such an analytical determination is important to make, given the increasing availability of gigabit-speed Internet connections, along with the explosive growth in data-transfer speeds supported by newer editions of drive interfaces such as SAS and PCI Express.
from Hacker News https://ift.tt/32TIe90
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.