Distributed Computing - Dennis Salzner

What

Distributed computing is becoming more and more relevant in various fields. The following is an overview of technologies and algorithms that have stood out to me.

Contents

Background

Decentralized computing, apart from the internet itself, and true decentralized peer-to-peer software have emerged in universities and were probably first used large-scale for mostly nefarious purposes like botnets and copyright infringement, but also for cryptocurrencies [10] some of which have the ability to destabilize small economies.

It has also become a pillar of modern computing and highly in resilient web backend infrastructures, censorship-free internet, distributed IoT sensor networks and even satellite communication.

Decentralized computing often has a significantly disruptive aspect to it.

Remote Access Tools and Botnets

When I first came in contact with distributed computing it was when writing Remote Access Tools (RAT) in Visual Basic to play pranks on my classmates in school - of course always with their consent. The inspiration must have come from code on Planet-Source-Code.com [10]. Controlling physical devices or controlling other computers is intriguing. Generally learning about computer networks and writing network software was an interesting challenge.

A network direct connection is easy to track back and so it becomes interesting to bounce the communication off of intermediate nodes. Commonly the ICQ messenger was used for RAT servers to tell the client of the attacker, that they were online and with which IP address they could be reached at.

The next level of escalation was to have web interfaces, written in PHP, the language of choice back then, to collect the alive messages from the nodes. Instead of connecting at all it may be much more covert to either chat with the RAT via ICQ or use that web interface hosted on some bulletproof hosting platform to send commands and receive the responses.

Alternatively, even today, the Internet Relay Chat is also used for that purpose.

At this point we stop speaking of RAT, but call them bots. Multiple bots form botnets. Bots expect commands from a central service.

It would be even better, if the nodes didn’t rely on any central service, but instead would talk to each other.

There is significant financial incentive for nefarious actors to run botnets, which is why we are seeing so many of them. A large scale botnet can take down entire networks by Distributed Denial-of-Service (DDoS) attacks to manipulate stock markets, it can send spam mails, collect user credentials and so forth. They also run large scale password guessing/Brute-Force attacks on servers (see Brute-Force attacks on my server).

KaZaA and Skype

Roughly around that time KaZaA, the Peer-to-peer (P2P) file sharing network and Skype, an internet telephony network came about. Both originated from the same developers.

For the longest time telecommunication providers had charged insane fees for long distance calls. In part, of course, to offset the cost of infrastructure, but in part also in seek of profit.

With Skype it became possible to have affordable long distance phone calls. Either from Skype-Client to Skype-Client via the Internet or by relay stations to land lines.

This kind of technology was extremely unpopular with the telecommunication providers, as P2P file sharing was unpopular with the music industry and so I believe neither would have survived, if they hadn’t been almost entirely decentralized and hard to take down.

Mainstream Usages

In the meantime there are vast use cases for distributed computing whether for

decentralized P2P services that resist take-down for
- internet telephony (e.g. Skype)
- file-sharing (e.g. Bit-Torrent, KaZaA, Gnutella)
- distributed ledgers for blockchain crypto currencies (e.g. Bitcoin)
- censorship-free internet (e.g. InterPlanetary File System (IPFS))
distributed networks of IoT devices
- mesh networks (e.g. LoraWAN)
- IoT power metering (e.g. for the smart grid)
backend infrastructures of large web infrastructures
- container orchestration (e.g. Docker with Cassandra)
- consistent distributed storage/data replication and consensus algorithms (e.g. Paxos, Raft, etcd)
distributed mobile apps
federated clouds (e.g. Diaspora, Mastodon)

Traditional Client/Server Architectures

Traditionally, in the most simple case, in computer networking you would have a central server and multiple clients connecting to that server.

This has downside that the server could go down. That could be frustrating to the users at best or detrimental for an online business.

Levels of Distribution

That problem can be combated by distributing the service over multiple replica nodes.

The level of distribution can vary

a fully decentralized peer-to-peer network has no single central server
- each node runs the same exact code
- the nodes all have either the same role and send data to one another (e.g. Gossip Protocol [1])
- they could also take on different roles like “follower” or “leader” via a distributed election scheme (e.g. Raft Consensus algorithm [2])
a federated cloud has multiple servers that synchronize with each other from time to time
- some social networks use this approach (e.g. Diaspora [3], Mastodon [4])
a master slave architecture may have a centralized controlling instance, but have the databases and compute nodes distributed
- in CI/CD for running multiple nodes for more build speed (e.g. Jenkins master and worker nodes, GitHub Runners)
- in backend web infrastructures for fault-tolerance
  - this is often used with micro-services that are run in containers (e.g. Docker)
  - and distributed onto worker nodes by a central server (e.g. Kubernetes Master [5])
  - such that they can be redistributed and rebooted to achieve high availablility
in mesh networks of IoT devices, in space and the internet itself
- where messages should reach a peer and can be bounced off of other nodes or relay stations (e.g. routers)
- swarm communication across spacecraft and satellites (“Inter-Spacecraft Communication” [9])

Differences Requirements

The requirements on the distributed differ substantially

in fully decentralized peer-to-peer
- there needs to be protection against nefarious actors (e.g. proof-of-work for Bitcoin [6])
in IoT mesh networks and satellite communication
- the round-trip times can be substantial
- packets frequently get lost
- through-put can be a lot lower
whereas in CI/CD
- a failing build node can just be restarted generally without any issues

Common Challenges and Limitations in Theory

Where orchestrating and rebooting nodes is generally not a big problem, the issues arise with consistent data synchronization.

For most rational databases we speak of “transactions” that should for fill the ACID criteria [7]. Interestingly the ACID model was developed by Härder and Reuter from the University of Kaiserslautern and Heidelberg, who have close ties to the Professor I learned about implementation of Databases at the University of Erlangen-Nürnberg.

In simplified terms:

atomicity - transactions either occurs fully or not a all
consistency - transactions can only bring the database from one consistent state to another
isolation - transactions don’t have side effects and can be run concurrently
durability - transactions that have been committed to the database are also there in future

For a distributed data store we additionally need

consistency - a read at any node always retrieves the most recent data
availability - there is always at least one node that can return the data
partition tolerance - even when nodes are cut off and two or more subnets are formed, the above criteria is still fulfilled

Unfortunately the CAP theorem [8] shows that a distributed data storage can only ever guarantee two of those three criteria.

How

Technologies

With that preamble for motivation, let’s look into the current landscape of technologies and what aspects of distributed computing they attempt to solve.

Containers - to make workloads restart-able, gain parallelism and easily transfer them from node to node

Self-contained containerized applications that can be easily started and stopped on hosts are a important in large corporate infrastructures to achieve fault-tolerance.

Docker

Often unpopular due to miss-use as a build environment setup tool or build tool, it provides a platform abstraction by reproducible text-based recipes that create Docker containers. To get software onto the build environment, be it bare-metal or containerized, it makes sense to add use a package manager (see below) to pull the software in an organized way instead of downloading (wget) from the Docker-Recipe.

Container Orchestration - to run and coordinate multiple containers across nodes

If we want to run multiple containers, e.g. Docker containers, and want to automatically reboot or redistribute these containers among a cloud of nodes we need an orchestration system.

Kubernetes

Kubernetes is an option for this. It automatically distributes Containers over nodes.

OpenShift

OpenShift was built on top of Kubernetes by Red Hat and adds enhancements and security features.

Packaging - to get the most recent software into containers and onto nodes

A package manager serves to get software onto machines or into containers. Typically combined with caching. This is an important pillar to get useful software onto nodes.

Apt, Yum, Cargo, Conan, Pip

for Linux we have the “apt” and “yum” package mangers and their packaging formats *.deb and *.rpm to install software on Linux computers.
programming languages have package managers like Rust with Cargo, Python with it’s “pip” package manager and *.pypi packages, C++ Conan recipes and *tar.gz packages

The reason I bring them up is because an online apt and yum repository is in a sense a federated cloud. Many universities, internet service providers and large corporations have FTP and HTTP servers that synchronize among themselves to provide endpoints for the package managers to retrieve packages from.

Data Verification - to check correctness of data

With multiple nodes and data distributed and shared among the nodes, there needs to be data verification. This is an important step towards synchronization.

Merkle Trees

In computer science a tree is a data structure with nodes, where each node connects to a set of other nodes.

A hash is a large, typically 256-bit, number that is computed by a hash function from binary contents and is designed to make collisions - a collision occurs, when two different binary contents yield the same hash - highly unlikely.

In Merkle trees every parent node contains the hash of it’s children. It is used both in Git and Bitcoin, as well as the Inter-planetary File System (IPFS) and Apache Cassandra. The use of Merkle trees [11] guarantees consistent data and allows efficient verification of the data.

By recomputing the hash of a parent node from its children and comparing it any node in a network can verify the correctness of the data and it is often sufficient to only look at the top n-nodes and only act further, when an inconsistency is detected.

Example - Git

Git is a distributed source code management system that is the backbone of almost all software development. It has quickly ousted other solutions like CVS, mercurial, SVN and MS SourceSafe.

With Git every developer has a “checkout” of the code repository and can make his changes offline. Afterwards changes are committed and, when online, the repositories of other developers are synchronized. The Git commit history can be seen as a distributed log file.

For consistency it uses Merkle trees [11] and as such there is debate on whether or not Git, that appeared in 2005, is not already using a blockchain similar to the one in Bitcoin that appeared in 2009.

Example - Blockchain, Bitcoin

Cryptocurrencies like Bitcoin also have to tackle the challenge of P2P distributing what is essentially a log file. Here the log file is a ledger of financial transactions.

Data Consensus - to solve conflicts and agree on one version of that data to move forward

With multiple verified sets of the data spanned across the nodes we need to achieve “consensus” on which set of data is the correct version to continue with.

Paxos

Paxos is an algorithm proposed in a paper from 1980 that seems to be the first attempt at solving the consensus problem with a solution similar to elections in democratic parliaments.

Raft

Raft can be seen as a simplification of Paxos. Here nodes can take on the role of “leader” or “follower”. New leaders get elected, when followers disappear. Extra precaution has to be taken, when the network partitions, because then two leaders could be elected at the same time. For this there is a “term” system that counts the current term. So when the network partitions come back together the leader with the higher term number is selected as the overall leader. There are two excellent videos on YouTube about this, an overview [20] and some more technical details from a developer of the Go implementation [21].

Example - etcd

etcd, from the Linux “/etc” configuration directory and “d” for daemon is software from OpenCore that uses Raft underneath. It synchronizes configuration files across multiple nodes.

Example - nuRaft

There is a C++ implementation from eBay of Raft called nuRaft [22]. It has an interesting example implementation of a distributed calculator [23] that serves as inspiration for all sorts of projects.

Example - Apache Cassandra

Apache Cassandra is a distributed NoSQL-Database similar to Dynamo and Google’s BigTable. The later is specifically designed to handle web caches for the Google search engine [24].

Network Traversal - to be able to form direct peer-to-peer network connections

In order for distributed nodes to be able to communicate with each other we need network traversal schemes. In corporate networks often switched networks or VPN connections are used. For customer applications, in order to get bi-directional communication into home networks, we need some form of Network Address Translation (NAT). There are different options here.

UPnP

Routers with Universal Plug and Play (UPnP) enabled allow software running behind the router to open public facing network ports.

Stun & UDP Hole-Punching

An intriguing aspect of the Skype telephony protocol is how it was consistently able to connect even through private networks without UPnP and without prior exposed ports.

The trick here is what became known as STUN. Through, for example, an unblocked HTTP internet connection to a STUN server a UDP connection between two peers is coordinated.

Now UDP, in contrast to TCP, has the property that a router cannot tell if a package is an initial request or a response to a request. So if both peers send a few UDP packages to one another from both sides of the router or firewall, they can “punch” a “hole” through it. The routers on both sides will expect responses and will leave the ports opened for some time. The software can then perform direct bidirectional P2P communication even though both nodes are in private networks.

Data Distribution - to efficiently replicate data onto the nodes

With distributed nodes and the ability to verify the correctness of data we are still missing an option to replicate data onto our nodes as in when to send what data where. It should ideally minimizing excess communication.

Gossip Protocol

An naive approach is to use the Gossip Protocol [12]. It is also called “epidemic protocol”, because it randomly spreads information in all directions in the hope that after some time all nodes will have heard and we arrive at a consistent state.

From what I see in the code of early Bitcoin implementation [13] it used a Gossip or Epidemic approach to spreading the Merkle Tree distributed ledger (verification needed).

Chord

A more efficient approach than Gossip is to use a distributed hash table (DHT). One approach that utilizes a DHT is Chord [14]. It was first mentioned in a paper from 2001 and found use in Gnutella an Bittorrent [14].

Conceptually Chord orders the nodes of the network in a circle and defines which nodes must synchronize with which subset of other nodes (by modulo [28]) in order to achieve a high availability.

Example - Skype

Apart from the UDP hole-punching in Skype, being the first P2P telephony protocol, it is also a very interesting to study. It is proprietary, but there are many papers analyzing its behavior. The Skype protocol defines different roles: “Nodes”, Super Nodes” and such much like the consensus algorithms mentioned above. In 2011 some source code has been leaked [19].

Data Structure

To transfer data between nodes we need data schema and protocols for compatibility.

JSON

Currently JSON or XML is very commonly used, but also binary serialization formats like the one defined by Protobuf is common.

Protobuf Serialisation

To serialize data across different nodes in a network that may have different CPU architectures (big-endian, litte-endian, bit widths, etc…), we need robust serialization. Protobuf is a set of libraries for a large number of programming languages that allows serializing an deserializing objects to store them or send them over the network.

As with all serialization, there are drawbacks:

it serializes structs and classes, but needs an additional compile step to intermediary formats
the output format is not human readable
the compile size is large and the code can be inefficient compared with purpose built solutions
it doesn’t meet the requirements of high throughput communication like media streaming

Communication Protocols

We also need communication protocols, or, the format of the bit stream that is sent from one computer node in a network to another.

REST API Endpoints

Here “Representational State Transfer” (REST) is often used. Invented in ‘94 it mandates some key properties

In simplified terms:

client-server - multiple clients talk to a server
stateless - the server is not required to keep a state that would consume memory on its side
cachable - following the definition of a function, for the same set of parameters the same output is expected. This is important to effectively cache requests.
unified interface - the interface is independent of the underlying architecture. To be able to keep it stable and compatible.
layered - the client does not need to be aware which back end server is handling the request

Remote Procedure Call (RPC)

Often, in distributed systems, we want to call a function or procedure on a different node. There exist libraries that facilitate these “remote procedure calls”.

Example - .Net Remoting

The .Net langauges have a library called “remoting” [29] that I’ve used around ‘08. Back then I was automating online web browser games (Macromedia, later Adobe Flash at the time). My Laptop, an LG C1 Dual Express Tablet, with it’s ultra low voltage processor couldn’t handle the load I was putting on it in order to analyze a video stream in real time.

With Dot Net Remoting it was easy to run the heavy lifting on a more powerful desktop machine with an Intel Core2Quad and run the rest of the software on the laptop. Code wise not much had changed: There was a standard function call, but it would run somewhere else. The compute duration was shorter and, as expected, the function call itself took significantly longer than a call to local function. The rest was fairly transparent.

Example - gRPC w/ Protobuf

With gRPC, that uses Protobuf underneath, we gain remote procedure calls much like “Network OLE”, “Distributed Component Object Model” (DCOM) [25] and .Net Remoting mentioned above. The ability to run a function, as if it were local, but have it run, over the network, on another node.

Example - MPI

In scientific computing, for so-called “Beowulf Clusters” (term from the 90’s), the “Message Passing Interface” (MPI) is commonly used.

The Quantum Chemistry Software Gaussian, for example, has an extension “G09 with Linda”, now “TCP Linda 9”, to run on compute clusters, that uses MPI underneath as a remote procedure call interface.

Data Protection

If we’re going over the internet and have critical information we may seek to protect the data from eavesdropping.

There are two aspects to this:

encryption - to make sure no unauthorized actors can read the data
certificates - to make sure the data was sent from the correct origin and not injected by an attacker (man-in-the-middle) [26].

HTTP-over-TLS

To handle encryption a common methods is HTTP-over-TLS or HTTPs. Most of the internet nowadays uses https:// connections. Here certificates that are signed from a root authority are used to validate the true identity of the parties involved in the connection.

Certificates

Certificates are used to safely identify a client with the server. Otherwise the system may become susceptible to man-in-the-middle attacks [26].

Public-key cryptography

Additionally, often data has to be encrypted in order to prevent eavesdropping.

A way to do this is to generate key pairs “private” and “public”-key.

in asymmetric encryption if User A encrypts the data with User B’s public key, then only User B’s private key can be used to decrypt it.

That is already useful, but still susceptible brute-forcing to attacks.

Deffie-Hellmann Key exchange goes further
- the own private key, combined with the other nodes public key is used together to generate a shared secret.
- on the other side the same approach is used to come to that same shared secret.
- from there on that shared secret is used as a password for encrypting and decrypting the data
- the process can be improved by using additional “salting” to increase the length of the shared secret

Example - SSH

SSH supports a variety of Key-Exchange algorithms

ssh -Q kex

diffie-hellman-group1-sha1
diffie-hellman-group14-sha1
diffie-hellman-group14-sha256
diffie-hellman-group16-sha512
diffie-hellman-group18-sha512
diffie-hellman-group-exchange-sha1
diffie-hellman-group-exchange-sha256
ecdh-sha2-nistp256
ecdh-sha2-nistp384
ecdh-sha2-nistp521
curve25519-sha256
curve25519-sha256@libssh.org
sntrup761x25519-sha512@openssh.com

and chiphers

ssh -Q cipher

3des-cbc
aes128-cbc
aes192-cbc
aes256-cbc
aes128-ctr
aes192-ctr
aes256-ctr
aes128-gcm@openssh.com
aes256-gcm@openssh.com
chacha20-poly1305@openssh.com

Sometimes, when security vulnerabilities are found, they are removed.

This has happened with

arcfour
arcfour128
arcfour256

They are then classified as “weak ciphers” by OpenSSH and need to be explicitely enabled with, e.g. 'ssh -c arcfour128`.

This occurs for instance when you’re running an unsafe router with an old version of OpenWRT [27].

Progress

Conclusion

Distributed computing is a topic we may come into contact with from multiple angles

Routers flashed with home brew software,
Beowulf clusters for distributed scientific computing,
Multiplayer games written in Flask and Socket.IO during the pandemic,
out of an interest for Blockchain technologies,
WebAssembly in order to write C/C++-code for the web,
Home-Servers running Pley, NextCloud, Git, SSH, etc…
CI/CD, Cloud service providers, CDNs and Kubernetes in professional careers

There are only few infrastructures world wide with a significant enough size to require large scale redundancy and fail over. Additionally many large corporations run their own custom and proprietary software. The details are often considered trade secrets and are kept secret. We can of course run our own simulated environments using open source projects and experiment, in a similar fashion to Netflix’ “Chaos Monkey”, with randomly shooting down micro services, and see how a simulated network redistributed data and workloads.

1] https://en.wikipedia.org/wiki/Gossip_protocol
2] https://en.wikipedia.org/wiki/Raft_(algorithm)
3] https://en.wikipedia.org/wiki/Diaspora_(social_network)
4] https://de.wikipedia.org/wiki/Mastodon_(Soziales_Netzwerk)
5] https://de.wikipedia.org/wiki/Kubernetes
6] https://en.wikipedia.org/wiki/Proof_of_work
7] https://en.wikipedia.org/wiki/ACID
8] https://en.wikipedia.org/wiki/CAP_theorem
9] https://ntrs.nasa.gov/citations/20170009051
10] https://en.wikipedia.org/wiki/Pyramid_scheme
11] https://en.wikipedia.org/wiki/Merkle_tree
12] https://en.wikipedia.org/wiki/Gossip_protocol
13] https://github.com/bitcoin/bitcoin/commit/4405b78d6059e536c36974088a8ed4d9f0f29898
14] https://en.wikipedia.org/wiki/Chord_(peer-to-peer)
15] https://en.wikipedia.org/wiki/Kademlia
16] https://en.wikipedia.org/wiki/Skype_protocol
17] "Baset, Schulzrinne - An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol" - https://arxiv.org/abs/cs/0412017v1
18] https://en.wikipedia.org/wiki/STUN
19] https://www.techspot.com/news/44093-skype-protocol-reverse-engineered-source-posted-online.html
20] https://www.youtube.com/watch?v=ZyqAbQkpeUo
21] https://www.youtube.com/watch?v=ro2fU8_mr2w
22] https://github.com/eBay/NuRaft/tree/master
23] https://github.com/eBay/NuRaft/blob/master/examples/calculator/calc_server.cxx
24] https://de.wikipedia.org/wiki/Bigtable
25] https://de.wikipedia.org/wiki/Distributed_Component_Object_Model
26] https://en.wikipedia.org/wiki/Man-in-the-middle_attack
27] https://de.wikipedia.org/wiki/OpenWrt
28] https://en.wikipedia.org/wiki/Modulo
29] https://www.dotnetheaven.com/article/remoting-using-vb.net

Dennis Salzner - Distributed Computing