· --

1. Remote Procedure Calls (RPC)

distributed systems

TL;DR — RPC lets you call a function on a remote node as if it were local. Under the hood, an RPC framework handles marshalling arguments, sending messages over the network, and unmarshalling results. While the abstraction is powerful, it can’t hide the realities of distributed systems — networks fail, messages get lost, and retries introduce their own problems. Modern RPC takes many forms, from RESTful HTTP APIs to frameworks like gRPC that use Protocol Buffers and IDLs to bridge different languages and services.

Client-server RPC example: online payments
A client (online shop) sends an RPC to a payments service and receives a success response.

An example of an everyday distributed systems is when you buy something online using a credit / debit card. When you enter your card number in some online shop, that shop will send a payment request over the Internet to a service that specialises in processing card payments (e.g. stripe).
The payments service in turn communicates with a card network such as Visa or MasterCard, which communicates with the bank that issued your card in order to take the payment.

Lets see how the code would look like at the online shop

# Online shop handling customer's card details
card = Card()
card.set_card_number("1234 5678 8765 4321")
card.set_expiry_date("10/2024")
card.set_cvc("123")

result = payments_service.process_payment(card, 10.99, Currency.USD)
#                         ⬆ Implementation of this function is on another node!

if result.is_success():
    fulfil_order()

Calling the process_payment function looks like calling any other function, but in fact, what is happening behind the scenes is that the shop is sending a request to the payment service, waiting for a response, and then returning the response it received. The actual implementation of process_payment - the logic that communicates with the card network and the banks - does not exist in the code of the shop: it is part of the payments service, which is another program running on another node belonging to a different company.
This type of interaction, where code on one node appears to call a function on another node, is called a Remote Procedure Call (RPC). In Java it is called Remote Method Invocation (RMI). The software that implements RPC is called an RPC framework or middleware.


Lets take a deeper look into how this framework works:

RPC flow: marshalling, network transport, unmarshalling
The RPC framework marshals arguments, sends them over the network, and unmarshals the result.

When an application wishes to call a function on another node, the RPC framework provides a stub in its place. The stub has the same type signature as the real function, but instead of executing the real function, it encodes the function arguments in a message and sends that message to the remote node, asking for that function to be called. The process of encoding the function arguments is known as marshalling. In the example above, a JSON encoding is used for marshalling, but various other formats are also used in practice.
The sending of the message from the RPC client to the RPC server may happen over HTTP (in which case this is also called a web service), or one of a range of different network protocols may be used. On the server side, the RPC framework unmarshals (decodes) the message and calls the desired function with the provided arguments. When the function returns, the same happens in reverse: the function’s return value is marshalled, sent as a message back to the client, unmarshalled by the client, and returned by the stub. Thus, to the caller of the stub, it looks as if the function had executed locally.

Ideally, RPC makes a call to a remote function look the same as a local function call. “Location transparency”: the system hides where a resource is located.
But in practice, some hard questions arise:

  • What if the service crashes during the function call?
  • What if the message is lost?
  • What if a message is delayed?
  • If something goes wrong, is it safe to retry?

The difficulty with RPC is that many things can go wrong, as networks and nodes might fail. If the client sends an RPC request but receives no response, it doesn’t know whether or not the server received and processed the request. It could resend the request if it doesn’t hear back for a while, but that might cause the request to be performed more than once (e.g. charging a credit card twice). Even if we retry, there is no guarantee that the retried messages will get through either. Waiting forever is not a good approach, so in practice the client will have to give up after some timeout.

Despite these challenges, RPC has evolved significantly over the decades, with the goal of making it easier to program distributed systems. Today, the most common form of RPC is implemented using JSON data sent over HTTP. A popular set of design principles for such HTTP-based APIs is known as representational state transfer or REST and APIs that adhere to these principles are called RESTful. These principles include:

  • communication is stateless (each request is self-contained and independent from other requests)
  • resources (objects that can be inspected and manipulated) are represented by URLs
  • the state of a resource is updated by making a HTTP request with a standard method type, such as POST or PUT, to the appropriate URL.

Even though RESTful APIs and HTTP-based RPC originated on the web (where the client is JavaScript running in a web browser), they are now also commonly used with other types of client (e.g. mobile apps), or for server-to-server communication. Such server-to-server RPC is especially common in large enterprises, whose software systems are too large and complex to run in a single process on a single machine. To manage this complexity, the system is broken down into multiple services, which are developed and administered by different teams and which may even be implemented in different programming languages. RPC frameworks facilitate the communication between these services.

When different programming languages are used, the RPC framework needs to convert datatypes such that the caller’s arguments are understood by the code being called, and likewise for the function’s return value. A typical solution is to use an Interface Definition Language (IDL) to provide language independent type signatures of the functions that are being made available over RPC. From the IDL, software developers can then automatically generate marshalling/unmarshalling code and RPC stubs for the respective programming languages of each service and its clients.