Distributed Systems Architectures

11 minute read

Updated: February 05, 2020

Architectural Style
System Architectures
Summary

Goal

To understand the different ways on how to view the organisation of a distributed system.

Introduction

It’s all about software components and these components are software, apps, processes.
Components interact with each other, on a diff machine on a diff service
Software architecture – tells us how the various software components should be organized and interact.
- Imp goal: to separate the applications form underlying platforms. To have ONE SINGLE COHERENT VIEW → through the middleware!!
Once you instantiate software architecture (placing soft. components on a real machines)→ it becomes a system architecture
WHAT ARE THE difference between them?

Architectural Style

Formulated in terms of:
- Components with well-defined interfaces
- the way they are connected to each other – mechanism that mediates communication, coordination among components
- data exchanged btwn components
- how they are jointly configured into a system.
E.g. As the dff components can run on a diff machine, A run on machine 2, BCD on machine 1.
- Component B interacting with D – local invocation. Share the same server and memory
  - Interface of D is made available to B – B needs a copy of D’s interface
- When they are in a diff machine, A still needs a copy of D’s interface. Definitely! Invocation has to go through a network –
Important is that components are connected, and provide interfaces so that other components can use it

1. Layered Architecture

A – pure layered organization
- Only downcalls to the next lower layer are made.
- In TCP IP, there are 5 layers. (application, transport, network, link, physical) and you can only talk to the next one, not n-2.
- Layer n-1 will provide an interface to n, and if you wanna talk to me, u’d have to do it through this interface
B – mixed layered organisation.
- E.g. take layer n-1. There’s an app called A. this will invoke an OS library that’s available in layer n-3. AS WELL as n-3, A will call layer n-2,wihich holds a maths library . maths library itself relies on OS library in layer n-3! So n-2 has to call n-3 as well.
C – layered organisation with upcalls
- Have a lower layer do an upcall to its next higher layer
- E.g. OS signals the occurrence of an event
  - Its to do with handle – Its possible to subscribe to events, and when they become available, it gets an automatic notification.
  - This case, n-1 is interested in event in n-2, so n-2 notifies n-1 that the event (handle) happened, using an upcall.

E.g. Layered Communication Protocol

In TCP/IP protocol stack, each layer provides services and functions.
Each layer offers an interface and in order for n to invoke n-1, it exhibits an interface that could be used by layer n. but it has no idea how the functionalities in n-1 is implemented. What’s important is the interface!!! And that it hides the implementation.
Protocol is a set of rules that parties will follow in order to exchange info. (communication btwn parties)
Important to understand the difference between a service offered by a layer

Application Layering: the PAD model

Taking a layered arch and applying in a modern software. It ends up as a PAD model
- Presentation layer – you sitting in front of computer, and about to use a browser.
- application layer – processing layer. When you send a search request to somewhere, it does a core functionality of a search. !!it will rely on other services that will hold the data.
- Data layer – responsible for the storage where the app layer operates

E.g. a Web Search Engine

RHS you have three levels: UI, processing, data
(UI) User types in a search keyword → (Processing) Query generator needs db → (Data) db query is taken care of → (Data) the info (page titles and info) is sent back to processing level → (processing) All results are ranked → (processing) generates HTML pages→ (UI) its passed back to UI level.

A search on Google cloud: How it is handled

Query is sent through google – in the border router of the google cloud, and it needs to balance the influx of query load. Load balancer tells you to go this way!! It will get taken by a rack of servers. And its sent back to me. This happens sooooosoososos quick!
Hierarchy of the switches is very complex
Refer to processing and data level from prev!

2. Object based style

Objects corresponds to components, connected and communicate through a procedure call mechanism. If two objects reside on the same system → method call, over a network → remote procedure call.
Object encapsulate the data (state), as they exhibit the interface, but neverrrr shows how its implemented.

Architecture Screenshot 2020-02-13 at 9 33 29 am

Client-side stub (proxy)

Client invokes a method
Server gives a copy of interface to client → called proxy. Proxy is loaded into clients address space.
- Proxy marshal the method invocation client made
- and unmarshal reply messages to return the result of the method invocation to the client.
The marshalled invocation is passed across network.

Server-side stub (skeleton)

Incoming invocation requests are first sent to a server stub (skeleton)
- Which unmarshals the invocation client sent, and actually make method invocation that client wants at the server through an interface.
- Also creates a reply and marshals them and forwards reply msgs to the client-side proxy - Proxy and the skeleton are referred to as stubs!!! BOARD

3. Resource based Architectures

View a distributed system as a set of resources where machines, individually manged by components . Resources may be added, deleted, modified, etc by (remote) apps
Characteristics of RESTful architecture:
- Resources need an identifier→ is usually accessed through URI (uniform resource identifier)
- All the services offer same interface. e.g. put, get, delete, post
- Messages are fully self-described. E.g. when sending HTML, say that it is HTML. Send its media type!!
- After executing a service, component forgets about the caller
  - Once the sever gets the request, that server takes the rq, process it, sends back the resource and forgets it → is memoryless execution보드
  - Is prominent in web services and REST
Operations – put, get, delete, post → CRUD operation!! 보드
- Create (PUT), Read (GET), Update (POST), Delete (DELETE)

E.g. Amazon’s Simple storage service (Amazon S3) – RESTful in practice

Objects (=files) are placed into buckets (=directories)
By placing a file in a bucket, file is automatically uploaded to the Amazon cloud
ObjectName contained in BucketName –access through: http://BucketName.s3.amazonaws.com/ObjectName
URI - Operations are carried out by sending HTTP requests –
- PUT - request through HTTP
- GET – to see if the object is contained in the BucketName
- S3 – access a file in s3 web service
- Specific object in that bucket called ObjectName
Simple as long as you know the URI→ in detail in further lecture

4. Event based architecture

As processes join and leave, its important that dependencies btwn processes are as loose as possible → hence, architecture that has strong separation between processing and coordination
- So more of autonomously operating processes
Here we emphasize the coordination!! → it encompasses the communication and cooperation btwn processes and machines
Two design:
- Mailbox coordination- 2 processors, to work and exchange info – they use shared mailbox , and they communicate through this shared mailbox
  - Write and fetch to the mailbox
  - No real communication btwn the two
- Event-based coordination – coordination btwn processors will happen once the event occurs
  - Processer 1 publish a notification describing the occurrence of event 1, and if ur interested, you subscribe, and will be notified!! And will have access to it
  - Publish and subscribe
Example of a Coordination Model: Shared Data Space

Event based architectural style – publish subscribe is key.
- Event bus – mechanism which the publishers and subscribers are matched → what coordinates these events
Shared data space architectural style – there’s a db which is persistent and liable
- The components will communicate entirely through tuples which is saved in a saved db, and other one does a quick search to see if the tuple exists any tuple that matches is returned.
- Tuples: a structured data records with number of fields - Can be combined w event based – process subscribes to certain tuples publish \&subscribe

System Architectures

System architecture is the instantiation (placing soft. components on a real machines) of a software architecture. but how do I actually place them on hardware?????
There are 3 ways to organise this architecture: centralised, decentralised, hybrids

1. Centralised Organisations

Basic Client Server Model

One of the core organization of the system architecture
Server – process implementing a specific service – e.g. file or db service
Client – a process that requests a service from server
The way it works: Clients and servers could be in a diff machine – both follows request-reply model
- Client sends the request, server accepts, server does the service, server replies → would take a bit of time
- By means of a simple connectionless protocol- efficient.
But making the protocol resistant to transmission failure is not trivial

2-Tiered Architecture

UI is split btwn the client and server
- E.g. if ur in front of the machine and filling in the form - it can be implemented on both server or client side prev lec
5 diff ways to implement PAD model
- A - only terminal dependent UI is in client side, but also exists in server side.
- B – entire UI on the client side, which communicates with rest of the app through a protocol on the server. Client does no processing
- C –processing is partly done in the client side. Front end checks for the correctness of the form, and more complex in server.
- D – client is not only in charge of UI, but the processing! But for data, gotta ask the server
- E – client’s local disk has partly some data → doing a lot → FAT client!! Not good lmao
  - E.g. web browsing – build huge cache on local disk
If we have more functionalities on the client side, it has to be more end-user resilient, and has to cope with a lot of different platform, needing for a multiple version- hence not optimal.

3-Tiered Architecture

Client – application server – database server. Processing layer are executed by a separate server
- !
- UI – sends request to the application and present the result to the user
- Application server – doesn’t have the data although it understands the business logic. Get the data from db server and process it, and send it up
- Database server – take the rq, process and send it to application layer
E.g. transaction processing. A separate process, a transaction processing monitor, coordinates all transactions across different data servers
Q: So how much did user wait??? And is the user aware that there’s a db server?
- They don’t know!! Encapsulation

Is 2 or 3 tier better?

Depends on archi – but for good design, it’ll be better to have 3 as everything is separate.

2. Decentralised

Peer to Peer (P2P) architecture
- They communicate, and could join and leave, but the interaction is always theatre.
- There is no master node – processes are all equal → each process will act as a client AND server
- Sometime process will need functionality hence would have to send the request
E.g. Structured P2P
- Nodes are organised in an overlay with specific topology. E.g. binary tree, grid → for data lookup
  - Based on using semantic-free index.
- Each data item is associated with a key, which is used as an index. Its common to use a hash function. Key (data item) = hash (data item’s value)
  - each node responsible for strong (key, value) pairs - thus implementing a distributed hash table
- Any node can be asked to lookup a given key , which then comes down to efficiently routing that lookup request to the node responsible for storing the data associated with the given key .
- Question is, what happens when node 7 is requested to look up data that key 14 has? key 14 is stored on node 14. Key = node.
  - 7 is connected to 15, then 14 has to connect to 15→ two hop solution
  - There exists a lookup table that has a structure of the system
- Want to find a shortest path from 1101 – 1011 → routing problem by using the lookup table which helps you decide the next node.

3. Hybrid

Need to combine centralized and decentralized→ we experience on a daily basis
Ideally ud like to have and ISP (Internet Service Providers) server very close to where you live – due to bandwidth and latency issue. So what we do is combine, we will end up in number of edge servers all around the network. As a end user we connect to closest edge server that could serve me
- Sometimes when the edge server doesn’t have the info I need, It will connect to another edge server
- This is what the content providers tend to do! Have the edges spread → Netflix
Interesting problem – how many edge? How do we optimize the content itself? And how do you distribute the apps all over the edges?

Summary

Reviewed various architectural styles and system architectures
Discussed how the client-server model generalises to the idea of multi-tier systems
Described the logical layering of a client-server system according to the PAD model
Emphasized A LOT on client server model and the PAD model – to understand where UI, interface and the data is