Distributed Systems Architectures

11 minute read

Updated:

Goal

  • To understand the different ways on how to view the organisation of a distributed system.

Introduction

  • It’s all about software components and these components are software, apps, processes.
  • Components interact with each other, on a diff machine on a diff service
  • Software architecture – tells us how the various software components should be organized and interact.
    • Imp goal: to separate the applications form underlying platforms. To have ONE SINGLE COHERENT VIEW → through the middleware!!
  • Once you instantiate software architecture (placing soft. components on a real machines)→ it becomes a system architecture
  • WHAT ARE THE difference between them?

Architectural Style

  • Formulated in terms of:
    • Components with well-defined interfaces
    • the way they are connected to each other – mechanism that mediates communication, coordination among components
    • data exchanged btwn components
    • how they are jointly configured into a system.
  • image

  • E.g. As the dff components can run on a diff machine, A run on machine 2, BCD on machine 1.
    • Component B interacting with D – local invocation. Share the same server and memory
      • Interface of D is made available to B – B needs a copy of D’s interface
    • When they are in a diff machine, A still needs a copy of D’s interface. Definitely! Invocation has to go through a network –
  • Important is that components are connected, and provide interfaces so that other components can use it

1. Layered Architecture

  • Screenshot 2020-02-13 at 9 30 46 am

  • A – pure layered organization
    • Only downcalls to the next lower layer are made.
    • In TCP IP, there are 5 layers. (application, transport, network, link, physical) and you can only talk to the next one, not n-2.
    • Layer n-1 will provide an interface to n, and if you wanna talk to me, u’d have to do it through this interface
  • B – mixed layered organisation.
    • E.g. take layer n-1. There’s an app called A. this will invoke an OS library that’s available in layer n-3. AS WELL as n-3, A will call layer n-2,wihich holds a maths library . maths library itself relies on OS library in layer n-3! So n-2 has to call n-3 as well.
  • C – layered organisation with upcalls
    • Have a lower layer do an upcall to its next higher layer
    • E.g. OS signals the occurrence of an event
      • Its to do with handle – Its possible to subscribe to events, and when they become available, it gets an automatic notification.
      • This case, n-1 is interested in event in n-2, so n-2 notifies n-1 that the event (handle) happened, using an upcall.

E.g. Layered Communication Protocol

  • Screenshot 2020-02-13 at 9 32 06 am
  • In TCP/IP protocol stack, each layer provides services and functions.
  • Each layer offers an interface and in order for n to invoke n-1, it exhibits an interface that could be used by layer n. but it has no idea how the functionalities in n-1 is implemented. What’s important is the interface!!! And that it hides the implementation.
  • Protocol is a set of rules that parties will follow in order to exchange info. (communication btwn parties)
  • Important to understand the difference between a service offered by a layer

Application Layering: the PAD model

  • Taking a layered arch and applying in a modern software. It ends up as a PAD model
    • Presentation layer – you sitting in front of computer, and about to use a browser.
    • application layer – processing layer. When you send a search request to somewhere, it does a core functionality of a search. !!it will rely on other services that will hold the data.
    • Data layer – responsible for the storage where the app layer operates

E.g. a Web Search Engine

  • Screenshot 2020-02-13 at 9 32 38 am
  • RHS you have three levels: UI, processing, data
  • (UI) User types in a search keyword → (Processing) Query generator needs db → (Data) db query is taken care of → (Data) the info (page titles and info) is sent back to processing level → (processing) All results are ranked → (processing) generates HTML pages→ (UI) its passed back to UI level.

A search on Google cloud: How it is handled

  • Query is sent through google – in the border router of the google cloud, and it needs to balance the influx of query load. Load balancer tells you to go this way!! It will get taken by a rack of servers. And its sent back to me. This happens sooooosoososos quick!
  • Hierarchy of the switches is very complex
  • Refer to processing and data level from prev!

2. Object based style

  • Objects corresponds to components, connected and communicate through a procedure call mechanism. If two objects reside on the same system → method call, over a network → remote procedure call.
  • Object encapsulate the data (state), as they exhibit the interface, but neverrrr shows how its implemented.

Architecture Screenshot 2020-02-13 at 9 33 29 am

Client-side stub (proxy)

  1. Client invokes a method
  2. Server gives a copy of interface to client → called proxy. Proxy is loaded into clients address space.
    • Proxy marshal the method invocation client made
    • and unmarshal reply messages to return the result of the method invocation to the client.
  3. The marshalled invocation is passed across network.

Server-side stub (skeleton)

  1. Incoming invocation requests are first sent to a server stub (skeleton)
    • Which unmarshals the invocation client sent, and actually make method invocation that client wants at the server through an interface.
    • Also creates a reply and marshals them and forwards reply msgs to the client-side proxy - Proxy and the skeleton are referred to as stubs!!! BOARD

3. Resource based Architectures

  • View a distributed system as a set of resources where machines, individually manged by components . Resources may be added, deleted, modified, etc by (remote) apps
  • Characteristics of RESTful architecture:
    • Resources need an identifier→ is usually accessed through URI (uniform resource identifier)
    • All the services offer same interface. e.g. put, get, delete, post
    • Messages are fully self-described. E.g. when sending HTML, say that it is HTML. Send its media type!!
    • After executing a service, component forgets about the caller
      • Once the sever gets the request, that server takes the rq, process it, sends back the resource and forgets it → is memoryless execution보드
      • Is prominent in web services and REST
  • Operations – put, get, delete, post → CRUD operation!! 보드
    • Create (PUT), Read (GET), Update (POST), Delete (DELETE)

E.g. Amazon’s Simple storage service (Amazon S3) – RESTful in practice

  • Objects (=files) are placed into buckets (=directories)
  • By placing a file in a bucket, file is automatically uploaded to the Amazon cloud
  • ObjectName contained in BucketName –access through: http://BucketName.s3.amazonaws.com/ObjectName
  • URI - Operations are carried out by sending HTTP requests –
    • PUT - request through HTTP
    • GET – to see if the object is contained in the BucketName
    • S3 – access a file in s3 web service
    • Specific object in that bucket called ObjectName
  • Simple as long as you know the URI→ in detail in further lecture

4. Event based architecture

  • As processes join and leave, its important that dependencies btwn processes are as loose as possible → hence, architecture that has strong separation between processing and coordination
    • So more of autonomously operating processes
  • Here we emphasize the coordination!! → it encompasses the communication and cooperation btwn processes and machines
  • Two design:
    • Mailbox coordination- 2 processors, to work and exchange info – they use shared mailbox , and they communicate through this shared mailbox
      • Write and fetch to the mailbox
      • No real communication btwn the two
    • Event-based coordination – coordination btwn processors will happen once the event occurs
      • Processer 1 publish a notification describing the occurrence of event 1, and if ur interested, you subscribe, and will be notified!! And will have access to it
      • Publish and subscribe
  • Example of a Coordination Model: Shared Data Space

image

  1. Event based architectural style – publish subscribe is key.
    • Event bus – mechanism which the publishers and subscribers are matched → what coordinates these events
  2. Shared data space architectural style – there’s a db which is persistent and liable
    • The components will communicate entirely through tuples which is saved in a saved db, and other one does a quick search to see if the tuple exists any tuple that matches is returned.
    • Tuples: a structured data records with number of fields - Can be combined w event based – process subscribes to certain tuples publish \&subscribe

System Architectures

  • System architecture is the instantiation (placing soft. components on a real machines) of a software architecture. but how do I actually place them on hardware?????
  • There are 3 ways to organise this architecture: centralised, decentralised, hybrids

1. Centralised Organisations

Basic Client Server Model

  • One of the core organization of the system architecture
  • Server – process implementing a specific service – e.g. file or db service
  • Client – a process that requests a service from server
  • The way it works: Clients and servers could be in a diff machine – both follows request-reply model
    • Client sends the request, server accepts, server does the service, server replies → would take a bit of time
    • By means of a simple connectionless protocol- efficient.
  • But making the protocol resistant to transmission failure is not trivial

2-Tiered Architecture

  • UI is split btwn the client and server
    • E.g. if ur in front of the machine and filling in the form - it can be implemented on both server or client side prev lec
  • 5 diff ways to implement PAD model
    • A - only terminal dependent UI is in client side, but also exists in server side.
    • B – entire UI on the client side, which communicates with rest of the app through a protocol on the server. Client does no processing
    • C –processing is partly done in the client side. Front end checks for the correctness of the form, and more complex in server.
    • D – client is not only in charge of UI, but the processing! But for data, gotta ask the server
    • E – client’s local disk has partly some data → doing a lot → FAT client!! Not good lmao
      • E.g. web browsing – build huge cache on local disk
  • If we have more functionalities on the client side, it has to be more end-user resilient, and has to cope with a lot of different platform, needing for a multiple version- hence not optimal.

3-Tiered Architecture

  • Client – application server – database server. Processing layer are executed by a separate server
    • !image
    • UI – sends request to the application and present the result to the user
    • Application server – doesn’t have the data although it understands the business logic. Get the data from db server and process it, and send it up
    • Database server – take the rq, process and send it to application layer
  • E.g. transaction processing. A separate process, a transaction processing monitor, coordinates all transactions across different data servers
  • Q: So how much did user wait??? And is the user aware that there’s a db server?
    • They don’t know!! Encapsulation

Is 2 or 3 tier better?

  • Depends on archi – but for good design, it’ll be better to have 3 as everything is separate.

2. Decentralised

  • Peer to Peer (P2P) architecture
    • They communicate, and could join and leave, but the interaction is always theatre.
    • There is no master node – processes are all equal → each process will act as a client AND server
    • Sometime process will need functionality hence would have to send the request
  • E.g. Structured P2P
    • Nodes are organised in an overlay with specific topology. E.g. binary tree, grid → for data lookup
      • Based on using semantic-free index.
    • Each data item is associated with a key, which is used as an index. Its common to use a hash function. Key (data item) = hash (data item’s value)
      • each node responsible for strong (key, value) pairs - thus implementing a distributed hash table
    • Any node can be asked to lookup a given key , which then comes down to efficiently routing that lookup request to the node responsible for storing the data associated with the given key .

    • Screenshot 2020-02-13 at 9 35 35 am

    • Question is, what happens when node 7 is requested to look up data that key 14 has? key 14 is stored on node 14. Key = node.
      • 7 is connected to 15, then 14 has to connect to 15→ two hop solution
      • There exists a lookup table that has a structure of the system
    • Want to find a shortest path from 1101 – 1011 → routing problem by using the lookup table which helps you decide the next node.

3. Hybrid

  • Need to combine centralized and decentralized→ we experience on a daily basis
  • Ideally ud like to have and ISP (Internet Service Providers) server very close to where you live – due to bandwidth and latency issue. So what we do is combine, we will end up in number of edge servers all around the network. As a end user we connect to closest edge server that could serve me
    • Sometimes when the edge server doesn’t have the info I need, It will connect to another edge server
    • This is what the content providers tend to do! Have the edges spread → Netflix
  • Interesting problem – how many edge? How do we optimize the content itself? And how do you distribute the apps all over the edges?

Summary

  • Reviewed various architectural styles and system architectures
  • Discussed how the client-server model generalises to the idea of multi-tier systems
  • Described the logical layering of a client-server system according to the PAD model
  • Emphasized A LOT on client server model and the PAD model – to understand where UI, interface and the data is

Leave a comment