Distributed Systems Architectures
Updated:
Goal
- To understand the different ways on how to view the organisation of a distributed system.
Introduction
- It’s all about software components and these components are software, apps, processes.
- Components interact with each other, on a diff machine on a diff service
- Software architecture – tells us how the various software components
should be organized and interact.
- Imp goal: to separate the applications form underlying platforms. To have ONE SINGLE COHERENT VIEW → through the middleware!!
- Once you instantiate software architecture (placing soft. components on a real machines)→ it becomes a system architecture
- WHAT ARE THE difference between them?
Architectural Style
- Formulated in terms of:
- Components with well-defined interfaces
- the way they are connected to each other – mechanism that mediates communication, coordination among components
- data exchanged btwn components
- how they are jointly configured into a system.
- E.g. As the dff components can run on a diff machine, A run on
machine 2, BCD on machine 1.
- Component B interacting with D – local invocation. Share the
same server and memory
- Interface of D is made available to B – B needs a copy of D’s interface
- When they are in a diff machine, A still needs a copy of D’s interface. Definitely! Invocation has to go through a network –
- Component B interacting with D – local invocation. Share the
same server and memory
- Important is that components are connected, and provide interfaces so that other components can use it
1. Layered Architecture
- A – pure layered organization
- Only downcalls to the next lower layer are made.
- In TCP IP, there are 5 layers. (application, transport, network, link, physical) and you can only talk to the next one, not n-2.
- Layer n-1 will provide an interface to n, and if you wanna talk to me, u’d have to do it through this interface
- B – mixed layered organisation.
- E.g. take layer n-1. There’s an app called A. this will invoke an OS library that’s available in layer n-3. AS WELL as n-3, A will call layer n-2,wihich holds a maths library . maths library itself relies on OS library in layer n-3! So n-2 has to call n-3 as well.
- C – layered organisation with upcalls
- Have a lower layer do an upcall to its next higher layer
- E.g. OS signals the occurrence of an event
- Its to do with handle – Its possible to subscribe to events, and when they become available, it gets an automatic notification.
- This case, n-1 is interested in event in n-2, so n-2 notifies n-1 that the event (handle) happened, using an upcall.
E.g. Layered Communication Protocol
- In TCP/IP protocol stack, each layer provides services and functions.
- Each layer offers an interface and in order for n to invoke n-1, it exhibits an interface that could be used by layer n. but it has no idea how the functionalities in n-1 is implemented. What’s important is the interface!!! And that it hides the implementation.
- Protocol is a set of rules that parties will follow in order to exchange info. (communication btwn parties)
- Important to understand the difference between a service offered by a layer
Application Layering: the PAD model
- Taking a layered arch and applying in a modern software. It ends
up as a PAD model
- Presentation layer – you sitting in front of computer, and about to use a browser.
- application layer – processing layer. When you send a search request to somewhere, it does a core functionality of a search. !!it will rely on other services that will hold the data.
- Data layer – responsible for the storage where the app layer operates
E.g. a Web Search Engine
- RHS you have three levels: UI, processing, data
- (UI) User types in a search keyword → (Processing) Query generator needs db → (Data) db query is taken care of → (Data) the info (page titles and info) is sent back to processing level → (processing) All results are ranked → (processing) generates HTML pages→ (UI) its passed back to UI level.
A search on Google cloud: How it is handled
- Query is sent through google – in the border router of the google cloud, and it needs to balance the influx of query load. Load balancer tells you to go this way!! It will get taken by a rack of servers. And its sent back to me. This happens sooooosoososos quick!
- Hierarchy of the switches is very complex
- Refer to processing and data level from prev!
2. Object based style
- Objects corresponds to components, connected and communicate through a procedure call mechanism. If two objects reside on the same system → method call, over a network → remote procedure call.
- Object encapsulate the data (state), as they exhibit the interface, but neverrrr shows how its implemented.
Architecture
Client-side stub (proxy)
- Client invokes a method
- Server gives a copy of interface to client → called proxy.
Proxy is loaded into clients address space.
- Proxy marshal the method invocation client made
- and unmarshal reply messages to return the result of the method invocation to the client.
- The marshalled invocation is passed across network.
Server-side stub (skeleton)
- Incoming invocation requests are first sent to a server stub
(skeleton)
- Which unmarshals the invocation client sent, and actually make method invocation that client wants at the server through an interface.
- Also creates a reply and marshals them and forwards reply msgs to the client-side proxy - Proxy and the skeleton are referred to as stubs!!! BOARD
3. Resource based Architectures
- View a distributed system as a set of resources where machines, individually manged by components . Resources may be added, deleted, modified, etc by (remote) apps
- Characteristics of RESTful architecture:
- Resources need an identifier→ is usually accessed through URI (uniform resource identifier)
- All the services offer same interface. e.g. put, get, delete, post
- Messages are fully self-described. E.g. when sending HTML, say that it is HTML. Send its media type!!
- After executing a service, component forgets about the caller
- Once the sever gets the request, that server takes the rq, process it, sends back the resource and forgets it → is memoryless execution보드
- Is prominent in web services and REST
- Operations – put, get, delete, post → CRUD operation!! 보드
- Create (PUT), Read (GET), Update (POST), Delete (DELETE)
E.g. Amazon’s Simple storage service (Amazon S3) – RESTful in practice
- Objects (=files) are placed into buckets (=directories)
- By placing a file in a bucket, file is automatically uploaded to the Amazon cloud
- ObjectName contained in BucketName –access through: http://BucketName.s3.amazonaws.com/ObjectName
- URI - Operations are carried out by sending HTTP requests –
- PUT - request through HTTP
- GET – to see if the object is contained in the BucketName
- S3 – access a file in s3 web service
- Specific object in that bucket called ObjectName
- Simple as long as you know the URI→ in detail in further lecture
4. Event based architecture
- As processes join and leave, its important that dependencies btwn
processes are as loose as possible → hence, architecture that has
strong separation between processing and
coordination
- So more of autonomously operating processes
- Here we emphasize the coordination!! → it encompasses the communication and cooperation btwn processes and machines
- Two design:
- Mailbox coordination- 2 processors, to work and exchange
info – they use shared mailbox ,
and they communicate through this shared mailbox
- Write and fetch to the mailbox
- No real communication btwn the two
- Event-based coordination – coordination btwn processors will
happen once the event occurs
- Processer 1 publish a notification describing the occurrence of event 1, and if ur interested, you subscribe, and will be notified!! And will have access to it
- Publish and subscribe
- Mailbox coordination- 2 processors, to work and exchange
info – they use shared mailbox ,
and they communicate through this shared mailbox
- Example of a Coordination Model: Shared Data Space
- Event based architectural style – publish subscribe is key.
- Event bus – mechanism which the publishers and subscribers are matched → what coordinates these events
- Shared data space architectural style – there’s a db which is
persistent and liable
- The components will communicate entirely through tuples which is saved in a saved db, and other one does a quick search to see if the tuple exists any tuple that matches is returned.
- Tuples: a structured data records with number of fields - Can be combined w event based – process subscribes to certain tuples publish \&subscribe
System Architectures
- System architecture is the instantiation (placing soft. components on a real machines) of a software architecture. but how do I actually place them on hardware?????
- There are 3 ways to organise this architecture: centralised, decentralised, hybrids
1. Centralised Organisations
Basic Client Server Model
- One of the core organization of the system architecture
- Server – process implementing a specific service – e.g. file or db service
- Client – a process that requests a service from server
- The way it works: Clients and servers could be in a diff machine –
both follows request-reply model
- Client sends the request, server accepts, server does the service, server replies → would take a bit of time
- By means of a simple connectionless protocol- efficient.
- But making the protocol resistant to transmission failure is not trivial
2-Tiered Architecture
- UI is split btwn the client and server
- E.g. if ur in front of the machine and filling in the form - it can be implemented on both server or client side prev lec
- 5 diff ways to implement PAD model
- A - only terminal dependent UI is in client side, but also exists in server side.
- B – entire UI on the client side, which communicates with rest of the app through a protocol on the server. Client does no processing
- C –processing is partly done in the client side. Front end checks for the correctness of the form, and more complex in server.
- D – client is not only in charge of UI, but the processing! But for data, gotta ask the server
- E – client’s local disk has partly some data → doing a lot →
FAT client!! Not good lmao
- E.g. web browsing – build huge cache on local disk
- If we have more functionalities on the client side, it has to be more end-user resilient, and has to cope with a lot of different platform, needing for a multiple version- hence not optimal.
3-Tiered Architecture
- Client – application server – database server. Processing layer
are executed by a separate server
- !
- UI – sends request to the application and present the result to the user
- Application server – doesn’t have the data although it understands the business logic. Get the data from db server and process it, and send it up
- Database server – take the rq, process and send it to application layer
- E.g. transaction processing. A separate process, a transaction processing monitor, coordinates all transactions across different data servers
- Q: So how much did user wait??? And is the user aware that there’s a
db server?
- They don’t know!! Encapsulation
Is 2 or 3 tier better?
- Depends on archi – but for good design, it’ll be better to have 3 as everything is separate.
2. Decentralised
- Peer to Peer (P2P) architecture
- They communicate, and could join and leave, but the interaction is always theatre.
- There is no master node – processes are all equal → each process will act as a client AND server
- Sometime process will need functionality hence would have to send the request
- E.g. Structured P2P
- Nodes are organised in an overlay with specific topology. E.g.
binary tree, grid → for data lookup
- Based on using semantic-free index.
- Each data item is associated with a key, which is used as an
index. Its common to use a hash function.
Key (data item) = hash (data item’s
value)
- each node responsible for strong (key, value) pairs - thus implementing a distributed hash table
-
Any node can be asked to lookup a given key , which then comes down to efficiently routing that lookup request to the node responsible for storing the data associated with the given key .
- Question is, what happens when node 7 is requested to look
up data that key 14 has? key 14 is stored on node 14. Key =
node.
- 7 is connected to 15, then 14 has to connect to 15→ two hop solution
- There exists a lookup table that has a structure of the system
- Want to find a shortest path from 1101 – 1011 → routing problem by using the lookup table which helps you decide the next node.
- Nodes are organised in an overlay with specific topology. E.g.
binary tree, grid → for data lookup
3. Hybrid
- Need to combine centralized and decentralized→ we experience on a daily basis
- Ideally ud like to have and ISP (Internet Service Providers) server
very close to where you live – due to bandwidth and latency issue.
So what we do is combine, we will end up in
number of edge servers all around the
network. As a end user we connect to closest edge server that could
serve me
- Sometimes when the edge server doesn’t have the info I need, It will connect to another edge server
- This is what the content providers tend to do! Have the edges spread → Netflix
- Interesting problem – how many edge? How do we optimize the content itself? And how do you distribute the apps all over the edges?
Summary
- Reviewed various architectural styles and system architectures
- Discussed how the client-server model generalises to the idea of multi-tier systems
- Described the logical layering of a client-server system according to the PAD model
- Emphasized A LOT on client server model and the PAD model – to understand where UI, interface and the data is
Leave a comment