Timing and Synchronization
Updated:
- Time in Distributed System
- Synchronization model
- Synchronization algorithms
- Summary
Time in Distributed System
- We have DS and have 4 machines and it will give 4 diff time as they have their own global time → hence time is unambiguous in a centralised system.
- In order for these machines to work, we need a mechanism so that there’s some form of agreement on global time
- Q: is there any way of synchronizing all the clocks? YES!
Internal Physical Clock
- Machines these days have built in time hardware called a precisely machined quartz crystal.
- Internal clock on PC has a crystal and it oscillates. There’s a tension due to oscillation, which oscillates at a well-defined frequency.
- Each oscillation will decrement a counter by 1.
- What happens when it gets 0? Can simply generate timer interrupt or clock tick and counter is reloaded from a holding register
- In older machine, there’s a reference where this internal clock will always reference to, which is called epoch, first generated in 1/1/1970.
- If you have DS with n machines, each machine has crystals and each crystals will run at slightly different rates → results in ambiguity which we refer as clock skew
Time Standards
- Since the invention of atomic clock – 1 second is defined as the time for a ‘caesium 133’ atom to make 9 billion electronic transition
- If we are able to define atomic and solar second – can we agree on it being mapped?
- History – BIH – Bureau International Heure – if all the people \ take the average. → International Atomic Time (TAI)
- Why are we doing this?
- Bc the mean solar time would need semantic – the concept of time is changing bc the atmospheric change, the amt of coal, hence the notion of time needs to be caught up
- Hence BIH introduce leap seconds whenever discrepancy between TAI and means solar time reaches 800ms. → we need to artificially introduce a second
The Worldwide standard for time: UTC
- There is discrepancy and mean solar and TAI do not agree. If this
difference >800ms, then we introduce a leap second.
- 31st of December midnight – we introduce leap second! Without us knowing,
- If we take time based on the TAI and introduce leap time – this corrected time IS the time and is Universal Coordinated Time (UTC)– the time u, me, mobile phones etc reference
- UTC synchronise signals need to be broadcasted so we know what time it is. It is broadcasted through radio transmitters and satellites.
Clock Synchronization Algorithms
- If we know the UTC, we can synchronize our time.
- One is to have a machine with UTC receiver, we can connect to a satellite and ask for UTC → Cristian
- OR machines in DS have no UTC receiver, so you need to make an agreement on global time → Berkeley
- → The point is to always minimize clock skew
- E.g. of skew – difference of 10-6 per second. you will end up with 1 second in every 11.6 days – could be 1 sec behind or faster
Synchronization model
- Assumptions: DS with n machines and own clock and timer, and it issues H interrupts per second.
- Interrupt handler adds 1 to software clock, and keeping track the number of ticks since some standard time in the past – value of clock is C.
- At UTC time t, the value of ith clock in the system is
Ci(t)
- Ideally, Ci(t) = t + γ for all i and t in every DS.
- The variation of dc/dt is always 1!! Which means everytime there’s a tick, there’s 1 tick. Then there’s a clear synchronization
- However, in practice there are some variations, resulting a drift.
- Bc their frequency is not perfect and affected by external sources such as temperature, clocks on different machines will gradually start showing different values for time
- Variation of tick over time/ and 1-ρ → where ρ is a maximum drift
rate.
- 1−ρ ≤ dC/dt ≤ 1 + ρ
- Its ahead or behind.
- Since we are looking at number of TICKS. So, 61 ticks mean it’s faster.
- E.g. 1 hour = 3600 second.
- 1 second = 60 ticks. → 216000 ticks per hour
- If I have a relative error which gives about 10-5, within an hour we end up with 2 ticks as my maximum drift rate. This means that in my DS, I can have clocks that are generating 215998 and 216002 ticks.
Drift as a function of UTC
- It’d be easy to have a perfect clock where dc/dt = 1, slow clock when < 1, and fast clock when >1.
- When UTC is drifting in diff direction, at a time ∆t after synchronization, they must be as much as 2p∆t apart.
- This can be a concern in DS coz if you are not careful about clock
skew, this may lead to some catastrophic event.
- There was a software error in the clock, and it was a time for missile, and this has killed 100s of people.
Cristian’s Algorithm
- A client with process running and time server running the processors.
- Server has UTC, and at intervals not exceeding ∆C/2ρ sec, each machine requests current time to UTC server
- Server gets the request and look at UTC and sends back.
- Bc by the time server’s time gets to the client, the time would’ve
passed. And there could be a network delay. Client needs to ensure
that that delay is taken into consideration.
- So what client does is they estimate the round trip time. Set a timer and calculate the time, and take half and add it to the UTC time that server sends.
Propagation time
- T0 sends the request, server gets it hand handles it, with I – interrupt handling time. It then sends back the current UTC to T1.
- End of the day if you know the time client sends and receives the request, and if you can handle the interrupt handling time, you can calculate the propagation time.
- (T1-T0-I)/2, where I is time that server needs to handle the interrupt
Berkeley Algorithm
- Easy to implement coz there is no UTC server
- There’s always the leader- the time daemon.
- Daemon asks each machine for its current time. And each reply with their clock time.
- Based on the answer daemon calculates the difference in time and sends the NEW time to all the other 2 including itself.
- You add all of them, 0+25-10 / 3 = +5. 3. Then daemon tells everyone how to adjust their clock.
- What happens now is the time daemon sets it global time to 3;05. And other clocks as well to 3:05.
- Problem is, it won’t scale. Too much overhead due to too much messages in btwn
Election Algorithms
- Deciding who is going to play the daemon
- In DS there’s a need for a coordinator. Hence, we use election algorithm to pick a coordinator (leader election)
- E.g. Take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm. → through election algorithm: Bully algorithm.
Bully Algorithms
- Ds machines all have unique numerical ID, and the processes communicates with each other, through a reliable connection.
- The key idea is it selects the highest id, and this one will start
the election.
- If it realises that the corrdiantor failed, the election is initiated.
- 3 msg types – election, OK, I won
- Several processes can initiate an election but need a consistent result to have one elected leader
- O(n2) – quite large complexity, coz it needs to talk to each other
- Steps
- P sends election messages to all process with higher ids and waits for OK messages
- P becomes the cord if no OK was received and send out I WON, or if it receives OK, the higher ID will take responsibility
Bully Algorithm Example
- You have 8 processes, and imagine that process 7 was the coordinator. But process 7 crashed and can’t be a leader. Then processes realized this failure of 7 due to timeout, so 4 holds the election. It calls process 5 and 6, and they respond and tell 4 to stop. And 5 and 6 each holds election. Since they dunno 7 crashed, they send request to 7. 6 tells 5 to stop as 6>5. 6 wins and sends everyone ‘I WON’ to all the system
- As long as you have semantics attached to msgs you send, you good
Network Time Protocol
- Enables clients across internet to be synchronized to UTC.
- If we have a hierarchy of servers, and each server will connect to a
primary, you are guaranteed that at some point the UTC becomes more
accurate and accurate.
- Need mechanism for servers to talk to each other and request this UTC !!
- It provides a reliable service that can survive losses of connectivity
- Authenticates timing data to protect against malicious or accidental interference with service
- Protocol; will allow clients to resynchronise sufficiently frequently to offset drift rates common to most computers
Structure of NTP
- Servers form a logical hierarchy – called synchronisation subnet.
- Primary server – Stratum 1 - They are directly connected to a time source, such as radio clock reviving UTC. Server that has the most accurate time. – ntp2.ja.net
- Secondary server - Stratum 2 – connects to stratum 2, and
synchronise with primary servers
- dnsserv2.leeds.ac.uk connects to ntp2.ja.net
- Stratum 3 – another server that syncs to stratum2.
- Feng gps connects to dnssrv.
- Idea is leaf servers would be the one to be executed in users’ workstations.
- Accuracy decreases as we move down through the strata.
Summary
- There is no global agreement on time in a DS; internal clocks of a system’s computers will run at different rates.
- UTC provides an external global standard, to which we can attempt to synchronise a DS.
- Cristian’s algorithm can be used to synchronise machines on a LAN to the UTC time received by a time server.
- The Berkeley algorithm can be used to achieve reasonable consistency in the clocks of machines on a LAN, without synchronising to UTC.
- NTP provides a decentralised time service suitable for synchronising machines on the Internet.
- Introduced the notion of election as well as the Bully algorithm
- Alternative of Bully algorithm is the Ring Algorithm
- Question: can you explain this alternative algorithm?
Leave a comment