Timing and Synchronization

8 minute read

Updated:

Time in Distributed System

  • We have DS and have 4 machines and it will give 4 diff time as they have their own global time → hence time is unambiguous in a centralised system.
  • In order for these machines to work, we need a mechanism so that there’s some form of agreement on global time
  • Q: is there any way of synchronizing all the clocks? YES!

Internal Physical Clock

  • Machines these days have built in time hardware called a precisely machined quartz crystal.
    • Internal clock on PC has a crystal and it oscillates. There’s a tension due to oscillation, which oscillates at a well-defined frequency.
  • Each oscillation will decrement a counter by 1.
  • What happens when it gets 0? Can simply generate timer interrupt or clock tick and counter is reloaded from a holding register
  • In older machine, there’s a reference where this internal clock will always reference to, which is called epoch, first generated in 1/1/1970.
  • If you have DS with n machines, each machine has crystals and each crystals will run at slightly different rates → results in ambiguity which we refer as clock skew

Time Standards

  • Since the invention of atomic clock – 1 second is defined as the time for a ‘caesium 133’ atom to make 9 billion electronic transition
  • If we are able to define atomic and solar second – can we agree on it being mapped?
  • History – BIH – Bureau International Heure – if all the people \ take the average. → International Atomic Time (TAI)
  • Why are we doing this?
    • Bc the mean solar time would need semantic – the concept of time is changing bc the atmospheric change, the amt of coal, hence the notion of time needs to be caught up
  • Hence BIH introduce leap seconds whenever discrepancy between TAI and means solar time reaches 800ms. → we need to artificially introduce a second

The Worldwide standard for time: UTC

  • There is discrepancy and mean solar and TAI do not agree. If this difference >800ms, then we introduce a leap second.
    • 31st of December midnight – we introduce leap second! Without us knowing,
  • If we take time based on the TAI and introduce leap time – this corrected time IS the time and is Universal Coordinated Time (UTC)– the time u, me, mobile phones etc reference
  • UTC synchronise signals need to be broadcasted so we know what time it is. It is broadcasted through radio transmitters and satellites.

Clock Synchronization Algorithms

  • If we know the UTC, we can synchronize our time.
    1. One is to have a machine with UTC receiver, we can connect to a satellite and ask for UTC → Cristian
    2. OR machines in DS have no UTC receiver, so you need to make an agreement on global time → Berkeley
  • → The point is to always minimize clock skew
  • E.g. of skew – difference of 10-6 per second. you will end up with 1 second in every 11.6 days – could be 1 sec behind or faster

Synchronization model

  • Assumptions: DS with n machines and own clock and timer, and it issues H interrupts per second.
  • Interrupt handler adds 1 to software clock, and keeping track the number of ticks since some standard time in the past – value of clock is C.
  • At UTC time t, the value of ith clock in the system is Ci(t)
    • Ideally, Ci(t) = t + γ for all i and t in every DS.
    • The variation of dc/dt is always 1!! Which means everytime there’s a tick, there’s 1 tick. Then there’s a clear synchronization
  • However, in practice there are some variations, resulting a drift.
    • Bc their frequency is not perfect and affected by external sources such as temperature, clocks on different machines will gradually start showing different values for time
  • Variation of tick over time/ and 1-ρ → where ρ is a maximum drift rate.
    • 1−ρ ≤ dC/dt ≤ 1 + ρ
    • Its ahead or behind.
    • Since we are looking at number of TICKS. So, 61 ticks mean it’s faster.
  • E.g. 1 hour = 3600 second.
    • 1 second = 60 ticks. → 216000 ticks per hour
    • If I have a relative error which gives about 10-5, within an hour we end up with 2 ticks as my maximum drift rate. This means that in my DS, I can have clocks that are generating 215998 and 216002 ticks.

Drift as a function of UTC

  • image
  • It’d be easy to have a perfect clock where dc/dt = 1, slow clock when < 1, and fast clock when >1.
  • When UTC is drifting in diff direction, at a time ∆t after synchronization, they must be as much as 2p∆t apart.
  • This can be a concern in DS coz if you are not careful about clock skew, this may lead to some catastrophic event.
    • There was a software error in the clock, and it was a time for missile, and this has killed 100s of people.

Cristian’s Algorithm

  • image
  • A client with process running and time server running the processors.
  • Server has UTC, and at intervals not exceeding ∆C/2ρ sec, each machine requests current time to UTC server
  • Server gets the request and look at UTC and sends back.
  • Bc by the time server’s time gets to the client, the time would’ve passed. And there could be a network delay. Client needs to ensure that that delay is taken into consideration.
    • So what client does is they estimate the round trip time. Set a timer and calculate the time, and take half and add it to the UTC time that server sends.

Propagation time

  • Screenshot 2020-02-25 at 9 20 02 pm
  • T0 sends the request, server gets it hand handles it, with I – interrupt handling time. It then sends back the current UTC to T1.
  • End of the day if you know the time client sends and receives the request, and if you can handle the interrupt handling time, you can calculate the propagation time.
  • (T1-T0-I)/2, where I is time that server needs to handle the interrupt

Berkeley Algorithm

  • image
  • Easy to implement coz there is no UTC server
  • There’s always the leader- the time daemon.
    1. Daemon asks each machine for its current time. And each reply with their clock time.
    2. Based on the answer daemon calculates the difference in time and sends the NEW time to all the other 2 including itself.
    • You add all of them, 0+25-10 / 3 = +5. 3. Then daemon tells everyone how to adjust their clock.
    • What happens now is the time daemon sets it global time to 3;05. And other clocks as well to 3:05.
  • Problem is, it won’t scale. Too much overhead due to too much messages in btwn

Election Algorithms

  • Deciding who is going to play the daemon
  • In DS there’s a need for a coordinator. Hence, we use election algorithm to pick a coordinator (leader election)
  • E.g. Take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm. → through election algorithm: Bully algorithm.

Bully Algorithms

  • Ds machines all have unique numerical ID, and the processes communicates with each other, through a reliable connection.
  • The key idea is it selects the highest id, and this one will start the election.
    • If it realises that the corrdiantor failed, the election is initiated.
    • 3 msg types – election, OK, I won
    • Several processes can initiate an election but need a consistent result to have one elected leader
  • O(n2) – quite large complexity, coz it needs to talk to each other
  • Steps
    • P sends election messages to all process with higher ids and waits for OK messages
    • P becomes the cord if no OK was received and send out I WON, or if it receives OK, the higher ID will take responsibility

Bully Algorithm Example

  • Screenshot 2020-02-25 at 9 21 08 pm
  • You have 8 processes, and imagine that process 7 was the coordinator. But process 7 crashed and can’t be a leader. Then processes realized this failure of 7 due to timeout, so 4 holds the election. It calls process 5 and 6, and they respond and tell 4 to stop. And 5 and 6 each holds election. Since they dunno 7 crashed, they send request to 7. 6 tells 5 to stop as 6>5. 6 wins and sends everyone ‘I WON’ to all the system
  • As long as you have semantics attached to msgs you send, you good

Network Time Protocol

  • Enables clients across internet to be synchronized to UTC.
  • If we have a hierarchy of servers, and each server will connect to a primary, you are guaranteed that at some point the UTC becomes more accurate and accurate.
    • Need mechanism for servers to talk to each other and request this UTC !!
  • It provides a reliable service that can survive losses of connectivity
  • Authenticates timing data to protect against malicious or accidental interference with service
  • Protocol; will allow clients to resynchronise sufficiently frequently to offset drift rates common to most computers

Structure of NTP

  • Servers form a logical hierarchy – called synchronisation subnet.
  • image
  • Primary server – Stratum 1 - They are directly connected to a time source, such as radio clock reviving UTC. Server that has the most accurate time. – ntp2.ja.net
  • Secondary server - Stratum 2 – connects to stratum 2, and synchronise with primary servers
    • dnsserv2.leeds.ac.uk connects to ntp2.ja.net
  • Stratum 3 – another server that syncs to stratum2.
    • Feng gps connects to dnssrv.
  • Idea is leaf servers would be the one to be executed in users’ workstations.
  • Accuracy decreases as we move down through the strata.

Summary

  • There is no global agreement on time in a DS; internal clocks of a system’s computers will run at different rates.
  • UTC provides an external global standard, to which we can attempt to synchronise a DS.
  • Cristian’s algorithm can be used to synchronise machines on a LAN to the UTC time received by a time server.
  • The Berkeley algorithm can be used to achieve reasonable consistency in the clocks of machines on a LAN, without synchronising to UTC.
  • NTP provides a decentralised time service suitable for synchronising machines on the Internet.
  • Introduced the notion of election as well as the Bully algorithm
  • Alternative of Bully algorithm is the Ring Algorithm
  • Question: can you explain this alternative algorithm?

Leave a comment