Randomness

8 minute read

Updated:

6. Randomness

Objectives

  • To consider how good computers are at generating sequences of random numbers
  • To examine the security implications of bad RNGs
  • To explore secure RNG implementations

‘Random’ Number Generation

  • XKCD cartoon – its not really random. | | | —————————————————————————————————————————— | | “Anyone who considers arithmetical methods of producing random numbers is, of course, in a state of sin.” – John von Neumann - |
  • So We cannot use algorithmic method to truly generate random numbers!!
  • E.g. C program that generates a random number, but when you run it again it generates the same 5 rand numbers
  • Computers are deterministic – they are predictable. If we run the program and feed the input, we expect a certain output predicted output

Pseudo-random number generators (PRNGs)

  • What we really talking when we say ‘random’ is PRNGs
  • When you run this, since its based on an algo, it will produce a definite sequence of values
  • It may not look like there’s a pattern, but if you see it long enough it will wrap around
  • Period – the length of time it takes to see it wrap around
  • Determined by seed – typically a 32, 64-bit integer value
    • Given seed, you always get the same sequence. Which is why we saw 2 same set of numbers
  • This deterministic nature is useful at times- software testing, when the test data is reproducible.
    • Can use a fixed seed to always generate same number.
    • But from security standpoint, you don’t want that.

Attacks against PRNGs

  • Cryptanalytic approach– attack using a cipher, using mathematical methods – analysing the sequence of numbers. will tell us that its not all that random, and is rather algorithmic
  • Brute force – since there’s a seed which determines where ur sequence starts, there’s a certain number of possible starting points or sequences. you can match up generated sequence and observed sequence, you can guess the seed.
    • 32 bit is not that great as there are possibility of computing it
  • Discovery of internal state of the generator such that you can easily know how to generate a seed
    • If you know the seed, you got an entirely predictable behaviour so it could be easier
    • If we take 32 bit output, and squash it down to 6 possible values, then you increase the number of observations that is necessary to figure out the internal state

Look at standard PNRGs and weaknesses and practical examples of how its been exploited, and how to do better

Linear Congruential Generator (LCG)

  • image
  • Values are generated by multiplication and addition hence linear.
  • ni – previous value of the sequence
  • To start = n0 = seed value
  • Period <= m.
    • Glibc – standard lib of Linux, m = 231, a = 11024214, c= 12345
  • Our value’s output range are gonna be 0 - 231-1 in glibc
  • Such a simple arithmetic that is fast, doesn’t require much memory
  • BUT Not good – period is too small for security purposes
  • And the numbers aren’t good either – this is what’s used for c program he demonstrated

Mersenne Twister

  • If ur using other languages, other than C- python and ruby uses Mersenne twister
  • Slower and requires more storage
  • But the period is enormous 219937 – 1
    • U will never see it repeat for entire universe time
  • Better quality number and large quantity better for simulations and stuff
  • If you observe 624 consecutive values, you can work backwords and predict the internal state better than LCG but no good in security purposes

What’s wrong with the code?

  • Basic c program calling randsame seed is used so get the same sq of values
  • So we added a line of srand(time(NULL))
    • srand to seed my generator, and rand produces next number in the sq
    • image
  • Question is, what’s wrong?
    • Time is predictable – not a good thing to use to seed not good for security purposes
    • Since time ticks over, we get diff thing, but the time is predictable, so the sq is also predictable

Cigital’s Internet Poker Exploit – consequences of it being predictable

  • First online gambling – download and install on PC
  • U can’t see others card, only community card and own card
  • To compute what everyone’s cards are – company was worried that hunters wouldn’t trust that its random. So they published the deck shuffling algorithm
    • Algo was non-cryptographic like LCG - 32-bit seed was used= around 4 billion possible shuffles, when in reality is 2226 possible shuffles
  • If you choose a shit seed
    • particular seed value used for this gambling was milliseconds since midnight – which reduced the shuffle to 86400000
    • AND time is predictable! if we know the time that server uses, we can narrow down the possibilities to less than 86 million
  • Why do we know what time server clock to a reasonable accuracy?
    • NTP – time protocol – server can sync up to other servers atomic clock, using ntp so you can predict to a few hundred millisecond what time it is in that server
  • So security advisors built a little app that could figure it all out
    • Put time in, shuffle, and you specify ur cards that you can see, and 3 community cards
    • It then computes all the possible shuffles that matches to the 5 cards and figure out other player’s cards, and the hidden cards
    • So they changed the algorithm

Does this really matter?

  • Its not just gambling, random number generation is used to generate keys too!!
  • TLS – uses session key (randomly generated symmetric cipher key) for encryption and decryption of data. And public key cryptography is used to exchange that key between the browser
    • We need to break the encryption to find. But if we know what the session key is, there’s no need to break it
    • Seed value for session key could be predicted by the time of the day, and process id
  • Keys – have to be large enough for brute force, but also have to be random enough so they can’t be predicted

Another issue: TCP ISNs

  • TCP allows us to create byte stream connection between 2 machines. To set this up you need a handshake between client and server
  • Initial stage: SYN – client synchronizes and sends the sequence number nc
  • Then server to prove that it received, sends back ACK, incrementing the counter by 1, and also its own initial sq no. ns , along with the SYN flag
  • Client ACK the sq number, and increment it, and incrementing the client counter by 1
  • Sq number has been synchronised on both side – can be used to ensure TCP segments are put together in a correct order using sq number
  • For this to work, we shouldn’t be able to predict what the sq numbers are TCP connection could be hijacked
    • Initial sequence number (ISN) on both sides HAS to be random.

ISN predictability

  • Can see the correlation and pattern
  • Large distributed – good ISN, sufficiently random
  • If you see a structure – high predictability
  • how OS used to be bad at this

We’re still getting wrong - Much more recent areas

  • E.g. Sony’s PS3, they don’t want you to play it in hardware that hasn’t been authorized
    • So bootloader checks that its been sighed by sony’s private key – use elliptic curve
    • Signature needs a random nonce and that is never reused – but Sony used a fixed value tf
      • Which could be recovered by looking at enough signatures
    • Sony wasn’t happy, sued, and wanted to have a restraining order to not publish the key
  • E.g.2. Bitcoin – that their bitcoin was disappearing (stolen) due to a pseudo number bug
  • -> Randomness is used in many different places where the key shouldn’t be predictable!!!

!! Better PRNGs

  • Our secure PRNGs use cryptographic techniques
    1. Can run symmetric cipher in counter mode
    2. Or take secret counter and hast that – output is the hash, and we are incrementing the counter, where the initial value of the counter is the seed
  • we are relying on reversibility - crypto’s characteristic of it being hard to reverse
    • Hash can’t be recovered to generate original msg
  • Problem we have is that we still need to seed this well So, the initial counter value needs to be as random as possible

Entropy Collection

  • Randomness = entropy. More entropy we have, more genuine randomness there is
  • We talk about it in bits of entropy; Coins – heads or tail, 0,1
  • Bc algos are deterministic, we can’t use simple aglo as a good source for entropy
    • Can use radioactive decay – genuine quantum mechanical randomness
    • Chaotic processes or random noise
      • E.g. HotBits generator – click sound, counting the radioactive decay and convert this as a true randomness
      • Even keyboard coulddd be used – could give reasonable amount of entropy
  • Our OS holds a lot of resources

LavaRnd

  • LavaRand – digital photographs of coloured oil forming and rising – pattern is unpredictable, but not a lot of entropy
  • More modern – digital images and put it in a dark container
    • Ccd sensor produces an image that consists of random pixel values, generated by the thermal noise
  • Then you process it, and get soo many random bytes from processing one single image

Randomness via the OS

  • Could rely on the OS – not good as radioactive/ thermal noise but is still good and its available to us
  • Urandom.py 10000
  • Using OS as a source and we go to low level – use of API to access this device

Higher level approaches

  • python, use secrets module
    • token bytes telling how many bytes you want
    • or create SystemRandom() object
  • java – SecureRandom class

Summary

  • Output from traditional PRNGs is insufficiently random for security purposes
  • Real cases where insufficient randomness has been exploited– Casino, Sony, Bitcoin
  • Instead, use good cryptographically strong PRNGs - symmetric cipher or hash counter –
  • Use good source of seed for entropy – OS, LavaRand

Leave a comment