Clicky

20150826

Random numbers created with a TPM (test yourself!)


Random numbers are the base of many cryptographic functions. E.g. they are used for the creation of a new identity or when a secure connection is set up between a client and a server. When the randomness is predictable, so not as random as you expect, the protection of the cryptographic functions are flawed.

The Trusted Platform Module (TPM) has a build-in Random Number Generator (RNG):

Trusted Platform Block schema

I did some research to the quality of the random numbers as produced by the TPM chip. To make a long story short, the quality (randomness) of TPM based random numbers is much better than random number generators from Operating Systems. This means that the level of protection, when TPM based random numbers are used, is better.

How random is my random data?



This was a pretty hard question until Karl Pearson defined the “Chi Square test” in the early 20th century. The Chi square test takes the count that a certain number is occurring and calculates the accuracy against the expected random number frequency. 
 
Example: if you throw a dice 60 times you expect that each side of the dice is represented 10 times. So, 10 times the 1, 10 times the 2 until 10 times the 6. This is the theory. In practice there will be a deviation of these numbers. 

The Chi square test is able to calculate if the returned values from the dices are still random (enough) or if you should suspect an unbalanced dice that favor a particular side of the dice (see this link for a YouTube movie that explains the Chi square test with dice).
When we apply the Chi square test on random data, the test can show that certain random data is not as random as we expected. In other words, when “bad” random data is used in computations, it might be possible for an attacker to predict (a portion of the) random data and break the security.
Since the Chi square test is in the statistics domain, we have to make assumptions to define a “bad” or “good” RNG. In this case we assume that when the “Distribution” value is around 256 and the “Exceed” value is between 10% and 90%, the RNG is providing “good” random data. These assumptions are based on the theory as described by Ronald Knuth in is programmers reference books: “The Art of Computer Programming” (Third edition, Volume 2, Chapter 3.3.1). See also links below on more information around the Chi square test. 

DIY TPM random number testing


You do not have to believe me. You can check for your self! The only thing you need is a computer with a TPM and a small program that produces random numbers with the TPM.

Disclaimer: the download of 'tpmrng' is free but the use of 'tpmrng' is (of course) at your own risk.


Step 1: download 'tpmrng' here (in English or Deutsch)
Step 2: unzip somewhere on your harddisk
Step 3: execute (user mode, default amount of random data: 1MB)

You should see something like this:


The 'tpmrng' application displays the state of the TPM (so also a handy tool to quickly check if your TPM is ready for use!), the RNG buffer size (which is different per TPM chip manufacturer) and the performance of the RNG.

You can use the file 'random.bin' for testing with the Chi square tests.

Verify randomness with 'Ent'


The 'Ent' utility analyzes random numbers. Download Ent here (link) and use Ent like this:



Explanation of Ent's tests:

 

(The reader may also check the Ent website for a comprehensive explanation of the Ent tests.)

 

Entropy

A good random number gives all the 256 possibilities that are possible from the 8 bits. If the RNG only gives the 26 letters of the alphabet (see ‘Compression’) only 4.7 bits out of 8 bits are used. The entropy for the 26 letters of the alphabet is therefore: 4.7.
A Good RNG: The Entropy value must be (or very close to) 8.

Compression

Computers calculate with ‘byte’. A byte consists of 8 ‘bits’. A bit is a memory location that can have 2 values: 0 (zero) or 1 (one). With some math we can calculate that one byte can have the number of values (2) to power of positions (8): 28 = 256 values (or characters).
Consider the alphabet that consists of 26 characters. It is clear that we do not have to use all the 8 bits to present the 26 letters of the alphabet. We can calculate the amount of bits by the division of the logarithm of characters (26) to the logarithm of values (2) : log(26) / log(2) = 4.7 bits. Since the result is a fraction, we need actually 5 bits (rounded up) to represent all the letters of the alphabet. Using 5 bits gives some “spare” characters because number of values (2) to the power of positions (5) is 25 = 32 characters. Compression programs, like ZIP,7z or RAR use this “spare” to compress data. E.g. imagine a 1000 byte document with only letters of the alphabet, a compression program can reduce the size from 1000 to 26/256*1000=101 bytes. A reduction of almost 90%.
A Good RNG: Optimum compression should be (very close to) 0%

Chi Square test

The Chi Square test is extremely sensitive for ‘un-random’ RNGs. We will use the Chi Square test for deeper analyses of the TPM and the Windows and Linux RNGs. See links below for explanations of the Chi square test.
A Good RNG: The Chi Square distribution value must be around 256 and the exceed value must be in the range 10-90%.

Average

Calculating the average is the easiest of all tests. You add all the random bytes and divide them to the amount of bytes. One byte can have a value between 0 and 255 se the average should be 255/2 = 127.5
 A Good RNG: has a value (very close to) 127.5

Monte Carlo value of Pi

This tests takes the random numbers for an x-value and y-value and places a dot in an x-y quadrant.
Imagine a very large number of dots. The relation of dots that are inside and outside of the circle is dependent on the surface inside of the circle and the combined 4 areas outside of the circle.  If the square’s surface = 1 (x=1 and y=1) than the circle has a surface of Ï€ * r2 = 3.14159 * 0.52=0.785398. This means that 785398 dots are inside the circle and 1000000-785398=214602 dots in the 4 areas outside of the circle.
The Monte Carlo value of Pi uses this fact to calculate the number inside and outside of the circle and derive the number Pi from the total number of random data.
A Good RNG: The value for the Monte Carlo value of Pi must be (very close) to: 3.14159265359

Serial correlation

If we use a file with random data that is sequenced 0-255-1-254-2-253… etc., it is clear that there is a relation between the sequence of numbers. This Serial correlation test calculates that relationship between consecutive random bytes.
A Good RNG: both values should be (very close) to 0 

TPM random numbers compared with those from Windows and Linux


To show the differences in RNGs from the TPM, Windows and Linux, we created 10 files of 1MB of data by each RNG and we analyzed the data with the Chi square test. The next table gives the results that were collected from a TPM, Windows and Linux RNG:


Disclaimer: Since everything is “random”, it can happen that a single test fails. E.g. see the one occasion where the TPM random data would be not good enough (91,59%). This means that for scientific proof of the quality of the randomness, the tests must be applied on much more data and with different analytic methods. In this case we use a “lightweight” method to support the statements around the quality of the RNGs in this article. 

Conclusion TPM based Random Numbers Generation

The table shows that the TPM is producing “better” random data than Windows and Linux RNGs. The Windows and Linux random data are “less random” (see the “Exceed” values outside 10-90%). This does not imply that the Windows and Linux random numbers are not random enough for day-by-day computations. It just shows that the random numbers from the TPM are “more random" and provide more trust and security in the cryptographic functions that are executed with TPMs.


No comments :

Post a Comment

Real Time Web Analytics