New cellular automata design based on transformation sequence programming

NEW CELLULAR AUTOMATA DESIGN BASED ON TRANSFORMATION SEQUENCE PROGRAMMING TAN SYN KIAT NATIONAL UNIVERSITY OF SINGAPORE 2005 NEW CELLULAR AUTOMATA DESIGN BASED ON TRANSFORMATION SEQUENCE PROGRAMMING TAN SYN KIAT (B.Eng. (Hons.), NATIONAL UNIVERSITY OF SINGAPORE) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING NATIONAL UNIVERSITY OF SINGAPORE 2005 i Acknowledgement I am most grateful to my supervisor, Associate Professor Guan Sheng-Uei for his continuous guidance, understanding and encouragement during this work. The four years spent together on this thesis have been a taxing period. This thesis would not be possible if he has not regained my focus during the numerous times when I drifted away from the research goals. I would also like to express my appreciation to the numerous anonymous reviewers of our works. Many thanks to the fellow researchers from the Computer Communication Networks Laboratory who have made the work atmosphere enjoyable, especially to Zhu Fang Ming, Zhang Shu, Marie Therese Quieta and Eric Poon for their help during this work. Support for this work was given by the research scholarship awarded by the National University of Singapore. ii Contents Acknowledgments .i Contents ii Summary .vi List of Tables viii List of Figures .ix Chapter Introduction .1 1.1 Introduction to Work Presented in this Thesis .1 1.2 Overview on Pseudorandom Number Generators (PRNG) .5 1.3 Applications of Pseudorandom Number Generators 1.3.1 Cryptology .8 1.3.2 Built-in Self Test 1.4 1.4 Contribution of the Thesis .10 1.5 Overview of the Thesis .12 Chapter CA and LFSM based PRNG: Introduction and Literature Survey 15 2.1 Linear Finite State Machines (LFSM) 15 2.2 Cellular Automata 20 iii 2.3 Literature Survey of CA based PRNGs 23 2.3.1 Uniform CA .24 2.3.2 Hybrid CA 25 2.3.3 Approaches based on Genetic Algorithms .27 2.3.4 Two-Dimensional (2-d) CA .27 2.3.5 Configurable CA 29 2.4 Summary 31 Chapter Self-Programmable CA 34 3.1 Details of the Self-Programmable CA 35 3.2 State Transformations of SPCA f90 ↔ f165 and SPCA f150 ↔ f105 37 3.3 Experimental Results and Discussion 40 3.4 Summary 43 Chapter Transformation Sequence With Maximum Length Matrix 45 4.1 Maximum length LFSM .47 4.2 Transformation Sequence With Maximum Length Matrix 49 4.3 Extension to Arbitrary PRNGs .58 4.4 Preview to Following Chapters 59 4.5 Summary 60 iv Chapter L-Layered CA 62 5.1 Transformation Sequence of L-LCA 63 5.2 Analysis of Sequence Period for L-LCA 67 5.3 Experimental Results and Discussion 70 5.3.1 DIEHARD .70 5.3.2 Linear Complexity .76 5.4 Summary 78 Chapter Programmable Transformations .80 6.1 Introduction 81 6.2 Programmable Transformations .83 6.3 Experimental Results and Discussion 90 6.3.1 DIEHARD .90 6.3.2 Linear Complexity .92 6.3.3 Implementation Issues .93 6.4 Summary 96 Chapter Exponential Transformation Matrix Machines 98 7.1 Introduction 99 7.2 Exponential Transformation Matrix Machines 104 7.3 Experimental Results and Discussion 109 7.3.1 DIEHARD .110 7.3.2 Driving Ability 114 7.3.3 Implementation Issues .117 7.4 Summary 119 v Chapter Conclusion 121 8.1 Conclusion 121 8.2 Future Work .123 Bibliography .125 Appendix .133 A.1 Experimental Setup and Test Evaluation .134 A.2 DIEHARD Randomness Test Suite .135 vi Summary Cellular automata based pseudorandom number generators (CA PRNG) are widely used in various fields. Complex CA that generate highly random number sequences have been identified to consist of register transformation functions with more inputs over a non-local neighborhood while time-varying transformations are also used. These sequences are shown to pass all the DIEHARD randomness tests, however their designs are difficult to analyze and desirable sequence properties such as period and uniform distribution are not guaranteed. On the other hand, well-analyzed CA designs such as maximum length CA can be shown to have desirable sequence properties. However, the randomness quality of the generated sequences is not always satisfactory. The proposed Transformation Sequence with Maximum Length Matrix (TSMLM) concept facilitates the design of analyzable PRNG by using a top-down approach that changes all the registers’ existing transformation function such that they possess identified characteristics of complex CA and the concerted effect of all modified registers will generate sequences with randomness quality hypothesized to be at least as good as those complex CA. The transformation sequence of any maximum length CA or Linear Finite State Machine (LFSM) is first obtained and a suitable mapping is then applied to permute this transformation sequence. The resulting PRNG with this permuted transformation sequence will retain the analyzability of maximum length CA/LFSM as well as their desirable sequence properties. We also developed several theorems to support the design of new PRNG and the TSMLM concept can be applied with arbitrary PRNG under certain conditions. vii The new Programmable Transformations (PT) scheme uses cascaded, programmable linear transformations that are equivalent to a nonlinear transformation. These programmable transformations can be keyed such that a large class of 2n-1 different nonlinear transformations can be used. Each resulting nonlinear transformation generates n-bit sequences with a period of 2n-1 while the n single-bit sequences are not cyclically equivalent (unlike single-bit sequences from registers in maximum length LFSM). The PT scheme can also replace nonlinear transformations in many cryptosystems such as S-boxes, etc. Due to their low cost, nonlinear transformations with a large number of inputs is thus feasible. Furthermore, the overall security will be improved due to the key-able nature of the PT scheme. We tested the randomness quality of sequences generated from several 16- to 48-bit PT schemes using DIEHARD - their results consistently outperform the maximum length CA. For 24-bit and longer PT schemes, more than 18 DIEHARD tests are passed consistently. The linear complexity of these sequences is bounded by LC>n(2n-q) where q 0.999. Note that in this case, these values not represent a test conducted at the 0.001 confidence level in typical hypothesis testing. The corresponding DIEHARD source and documentations are available at [33]; we reproduce a brief description of the tests here. Overlapping sum test (Test 1): Random integers are transformed into floatingpoint numbers in the range [0,1), thus giving a new series u1,u2,… and the overlapping sums s1=u1 +…+u100. u2 +…+u101,… are formed. These sums are normal with a certain covariance matrix. These are then converted to uniform variables for a KS test. Runs test (Test 2): This test analyzes the trends of a sequence of numbers to determine the randomness of ascending and descending sub-sequences. The random integers are transformed into floating-point numbers in the range [0,1). The runs-up and runs-down covariance matrices lead to a chi-square test. Runs are counted for sequence of length 10,000. 3D sphere test (Test 3): Choose 4000 random points in a cube of side 1000 units. At each point, center a sphere with radius r large enough to reach the next closest point. The volume (or equivalently r3) of the smallest such sphere is exponentially distributed with mean = 30 (this mean is obtained through extensive simulation). Appendix A Description of Experiment Setup and Randomness Tests 137 The test generates 4000 such spheres 20 times, and each minimum r3 is transformed to an uniform variable by applying 1- exp(-r3/30). A KS test is performed on the 20 p-values. Parking lot test (Test 4): Consider a square parking lot of 100×100 cells. Cars are successively parked while trying to avoid bumping into already parked cars. After 12,000 attempts, the theoretical number of successes is normally distributed with mean = 3523 and standard deviation = 21.9. The actual values obtained are compared with this theoretical distribution, and the deviation gives us the p-value. A KS test is done on the p-values. Birthday spacing test (Test 5): Choose m birthdays in a year of n days. Count the intervals between each birthday pair and construct a class for each interval. The distribution of intervals should be asymptotically Poisson with mean m3/4n. In this test we use n =224 or n=218 and thus the mean = 2. Each sample is composed of 500 intervals, each one giving a p-value. A KS test is done on the p-values. Count the ones test (Test 6, 10): Consider a sequence of bytes. Each byte contains between zero and eight ones and the theoretical frequency for each category is 1,8,28,56,70,56,28,8,1 (total sum is 256). We group bytes containing zero to two ones in one category and bytes containing six to eight ones in another one. The new theoretical distribution becomes 37, 56, 70, 56, 37. These categories are denoted A to E. There are 55 possible five-letter words. From a string of 256000 five-letters words, counts are made on the frequencies of each word. The quadratic form in the weak inverse of the covariance matrix of cell count provides a chi-square test. Appendix A Description of Experiment Setup and Randomness Tests 138 Binary rank tests for n×m matrices (Test 7,8,9): These three tests depend upon the number of integers tested n and the corresponding number of bits m. Matrices are constructed in various ways from the given sequences and the rank of each matrix is calculated for a large number of cases. The frequencies are placed in different categories and then a chi-square test is performed between the rank’s empirical matrix and the rank’s theoretical matrix frequencies. The next four tests are variants of the missing words test. Consider a string of bits and an alphabet comprising of different combinations of zeros and ones. For example, a two-bit alphabet is composed of four letters ((0,0),(0,1),(1,0),(1,1)). A word is being a string composed of n letters and the entire sequence is treated as a series of words (with overlapping). Each test counts the words that not appear in the sequence and compares with the theoretical distribution for the missing words. If the value for a missing word is too far from the mean of the theoretical distribution we will consider that the series is not random. In these tests, the string is always 221 words long. • Bitstream test (Test 11): A 1-bit alphabet (0,1) is used. The words are 20 letters long (with overlapping). • OPSO test (Test 16): A 10-bit alphabet (1024 letters) is used. The words are two letters long (with overlapping). • OQSO test (Test 17): A 5-bit alphabet (32 letters) is used. The words are four letters long (with overlapping). • DNA test (Test 18): A 2-bit alphabet is used here. The words are 10 letters long (with overlapping). Appendix A Description of Experiment Setup and Randomness Tests 139 Craps test (Test 12): Play 200,000 games of craps and find the number of wins and throws necessary to end a game. The number of wins should exhibit a normal distribution with mean 110,556 and standard deviation 95.6962. To test the number of throws necessary to win, a chi-square test is done. Minimum distance test (Test 13): Choose 8000 points in a square of 10,0002 units and d is the minimum distance between the (n2-n)/2 pairs of points. If the points are independent, then d2 has to be exponentially distributed with mean = 0.995. The value of - exp (-d2/0.995) should be uniformly distributed in the range [0,1). A KS test on the values obtained measures the uniformity of the points in the square. The overlapping 5-permutation test (Test 14): Consider a sample of one million 32-bit integers. Each series of five consecutive integers could be in one of 120 possible states. After collecting many such series, cumulative counts are made on the number of occurrences for each series and compared with the theoretical distribution of the frequencies. Squeeze test (Test 15): Random integers are transformed into floating-point numbers in the range [0,1). Consider k =231 and the function k = ceiling (k ×U), where U is a number in our sequence and j is the number of iterations necessary to reduce k to one. The value of j is measured 100,000 times and the frequencies are then compared to the theoretical ones through a chi-square test. [...]... (2-LFSR) self-programmable CA layered CA with L layers of memory transformation sequence with maximum length matrix programmable transformations exponential transformation matrix machines ETMM constructed from the transformation matrix of a CA ETMM constructed from the transformation matrix of a 1-LFSR ETMM constructed from the transformation matrix of a 2-LFSR (t) n p S (t ) ( ( [ snt )1 , snt )2 , C... ( S (t ) ) 1) (t ) ( S (t ) ) transformation sequence using maximum length LFSM maximum length CA transformation matrix used at time (t) to transform CA state S (0) f 1) transformation sequence of arbitrary PRNG } { A , A , , A A A general transformation sequence for linear sequences 0,1, , 2 n 1} set of maximum length matrices integer or floor function channel separation between the register pair... layer; used to derive C 1) in the (t ) A aij denotes the transformation matrix of a maximum length CA binary element in ith row and jth column of transformation matrix A Af CA transformation matrix computed as f S (t 1) A S (t ) basic CA state transformation equation time-invariant CA transformation such that S (t (t ) { 1 time-varying CA transformations such that S (t 2 , (1) p , , (2) ( p) } (t ) { A(t... distribution etc., are usually based on experimental results In previous works, there is considerable focus on the CA structure primarily because of strong influence from the initial research areas on CA such as self-reproduction systems [5,8], Game of Life [6], etc The following are some examples on how the CA structure is modified so that the CA transformation is changed - boundary conditions, type... inputs, number of registers in a CA, dimensionality of CA, initial state configuration of the CA, the individual function used for each register, etc Ultimately, the type of sequence generated still depends on the CA transformation Four characteristics common in CA transformations that passed all DIEHARD tests are identified in Table 2.2 The high randomness quality of sequences generated is possibly attributed... registers and their functions? Can a CA-level modification be applied to an existing CA transformation Chapter 1 Introduction 5 that is close to the solution we seek, such that each register function is specifically modified and the concerted effect of all modified registers generates the sequence properties we seek? The answer to the second question is different from the first question and previous approaches... used may not keep up with new encryption/decryption requirements Stream ciphers [26,30] are best suited for such communication environments because they can operate at very high speed, have low-cost construction in terms of gates and memory, and have limited error propagation etc A central problem in any stream cipher scheme is to design PRNG to generate long sequences of highly nonlinear and unpredictable... are shown to generate sequences passing at least 18 DIEHARD tests The Layered-CA also increases linear complexity substantially when simple nonlinear functions are used The core of this thesis is the TSMLM concept for designing analyzable CA PRNG The transformation sequence of an LFSM is defined, and the TSMLM concept revolves around techniques to permute this transformation sequence We developed several... set of cascaded, programmable linear transformations that “induce” nonlinearity into the generated sequence indirectly The PT scheme can replace nonlinear transformations in many cryptosystems such as Secret-Boxes, etc For certain conditions, a special low-cost scheme requiring approximately n 2 XOR gates can be derived Due to its low cost, nonlinear transformations with a large number of inputs is... next state transformation of the CA is thus considerably more random Single-bit sequences from both CA and LFSR can be considered pseudorandom, but often multiple bits are required at each sampling instance This cannot be done for the LFSR due to the high correlation between adjacent registers Chapter 2 CA and LFSM based PRNG: Introduction and Literature Survey 21 The 256 transformation functions (including . NEW CELLULAR AUTOMATA DESIGN BASED ON TRANSFORMATION SEQUENCE PROGRAMMING TAN SYN KIAT NATIONAL UNIVERSITY OF SINGAPORE 2005 NEW CELLULAR AUTOMATA DESIGN. linear transformations that are equivalent to a nonlinear transformation. These programmable transformations can be keyed such that a large class of 2 -1 different nonlinear transformations can. transformations ETMM exponential transformation matrix machines ETMM (CA) ETMM constructed from the transformation matrix of a CA ETMM (1-LFSR) ETMM constructed from the transformation matrix