Tài liệu Cryptographic Algorithms on Reconfigurable Hardware- P7 doc

Thông tin tài liệu

6.1 Field Multiplication 159 Algorithm 6.5 Modular Reduction Using General Irreducible Polynomials Require: The degree m of the irreducible polynomial; the operand C to be reduced; and k the number of bits that can be reduced at once. Ensure: The field polynomial defined as C = C mod P, with a length of m bits. 2: shift = 2m-2-k-l] 3: for i from 0 to Nk do 4: A = Cn-k-iC{n-k-i)-\ • • • C'(n-fc.i)-/e+i; 5: 5 = Highdivtahle[A\\ 6: Pshifted = LeftShift{Paddedtable[S], shift); 7: C = C-\- Pshifted] 8: s/iz/t = shift — k\ 9: end for 10: Return C is computed the amount of shift needed to apply properly the method outlined in figure 6.7. Then, in each iteration of the loop in lines 3-9, k bits of C are reduced. In line 4 the k bits of C to be reduced are obtained. This information is used in line 5 to compute the appropriate scalar S needed to obtain the result of equation (6.23). In fine 6 the S-th entry of the table Paddedtable is left shifted shift positions so that in line 7 the operation C-{-2^^^^^{S-P) can be finally computed allowing the effective reduction of k bits at once. Then, in fine 8 the variable shift is updated in order to continue the reduction process. Algorithm 6.5 performs a total of A^^; = T^^x^l iterations. At each iteration of the algorithm the look-up tables Highdivtable and Paddedtable are accessed once each. In line 7, and XOR addition is executed, implying that the complexity cost of the general reduction method discussed in this section is given as, Additions = 2Nk, .^ ^^. Look-up table size (in bits) = 2^^(771 -h 2k) . \ - ) 6.1.6 Interleaving Multiplication In this Subsection we discuss one of the simplest and most economical binary field multiplier schemes: the serial interleaving multiplication algorithm. Multiplication by a Primitive Element Let P(a:;) = po+pia;-f-pia;^-f .H-Pm-ia;"^"^ +a;'^ be an m-degree irreducible polynomial over GF{2). Let also a be a root of p(a;), i.e., p(a) — 0. Then, the set {1, a, a^, , a'^"^} is a basis for ^^(2^^), commonly called the polynomial (canonical) basis of the field [221]. An element A G GF{2'^) is expressed m —1 in this basis as A — ^ aia\ Let A{a) be an arbitrary element of GF{2'^). i=0 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 160 6. Binary Finite Field Arithmetic Then, the product C — a- A{a) can be expressed as, C = a (ao+ aia4 .+arri_ia'^~^) = aoa + aia^ + . H-am-iQ;'^. (6.25) 'T5 '^ ^ •# -e ^ -—e Fig. 6.8. a • A{a) MultipUcation Using the fact that a is a primitive root of the irreducible polynomial, we can write, a^ = po + Pia + + pm-ia^"^ (6.26) Substituting Eq. (6.26) into Eq. (6.25) we obtain, C = Co + cia 4- + Cm-ia^~\ where, CQ — am-iPo and di — ai-i -f am-iPi, for i — 1, , m — 1. A realization of the above operation is shown in Fig. 6.8. The main building block is an m-tap LFSR register. That register is initially loaded with the m coordinates of the field element A, namely, (ao, ai, a2, , am — 1). The signals pi represent the coefficients of the irreducible polynomial. Notice that whenever a given polynomial coefficient is on, i.e Pi = 1, then the corresponding branch of the circuit will be a short circuit. Otherwise, if Pi = 0 the branch acts as an open circuit. After m clock cycles, the new register content will be the value of the field element C. Serial Multiplication Using the multiplication procedure outlined above, the multiplication of two arbitrary field elements can be accomplished by using a procedure inspired in the well-know Horner's scheme. Let us consider two arbitrary field elements A and B expressed in polynomial basis as, m —1 m—l i=0 1=0 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 6.1 Field Multiplication 161 Then, the product oi A • B can be expressed as, C{a) - A{a)B{a) mod P{a) = A{a) ( Y^ bia' j mod P{a) m-l \ Y^ biA{a)a' mod P{a) si=0 / Therefore, C{a) = {boAia) + biA{a)a -f b2A{a)a'^ 4 + bm-iAia)'^-'^) mod P{a). Algorithm 6.6 shows the standard procedure for computing above equation using Horner's rule. Algorithm 6.6 LSB-First Serial/Parallel Multipher Require: An irreducible polynomial P{a) of degree ?n, two elements A^ B G Ensure: C{a) = A{a)B{a) mod P{a). 1 2 3 4 5 6 C = 0; for i = 0 to 772 — 1 do C^biA-i- C; A = Aa^ mod P(a); end for Return(C). The multiplier realization of Algorithm 6.6 is shown in Fig. 6.9. The archi- tecture shown in Fig. 6.9 consists of two LFSR Register plus extra circuitry. As it was mentioned previously, the signals pi in the first LFSR block represent the coefficients of the irreducible polynomial, and their values (either ones or zeroes) determine the LFSR structure. Furthermore, a gate array is included in order to compute the multiplication operation as is explained below. Ini- tially the register C is set to zero, whereas the register in the upper part of Fig. 6.9 is loaded with the m coefficients of the field element A. Thereafter, when the clock signal is applied to the registers, the value of Aa is generated. Then, B coefficients, namely, 6o, î, ^2, • • •, ^m-i are serially introduced in that order, thus generating the values biAa\ for z = 0,1, , m — 1, which are ac- cumulated in register C until all the m product coefficients CQ, ci, C2, , Cm-i are collected. 6.1.7 Matrix-Vector Multipliers The GF(2^) multiplication given by (6.1) can be described in terms of matrix- vector operations. There are mainly two different approaches based on matrix vector operations to compute a field product: Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 162 6. Binary Finite Field Arithmetic po~ri 7^} ^ . b^, bo e- e* j^ ^ e- / e i3 5 5 ^ T^ e* "F^ Fig. 6.9. LSB-First Serial/Parallel Multiplier a o* T^ 1. The polynomial multiplication part is performed by any method. Then, the resulting product is reduced by using a reduction matrix. 2. The polynomial multiplication and modular reduction parts are performed in a single step by using the so-called Mastrovito matrix. Let a{x) and b{x) denote two degree m polynomials representing the elements in GF(2"^). Let c{x) = a{x)b{x) mod P{x) denote their field product. The coefficient vectors of these polynomials are given by a== [ao,ai,- • • ,am-i]^ b = [bo.bi, .bm-i]'-^ c = [co,ci,-" ,Cm-i]^. Also, let us define the polynomials d{x) = a{x)b{x) = do-\- dix H h (i2m-2^^^~^ , d(^\x) = do -f c/ix + • -f- dm-ix'^-'^ , (6.27) d^^^{x) =dm-\- dm-^-lX + • • • 4- d2m-2X'^-^ . The coefficient vectors representing these polynomials are d = [do^di,'" ,C?2m-2]^ , d(^) = [do,dir".dm-if , d^^^ = [dm, dm-\-l, • • • , C?2m-2]^ • The work in [284] reduces the polynomial multiplication d{x) using an (m X m — 1) reduction matrix Q to obtain the field product c{x) as below: Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 6.1 Field Multiplication 163 c = d(^) + Q • d^^) . (6.28) Mastrovito Multiplier The so-called Mastrovito matrix is constructed from the coefficients of the first multiplicand and the irreducible polynomial defining the field. Then, the polynomial multiplication and modulo reduction steps are performed together using this matrix. The papers [351, 128, 401] follow the Mastrovito multiphcation scheme outHned below. c-M b (6.29) where M is the (m x m) Mastrovito matrix whose entries are the function of the coefficients of a(x) and P{x). The Mastrovito matrix M is related to the reduction matrix Q by M - L + Q . U , (6.30) where L and U are the following (m x m) and (m — 1 x m) matrices: L = U = ao ai (12 O'm-2 _<^m-l 0 am- 0 0 0 ao ai 0 0 do ^m-3 <^m-4 ttm-2 ttm-3 1 Q'm- dm- -2 " ' -1 " • Cl2 ^3 0 0 0 0 0 0 ao 0 ai ao ai a2 (6.31) 0 0 0 -1 CLr, 0 0 0 ••• 0 ttm-l. This is because d{x) = a{x)b{x) can be given in the vector notation by d=: d(^) d(^) Lb Ub Then, c = d(^) + Q • d(^) =L.b + Q.U.b=(L + Q-U).b = M.b. The Mastrovito and the reduction matrices are studied thoroughly in [284, 401] for various types of irreducible polynomials. In [351] a compre- hensive study of the Mastrovito multiplier for irreducible trinomials was pre- sented. Authors in [401] proposed a practical and systematic design approach for a general Mastrovito multiplier. In [388] it was shown that non-Mastrovito multipliers using direct modular reduction also provide competitive performance. Moreover, efficient non-Mastrovito multipliers for irreducible trinomials were also proposed. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 164 6. Binary Finite Field Arithmetic 6.1.8 Montgomery Multiplier In this section we explain the Montgomery multiplication method in GF(2"^). Once again, let P{x) be an irreducible polynomial over GF{2) that defines the field GF(2^). Rather than computing Eq.(6.1), the Montgomery multiplication calculates C{x) = A[x)B{x)R-\x) mod P[x) (6.32) where R{x) is a fixed element and gcd{R{x),P{x)) = 1. Because of Bezout's identity^, one can find two polynomials i?~^(x) and P {x) such that R{x)R-\x) + P{x)P'{x) - 1 (6.33) where R~^{x) is the inverse of R[x) modulo P{x). These two polynomials can be calculated with the extended Euclidean algorithm. Kog and Acar [182, 388] selected R{x) — x^ for high performance modular reduction in the Montgomery multiplication algorithm, which can be given as follows: Algorithm 6.7 Montgomery Modular Multiplication Algorithm Require: A{x),B{x),R(x),P'(x) Ensure: C{x) = A{x)B{x)R~^{x) mod P{x) 1: T{x) = A(x)B{x); 2: U{x) = T{x) P'{x) mod R{x)\ 3: C\x) = [T{x) + U{x)P{x)]/R{x)] 4: Return C To prove the correctness of this algorithm we note that Step 2 implies that there exists a polynomial U{x) = T{x) P\x) + H{x)R{x) . (6.34) We write C{x) in Step 3 by using (6.34) as follows: <î^) = flfeyl^W + T{x) P'{x) P{x) + H{x)R{x) P{x)\ = flfe[rW(l + P'{x) P{x))+H{x)R{x) P(x)] . From (6.33), we can write 1 + P{x)P (x) = R{x)R''^{x) and substitute it into our last expression ^(^) = W^[T{x)R{x)R-' {x) -f H{x)R{x) P{x)] = T{x)R'\x)-^H[x) P{x) = A{x)B{x)R-^ mod P{x) . For more details on Bezout's identity the reader is refer to §6.3.1. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 6.1 Field Multiplication 165 The degree of C{x) can be verified from Step 3 as follows: deg[C{x)] < max{deg[T{x)],deg[U{x)] 4- deg[P{x)]} - deg[R{x)] < max{2m — 2, deg[R{x)] — 1 + m} — deg[R{x)] < max{2m — 2 — deg[R{x)],m — 1} . Then, it can be concluded that deg[C{x)] < m — 1, if deg[R{x)] > m — 1. If we choose R{x) = x'^, the result C{x) will be of degree m — 1 at most. It can be shown [182] that Algorithm 6.7 has an associated computational cost of 2m^ coefficient multiplications (ANDs) and 2m^ — 3m — 1 coefficient additions (XORs), whereas the total time complexity is 3TA + (2|'log2m] + [log2(m-l)l)rx. 6.1.9 A Comparison of Field Multiplier Designs Table 6.3. Fastest Reconfigurable Work KOM variant by [47], implemented by [326] KOM variant by [85], implemented by [326] KOM variant by [293], implemented by [326] KOM [106] Recursive Classical [106] KOM [117] Massey-Omura [118] Platform Virtex 2 Virtex 2 Virtex 2 Virtex 2 Virtex 2 Virtex 2 Virtex 2 Field GF(2'^^) GF(2'^^) GF(2^^^) 240 bits 240 bits 240 bits 240 bits Hardware GF{2'^) Multipliers Cost 5307 CLBs 5409 CLBs 5840 CLBs 1480 CLBs 1582 CLBs 1660 CLBs 36857 LUTs Cycles 1 1 1 30 56 54 50 timings I2.5677S 13.37r?S 14.73778 37877S 523r;S 655778 8OO778 bits S licesx tim ings 2.445M 2.254M 1.895M 0.429M 0.290M 0.221M 0.0336M (est.) In this Subsection we compare some of the most representative designs of GF{2'^) multipliers considering three metrics: speed, compactness and efficiency. Table 6.3 shows the fastest designs reported to date for GF{2'^) field multiplication. It can be observed that Karatsuba-ofman Multipliers (KOM) are much faster than other schemes such as recursive classical multiplier or Massey-Omura scheme. This can be explained from the theoretical point of view from the fact that KOM algorithms enjoy of a sub-quadratic complexity. In Table 6.4 we show a selection of some of the most compact reconfigurable hardware multiplier designs. It is noted that this category is dominated by the interleaved and Montgomery multiplier schemes. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 166 6. Binary Finite Field Arithmetic Table 6.4. Most Compact Reconfigurable Hardware GF(2'^) Multipliers Work Interleaved [104] Montgomery [97] Class.+Montg. [18] Montgomery 118] Interleaved [266] Platform Virtex Virtex Virtex Virtex Virtex Field GF(2"^^^) GF(2'"^^) GF(2^^") GF(2^^") GF(2'"^") Cost 359 CLBs 425 CLBs (est) 1049 CLBs 1427 CLBs 420 CLBs (est) Cycles 239 466 80 160 210 timings 3.1MS 2.8lAiS l.U/xS 1.66/iS 12.3/iS bits Slicesxtiminqs 0.215M ' 0.195M 0.137M 0.0675M 0.042M We measure efficiency by taking the ratio of number of bits processed over slices multiplied by the time delay achieved by the design, namely, bits Slices X timings For instance, consider the KOM variant design proposed by [47] and implemented by [326]. As is shown in Table 6.3, working over GF{2^^^), that design achieved a time delay of just, 12.66778 at a cost of 5307 sHces. Therefore its efficiency is calculated as, bits 163 Slices X timings 5307 x 12.56?7 2.445M When comparing the designs featured in Tables 6.3 and 6.4, it is noticed that the most efficient multiplier designs are the Karatsuba-Ofman multipliers variants as they were reported in [47, 85, 293]. This is a quite remarkable feature, which implies that the Karatsuba-Ofman multipliers represent both, the fastest and the most efficient of all multiplier designs studied in this Chap- ter. 6.2 Field Squaring and Field Square Root for Irreducible Trinomials Let us consider binary extension fields constructed using irreducible trinomials of the form P(x) = x'^ -{- x'^ -h 1, with m > 2. It is convenient to consider, without loss of generality, the additional restriction 1 <n< [^J ^. ^ It is known that if P{x) = x"^ -\-x'^ -{-1 is irreducible over GF{2), so is P{x) = ^m _^ ajW-n _|_ ^228]. Hence, provided that at least one irreducible trinomial of degiee m exists, it is always possible to find another irreducible trinomial such that its middle coefficient n satisfies the restriction 1 < n < [yj. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 6.2 Field Squaring and Field Square Root for Irreducible Trinomials 167 The rest of this Section is organized as follows. First, in Subsection 6.2.1, we give the corresponding formulae needed for computing the field squaring operation when considering arbitrary irreducible trinomials. Those equations are then used in Subsection 6.2.2 to find the corresponding ones for the field square root operator. 6.2.1 Field Squaring Computation Let A = X^^^ aix'^ be an arbitrary element of GF{2'^). Then, according to Eq. (6.16) its square, A^, can be represented by the 2m-coefficient vector. A^{x) = [O ttm-i 0 am-2 0 ai 0 ao] = Km-l ^m-2 • • • ^m-1 «m i ^m-1 ^2 • • • «1 «o] (6-35) where a[ = 0 for i odd. Hence, the upper half of A'^ (i.e., the m most signifi- cant bits) in Eq. (6.35) is mapped into the first m coordinates by performing addition and shift operations only. In order to investigate the exact cost of the field squaring operation, we categorize all the irreducible trinomials over GF{2) into four different types. For all four types considered and by means of Eqs. (6.35) and (6.21), the following explicit formulae for the field squaring operation were found. Type I: Computing C = A"^ mod P{x)y with P{x) = x"^ -f x" 4- 1, m even, n odd and n < y, a± + arn±i i even, z < n or z > 2n, a± + ttm+i -f a^_„î i even, n < i < 2n, a^î_ii±i i odd, i < n, am-n+i i odd, i > riy Ci = \ for z = 0,1, • • • , m — 1. It can be verified that Eq. (6.36) has an associated cost of m±E:zl XOR gates and 2T^ delays. Type II: Computing C = ^^ mod P{x), with P{x) = x"^ 4- a:"" 4-1, m even, n odd and n = ^, (6.37) for 2 = 0,1, • • • , m — 1. It can be verified that Eq. (6.37) has an associated cost of ^^^ XOR gates and one Tx delay. ai -f am+i 2 ~2~ ai 2 ^m+1-^ an+i i even, i < n, i even, z > n, i odd, z < n. z odd, i > n^ Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 168 6. Binary Finite Field Arithmetic Type III: Computing C = A^ mod P{x), with P{x) = x"^ +x^ -f 1, m, n odd numbers and n < ^^^^, Ci= { a± -ha±_^rn^ +ai^(^_^) a± 4- tti , 1 am+i + ar 2 am+i i even, i < n, i even, n < z < 2n, 2 even, z > 2n, i odd, i < n, z odd, i > n^ (6.38) for z = 0,1, • • • , m — 1. It can be verified that Eq. (6.38) has an associated cost of ^ XOR gates and 2Tx delays. Type IV: Computing C = A^ mod P{x), with P{x) = x^ -f a:^ + 1, m odd. n even and n < ^^^^^, ai + ai 2 2 2 2 ai 2 a rn + i ar +m—n + ar i even, z < n, even, n < i < 2n, even, z > 2n, odd, z < n, z odd, i > n, (6.39) for z = 0,1, • • • , m — 1. It can be verified that Eq. (6.39) has an associated cost of ^+^~-^ XOR gates and one Tx delay. The complexity costs found on Equations (6.36) through (6.39) are in conso- nance with the ones analytically derived in [386, 387]. 6.2.2 Field Square Root Computation In the following, we keep the assumption that the middle coefficient n of the generating trinomial P{x) — x'^ -\-x'^ -\-1 satisfies the restriction 1 < n < ^. Clearly, Eqs. (6.36)-(6.39) are a consequence of the fact that in binary extension fields, squaring is a linear operation. The Hnear nature of binary extension field squaring, allow us to describe this operator in terms of an (m X m)-matrix as, C = A^:=^MA (6.40) Furthermore, based on Eq. (6.40), it follows that computing the square root of an arbitrary field element A means finding a field element D ~ yA such that D^ = MD = A. Hence, D = M-'Â (6.41) Eq. (6.41) is especially attractive for fields GF{2^) with order sufficiently large, i.e., m >> 2, where the matrixes M corresponding to Eqs. (6.36)-(6.39) are all highly spare (each row has at most three nonzero values). Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... Multiplicative Inverse Among customary finite field arithmetic operations, namely, addition, subtraction, multiplication and inversion of nonzero elements, the computation of the later is the most time-consuming one Multiplicative inversion computation of a nonzero element a G GF{2'^) is defined as the process of finding the unique element a~^ G GF{2'^) such that a • a~^ = 1 Several algorithms for computing... multiplication computations are required On the other hand, notice that at each iteration i, a total of 2'"^2 field squarings are performed Notice also that by definition, the addition chain guarantees that for each Ui^l < i < ty the relation Ui^ — Ui — ui^ holds Hence, one can show by induction that the total number of field squaring operations performed right after the execution of the z-th iteration \^... by the addition chain U as discussed above We assess the computational complexity of the algorithm shown in Fig 6.10 as follows The algorithm performs one field multiplication in each of algorithm's t iterations, yielding a total of t field multiplication computations required Furthermore, at each iteration z, a total of 2^^2 field square roots are performed Since by definition, the addition chain guarantees... a t i o n s In this Section we briefly describe some important binary finite field arithmetic operations such as, the computation of the trace function, the half trace function and binary exponentiation The first two operations are key building blocks for halving an eUiptic curve point, which will be studied in §10.7 6.4.1 Trace function Given C G (7F(2"^), the trace function can be defined as: TriC)... reduction polynomial p{x) = x^^^ -{- x'^ -{- x^ + x^ -\- 1 Then, Tr(x^) = 1 if and only if z G {0,157} The implementation of the trace function in reconfigurable hardware only needs one XOR gate to add the bits 0 and i57 from the input polynomial 6.4.2 Solving a Q u a d r a t i c E q u a t i o n over ^ ^ ( 2 ^ ^ ) In order to solve a quadratic Equation (10.26), we may use the half-trace function Let... Itoh-Tsuii algorithm with the concept of addition chains Then, we showed that for this version of the Itoh-Tsuii algorithm the multiplicative inverse of an arbitrary nonzero field element in GF(2^) can be computed by performing exactly m — 1 field squarings and t multiplications, where t is the step-length of the optimal addition-chain for m-1 One of the main conclusions of this Section is that according to... stipulated by the addition chain U as discussed above We assess the computational complexity of the algorithm shown in Fig 6.9 as follows The algorithm performs t iterations (where t is the length of the Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark 6.3 Multiplicative Inverse 179 addition chain U) and one field multiplication per iteration Thus, we conclude that a total... www.verypdf.com to remove this watermark 6.4 Other Arithmetic Operations 185 6.4.3 Exponentiation over Binary Finite Fields Exponentiation over binary finite fields is used for inverse computation via Fermat Little theorem [295] and key agreement schemes such as the DiffieHellman protocol, among other applications For binary extension fields GF{2'^), generated using the m-degree irreducible polynomial... platforms We included detailed analysis of complexities for binary field operations such as: multiphcation, squaring, square root, multiphcative inverse computation, among others Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark 6.5 Conclusions 187 Algorithm 6.13 Squaring and Square Root Parallel Exponentiation Require: The irreducible polynomial P{x), a G GF{2'^), e = (em-i • •... multiplication and squaring operations; 2 ITMIA is a competitive design option only when using normal basis representation and; 3 The recursive nature of the ITMIA algorithm makes the parallelization of that algorithm rather difficult if not impossible, forcing the implementation of the ITMIA procedure in a sequential manner In the rest of this Section we describe efficient implementations of the binary . Among customary finite field arithmetic operations, namely, addition, subtraction, multiplication and inversion of nonzero elements, the computation. modular reduction in the Montgomery multiplication algorithm, which can be given as follows: Algorithm 6.7 Montgomery Modular Multiplication Algorithm

Ngày đăng: 22/01/2014, 00:20

Xem thêm: Tài liệu Cryptographic Algorithms on Reconfigurable Hardware- P7 doc, Tài liệu Cryptographic Algorithms on Reconfigurable Hardware- P7 doc

Tài liệu Cryptographic Algorithms on Reconfigurable Hardware- P7 doc

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Front-Matter

1 Introduction

2 A Brief Introduction to Modern Cryptography

3 Reconfigurable Hardware Technology

4 Mathematical Background

5 Prime Finite Field Arithmetic

6 Binary Finite Field Arithmetic

7 Reconfigurable Hardware Implementation of Hash Functions

8 General Guidelines for Implementing Block Ciphers in FPGAs

9 Architectural Designs For the Advanced Encryption Standard

10 Elliptic Curve Cryptography

Back-Matter

Tài liệu cùng người dùng

Tài liệu liên quan