FPGARelated.com
Blogs

One Clock Cycle Polynomial Math

Mike November 20, 201514 comments

Error correction codes and cryptographic computations are most easily performed working with $GF(2^n)$  polynomials.  By using very special values of $n$ we can build circuits which multiply and square in one clock cycle on an FPGA. These circuits come about by flipping back and forth between a standard polynomial basis and a normal basis representation of elements in $GF(2^n)$.

A normal basis is yet another form of polynomial but instead of adding powers of $\beta$ we add squares of powers: $$a_{n-1}\beta^{2^{n-1}}+a_{n-2}\beta^{2^{n-2}}+...+ a_2\beta^{2^2}+a_1\beta^2+a_0\beta$$ Note that the last term is $\beta^{2^0}$.  The first piece of magic to take notice of is that $\beta^{2^n}=\beta$. This comes from Fermat's "little" theorem.  The advantage of this representation is that the operation of squaring becomes a simple left rotation - a single clock cycle on an FPGA.  

To see why this works, we first notice that when squaring a polynomial in $GF(2^n)$ all the cross terms cancel.  The only terms that don't cancel are $a_j\beta^{2^j} \ast a_j\beta^{2^j}=a_j\beta^{2^{j+1}}$. The coefficient $a_j$ is now in front of the term $\beta^{2^{j+1}}$ which is the next term to the left.  The term $a_{n-1}$ goes down to $a_0$ due to Fermat's "little" theorem.  We have just computed the square of a polynomial using a left rotation by one bit.

A normal basis is "optimal" when it takes the fewest number of terms to compute a multiply.  For specific values of $n$, a lot of magic happens which makes the circuits to implement the math exceptionally simple.  A "type 1 normal basis" has the following rules: $n+1$ is prime and 2 is a generator for $Z_{n+1}$. The values of $n<500$ for which this is true are: 2, 4, 10, 12,18, 28, 36, 52, 58, 60, 66, 82, 100, 106, 130, 138, 148, 162, 172, 178, 180, 196, 210, 226, 268, 292, 316, 346, 348, 372, 378, 388, 418, 420, 429, 442, 460, 466 and 491.

There is one other optimal normal basis form but the type 1 form has a lot of advantages in an FPGA as well as microcontrollers.   The fundamental advantage of type 1 is the ability to flip between a normal basis and a polynomial basis representation for the same value. The second rule is what allows this.  Since $2$ is a generator, $2^j$ will be one of the values between $1$ and $n$. For an example, let's look at $n=10$:

j
0 1 2 3 4 5 6 7 8 9 10
$2^j \mod 11$
1 2 4 8 5 10 9 7 3 6 1

A polynomial basis (or standard basis) requires we work modulo an irreducible polynomial.  By choosing the irreducible polynomial of $\sum_{i=0}^{i=n}\beta^i$ the transform between normal basis and polynomial basis is a simple permutation.  Why this choice works is way beyond what I want to discuss, for now we'll just say the mathematicians figured it out a long time ago and we will run with it.

To understand what the above table means let's look at several terms in both bases. $$\begin{array}{c}\beta^{2^0}\rightarrow\beta\\ \beta^{2^1}\rightarrow\beta^2\\ \beta^{2^8}\rightarrow\beta^3\end{array}$$ But something strange happens at $j=5$: 

$$\beta^{2^5}\rightarrow\beta^{10}=\beta^9+\beta^8+\beta^7+\beta^6+\beta^5+\beta^4+\beta^3+\beta^2+\beta+1$$ modulo the irreducible polynomial. To map $1 = \beta^0$ into $\beta^{2^j}$ we need to sum over all the $\beta^{2^j}$ terms because only $\beta^{2^5}$ carries the $1$ with it.  So we have $$1\rightarrow\sum_{j=0}^{j=n-1}\beta^{2^j}$$

Our polynomial in standard basis can be transformed to normal basis for squaring, then we can use the permutation table to transform back.  To multiply, we follow the form described by J. K. Wolf in Discrete Mathematics 106/107 (1992) pg 497: $$\sum_{i=0}^{i=n-1} a_i\beta^i \ast \sum_{j=0}^{j=n-1} b_j\beta^j = \sum_{k=0}^{k=n-1} c_k\beta^k$$   $$ c'_k = \sum_{j=0}^{j=n-1} b_j a_{k-j\mod n+1} \text{  for  } k=0\dotso n$$ The fundamental trick here is the coefficient $a_n = 0$ and the term $c_n$ is used to modify the result with $$c_k = c_n + c'_k$$ Setting $a_n=0$ effectively "knocks out" one term in each sum.  The extra term $c_n$ actually comes from the use of an extension into $\beta^{n+1} + 1$ as the basis polynomial which is $(\beta+1)$ times the "all ones" irreducible polynomial.  

So what does all that mean?  The multiplication is just an AND gate, the sums are XOR gates.  The last step is a complement of the resulting $c'_k$ if $c_n$ is set, otherwise nothing happens and $c_k = c'_k$.  We can clock in the $a_i$ coefficients and $b_j$ coefficients and the result falls out $c_k$.  Obviously the clock timing depends on the cascade of gates.  

Another approach is to use n clocks to perform the multiply.  This drastically reduces the area consumed by the logic. On each clock $j$, if $b_j$ is set, XOR the $a$ register with the $c$ register, then rotate $a$ left. On clock n, use $c_n$ to complement the output if $c_n=1$.  Note that $a$ and $c$ are $n+1$ bits wide and the bit $a_n$ is set to zero.  This $0$ rotates around and "takes out" the correct bit each time.  On a microcontroller, this is the most sensible way to do the computation.

The point of all this is speed.  By using the right size field, polynomial multiplies can be computed in a single clock cycle on an FPGA.  The same field which allows this also can be represented in a normal basis which makes squaring a polynomial possible in one clock cycle as well.  Switching between the two representations is a simple permutation (plus complement if a certain bit is set).  Everything else we need to do can be done with these basic functions.  If you need to go fast, then one of the above listed field sizes is the best place to start.


[ - ]
Comment by drmikeNovember 20, 2015
An editor on the comments would be really useful!!!!
[ - ]
Comment by drmikeNovember 20, 2015
It's called an "extension field". So a field over $GF(2^n)$ as an extension field has all coefficients in $GF(2)$ with maximum polynomial order $n?1$. An extension field with coefficients of polynomials is a polynomial of polynomials which is what microsoftware is describing. If $a$ and $b$ are in an extension field of $GF(2^n)$ then each one has only $n$ terms. $a+b$ also has only 10 terms, once reduced the prime polynomial. Everything I'm talking about is simple, not complicated. Each of the $a_j$ in the above expressions are in $GF(2)$. So the whole field is over $GF(2^n)$. I'm not sure how you define polynomials of polynomials.

The point is you are going way, way too deep. This is trivially simple. As you point out $(a+b)^2$ has $2ab$ as a cross term. But $2\mod 2 = 0, so the cross term is 0. The other way to describe it is that you get a cross term from each group and when you add them together you get 0. That?s why XOR works as the sum of terms. I?ve always seen it written as a "field over $GF(2^n)$" , and they always have coefficients in $GF(2)$. I suppose to be correct they should say "an extension field over $GF(2^n)$". Otherwise, XOR doesn't work as the summation operator.
[ - ]
Comment by jms_nhNovember 20, 2015
You mention $GF(2^n)$, but do you mean polynomials over $GF(2)$? (e.g. coefficients are 0 or 1) I get why cross terms in squaring disappear in polynomials over $GF(2)$, because you have $(a+b)^2 = $a^2 + 2ab + b^2 = a^2 + b^2$, but I don't get how that works in $GF(2^n)$.
[ - ]
Comment by jms_nhNovember 20, 2015
let's try that again: You mention $GF(2^n)$, but do you mean polynomials over $GF(2)$? (e.g. coefficients are 0 or 1) I get why cross terms in squaring disappear in polynomials over $GF(2)$, because you have $(a+b)^2=a^2 + 2ab + b^2 = a^2 + b^2$, but I don?t get how that works in $GF(2^n)$.
[ - ]
Comment by microsoftwareNovember 20, 2015
An example: At GF(2^3) consider the polynomials a=7(x^3)+ 5(x^2)+1 and b=3(x^2)+2x or the (7 5 0 1) and (3 2 0) the only permissible coefficients are the integers 0... throught 7 .
we have :

a^2= 0x6+4x4+1+0x5+0x3+3x2=4x4+3x2+1 , 4x4 means 4*x^4 etc.

b^2=2x4 + 5x3 + 4x2

a*b= 0(x^5) + 0(x^4) + 1(x^4) +3(x^3) + 3(x^2)+2x
2*a*b= 2(x^4) +6(x^3) + 6(x^2)+4x

a^2+2*a*b+b^2= x4+4x3+6x2+4x+1
=======================================
a+b=7x3+x2+2x+1

(a+b)^2=(a+b)*(a+b)= (7x3+x2+2x+1 ) * ( 7x3+x2+2x+1)=0x6+ 7x5 + 0x4 + 7x3 + 7x5 + x4 + 2x3 + x2 + 0x4 + + 2x3 + 4x2 + 2x + 7x3 +x2 + 2x + 1=
=x4 + 4x3 + 6x2 + 4x + 1

so in GF(2^3) (a+b)^2 is not equal to a^2 + b^2 . It will be happened (a+b)^2=a^2+b^2 at the case where the multiplication (to ensure that the field is closed undrer multiplication , no coefficient longer than 3 bits ) was defined and moduloP(x) , where P(x) is a prime polynomial in such a way that 2*a*b moduloP(x)=0.

[ - ]
Comment by drmikeNovember 20, 2015
so the cross termis 0. The other way to describe it is that you get across term from each group and when you add them together you get 0. That's why XOR works as the sum of terms. I've always seen it written as a "field over $GF(2^n)$" , and they always have coefficients in $GF(2)$. I suppose to be correct they should say "an extension field over $GF(2^n)$". Otherwise, XOR doesn't work as the summation operator.
[ - ]
Comment by microsoftwareNovember 21, 2015
Hello,
Given the polynomial x2+1 in G(8) we will calculate (x2+1)^2 using multiplication as in G(2) and addition as XOR . Multiplication is based on P(x) = x3+x2+1 to ensure coefficients of 3 bits long .We have:

2X2+1 = (2 0 1)

x 2 0 1
2 0 1
-----------------
2 0 1
4 0 2 0 0
-----------------------------
= 4 0 0 0 1
0r 100 000 000 000 001

it seems that the coefficients of normal base will be:
110 000 000 000 000 or 6 0 0 0 0

so, one cyclic left shift gives 4 0 0 0 1.

How it is calculated the 6 0 0 0 0 using a normal base?
I’ll try it. Interesting.
Thanks.

[ - ]
Comment by microsoftwareNovember 21, 2015
x 2 0 1
2 0 1
-----------------
2 0 1
4 0 2 0 0
-----------------------------
= 4 0 0 0 1

[ - ]
Comment by microsoftwareNovember 22, 2015
how we will represent the elements 0...7 of G(8) under a normal basis??
[ - ]
Comment by microsoftwareNovember 22, 2015
A normal basis representation of the elements of G(8) is, may be :
element normal base
000 000
001 100
010 010
011 110
100 111
101 011
110 101
111 001
[ - ]
Comment by drmikeNovember 22, 2015
Since 8+1 is not prime, the type 1 normal basis does not work. Wikipedia points to

Primitive Normal Bases for Finite Fields
H. W. Lenstra, Jr. and R. J. Schoof
Mathematics of Computation
Vol. 48, No. 177 (Jan., 1987), pp. 217-231

which proves "any finite extension of a finite field has a normal basis consisting of primitive roots." So there _is_ a normal basis over GF(8), but I don't know how you map it back to a standard basis (without reading that reference!).

I like your order, the square of each element goes to the symmetric position.
[ - ]
Comment by microsoftwareNovember 29, 2015
Using the p(x)=x3+x2+1 or the1101 irreducible polynomial to construct G(8) we have :
A root ? of p(x)=0 is the ?=010.
The set of the distinct roots of p(x) is:
B={ ?, ?2, ?4 } , ?^4= ?*?3=?2+?+1 . The elements of the set B are linearly independent and so they are a normal basis. And ?=010 and ?^2=100 and b^3=101mod(p(x) and ?^4=111 mod(p(x))
We have now:

Element normal basis representation
000 000
001 111
010 001
011 110
100 010
101 101
100 011
111 100

http://faculty.washington.edu/manisoma/ee540/EE540finite.pdf

Amela Mutarovic-Ribic, University of Sarajevo

Peng Ning , Yiqun Lisa Yin North Carolina State University
[ - ]
Comment by drmikeNovember 29, 2015
Nice - what I was trying to say before is described in that presentation as a composite field. Very useful for error correction codes! Thanks!
[ - ]
Comment by microsoftwareNovember 30, 2015
110------------>011 corrective, Thanks

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: