Article directory
- illustrate
- Symbol definition
- Montgomery modular multiplication
- REDC
- Montgomery modular multiplication
- modular exponentiation
Introduction
In the previous (1)-(4), RSA and its large number operation library were implemented and optimized step by step. As mentioned before, the efficiency of RSA depends on division because calculating modular exponentiation requires the use of modulus. Modulo uses division, and in the end it comes down to division.
However, there is another way of thinking, which is to use the Montgomery algorithm when calculating modular powers. The Montgomery algorithm converts modulo division into relatively cheap multiplication, addition, and shift operations.
By the way, it is not easy to find relevant Chinese information that is concise on the Internet. The documents of NSFOCUS and FreeBuf are very clear. There was also a very good article on CSDN before. It should be the first one found by search engines. If it is the most concise and easy to understand, it may still be the Montgomery algorithm on Wikipedia, but it is in English.
algorithm
illustrate
To perform modular exponentiation by the Montgomery method, you need to first implement Montgomery reduction and Montgomery modular multiplication, and cooperate with the repeated square method to replace the original modulus with the Montgomery method to achieve modular exponentiation.
Symbol definition
Assume that modulo N, R is a number just larger than N, and R is the m power of 2, R=2m.
R·R’=1(mod N),(-N)·N’=1(mod R),-N=R-N(mod R)
Montgomery modular multiplication
Consider asking for
z=x·y(mod N) ①
Use Montgomery reduction to solve, Montgomery reduction is referred to as REDC. The reduction result of REDC(T) is T·R’ (mod N). As long as z is input into REDC() in some “form”, you can expect to get the final result.
This “form” is called Montgomery form. In Montgomery form:
xExpressed asx·R(mod N)
yExpressed asy·R(mod N)
Express z as Montgomery form, which can be z·R=x·y·R=REDC(x·R·R)·REDC(y·R·R)·R (mod N). Enter this form into REDC( z·R) performs Montgomery reduction, and the result is converted into the normal form, which is z (mod N).
In this process, it is necessary to prevent overflow in continuous multiplications such as y·R·R, reduce while multiplying, calculate R·R (mod N) and then multiply.
Montgomery modular multiplication algorithm
Calculate z=x·y (mod N)
x’=REDC(x·(R·R mod N))
y’=REDC(y·(R·R mod N))
z’=REDC(x’·y’)
z=REDC(z’)
return(z)
Montgomery reduction
Montgomery reduction is used in Montgomery modular multiplication. The result of Montgomery reduction REDC(T) is T·R’ (mod N). If you input REDC(T·R), you can get the result of T (mod N).
REDC(T) algorithm
m = ( ( T % R ) * N’ ) % R;
t = ( T + m * N ) / R;
if ( t >= N )
return( t – N );
else
return( t );
Observe this algorithm carefully. It is originally supposed to take the modulo of N. In this algorithm, there is no operation of dividing by N, but it is transformed into the operation of taking the modulo of R and dividing by R. Because R is carefully selected and is a power of 2, you can use shifting or direct selection to perform modulo and division operations on R, which is cheaper than division. In this algorithm, the time-consuming operations are addition, subtraction and multiplication.
efficiency
Observe carefully, it involves R, N, N’, R·R (mod N), among which N’ and R·R (mod N) can be calculated in advance. In a modular exponentiation, it only needs to be calculated once at the beginning. , the running time depends on the multiplication.
Code implementation
REDC
//Montgomery reduction, result=DT*R’ (mod N) void mont_redc(BN DT, BN N, BN Np, BN R,BN & result) { BN temp1 = { 0 }; BND temp2 = { 0 }; BN m = { 0 }, t = { 0 }; unsigned int R_bits = getbits_b(R); // Modulo division by R means retaining so many bits int Bits = (R_bits + 31) / 32; // Convert to larger bits Number int remain = R_bits – 32 * (Bits-1);//The highest “bit”[remain-1]Bit is set to 1 and left unchanged[remain-2]..[0]Just use bits //if(remain>1) keep R[0];otherwise R[0]-1, keep all R[0]-1 “bit” //for R[Bits]keep R[Bits][remain-2]..R[Bits][0]
//if (Bits>1) keep R[Bits-1]..R[1]
//Perform temp=T%R if (getbits_b(DT) >= R_bits)//Only when there are many, the remainder must be taken. R is 0100, and T is 1101. It is also necessary to take the remainder. It is better to construct one, 100 is exclusive or 101 OK, but there cannot be the highest bit { for (unsigned int i = 1; i 1)//The highest bit is at least 10, remain=2, 1 bit can be reserved { temp1[R[R[0]]= (DT[R[R[0]]& (uint32_t)(1U= R_bits)//Only when there are many, the remainder must be taken. R is 0100, and T is 1101. It is also necessary to take the remainder. It is better to construct one. 100 XOR 101 is enough, but there cannot be the highest bit. { for (unsigned int i = 1; i 1)//The highest bit is at least 10, remain=2, 1 bit can be reserved{ m[R[R[0]]= (temp2[R[R[0]]& (uint32_t)(1U 0;i–)
{
shr_b(temp2);
}
//cout = 0)
{
sub(t, N, t);//t=t-N
}
cpy_b(result,t);
}
Montgomery modular multiplication
void mont_modmul(BN x, BN y, BN R,BN N,BN Np,BN RRN, BN & result)//Modular multiplication for modular exponentiation { BND temp1 = { 0 }, temp2 = { 0 }, temp3 = { 0 }; // May exceed 1024 bits! There must not be only 1024-bit BN Xp = { 0 }, Yp = { 0 }, Zp = { 0 }; mul(x, RRN, temp1); mul(y, RRN, temp2); mont_redc(temp1, N, Np, R, result); }void mont_modmul(BN x, BN y, BN N ,BN & result) { BND temp1 = { 0 }, temp2 = { 0 }, temp3 = { 0 }; // May exceed 1024 bits! Definitely not 1024-bit BN It is too likely to occur. It is considered to be +1, R=2^m must be greater than n BN RRN = { 0 };//R*R (mod n) BN R = { 0 }, Rp = { 0 }, Np = { 0 }; int Bits = (m + 31) / 32; //Converted to a larger number of bits int remain = m – 32 * (Bits – 1); //The highest “bit”[remain-1]Bit is set to 1 and left unchanged[remain-2]..[0]Just R[0] = Bits;
R[Bits] = (uint32_t)(1U
modular exponentiation
void mont_modexp(BN a, BN b, BN N, BN & result)//Montgomery modular power a^b mod N { int m = getbits_b(N);//Basically there will be no even numbers in the modular power, especially Encryption and decryption are unlikely to occur. It is considered to be +1, R=2^m must be greater than n BN RRN = { 0 };//R*R (mod n) BN R = { 0 }, Rp = { 0 }, Np = { 0 }; BN a_t = { 1,1 }, b_t; // a=1; n does binary expansion BN temp1 = { 0 }, temp2 = { 0 }; // Calculation as result has a clearing operation BN “position”[remain-1]Bit is set to 1 and left unchanged[remain-2]..[0]Just R[0] = Bits;
R[Bits] = (uint32_t)(1U a^b mod N
memset(result, 0, sizeof(result));
cpy_b(b_t, a);//b_t=b,初始化
uint32_t *nptr, *mnptr;
nptr = LSDPTR_B(b);
mnptr = MSDPTR_B(b);//!!!!!!!
char binform[33];//Each 32-bit uint32 can be converted into binary, and taken out again and again int i = 0; while (nptr = 0; j–)//Start modular square { if (i >= 0)//Business down the line , otherwise just square b modulo { if (binform[i] == ‘1’)
{
mont_modmul(a_t,b_t, R, N, Np, RRN, a_t);
}
i–;
}
mont_modmul(b_t, b_t, R, N, Np, RRN, b_t);
}
nptr++;
}
cpy_b(result, a_t);
RMLDZRS_B(result);
}
Running results
The running speed is much slower than using division. It should be because the code is written relatively slowly. I suspect that the shift function shr_b() is not written cleverly enough. There should be no problem with the multiplication function. Compared with the original multiplication, fast multiplication is faster. The contribution is not particularly large, and there are only 1024 digits that cannot be reflected.
In view of the fact that the speed is temporarily slower than the original (or really slower), this version is not synchronized to github for the time being. I hope the experts can point out the shortcomings in this, thank you very much.