Mathematics AA HL's Sample Internal Assessment

Mathematics AA HL's Sample Internal Assessment

Comparing the suitability of encryption algorithms using Iterative procedures in discrete mathematics

6/7
6/7
11 mins read
11 mins read
Candidate Name: N/A
Candidate Number: N/A
Session: N/A
Word count: 2,057

Table of content

Rationale

Since my childhood, I am really fascinated with internet. At a younger age, I used to wonder how on computers we can get all sort of data in fraction of seconds. I was always mesmerized with beautiful designs of webpages and all sorts of animations that used to come while scrolling down the webpages. Personally, cartoons and animated videos never gained much value in my mind as the animations on webpages have gained. I always wanted to learn how these webpages work and how to design such beautiful websites. But as there was lack of knowledge as well as programming skills, I was unable to achieve my dreams of designing beautiful webpages. Day by day, I have started to learn programming language and finally in last summer, I had successfully completed a course on Webpage Development. Initially I was very much interested in webpage development but very soon my happy days in developing animated webpages turned gloomy. Whenever I launched any webpage on the internet, either it got attacked by some unidentified user or whatever data transmission happened between my website and any user, the data got leaked. In the days of my childhood, I always thought about the animation and designs of the websites but never thought about the security of the website. Once we were having a video chat with relatives who reside in the US over Skype and suddenly a thought crossed my mind. I was worried that whatever data gets transferred to and from my website gets leaked I had never seen any case of cross connection in Skype. This thought led my interest to learn more about Cyber Security and Ethical Hacking. After attending three sessions, I was completely able to hack my router and was able to access every packet of data that was getting transferred through that router. But I was unable to crack the information present in the packets of the data whereas whenever I tried to use my own website as a user, I could access the information that were transferred from my website to the user as I hacked my home router. Later on, I got to know from my Cyber Security tutorial classes that data transmitted and received to and from my website was getting leaked as there was no encryption of data in my website. Moreover, there was no certification of my website as well which led to continuous cyber-attacks in my website.

 

As a web-developer I would like to put forward different ways to deal with the cyber-attacks so that the users of website designers may include such procedures to keep their website safe from hackers and also the users can get assured that their data will not get leaked under any circumstances. In this IA, I would like to discuss a few algorithms for data encryption so that even if one’s data gets leaked, then how that encrypted data packet can be decrypted without any key by the hacker in order to increase the efficiency of the algorithm and also, I would do a comparative study based on the algorithms.

Aim

The main motive of this exploration is to compare the suitability of the encryption algorithms using the procedure of discrete mathematics.

Background information

Modulo arithmetic

This domain of arithmetic primarily deals with the application of remainders in any division involving integral values. It is generally used when we calculate using numbers which tend to ‘wrap around’ after a few steps. An intuitive example is the case of a 12 - hour time system derived from a 24 - hour system. Whenever we try to note the time after 12 - noon in 24 - hour system by converting it to the 12 - hour system, we divide the hour value by 12 and obtain the remainder. The remainder is basically the hour’s value in the desired 12 - hour system. For instance, the time 15:30 in 12 - hour system is 3:30 p.m. which is obtained by dividing 15 by 12 and noting the remainder 3. This, mathematically, is denoted as (15 mod 12) = 3.

 

Two numbers a and b are said to be congruent (modulo N) denoted by a ≡ b (mod N) when they leave the same remainder when divided by the common number N.

Iterative calculation

This process of calculation is based on the fact that a mapping takes in a constant initial value and each subsequent calculation through he mapping takes in the previous value obtained. This results in the following mapping defined as:

 

Xv+1 =  f(xv)  where f denotes a function.

 

Examples of iterative calculations include the following:

 

  • Logistic Map: Xv+1 ρXv(1 - Xv )
  • Collatz Conjecture: Xv+1 = ( 3X+1) when Xv mod 2 = 1 \(\frac{X_v}{2}\) when Xv  mod 2 = 0

Matrix multiplication

It refers to a rectangular array of numbers or to symbols that are usually positioned in columns and rows. The order of this matrix is characterized by the number of columns and rows. The entries refer to the numbers within the matrix and each such number is referred to as the element. Size of the matrix is known as the ‘n by m’ matrix and it is mentioned as m×n. Here, m refers to the number of columns and n refers to the number of rows. To further explain this, having a 4×5 matrix would mean that the number of columns is 5 and number of rows is 4.

Matrix representation

It refers to a procedure utilized by a computer language in order to save non-singular matrices in the memory. It is a set of squares which are used to fulfil the multiplication table of the specific group when the matrices are multiplied by the use of ordinary rules of the multiplication of matrix.

Permutation combination

It refers to ways in which a group of objects can be represented by choosing them based on sets and by creating subsets.  It describes the different methods in which a particular group of data can be arranged. Permutation refers to the selection of data from a particular group and combination refers to the arrangement in which they are selected.

Prime factor

Factors refer to numbers that if multiplied together can give another number and the division of a specific number by all its factors will give 0 as the remainder. Hence, factors refer to the exact divisor of a particular number. Prime factor refers to the factors of a given number and the factors are also prime numbers. This means that prime factor refers to prime numbers that can be multiplied to find a particular number.

Euler totient function

It refers to a mathematical multiplicative function that is used to count positive integers till a specified integer and is usually known as ‘n’. This function is utilised in order to find out the total number of prime numbers till a given integer which is ‘n’.

Experimental methodology

In this exploration, three encryption systems have first been employed to encrypt a string using a definite predefined random key of fixed length. The encrypted strings were then tried for cryptanalysis without knowledge of private key. This showed us the probability or chances of an intruder deciphering the exact message by intercepting data during transmission.

Process of calculation

Choice of word and private key

For next few steps of algorithmic analysis, we would be using the following string S as the plaintext and the private key K.

 

For the sake of simplicity, we would be using a 4 - letter long string that will permit easier calculation as follows:

 

S = Math

 

The chosen private key has been obtained using a Python program. So, we have:

 

K = zlZeOtV0Hhbuta92

Figure 1 - Python Program To Determine Private Key

Encryption I - using the caesar cipher

The Caesar Cipher works on the following principle for encryption.

 

The positional value Εv of character γ with positional value λ in the new alphabet shifted by v is given as follows where Ω  represents the old and Ξ represents the new alphabets:

 

Ev: Ω → Ξ, Ev(γ) = (λ + v) mod 26

 

Mapping A to 0, B to 1 Z to 25 and so on, we can construct a Caesar Cipher. Assuming shift value to be 3, we will have A mapped to (0 + 3) mod 26 = 3, B to (1 + 3) mod 26 = 4 … Z to (26 + 3) mod 26 = 2 and so on.

 

Encrypting the Caesar Cipher using a private key, we construct a new shift alphabet by placing the characters of the private key initially mapped to the first letters of the existing alphabet, followed by all the remaining letters of the existing alphabet not appearing in the encryption key. In case a single value appears more than once in the key, only the first occurrence is heeded. Using our key C, we create a new alphabet of lowercase characters, uppercase characters and digits. We can then conveniently shift the new alphabet backwards relative to the first using a shift value as well.

 

Following the above-mentioned steps, we have the following mapping for no shift value:

Existing AlphabetNew Alphabet
Az
Bl
CZ
De

\(\vdots\)

\(\vdots\)

O9
P2
QA
RB
SC

\(\vdots\)

\(\vdots\)

Figure 2

This correspondence has been mapped using a Python program and using it we obtain the following ciphertext C:

 

C = Iekb

 

Where the key used is K, plaintext is S and shift value is 3.

 

Code used for the calculations is shown below:

import string

class Caesar:

    def__init__(self, string, key, shift) - > None:

       self.string = string

       self.key = key

       self.shift = shift

   def encrypt(self):

       s = self.string

       key = self.key

       shift = self.shift

       trans = []

       alpha = [i for i in string.ascii_letters+string.digits]

      out = []

      for j in key[shift%62:]:

          trans.append(j)

      for k in alpha:

          if k not in trans and k not in key[:3]:

             trans.append(k)

      for n in key[:3]:

          trans.append(n)

      mapping = list(zip(alpha, trans))

      for l in s:

          for m in mapping:

             if m[0]==l:

                out.append(m[1])

      encrypted = ''.join(out)

      return encrypted

Encryption II- using the hill cipher

For this we construct a code - space such that the letter a maps to 0, b to 1, c to 2, …, z to 25 then A to 26, B to 27, …, Z to 51, followed by 0 to 52, 1 to 53, and so on till 61.

 

The plaintext S is represented in the form of a 1×4 vectorized matrix using our coded values of the characters:

 

Smatrix = 38 0 19 7

 

The secret key K is also written as a matrix after converting the characters to our coded values and placing them in groups of 4 in a 4×4 matrix:

 

Kmatrix = 25 11 51 4 40 19 47 52 33 7 1 20 19 0 61 54

 

Multiplying the Smatrix by Kmatrix, we have ciphertext matrix Cmatrix computed using another Python program:

 

Cmatrix = (25 11 51 4 40 19 47 52 33 7 1 20 19 0 61 54)⋅(38 0 19 7) = (1947 2777 1413 2259) ≡ (25 49 49 27)(mod 62)

 

Here, using the modulo operator helps us determine the code value of encoded character by restricting the values to the code space (of 62 characters including 26 letters each of lowercase and uppercase and 10 digits).

 

So mapping each value of the Cmatrix to our code space, we have the coded message as follows:

 

C = zXXB

 

Code used for the calculations is shown below:

import string

import numpy as np

 

class Hill:

   def __init__(self, string, key) -> None:

      self.string = string   

       self.key = key

  def encrypt(self):

       s = self.string

       key = self.key

       sMat = []

       kMat = []

       out = []

       code = []

      x = 0

       alpha = [i for i in string.ascii_letters+string.digits]

       for q in range(62):

            code.append(q)

       comb = {c:a for c, a in zip(code, alpha)}

       for i in s:

            sMat.append([get(comb, i)])

       for j in range(len(s)):

            kMat.append([])

            for k in range(len(s)):

                kMat[j].append(get(comb,key[x]))

                x += 1

       out = np.dot(kMat, sMat)

       out = out.tolist()

       for a in range(len(out)):

           out[a][0] = out[a][0]%62

       encrypted = []

       for c in out:

           encrypted.append(comb[c[0]])

       encrypted = ''.join(encrypted)

       return encrypted

Encryption III - using the RSA algorithm

In this algorithm, we assume two large prime number (though here we consider small ones for sake of simplicity). Let these be p = 31 and q = 37.

 

We compute a number N = p × q = 31 × 37 = 1147. Then we obtain a number T = (p - 1) × (q - 1) = 30 × 36 = 1080.

 

We now calculate two numbers e and d as encryption and decryption keys respectively such that e × d  (mod ϕ (N)) = 1.

 

Here ϕ(N) is Euler’s Totient Function, giving us the number of integers less than which are relatively prime to N. For a prime number p, we have p = (p - 1) since all numbers lesser than p are not factors of p.

 

For us, ϕ(N) = ϕ(p)∙ϕ(q) = (p - 1) (q - 1) = T = N - (p + q) +1. So ϕ(1147) = 1080. We now come up with our public key that is encryption key and private key or decryption key that is δ such that gcd (ϵ,ϕ(N)) = 1 and δ ≡ϵ-1 (mod N).

 

Given ϕ(N= 1080, we check upon inspection that for ϵ = 7, we have gcd gcd (ϵ, ϕN =1). So we assume ϵ to be 1.

 

Now,

 

δ ≡ ϵ-1 (mod N)

 

δ × ϵ ≡ 1  (mod N)

 

δ × ϵ (mod N) = 1 (mod N)

 

δ × ϵ (mod N) = 1

 

δ × 7 (mod 1147) = 1

 

Again upon inspection we have for δ = 164, δ × ϵ ≡ 1 mod(N).

 

So our sets of public key Kpublic = {ϵ, N} = {7, 1147} and private key K = {δ, N} = {164, 1147}

 

Now, we encrypt our plaintext S = Math using the notation explained in section 4.3. So,

 

becomes a series of four characters 38, 0, 19, 7

 

By the RSA Algorithm, our ciphertext C = Sϵ mod (N). Using a Python Program, we compute this value to be equal to a ciphertext C = 0741000008751144 where the message is encoded in terms of each of the letters being represented by numbers taken in groups of 4.

 

Code used for the calculations is shown below:

import string

 

class RSA:

   def __init__(self, string) -> None:

      self.string = string

   def encrypt(self):

      s = self.string

       p, q, e, d, encoded, enc = 31, 37, None, None, [], []

       n = p*q

       for i in range(2, phi(n)):

           if max(comfac(i, phi(n))) == 1:

              e = i

              break

      for i in range(2,n):

           if (e*i)%n == 1:

              d = i

      x = []

      alpha = [x for x in string.ascii_letters+string.digits]

      order = [y for y in range(len(alpha))]

      for t in s:

          for i,o in zip(alpha,order):

              if t == i:

                 x.append(o)

      for z in x:

          crypt = (z**e)%n

          encoded.append(crypt)

      for a in encoded:

          enc.append(f'{a:04d}')

      return ''.join(enc)

Cryptanalysis without knowledge of private key

  • Caesar Cipher:

 While decrypting the Caesar Cipher, we assume that an intruder has no knowledge of the secret key or the shift value except the ciphertext. While he can safely assume that the codespace consists of 62 characters: 26 each for the uppercase and lowercase alphabets and 10 digits, we find that there will be several permutations for the secret key.

 

If the secret key is n characters long, we find that the permutations for m characters each fitting into one of the spaces of the secret key, are given by

 

nmP = \(\frac{m!}{(m-n)!}\) = n62P =\(\frac{62!}{(62-m)!}\)

 

Considering length of secret key to be 1 character, 2 characters, 3 characters, …, and so on, we have possible permutations:

 

162P = \(\frac{62!}{61!}\) = 62

 

262P = \(\frac{62!}{60!}\) =3782

 

362P = \(\frac{62!}{59!}\) = 226920

 

462P = \(\frac{62!}{58!}\) = 13388280

 

These values correspond to probabilities of choosing a correct sequence for a correctly chosen length as follows:

 

P(1) = \(\frac{1}{62}\) = 0.01612

 

P(2) = \(\frac{1}{3782}\) = 0.0002644

 

P(3) = \(\frac{1}{226920}\) = 0.000004406

 

P(4) = \(\frac{1}{13388280}\) = 0.00000007469

 

Plotting the graph of probability (taken logarithm to the base 10 for depiction of nature) with respect to length:

Figure 3 - Length Of Key Vs Probability Of Correctly Guessing Key

In our assumed case, we chosen a length of 16 characters.

 

So in theory, the total number of possible permutations is

 

\(\frac{62!}{46!}\) = 5719086709283091520696320000

 

This gives us a probability of correctly guessing it as P(16) = 1.7485 × 10-28  ≈ 0 Thus it is firstly nearly impossible to correctly guess the secret key. Then, the intruder must choose a shift value from 0 to 62 (if he correctly assumes in the first place that total code points in our code space is 62). Once the secret key has been deciphered, the hacker must again create 62 combinations from the 62 shift values before correctly reaching the correct shift alphabet resulting in a probability

 

P \(\frac{1}{62}\) × 1.7485 × 10-28 = 0.02818 × 10-28 = 2.818 × 10-30

 

This value is extremely small and it is nearly impossible to try out the cryptanalysis using brute force method alone.

 

  • Hill Cipher:

In this method, we used the concept that a message of n characters was to be encrypted by vectorizing it into a n × 1 matrix and using a key fitted into a n × n matrix. Using this idea, a hacker can again calculated the length of the secret key to be used that is n × n = n2.

 

In our case, the final message of 4 characters hinted a possible secret key of 16 characters. If the intruder is aware of the code space with 62 code points, he again needs to use 5719086709283091520696320000 permutations for correctly guessing the key as shown above. This already gives us a meagre probability of 1.7485×10-28. Now for arranging the characters in a 4 × 4  matrix, he needs to undergo 16! = 20922789888000 permutations. This gives us a probability of 0.00000000000004779. Now the final probability of computing the correct key matrix, is 1.7485 × 10-28 × 4.779×10-14  = 8.356 × 10-42.

 

This gives us a final secret key arranged in a matrix. Then the hacker needs to compute the inverse of the matrix and multiply that to the encrypted message to obtain the plaintext.

 

  • RSA Algorithm:

This algorithm makes public an encryption key consisting of the values {ϵ,N}. To calculate the decryption key  from this information alone, we have to find a possible value such that δ ≡ ϵ-1 (mod N). This simplifies to ϵ∙δ mod N = 1 mod N and this becomes δ∙ϵ mod N = 1. For sufficiently large values of N and ϵ, this becomes highly complicated. We thus observe that N being a very large product (of 200 digits) of two large primes (each of 100 digits), gives rise to a comparatively difficult process for obtaining . This is because given δ × ϵ = κ∙N + 1, for some integer k,k can assume indefinite values but  is worked out only for those values of which yield integral results.

 

Thus unlike the previous methods, this relies on a pseudo-trapdoor function which works on the principle that multiplication of two known, large primes is relatively, but the factorization of a large product of two primes into the constituent, distinct primes, is not feasible in reasonable time.

 

It must also be noted that in each of the above cases, we assume that the hacker has knowledge of the exact encoding scheme we employ. In all probability, the chances of successful decryption might be further limited by assuming separate, discrete and bespoke encoding schemes for each of the sender and receiver. Otherwise, the decrypted value in numerals might not be of any use again.

Conclusion

  • The Caesar and Hill ciphers are symmetric encryption systems in which knowledge of one key is sufficient to encrypt and decrypt a message. The RSA algorithm employs two separate keys (private and public or decryption and encryption).
  • The probability of an attacker for finding the correct secret key in case of Caesar Cipher, assuming he exactly knows the encoding scheme and length of secret key, is 2.818×10-30.
  • In case of Hill cipher, the probability of an attacker for finding the correct secret, assuming he exactly knows the encoding scheme and length of secret key, is 8.356×10-42.
  • The probability of a hacker finding the correct decryption key in case of the RSA algorithm, is limited to nearly zero since infinitely many combinations must be tested for obtaining the correct key assuming that given conditions (N and ϵ) are also equally long.
  • Moreover, the decryption might not even be possible at all if the hacker has no knowledge of our encoding scheme and is obtaining insignificant numeric values upon decryption.
  • The most secure algorithm is the RSA algorithm followed by the Hill Cipher and the Caesar Cipher. (based on the probability score).

Reflection