Mathematics AA HL

Sample Internal Assessment

Table of content

Rationale

Aim

Background information

Process of calculation

Conclusion

Reflection

Bibliography

11 mins Read

2,057 Words

Since my childhood, I am really fascinated with internet. At a younger age, I used to wonder how on computers we can get all sort of data in fraction of seconds. I was always mesmerized with beautiful designs of webpages and all sorts of animations that used to come while scrolling down the webpages. Personally, cartoons and animated videos never gained much value in my mind as the animations on webpages have gained. I always wanted to learn how these webpages work and how to design such beautiful websites. But as there was lack of knowledge as well as programming skills, I was unable to achieve my dreams of designing beautiful webpages. Day by day, I have started to learn programming language and finally in last summer, I had successfully completed a course on Webpage Development. Initially I was very much interested in webpage development but very soon my happy days in developing animated webpages turned gloomy. Whenever I launched any webpage on the internet, either it got attacked by some unidentified user or whatever data transmission happened between my website and any user, the data got leaked. In the days of my childhood, I always thought about the animation and designs of the websites but never thought about the security of the website. Once we were having a video chat with relatives who reside in the US over Skype and suddenly a thought crossed my mind. I was worried that whatever data gets transferred to and from my website gets leaked I had never seen any case of cross connection in Skype. This thought led my interest to learn more about Cyber Security and Ethical Hacking. After attending three sessions, I was completely able to hack my router and was able to access every packet of data that was getting transferred through that router. But I was unable to crack the information present in the packets of the data whereas whenever I tried to use my own website as a user, I could access the information that were transferred from my website to the user as I hacked my home router. Later on, I got to know from my Cyber Security tutorial classes that data transmitted and received to and from my website was getting leaked as there was no encryption of data in my website. Moreover, there was no certification of my website as well which led to continuous cyber-attacks in my website.

As a web-developer I would like to put forward different ways to deal with the cyber-attacks so that the users of website designers may include such procedures to keep their website safe from hackers and also the users can get assured that their data will not get leaked under any circumstances. In this IA, I would like to discuss a few algorithms for data encryption so that even if one’s data gets leaked, then how that encrypted data packet can be decrypted without any key by the hacker in order to increase the efficiency of the algorithm and also, I would do a comparative study based on the algorithms.

The main motive of this exploration is to compare the suitability of the encryption algorithms using the procedure of discrete mathematics.

This domain of arithmetic primarily deals with the application of remainders in any division involving integral values. It is generally used when we calculate using numbers which tend to ‘wrap around’ after a few steps. An intuitive example is the case of a 12 - hour time system derived from a 24 - hour system. Whenever we try to note the time after 12 - noon in 24 - hour system by converting it to the 12 - hour system, we divide the hour value by 12 and obtain the remainder. The remainder is basically the hour’s value in the desired 12 - hour system. For instance, the time 15:30 in 12 - hour system is 3:30 p.m. which is obtained by dividing 15 by 12 and noting the remainder 3. This, mathematically, is denoted as (15 mod 12) = 3.

Two numbers *a *and *b *are said to be congruent (modulo *N*) denoted by *a ≡ b* (mod *N*) when they leave the same remainder when divided by the common number *N*.

This process of calculation is based on the fact that a mapping takes in a constant initial value and each subsequent calculation through he mapping takes in the previous value obtained. This results in the following mapping defined as:

*X _{v+1 }= f(x_{v}) * where

Examples of iterative calculations include the following:

- Logistic Map:
*X*_{v+}_{1}ρ_{ }=*X*(1_{v}*- X*)_{v} - Collatz Conjecture:
*X*_{v+}_{1}( 3_{ }=*X*+1) when_{v }*X*mod 2 = 1 \(\frac{X_v}{2}\) when_{v}*X*mod 2 = 0_{v}

It refers to a rectangular array of numbers or to symbols that are usually positioned in columns and rows. The order of this matrix is characterized by the number of columns and rows. The entries refer to the numbers within the matrix and each such number is referred to as the element. Size of the matrix is known as the ‘*n *by *m*’ matrix and it is mentioned as *m×n*. Here, *m* refers to the number of columns and *n *refers to the number of rows. To further explain this, having a 4×5 matrix would mean that the number of columns is 5 and number of rows is 4.

It refers to a procedure utilized by a computer language in order to save non-singular matrices in the memory. It is a set of squares which are used to fulfil the multiplication table of the specific group when the matrices are multiplied by the use of ordinary rules of the multiplication of matrix.

It refers to ways in which a group of objects can be represented by choosing them based on sets and by creating subsets. It describes the different methods in which a particular group of data can be arranged. Permutation refers to the selection of data from a particular group and combination refers to the arrangement in which they are selected.

Factors refer to numbers that if multiplied together can give another number and the division of a specific number by all its factors will give 0 as the remainder. Hence, factors refer to the exact divisor of a particular number. Prime factor refers to the factors of a given number and the factors are also prime numbers. This means that prime factor refers to prime numbers that can be multiplied to find a particular number.

It refers to a mathematical multiplicative function that is used to count positive integers till a specified integer and is usually known as ‘*n*’. This function is utilised in order to find out the total number of prime numbers till a given integer which is ‘*n*’.

In this exploration, three encryption systems have first been employed to encrypt a string using a definite predefined random key of fixed length. The encrypted strings were then tried for cryptanalysis without knowledge of private key. This showed us the probability or chances of an intruder deciphering the exact message by intercepting data during transmission.

For next few steps of algorithmic analysis, we would be using the following string *S *as the plaintext and the private key *K*.

For the sake of simplicity, we would be using a 4 - letter long string that will permit easier calculation as follows:

*S = Math*

The chosen private key has been obtained using a Python program. So, we have:

*K = zlZeOtV0Hhbuta92*

The Caesar Cipher works on the following principle for encryption.

The positional value Ε* _{v}* of character

E*v*: Ω → Ξ, E*v*(γ) = (λ + *v*) mod 26

Mapping A to 0, B to 1 … Z to 25 and so on, we can construct a Caesar Cipher. Assuming shift value to be 3, we will have A mapped to (0 + 3) mod 26 = 3, B to (1 + 3) mod 26 = 4 … Z to (26 + 3) mod 26 = 2 and so on.

Encrypting the Caesar Cipher using a private key, we construct a new shift alphabet by placing the characters of the private key initially mapped to the first letters of the existing alphabet, followed by all the remaining letters of the existing alphabet not appearing in the encryption key. In case a single value appears more than once in the key, only the first occurrence is heeded. Using our key *C*, we create a new alphabet of lowercase characters, uppercase characters and digits. We can then conveniently shift the new alphabet backwards relative to the first using a shift value as well.

Following the above-mentioned steps, we have the following mapping for no shift value:

Existing Alphabet

New Alphabet

A

z

B

l

C

Z

D

e

\(\vdots\)

\(\vdots\)

O

9

P

2

Q

A

R

B

S

C

\(\vdots\)

\(\vdots\)

This correspondence has been mapped using a Python program and using it we obtain the following ciphertext *C*:

*C = Iekb*

Where the key used is *K*, plaintext is *S *and shift value is 3.

Code used for the calculations is shown below:

`import string`

class Caesar:

self.string = string

self.key = key

self.shift = shift

def encrypt(self):s = self.string

key = self.key

shift = self.shift

trans = []

alpha = [i for i in string.ascii_letters+string.digits]

out = []

for j in key[shift%62:]:

trans.append(j)

for k in alpha:

if k not in trans and k not in key[:3]:

trans.append(k)

for n in key[:3]:

trans.append(n)

mapping = list(zip(alpha, trans))

for l in s:

for m in mapping:

if m[0]==l:

out.append(m[1])

encrypted = ''.join(out)

return encrypted

For this we construct a code - space such that the letter a maps to 0, b to 1, c to 2, …, z to 25 then A to 26, B to 27, …, Z to 51, followed by 0 to 52, 1 to 53, and so on till 61.

The plaintext *S *is represented in the form of a 1×4 vectorized matrix using our coded values of the characters:

S_{matrix }= 38 0 19 7

The secret key *K *is also written as a matrix after converting the characters to our coded values and placing them in groups of 4 in a 4×4 matrix:

K_{matrix }= 25 11 51 4 40 19 47 52 33 7 1 20 19 0 61 54

Multiplying the *S _{matrix}* by

C_{matrix }= (25 11 51 4 40 19 47 52 33 7 1 20 19 0 61 54)⋅(38 0 19 7) = (1947 2777 1413 2259) ≡ (25 49 49 27)(mod 62)

Here, using the modulo operator helps us determine the code value of encoded character by restricting the values to the code space (of 62 characters including 26 letters each of lowercase and uppercase and 10 digits).

So mapping each value of the* C _{matrix}* to our code space, we have the coded message as follows:

*C = zXXB*

Code used for the calculations is shown below:

`import string`

import numpy as np

`class Hill:`

def __init__(self, string, key) -> None:

self.string = string `self.key = key`

s = self.string

key = self.key

sMat = []

kMat = []

out = []

code = []

x = 0alpha = [i for i in string.ascii_letters+string.digits]

for q in range(62):

code.append(q)

comb = {c:a for c, a in zip(code, alpha)}

for i in s:

sMat.append([get(comb, i)])

for j in range(len(s)):

kMat.append([])

for k in range(len(s)):

kMat[j].append(get(comb,key[x]))

x += 1

out = np.dot(kMat, sMat)

out = out.tolist()

for a in range(len(out)):

out[a][0] = out[a][0]%62

encrypted = []

for c in out:

encrypted.append(comb[c[0]])

encrypted = ''.join(encrypted)

return encrypted

In this algorithm, we assume two large prime number (though here we consider small ones for sake of simplicity). Let these be *p = *31 and *q = *37.

We compute a number *N = p × q = *31 *× *37 *= *1147. Then we obtain a number *T *= (*p *- 1) × (*q *- 1) = 30 × 36 = 1080.

We now calculate two numbers *e *and *d *as encryption and decryption keys respectively such that *e × d *(*mod ϕ *(*N*)) = 1.

Here *ϕ*(*N*) is Euler’s Totient Function, giving us the number of integers less than *N *which are relatively prime to *N*. For a prime number *p*, we have *p *= (*p* - 1) since all numbers lesser than *p *are not factors of *p*.

For us, ϕ(*N*) = ϕ(*p*)∙ϕ(*q*) = (*p *- 1) (*q *- 1) = *T *= *N *- (*p *+ *q*) +1. So ϕ(1147) = 1080. We now come up with our public key that is encryption key and private key or decryption key that is *δ *such that gcd (*ϵ,ϕ*(*N*)) = 1 and δ ≡_{ϵ}^{-1} (*mod N*).

Given ϕ(*N*) = 1080, we check upon inspection that for ϵ = 7, we have gcd *gcd *(ϵ, ϕN =1). So we assume ϵ to be 1.

Now,

δ ≡ ϵ^{-1} (*mod N*)

δ × ϵ ≡ 1 (*mod N*)

δ × ϵ (*mod N*) = 1 (*mod N*)

δ × ϵ (*mod N*) = 1

δ × 7 (*mod *1147) = 1

Again upon inspection we have for δ = 164, δ × ϵ ≡ 1 *mod*(*N*).

So our sets of public key *K*_{public} = {ϵ, N} = {7, 1147} and private key *K *= {δ, N} = {164, 1147}

Now, we encrypt our plaintext *S *= Math using the notation explained in section 4.3. So, *S *

becomes a series of four characters 38, 0, 19, 7

By the RSA Algorithm, our ciphertext *C = S*^{ϵ} mod (*N*). Using a Python Program, we compute this value to be equal to a ciphertext C = 0741000008751144 where the message is encoded in terms of each of the letters being represented by numbers taken in groups of 4.

Code used for the calculations is shown below:

`import string`

`class RSA:`

` def __init__(self, string) -> None:`

` self.string = string`

` def encrypt(self):`

` s = self.string`

` p, q, e, d, encoded, enc = 31, 37, None, None, [], []`

` n = p*q`

` for i in range(2, phi(n)):`

` if max(comfac(i, phi(n))) == 1:`

` e = i`

` break`

` for i in range(2,n):`

` if (e*i)%n == 1:`

` d = i `

` x = []`

` alpha = [x for x in string.ascii_letters+string.digits]`

` order = [y for y in range(len(alpha))]`

` for t in s:`

` for i,o in zip(alpha,order):`

` if t == i:`

` x.append(o)`

` for z in x:`

` crypt = (z**e)%n `

` encoded.append(crypt) `

` for a in encoded:`

` enc.append(f'{a:04d}') `

` return ''.join(enc)`

- Caesar Cipher:

While decrypting the Caesar Cipher, we assume that an intruder has no knowledge of the secret key or the shift value except the ciphertext. While he can safely assume that the codespace consists of 62 characters: 26 each for the uppercase and lowercase alphabets and 10 digits, we find that there will be several permutations for the secret key.

If the secret key is n characters long, we find that the permutations for m characters each fitting into one of the spaces of the secret key, are given by

nmP = \(\frac{m!}{(m-n)!}\) = n62P =\(\frac{62!}{(62-m)!}\)

Considering length of secret key to be 1 character, 2 characters, 3 characters, …, and so on, we have possible permutations:

162P = \(\frac{62!}{61!}\) = 62

262P = \(\frac{62!}{60!}\) =3782

362P = \(\frac{62!}{59!}\) = 226920

462P = \(\frac{62!}{58!}\) = 13388280

These values correspond to probabilities of choosing a correct sequence for a correctly chosen length as follows:

P(1) = \(\frac{1}{62}\) = 0.01612

P(2) = \(\frac{1}{3782}\) = 0.0002644

P(3) = \(\frac{1}{226920}\) = 0.000004406

P(4) = \(\frac{1}{13388280}\) = 0.00000007469

Plotting the graph of probability (taken logarithm to the base 10 for depiction of nature) with respect to length:

In our assumed case, we chosen a length of 16 characters.

So in theory, the total number of possible permutations is

\(\frac{62!}{46!}\) = 5719086709283091520696320000

This gives us a probability of correctly guessing it as *P*(16) = 1.7485 × 10^{-28 }≈ 0 Thus it is firstly nearly impossible to correctly guess the secret key. Then, the intruder must choose a shift value from 0 to 62 (if he correctly assumes in the first place that total code points in our code space is 62). Once the secret key has been deciphered, the hacker must again create 62 combinations from the 62 shift values before correctly reaching the correct shift alphabet resulting in a probability

*P *= \(\frac{1}{62}\) × 1.7485 × 10^{-28} = 0.02818 × 10^{-28} = 2.818 × 10^{-30}

This value is extremely small and it is nearly impossible to try out the cryptanalysis using brute force method alone.

- Hill Cipher:

In this method, we used the concept that a message of *n *characters was to be encrypted by vectorizing it into a *n *× 1 matrix and using a key fitted into a *n × n* matrix. Using this idea, a hacker can again calculated the length of the secret key to be used that is n × n = n^{2}.

In our case, the final message of 4 characters hinted a possible secret key of 16 characters. If the intruder is aware of the code space with 62 code points, he again needs to use 5719086709283091520696320000 permutations for correctly guessing the key as shown above. This already gives us a meagre probability of 1.7485×10^{-28}. Now for arranging the characters in a 4 × 4 matrix, he needs to undergo 16! = 20922789888000 permutations. This gives us a probability of 0.00000000000004779. Now the final probability of computing the correct key matrix, is 1.7485 × 10^{-28 }× 4.779×10-^{14 }= 8.356 × 10^{-42}.

This gives us a final secret key arranged in a matrix. Then the hacker needs to compute the inverse of the matrix and multiply that to the encrypted message to obtain the plaintext.

- RSA Algorithm:

This algorithm makes public an encryption key consisting of the values {ϵ,*N*}. To calculate the decryption key from this information alone, we have to find a possible value such that δ ≡ ϵ^{-1} (*mod N*). This simplifies to ϵ∙δ *mod N = 1* *mod N* and this becomes δ∙ϵ *mod **N *= 1. For sufficiently large values of *N *and ϵ, this becomes highly complicated. We thus observe that *N *being a very large product (of 200 digits) of two large primes (each of 100 digits), gives rise to a comparatively difficult process for obtaining . This is because given δ × ϵ = κ∙N + 1, for some integer k,k can assume indefinite values but is worked out only for those values of which yield integral results.

Thus unlike the previous methods, this relies on a pseudo-trapdoor function which works on the principle that multiplication of two known, large primes is relatively, but the factorization of a large product of two primes into the constituent, distinct primes, is not feasible in reasonable time.

It must also be noted that in each of the above cases, we assume that the hacker has knowledge of the exact encoding scheme we employ. In all probability, the chances of successful decryption might be further limited by assuming separate, discrete and bespoke encoding schemes for each of the sender and receiver. Otherwise, the decrypted value in numerals might not be of any use again.

- The Caesar and Hill ciphers are symmetric encryption systems in which knowledge of one key is sufficient to encrypt and decrypt a message. The RSA algorithm employs two separate keys (private and public or decryption and encryption).
- The probability of an attacker for finding the correct secret key in case of Caesar Cipher, assuming he exactly knows the encoding scheme and length of secret key, is 2.818×10
^{-30}. - In case of Hill cipher, the probability of an attacker for finding the correct secret, assuming he exactly knows the encoding scheme and length of secret key, is 8.356×10
^{-42}. - The probability of a hacker finding the correct decryption key in case of the RSA algorithm, is limited to nearly zero since infinitely many combinations must be tested for obtaining the correct key assuming that given conditions (
*N*and*ϵ*) are also equally long. - Moreover, the decryption might not even be possible at all if the hacker has no knowledge of our encoding scheme and is obtaining insignificant numeric values upon decryption.
- The most secure algorithm is the RSA algorithm followed by the Hill Cipher and the Caesar Cipher. (based on the probability score).

- The exploration analyzed three major cryptographic encryption systems and this strengthened the credibility of derived results as the conclusive evidences are also applicable to the real world: banks, ATMs, social media, and so on.
- The usage of a customized encoding scheme has also ensured that a hacker who gets to know the decrypted numeric values will never get to know the actual message as the numbers can correspond to different characters on different encoding schemes. Moreover, the usage of ASCII characters can be replaced by Unicode characters to increase the code-space and thus the possible values.
- The visual representation and methodological algorithmic procedure adopted has helped in comprehensive evaluation at each step.

- The usage of three algorithmic descriptions excludes other encryption standards like DES, AES and so on.
- The analysis with a single string and key can be made more comprehensive by application of strings of varying lengths and keys of varying strengths.

A more comprehensive evaluation can be undertaken by considering varying algorithms with strings and keys of varying lengths. The analysis can encompass statistical measures for analyzing the degree of randomness. On another side, an investigation on the amplification of randomness by employing dynamic ergodic cryptosystems can also be analyzed. The investigation could be entitled *“Analyzing the ergodicity of chaotic cryptosystems by considering varied forms of plaintext data.”*

- Euler’s Totient Function | Brilliant Math & Science Wiki.https://brilliant.org/wiki/eulers-totient-function/. Accessed 28 Oct. 2021.
- Eyewitness Testimony | Simply Psychology.https://www.simplypsychology.org/eyewitness-testimony.html. Accessed 25 Oct. 2021.
- “Matrix Multiplication.” VEDANTU,https://www.vedantu.com/maths/matrix-multiplication,
- Matrix Representation - an Overview | ScienceDirect Topics.https://www.sciencedirect.com/topics/mathematics/matrix-representation. Accessed 28 Oct. 2021.
- “Permutation and Combination (Definition, Formulas and Examples).” BYJUS,https://byjus.com/maths/permutation-and-combination/. Accessed 28 Oct. 2021.
- “Prime Factor.” GeeksforGeeks, 10 Dec. 2018,https://www.geeksforgeeks.org/prime-factor/.