Since my childhood, I am really fascinated with internet. At a younger age, I used to wonder how on computers we can get all sort of data in fraction of seconds. I was always mesmerized with beautiful designs of webpages and all sorts of animations that used to come while scrolling down the webpages. Personally, cartoons and animated videos never gained much value in my mind as the animations on webpages have gained. I always wanted to learn how these webpages work and how to design such beautiful websites. But as there was lack of knowledge as well as programming skills, I was unable to achieve my dreams of designing beautiful webpages. Day by day, I have started to learn programming language and finally in last summer, I had successfully completed a course on Webpage Development. Initially I was very much interested in webpage development but very soon my happy days in developing animated webpages turned gloomy. Whenever I launched any webpage on the internet, either it got attacked by some unidentified user or whatever data transmission happened between my website and any user, the data got leaked. In the days of my childhood, I always thought about the animation and designs of the websites but never thought about the security of the website. Once we were having a video chat with relatives who reside in the US over Skype and suddenly a thought crossed my mind. I was worried that whatever data gets transferred to and from my website gets leaked I had never seen any case of cross connection in Skype. This thought led my interest to learn more about Cyber Security and Ethical Hacking. After attending three sessions, I was completely able to hack my router and was able to access every packet of data that was getting transferred through that router. But I was unable to crack the information present in the packets of the data whereas whenever I tried to use my own website as a user, I could access the information that were transferred from my website to the user as I hacked my home router. Later on, I got to know from my Cyber Security tutorial classes that data transmitted and received to and from my website was getting leaked as there was no encryption of data in my website. Moreover, there was no certification of my website as well which led to continuous cyber-attacks in my website.
As a web-developer I would like to put forward different ways to deal with the cyber-attacks so that the users of website designers may include such procedures to keep their website safe from hackers and also the users can get assured that their data will not get leaked under any circumstances. In this IA, I would like to discuss a few algorithms for data encryption so that even if one’s data gets leaked, then how that encrypted data packet can be decrypted without any key by the hacker in order to increase the efficiency of the algorithm and also, I would do a comparative study based on the algorithms.
The main motive of this exploration is to compare the suitability of the encryption algorithms using the procedure of discrete mathematics.
This domain of arithmetic primarily deals with the application of remainders in any division involving integral values. It is generally used when we calculate using numbers which tend to ‘wrap around’ after a few steps. An intuitive example is the case of a 12 - hour time system derived from a 24 - hour system. Whenever we try to note the time after 12 - noon in 24 - hour system by converting it to the 12 - hour system, we divide the hour value by 12 and obtain the remainder. The remainder is basically the hour’s value in the desired 12 - hour system. For instance, the time 15:30 in 12 - hour system is 3:30 p.m. which is obtained by dividing 15 by 12 and noting the remainder 3. This, mathematically, is denoted as (15 mod 12) = 3.
Two numbers a and b are said to be congruent (modulo N) denoted by a ≡ b (mod N) when they leave the same remainder when divided by the common number N.
This process of calculation is based on the fact that a mapping takes in a constant initial value and each subsequent calculation through he mapping takes in the previous value obtained. This results in the following mapping defined as:
Xv+1 = f(xv) where f denotes a function.
Examples of iterative calculations include the following:
It refers to a rectangular array of numbers or to symbols that are usually positioned in columns and rows. The order of this matrix is characterized by the number of columns and rows. The entries refer to the numbers within the matrix and each such number is referred to as the element. Size of the matrix is known as the ‘n by m’ matrix and it is mentioned as m×n. Here, m refers to the number of columns and n refers to the number of rows. To further explain this, having a 4×5 matrix would mean that the number of columns is 5 and number of rows is 4.
It refers to a procedure utilized by a computer language in order to save non-singular matrices in the memory. It is a set of squares which are used to fulfil the multiplication table of the specific group when the matrices are multiplied by the use of ordinary rules of the multiplication of matrix.
It refers to ways in which a group of objects can be represented by choosing them based on sets and by creating subsets. It describes the different methods in which a particular group of data can be arranged. Permutation refers to the selection of data from a particular group and combination refers to the arrangement in which they are selected.
Factors refer to numbers that if multiplied together can give another number and the division of a specific number by all its factors will give 0 as the remainder. Hence, factors refer to the exact divisor of a particular number. Prime factor refers to the factors of a given number and the factors are also prime numbers. This means that prime factor refers to prime numbers that can be multiplied to find a particular number.
It refers to a mathematical multiplicative function that is used to count positive integers till a specified integer and is usually known as ‘n’. This function is utilised in order to find out the total number of prime numbers till a given integer which is ‘n’.
In this exploration, three encryption systems have first been employed to encrypt a string using a definite predefined random key of fixed length. The encrypted strings were then tried for cryptanalysis without knowledge of private key. This showed us the probability or chances of an intruder deciphering the exact message by intercepting data during transmission.
For next few steps of algorithmic analysis, we would be using the following string S as the plaintext and the private key K.
For the sake of simplicity, we would be using a 4 - letter long string that will permit easier calculation as follows:
S = Math
The chosen private key has been obtained using a Python program. So, we have:
K = zlZeOtV0Hhbuta92
The Caesar Cipher works on the following principle for encryption.
The positional value Εv of character γ with positional value λ in the new alphabet shifted by v is given as follows where Ω represents the old and Ξ represents the new alphabets:
Ev: Ω → Ξ, Ev(γ) = (λ + v) mod 26
Mapping A to 0, B to 1 … Z to 25 and so on, we can construct a Caesar Cipher. Assuming shift value to be 3, we will have A mapped to (0 + 3) mod 26 = 3, B to (1 + 3) mod 26 = 4 … Z to (26 + 3) mod 26 = 2 and so on.
Encrypting the Caesar Cipher using a private key, we construct a new shift alphabet by placing the characters of the private key initially mapped to the first letters of the existing alphabet, followed by all the remaining letters of the existing alphabet not appearing in the encryption key. In case a single value appears more than once in the key, only the first occurrence is heeded. Using our key C, we create a new alphabet of lowercase characters, uppercase characters and digits. We can then conveniently shift the new alphabet backwards relative to the first using a shift value as well.
Following the above-mentioned steps, we have the following mapping for no shift value:
Existing Alphabet | New Alphabet |
---|---|
A | z |
B | l |
C | Z |
D | e |
\(\vdots\) | \(\vdots\) |
O | 9 |
P | 2 |
Q | A |
R | B |
S | C |
\(\vdots\) | \(\vdots\) |
This correspondence has been mapped using a Python program and using it we obtain the following ciphertext C:
C = Iekb
Where the key used is K, plaintext is S and shift value is 3.
Code used for the calculations is shown below:
import string
class Caesar:
self.string = string
self.key = key
self.shift = shift
def encrypt(self):s = self.string
key = self.key
shift = self.shift
trans = []
alpha = [i for i in string.ascii_letters+string.digits]
out = []
for j in key[shift%62:]:
trans.append(j)
for k in alpha:
if k not in trans and k not in key[:3]:
trans.append(k)
for n in key[:3]:
trans.append(n)
mapping = list(zip(alpha, trans))
for l in s:
for m in mapping:
if m[0]==l:
out.append(m[1])
encrypted = ''.join(out)
return encrypted
For this we construct a code - space such that the letter a maps to 0, b to 1, c to 2, …, z to 25 then A to 26, B to 27, …, Z to 51, followed by 0 to 52, 1 to 53, and so on till 61.
The plaintext S is represented in the form of a 1×4 vectorized matrix using our coded values of the characters:
Smatrix = 38 0 19 7
The secret key K is also written as a matrix after converting the characters to our coded values and placing them in groups of 4 in a 4×4 matrix:
Kmatrix = 25 11 51 4 40 19 47 52 33 7 1 20 19 0 61 54
Multiplying the Smatrix by Kmatrix, we have ciphertext matrix Cmatrix computed using another Python program:
Cmatrix = (25 11 51 4 40 19 47 52 33 7 1 20 19 0 61 54)⋅(38 0 19 7) = (1947 2777 1413 2259) ≡ (25 49 49 27)(mod 62)
Here, using the modulo operator helps us determine the code value of encoded character by restricting the values to the code space (of 62 characters including 26 letters each of lowercase and uppercase and 10 digits).
So mapping each value of the Cmatrix to our code space, we have the coded message as follows:
C = zXXB
Code used for the calculations is shown below:
import string
import numpy as np
class Hill:
def __init__(self, string, key) -> None:
self.string = string self.key = key
s = self.string
key = self.key
sMat = []
kMat = []
out = []
code = []
x = 0alpha = [i for i in string.ascii_letters+string.digits]
for q in range(62):
code.append(q)
comb = {c:a for c, a in zip(code, alpha)}
for i in s:
sMat.append([get(comb, i)])
for j in range(len(s)):
kMat.append([])
for k in range(len(s)):
kMat[j].append(get(comb,key[x]))
x += 1
out = np.dot(kMat, sMat)
out = out.tolist()
for a in range(len(out)):
out[a][0] = out[a][0]%62
encrypted = []
for c in out:
encrypted.append(comb[c[0]])
encrypted = ''.join(encrypted)
return encrypted
In this algorithm, we assume two large prime number (though here we consider small ones for sake of simplicity). Let these be p = 31 and q = 37.
We compute a number N = p × q = 31 × 37 = 1147. Then we obtain a number T = (p - 1) × (q - 1) = 30 × 36 = 1080.
We now calculate two numbers e and d as encryption and decryption keys respectively such that e × d (mod ϕ (N)) = 1.
Here ϕ(N) is Euler’s Totient Function, giving us the number of integers less than N which are relatively prime to N. For a prime number p, we have p = (p - 1) since all numbers lesser than p are not factors of p.
For us, ϕ(N) = ϕ(p)∙ϕ(q) = (p - 1) (q - 1) = T = N - (p + q) +1. So ϕ(1147) = 1080. We now come up with our public key that is encryption key and private key or decryption key that is δ such that gcd (ϵ,ϕ(N)) = 1 and δ ≡ϵ-1 (mod N).
Given ϕ(N) = 1080, we check upon inspection that for ϵ = 7, we have gcd gcd (ϵ, ϕN =1). So we assume ϵ to be 1.
Now,
δ ≡ ϵ-1 (mod N)
δ × ϵ ≡ 1 (mod N)
δ × ϵ (mod N) = 1 (mod N)
δ × ϵ (mod N) = 1
δ × 7 (mod 1147) = 1
Again upon inspection we have for δ = 164, δ × ϵ ≡ 1 mod(N).
So our sets of public key Kpublic = {ϵ, N} = {7, 1147} and private key K = {δ, N} = {164, 1147}
Now, we encrypt our plaintext S = Math using the notation explained in section 4.3. So, S
becomes a series of four characters 38, 0, 19, 7
By the RSA Algorithm, our ciphertext C = Sϵ mod (N). Using a Python Program, we compute this value to be equal to a ciphertext C = 0741000008751144 where the message is encoded in terms of each of the letters being represented by numbers taken in groups of 4.
Code used for the calculations is shown below:
import string
class RSA:
def __init__(self, string) -> None:
self.string = string
def encrypt(self):
s = self.string
p, q, e, d, encoded, enc = 31, 37, None, None, [], []
n = p*q
for i in range(2, phi(n)):
if max(comfac(i, phi(n))) == 1:
e = i
break
for i in range(2,n):
if (e*i)%n == 1:
d = i
x = []
alpha = [x for x in string.ascii_letters+string.digits]
order = [y for y in range(len(alpha))]
for t in s:
for i,o in zip(alpha,order):
if t == i:
x.append(o)
for z in x:
crypt = (z**e)%n
encoded.append(crypt)
for a in encoded:
enc.append(f'{a:04d}')
return ''.join(enc)
While decrypting the Caesar Cipher, we assume that an intruder has no knowledge of the secret key or the shift value except the ciphertext. While he can safely assume that the codespace consists of 62 characters: 26 each for the uppercase and lowercase alphabets and 10 digits, we find that there will be several permutations for the secret key.
If the secret key is n characters long, we find that the permutations for m characters each fitting into one of the spaces of the secret key, are given by
nmP = \(\frac{m!}{(m-n)!}\) = n62P =\(\frac{62!}{(62-m)!}\)
Considering length of secret key to be 1 character, 2 characters, 3 characters, …, and so on, we have possible permutations:
162P = \(\frac{62!}{61!}\) = 62
262P = \(\frac{62!}{60!}\) =3782
362P = \(\frac{62!}{59!}\) = 226920
462P = \(\frac{62!}{58!}\) = 13388280
These values correspond to probabilities of choosing a correct sequence for a correctly chosen length as follows:
P(1) = \(\frac{1}{62}\) = 0.01612
P(2) = \(\frac{1}{3782}\) = 0.0002644
P(3) = \(\frac{1}{226920}\) = 0.000004406
P(4) = \(\frac{1}{13388280}\) = 0.00000007469
Plotting the graph of probability (taken logarithm to the base 10 for depiction of nature) with respect to length:
In our assumed case, we chosen a length of 16 characters.
So in theory, the total number of possible permutations is
\(\frac{62!}{46!}\) = 5719086709283091520696320000
This gives us a probability of correctly guessing it as P(16) = 1.7485 × 10-28 ≈ 0 Thus it is firstly nearly impossible to correctly guess the secret key. Then, the intruder must choose a shift value from 0 to 62 (if he correctly assumes in the first place that total code points in our code space is 62). Once the secret key has been deciphered, the hacker must again create 62 combinations from the 62 shift values before correctly reaching the correct shift alphabet resulting in a probability
P = \(\frac{1}{62}\) × 1.7485 × 10-28 = 0.02818 × 10-28 = 2.818 × 10-30
This value is extremely small and it is nearly impossible to try out the cryptanalysis using brute force method alone.
In this method, we used the concept that a message of n characters was to be encrypted by vectorizing it into a n × 1 matrix and using a key fitted into a n × n matrix. Using this idea, a hacker can again calculated the length of the secret key to be used that is n × n = n2.
In our case, the final message of 4 characters hinted a possible secret key of 16 characters. If the intruder is aware of the code space with 62 code points, he again needs to use 5719086709283091520696320000 permutations for correctly guessing the key as shown above. This already gives us a meagre probability of 1.7485×10-28. Now for arranging the characters in a 4 × 4 matrix, he needs to undergo 16! = 20922789888000 permutations. This gives us a probability of 0.00000000000004779. Now the final probability of computing the correct key matrix, is 1.7485 × 10-28 × 4.779×10-14 = 8.356 × 10-42.
This gives us a final secret key arranged in a matrix. Then the hacker needs to compute the inverse of the matrix and multiply that to the encrypted message to obtain the plaintext.
This algorithm makes public an encryption key consisting of the values {ϵ,N}. To calculate the decryption key from this information alone, we have to find a possible value such that δ ≡ ϵ-1 (mod N). This simplifies to ϵ∙δ mod N = 1 mod N and this becomes δ∙ϵ mod N = 1. For sufficiently large values of N and ϵ, this becomes highly complicated. We thus observe that N being a very large product (of 200 digits) of two large primes (each of 100 digits), gives rise to a comparatively difficult process for obtaining . This is because given δ × ϵ = κ∙N + 1, for some integer k,k can assume indefinite values but is worked out only for those values of which yield integral results.
Thus unlike the previous methods, this relies on a pseudo-trapdoor function which works on the principle that multiplication of two known, large primes is relatively, but the factorization of a large product of two primes into the constituent, distinct primes, is not feasible in reasonable time.
It must also be noted that in each of the above cases, we assume that the hacker has knowledge of the exact encoding scheme we employ. In all probability, the chances of successful decryption might be further limited by assuming separate, discrete and bespoke encoding schemes for each of the sender and receiver. Otherwise, the decrypted value in numerals might not be of any use again.