Let’s say you have two bit strings:
1 0 0 1 1 0 1 1
1 1 0 0 1 1 0 1
We can see the difference in these two strings, the number of changes required to change one bit string to another is the hamming distance. In our example the hamming distance is 4.
Before we jump the gun, let’s talk about a simple cryptographic encryption, repeated XOR.
Repeated XOR encryption is when you take a key x
and XOR your string y
using x
in a rotated manner.
Let’s say y = “foobargoo” and x = 0xf2, 0x9c, 0x31.
The encrypted XOR would be:
f o o b a r g o o
^ 0xf2 0x9c 0x31 0xf2 0x9c 0x31 0xf2 0x9c 0x31
The result is 0x94 0xf3 0x5e 0x90 0xfd 0x43 0x90 0xf3 0x5e (r).
Now the question is how do we decrypt r to get back y when we don’t know what x is.
This is easily solvable when we know the length of x (KEYSIZE). We just split y into buckets of length x. We now solve for each bucket independently by trying all possible values for that bucket between 0 to 255 and picking the value which give us the most English looking answer.
This works because A = B ^ C means B = A ^ C (associative).
0x94 0xf3 0x5e 0x90 0xfd 0x43 0x90 0xf3 0x5e
Bucket 1: 0x94 0x90 0x90
Bucket 2: 0xf3 0xfd 0xf3
Bucket 3: 0x5e 0x43 0x5e
All well and good. But how do we solve it when we don’t know the length of x.
The first step seems to be discovering the length of x. Because by doing that, we can easily solve the problem with the above solution.
Now that’s where we use the hamming distance.
So, to figure out the KEYSIZE we iterate over possible KEYSIZE (let’s say 2 to 10) take the result r and split it into buckets of size KEYSIZE and the one which has the least hamming distance between groups should be the required KEYSIZE.
For our example our correct key size was 3. Let’s see what happens when we try for KEYSIZE 2, 3 and 4, to see how hamming distance helps with correct KEYSIZE.
Bucket 1: 0x94 0xf3
Bucket 2: 0x5e 0x90
Bucket 3: 0xfd 0x43
Bucket 4: 0x90 0xf3
Bucket 5: 0x5e
The hamming distance between Bucket 1 and Bucket 2 is: 8
Normalized distance: 8/2 = 4
Bucket 1: 0x94 0xf3 0x5e
Bucket 2: 0x90 0xfd 0x43
Bucket 3: 0x90 0xf3 0x5e
The hamming distance between Bucket 1 and Bucket 2 is: 8
Normalized distance: 8/3 = 2.67
Bucket 1: 0x94 0xf3 0x5e 0x90
Bucket 2: 0xfd 0x43 0x90 0xf3
The hamming distance between Bucket 1 and Bucket 2 is: 16
Normalized distance: 16/4 = 4
We normalize because we want to ideally find the hamming distance per byte but since we’re taking KEYSIZE elements, we normalize.
We can always take multiple buckets and average out our results (we’ll still have to normalize against the KEYSIZE). This will give similar results.
So this clearly shows how using the hamming distance lets us discover the KEYSIZE.
The way this works is that english text in ASCII is within a close range and hence has lower average hamming distance between two alphabets than two random bytes between 0 to 255.
Let’s calculate the average of hamming distance over all azAZ
:
func main() {
hdist := 0
count := 0
for i := 65; i <= 127; i++ {
for j := 65; j <= 127; j++ {
hdist = hdist + HammingDistance(byte(i), byte(j))
count += 1
if j == 90 {
j = 96
}
}
if i == 90 {
i = 96
}
}
fmt.Println(float64(hdist)/float64(count))
}
Gives the output: 2.99
Whereas running an average of hamming distance over all 256 values:
func main() {
hdist := 0
count := 0
for i := 0; i <= 256; i++ {
for j := 0; j <= 256; j++ {
hdist = hdist + HammingDistance(byte(i), byte(j))
count += 1
}
}
fmt.Println(float64(hdist)/float64(count))
}
Gives the output: 3.99
So let’s say we have a string MJHNPY XOR’d with key KK^{1}K^{2}. This means that our result is:
(M ^ K) (J ^ K^{1}) (H ^ K^{2}) (N ^ K) (P ^ K^{1}) (Y ^ K^{2})
If we guess the KEYSIZE to be 2, we’d be doing
(M ^ K) ^ (H ^ K^{2})
That is both M and H would be in the first bucket. This would give an estimated value of 3.99 because K and K^{2} are random values.
whereas if we picked KEYSIZE to be 3, we’d be doing
(M ^ K) ^ (N ^ K)
(M ^ N)
Here both M and N are in the first bucket. The K gets eliminated because X ^ Y ^ Y = X
.
This would give an estimated value of 2.99 because here we only have M and N and both fall under english ASCII.
Where did I learn this?
https://cryptopals.com/sets/1/challenges/6
Challenge 23 is one of those challenges. It seems easy to implement at first, but when you get to actual details, it’s not so easy. Especially for those of us who aren’t used to bitwise operations on a regular basis.
The challenge is to write an “untemper” function that takes an MT19937 output and transforms it back into the corresponding element of the MT19937 state array.
This is the “temper” code which we need to reverse:
y := m.data[m.index]
y1 := y ^ ((y >> _U) & _D)
y2 := y1 ^ ((y1 << _S) & _B)
y3 := y2 ^ ((y2 << _T) & _C)
o := y3 ^ (y3 >> _L)
So given o we should be able to obtain y while propagating through intermediary states (y1, y2, y3).
Lets take this one step at a time and break it down to obtaining each of those intermediary states.
o is obtained from y3 and _L (18) as follows.
o := y3 ^ (y3 >> _L)
Easier to follow visually as shown below
We can see that the first _L bits of Y can be obtained by just taking the first _L bits of o. The 32  _L bits of y3 can be obtained by doing an XOR of o » _L with o. This is because A ^ B = C implies A = B ^ C.
So we’ve got a way to obtain y3 from o and _L.
y3 := o ^ (o >> _L)
Now that we have y3 we can move forward and obtain y2 from _T (15) and _C (0xEFC60000). Given the equation:
y3 := y2 ^ ((y2 << _T) & _C)
Which looks like
The last _T bits of y3 are the same as y2. This is because A & 0 = 0 and A ^ 0 = A. Which is visible from the diagram.
But this isn’t as simple as the previous step, is it? Yes and No.
No because you may have noticed that we only have 15 bits of untangled data (The last 15). So left shifting it with 15 would only give us the next 15 bits. Which is totally 30 bits out of 32 bits.
But Yes because, if you noticed _C, its first two bits are 11, which means that they don’t contribute to the output. We can think of this more clearly with the help of a diagram:
Notice:
y2 := y3 ^ ((y3 << _T) & _C)
We’ve obtained y3, y2 now lets retrieve y1 from y2 given _S (7) and _B (0x9D2C5680).
y2 := y1 ^ ((y1 << _S) & _B)
Again, visualizing
This is similar to the previous block, but there’s one big problem, we only have 7 bits of actual data. Going by the previous block this means that using this 7 bits we can generate only the next 7 bits of data.
Since we’re only generating 7 bits, we’d have to mask B so as to only use the parts of _B which actually effect in the calculation of the next 7 bits.
Let me explain visually:
Therefore doing:
mask := 0x7f
b := _B & uint(mask << 7)
tmp1 := y2 ^ ((y2 << _S) & b)
would give us the following result, wherein the last 14 bits are of the original y1.
Now if you see the pattern, we can use this as the intermediate value and again do:
b := _B & uint(mask << 14)
tmp2 := tmp1 ^ ((tmp1 << _S) & b)
Which would now give us the next 7 bits as shown below:
Doing this two more times, we arrive with the final value:
Therefore the final result is just a loop with 4 iterations where we mask _B and do the same as the previous step. So the code for obtaining y1 from y2 is as follows.
y1 := y2
mask := 0x7f
for i := 0; i < 4; i++ {
b := _B & uint32(mask << uint32(7 * (uint32(i) + 1)))
y1 := y1 ^ ((y1 << _S) & b)
}
This is the final value we need to obtain, with y1, y2, y3 in hand and _U (11) and _D (0xFFFFFFFF). Lets look at the equation to derive y1 from y. Which we need to reverse.
y1 := y ^ ((y >> _U) & _D)
Seems similar to the previous step, but anyone with a quick eye can see that _D is all 1s so potentially we can ignore it and shorten the equation to
y1 := y ^ (y >> _U)
Now lets visualize this
This is pretty similar to how we solved o <> y3 but the only problem is that _U < 32  _U. So we can use the same idea from the previous block. i.e. loop till we get the answer, since ( _U * 3 ) > 32. We’d need 3 loops.
Therefore, the solution is:
y := y1
for i := 0; i < 3; i++ {
y1 := y1 ^ (y1 >> _U)
}
Putting all of this together we can derive y from o as follows:
y := o ^ (o >> _L)
y = y ^ ((y << _T) & _C)
mask := 0x7f
for i := 0; i < 4; i ++ {
b := _B & uint32(mask<<uint32(7*(uint32(i)+1)))
y = y ^ ((y << _S) & b)
}
for i := 0; i < 3; i ++ {
y = y ^ (y >> _U)
}
Code: https://github.com/KarthikNayak/DeepRock
I joined Recurse Center with a goal of learning more about deeplearning. One of the projects I wanted to work on was:
Given a sequence of musical notes (X), can you train a Deep Neural Network to learn improvisation over those notes and come up with its own music (Y).
This post is a step in that direction. First we shall take it one level easier and try to train a model which can learn the intricacies of a Lead guitar and is able to Generate its own composition from the training received.
We shall use this dataset of midi files as our source for this training.
MIDI is short for Musical Instrument Digital Interface. Lets think of it as a way to store instrument data, wherein we can think of each song as something which contains one or more instruments (AKA tracks). Each track is said to contain one or more musical notes. Each note is an aggregation of:
MIDI also contains a lot more data and also metadata related to the said data. But we don’t really care about all that for now.
An important step of all Deep learning projects is ensuring that we parse the data and have good clean data for the model to learn on.
So our plan here is to:
sequence_length + 1
.unique_factor
number of unique notes in them.X
will be data[0:sequence_length]
and Y
will be data[sequence_length]
Y
. Normalize and Standardize X
.Code for the same:
# Parse the MIDIs to get only tracks with Guitars in them
def populate_guitar_track():
guitar_parts = []
for file in glob.glob("midi/**/*.mid", recursive=True):
try:
score = converter.parse(file)
guitar = instrument.ElectricGuitar
for part in instrument.partitionByInstrument(score):
if isinstance(part.getInstrument(), guitar):
print(f"Has Guitar: {file}")
guitar_parts.append(file)
except:
continue
# Generator to go through Guitar MIDIs and yield the tracks
def get_tracks():
with open('object/data/guitar_midi_files', 'rb') as f:
guitar_parts = pickle.load(f)
for file in guitar_parts:
print(f"In file: {file}")
song = converter.parse(file)
for part in instrument.partitionByInstrument(song):
if isinstance(part.getInstrument(), instrument.ElectricGuitar):
yield part
# Get all notes from a given track
def get_notes(seq_len=1, reset=False):
data = []
print(f"Parsing notes with reset set to {reset}")
if not reset:
with open('object/data/notes', 'rb') as f:
print(f"Returning notes from pickle")
return pickle.load(f)
for track in get_tracks():
tmp = []
notes = track.recurse()
for n in notes:
if isinstance(n, note.Note):
tmp.append(str(n.pitch))
elif isinstance(n, chord.Chord):
tmp.append(' '.join(str(x.pitch) for x in n))
tmp = tmp[:int(len(tmp)/seq_len)*seq_len]
data.extend(tmp)
print(f"Done parsing notes")
return data
# Check if the number of unique notes confirms to our minimum requirement
def check_data(data):
if len(np.unique(data)) > unique_factor:
return True
return False
# Parse notes and creating Training data
def create_training_data(reset=False):
X = []
Y = []
data = get_notes(sequence_length, reset)
idx = int(len(data) * data_percent/100)
enc = OrdinalEncoder()
enc.fit(np.array(data).reshape(1, 1))
print(f"Creating data from notes of size: {len(data)}")
for i in range(0, idx  sequence_length):
if check_data(data[i:i+sequence_length]):
X.append(enc.transform(np.reshape(
data[i:i+sequence_length], (1, 1))))
Y.append(enc.transform(np.reshape(
data[i+sequence_length], (1, 1))))
X = np.array(X)
Y = np.array(Y)
mean = X.mean()
std = X.std()
X = (X  mean) / std
onehot = OneHotEncoder(sparse=False)
Y = onehot.fit_transform(Y.reshape(1, 1))
This the model we’ll be using. Some notes here:
Adam
optimizerCategorical Cross Entropy
. This is because our Y
values are Onehot Encoded.learning rates
of 0.01, 0.001 and 0.0001.epochs
for each given learning rate.Code for the same:
def create_network(input_shape, op_len, lr):
model = Sequential()
model.add(CuDNNLSTM(256, input_shape=input_shape, return_sequences=True))
model.add(Dropout(0.3))
model.add(CuDNNLSTM(256))
model.add(Dense(128))
model.add(Dropout(0.3))
model.add(Dense(op_len, activation="softmax"))
opt = keras.optimizers.Adam(lr=lr, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(loss='categorical_crossentropy',
optimizer=opt, metrics=['accuracy'])
return model
def train():
x, y = get_xy()
onehot = None
with open('object/data/onehot', 'rb') as file:
onehot = pickle.load(file)
for lr in [0.01, 0.001, 0.0001]:
print(f"Training with lr: {lr}")
model = create_network(x.shape[1:], onehot.categories_[0].shape[0], lr)
history = model.fit(x, y, epochs=200, batch_size=64)
The loss against various learning rates are shown below
Once the model is trained, generating it is simple.
sequence_length
as seed.Y
.prevNotes[1:sequence_length] + Y
.Enough talk, wheres the music at?
Note: Music generated above contains the seed given to the generator too.
When I started off with this project, my expectation was to finish the project within 1 week. I undershot by a mile. It took exactly two weeks to actually wrap it up. This is mostly because I didn’t expect so much clean up would be required for the data I had gotten.
I’ve come to realize that with any Deep learning project, A good amount of time is spent in the data collection and aggregation stage. This is due to one of the following reasons
unique_factor
variable.I always overlooked this whenever I’ve gone through tutorials for deeplearning. But I cannot stress on how important this is. HyperParameter tuning is really really important.
Having the right HyperParameters can be deciding factor between a successful model and a model that doesn’t work at all.
I was stuck for a day with a model which was predicting the same output irrespective of the data being fed to the model. I couldn’t figure out why this was happening, as a last resort I tried to change the learning rate. This simple change got me a working model. Surprising.
]]>I was writing a blog post about the different activation functions and how they can impact your training. I wanted to make an interactive post so was looking at running a simple model on the browser. That’s when I found Tensorflow JS.
Kudos to the people behind this. This was a great way to bring the awesomeness of Tensorflow to the browser. When using the library you can either:
Let’s try and do both. We’ll work with the make_moons dataset from sklearn.
Parameters we’ll be using:
The data we’ll be training on:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import optimizers
from datetime import datetime
start_time = datetime.now()
model = Sequential()
model.add(Dense(4, activation='tanh', input_dim=X.shape[0]))
model.add(Dense(1, activation='sigmoid'))
sgd = optimizers.SGD(lr=1)
model.compile(optimizer=sgd,
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(X.T, Y.T, epochs=1000, batch_size=X.shape[1], verbose=0)
end_time = datetime.now()
print (end_time  start_time)
0:00:02.631679
That took 2.63s
Save the model to a form which can be used by TensorflowJS
import tensorflowjs as tfjs
tfjs.converters.save_keras_model(model, "/tmp/")
Load the model in TensorflowJS via Javascript
import * as tf from '@tensorflow/tfjs';
const model = await tf.loadModel("/tensorflowjs/model.json");
Finally Lets see how our model predicts
Code used to train the model:
const model = tf.sequential()
model.add(tf.layers.dense({units:4, inputShape:2, activation:'tanh'}))
model.add(tf.layers.dense({units:1, activation:"sigmoid"}))
var sgd = tf.train.sgd(1)
model.compile({optimizer:sgd, loss:'binaryCrossentropy', metrics:['accuracy']})
train_data().then(function(){
console.log('Training is Complete');
// Do predictions and so on...
}
async function train_data(){
const res = await model.fit(X, Y, {epochs:1000, batchSize:200, verbose: 1})
}
Why do we use an async function? That’s because model.fit() returns a Javascript promise. Hence we start an async function, and wait on it to finish.
Time taken to finish training: _Loading_
If you wait long enough, you’ll see the model finish training and prediction similar to our earlier method. The only difference is that this time it took a lot more time. On my PC it takes around 22s which is almost 1000% of the previous method.
Its amazing that we can even do something on the lines of training models on browsers. But at the same time, we can see how years of optimization on the Python front of Tensorflow clearly gives it an edge over its JS counterpart.
]]>My boot sector consisted of two files:
boot_print.asm
; Boot sector offset
[org 0x7c00]
mov bx, HELLO
call print
call print_nl
; Infinite loop, this hangs the OS
jmp $
; Include the print code
%include "boot_print.asm"
HELLO:
db 'HI', 0
; Junk padding
times 510  ($  $$) db 0
; Bytes 511 and 512 hold data to indicate if bootable
dw 0xaa55
This is the main file which instantiates the boot sector and prints the string HELLO.
Now we can compile the code and run it using qemu as shown:
nasm f bin boot_main.asm o boot.bin; and qemusystemx86_64 boot.bin
Which gives the much expected result:
NASM Success Hi
But what if we mess around with the code a little bit? coming from C, I’m used to having my includes at the beginning. So similarly I tried moving the includes to the beginning:
[org 0x7c00]
+%include "boot_print.asm"
+
mov bx, HELLO
call print
@@ 7,8 +9,6 @@ call print_nl
jmp $
%include "boot_print.asm"

HELLO:
db 'HI', 0
But running this, we are greeted with gibberish:
NASM Failure Hi
Interesting. But why is this happening? Lets look at the hexdump to see if we can find any difference
Hexdump of modified main.asm
0000000 8a60 3c07 7400 b409 cd0e 8310 01c3 f1eb
0000010 c361 0eb4 0ab0 10cd 0db0 10cd f2eb 29bb
0000020 e87c ffdc ebe8 ebff 48fe 0049 0000 0000
0000030 0000 0000 0000 0000 0000 0000 0000 0000
*
00001f0 0000 0000 0000 0000 0000 0000 0000 aa55
0000200
Hexdump of original main.asm
0000000 29bb e87c 0005 14e8 eb00 60fe 078a 003c
0000010 0974 0eb4 10cd c383 eb01 61f1 b4c3 b00e
0000020 cd0a b010 cd0d eb10 48f2 0049 0000 0000
0000030 0000 0000 0000 0000 0000 0000 0000 0000
*
00001f0 0000 0000 0000 0000 0000 0000 0000 aa55
0000200
Why? The answer is simple. If we include the boot_print.asm file in the starting, NASM will execute that before the other code, this would mean it will print random gibberish onto the screen, as per whatever is the value stored in the location stored in the bx register.
Labels in NASM are executed sequentially even if not called.