Commit 9ad8e532 authored by Alejandro Riera's avatar Alejandro Riera

Neural Network Model with 1 Hidden Layer using autograd

parent 91e2b02c
# Notes on how I proceeded:
* writing unit tests to check that my logic yielding the same results as coursera's implementation
* when implementing 1 FC layer I had a big headache
* For fast iteration I fixed the following hyperparameters: hidden_units=10, epochs=2000, learning_rate=0.01
* Best accuracies in test would range from 66% to 72%
* my mistake first was initializing the weights and biases to zeros
* this showed up as having the loss plateau very early, and hence having a very poor
accuracy in both the training and the test set
* this works on a logistic regression but
* doesnt work on NN because of what is called the symetry problem. Essentially
all your hidden units will end up calculating the same function (being symetric)
* my attempt to fix it was to initialize Weights and Biases to random values `torch.rand`
* this didnt work either
* z1, A1, and z2 had values >> 1000
* A2 = sigmoid(z2) was all ones
* I was using Log Loss (Cross Entropy Loss) function and is was yielding NaN
* hypothesis: are the weights too big?
* initialize randomly but divide by a power of 10, e.g., `torch.rand() * 0.01`
* I tried different values, 0.1, 0.01, 0.001 and 0.0001
* Smaller than 0.1 would return a sensible value for the loss
* But, for some reason, the grandients (`self.w1.grad`) return None
* I can't come up with an explanation for this, so I'll move forward and try something else
* instead of using my own initialization, I would use one of the available ones:
```
self.w1 = torch.zeros((X.shape[0], self.hidden_units), requires_grad=True, dtype=torch.float64)
torch.nn.init.normal_(self.w1, mean=0, std=0.01)
```
* Default values (mean=0, std=1) had the NaN problem
* Playing with values I foudn that mean=0, std=0.01 fixed it
* mean=0, std=0.1 still gave NaN
* mean=0, std=0.01 yielded the best results
* other intializations: Xavier Normal and Xavier Uniform
```
torch.nn.init.xavier_normal_(self.w1)
torch.nn.init.xavier_normal_(self.w1, gain=torch.nn.init.calculate_gain('relu'))
torch.nn.init.xavier_uniform_(self.w1)
torch.nn.init.xavier_uniform_(self.w1, gain=torch.nn.init.calculate_gain('relu'))
```
* both Normal and Uniform showed the same behaviour
* worked out of the box with default params
* if gain is not calculated: 66% accuracy test
* of gain is calculated: 72% accuracy test
* Best Scores achieved:
* Heavy loading
train accuracy: 100.0
test accuracy: 76.0
* xavier_normal_ with gain
* hidden_units = 1000
* epochs=10000
* learning_rate=0.01
* ~ 40 minutes training
* Lightweight
This config is very fast to train (~30 seconds) but fluctuates in performance depending (I guess)
on the initial randomization. Results go from 70% accuracy on test upto 80%, many times being 72% or 76%
* xavier_normal_ with gain
* hidden_units = 10
* epochs=1000
* learning_rate=0.01
* ~ 40 minutes training
# Ideas for next steps
* Implement my own ReLU and Sigmoid functions with Autograd
https://github.com/jcjohnson/pytorch-examples#pytorch-defining-new-autograd-functions
* Generalising nn1hl to accept L hidden layers
* Extracting the logic of the optimizer away from the model. Get inspired by torch.Optimizer
https://pytorch.org/docs/stable/optim.html
"""
My attempt at reproducing Coursera's logistic regresion example with autograd
"""
import numpy as np
import torch
class NN1HiddenLayerModel():
def __init__(self):
self.w1 = None
self.b1 = None
self.w2 = None
self.b2 = None
self.hidden_units = 10
def train(self, X, Y, epochs=1000, learning_rate=0.5):
self.w1 = torch.zeros((X.shape[0], self.hidden_units), requires_grad=True, dtype=torch.float64)
self.b1 = torch.zeros((self.hidden_units, 1) , requires_grad=True, dtype=torch.double)
self.w2 = torch.zeros((self.hidden_units, 1) , requires_grad=True, dtype=torch.float64)
self.b2 = torch.zeros((1,1) , requires_grad=True, dtype=torch.double)
# torch.nn.init.normal_(self.w1, mean=0, std=0.01)
# torch.nn.init.normal_(self.b1, mean=0, std=0.01)
# torch.nn.init.normal_(self.w2, mean=0, std=0.01)
# torch.nn.init.normal_(self.b2, mean=0, std=0.01)
torch.nn.init.xavier_normal_(self.w1, gain=torch.nn.init.calculate_gain('relu'))
torch.nn.init.xavier_normal_(self.b1, gain=torch.nn.init.calculate_gain('relu'))
torch.nn.init.xavier_normal_(self.w2, gain=torch.nn.init.calculate_gain('sigmoid'))
torch.nn.init.xavier_normal_(self.b2, gain=torch.nn.init.calculate_gain('sigmoid'))
# torch.nn.init.xavier_uniform_(self.w1, gain=torch.nn.init.calculate_gain('relu'))
# torch.nn.init.xavier_uniform_(self.b1, gain=torch.nn.init.calculate_gain('relu'))
# torch.nn.init.xavier_uniform_(self.w2, gain=torch.nn.init.calculate_gain('sigmoid'))
# torch.nn.init.xavier_uniform_(self.b2, gain=torch.nn.init.calculate_gain('sigmoid'))
m = X.shape[1]
for i in range(epochs):
z1 = self.w1.transpose(0, 1).mm(X).add(self.b1)
A1 = z1.clamp(min=0) # ReLu
z2 = self.w2.transpose(0, 1).mm(A1).add(self.b2)
A2 = torch.sigmoid(z2)
loss = Y.mul(A2.log()) + (1-Y).mul((1-A2).log())
loss = loss.sum()/(-m)
loss.backward()
# import pdb; pdb.set_trace()
with torch.no_grad():
self.w1 -= learning_rate * self.w1.grad
self.b1 -= learning_rate * self.b1.grad
self.w2 -= learning_rate * self.w2.grad
self.b2 -= learning_rate * self.b2.grad
# Manually zero the gradients after running the backward pass
self.w1.grad.zero_()
self.b1.grad.zero_()
self.w2.grad.zero_()
self.b2.grad.zero_()
if i % 100 == 0:
# import pdb; pdb.set_trace()
print ("Loss after iteration %i: %f" %(i, loss))
def predict(self, X):
with torch.no_grad():
z1 = self.w1.transpose(0, 1).mm(X).add(self.b1)
A1 = z1.clamp(min=0) # ReLu
z2 = self.w2.transpose(0, 1).mm(A1).add(self.b2)
Y_pred = torch.sigmoid(z2)
Y_pred[Y_pred<0.5] = 0
Y_pred[Y_pred>=0.5] = 1
return Y_pred
def benchmark(self, X, Y):
Y_pred = self.predict(X)
accuracy = np.mean(np.abs(Y_pred.numpy() - Y.numpy()))
accuracy = 100 - 100 * accuracy
return accuracy
if __name__ == "__main__":
from coursera01w02.lr_utils import load_dataset
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.
train_set_x = torch.from_numpy(train_set_x).type(torch.float64)
train_set_y = torch.from_numpy(train_set_y).type(torch.float64)
test_set_x = torch.from_numpy(test_set_x).type(torch.float64)
test_set_y = torch.from_numpy(test_set_y).type(torch.float64)
# d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
nn1hl = NN1HiddenLayerModel()
nn1hl.train(train_set_x, train_set_y, epochs=1000, learning_rate=0.01)
train_accuracy = nn1hl.benchmark(train_set_x, train_set_y)
test_accuracy = nn1hl.benchmark(test_set_x, test_set_y)
print(f"train accuracy: {train_accuracy}")
print(f"test accuracy: {test_accuracy}")
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment