Commit 9ad8e532 by Alejandro Riera

### Neural Network Model with 1 Hidden Layer using autograd

parent 91e2b02c
README.md 0 → 100644
 # Notes on how I proceeded: * writing unit tests to check that my logic yielding the same results as coursera's implementation * when implementing 1 FC layer I had a big headache * For fast iteration I fixed the following hyperparameters: hidden_units=10, epochs=2000, learning_rate=0.01 * Best accuracies in test would range from 66% to 72% * my mistake first was initializing the weights and biases to zeros * this showed up as having the loss plateau very early, and hence having a very poor accuracy in both the training and the test set * this works on a logistic regression but * doesnt work on NN because of what is called the symetry problem. Essentially all your hidden units will end up calculating the same function (being symetric) * my attempt to fix it was to initialize Weights and Biases to random values `torch.rand` * this didnt work either * z1, A1, and z2 had values >> 1000 * A2 = sigmoid(z2) was all ones * I was using Log Loss (Cross Entropy Loss) function and is was yielding NaN * hypothesis: are the weights too big? * initialize randomly but divide by a power of 10, e.g., `torch.rand() * 0.01` * I tried different values, 0.1, 0.01, 0.001 and 0.0001 * Smaller than 0.1 would return a sensible value for the loss * But, for some reason, the grandients (`self.w1.grad`) return None * I can't come up with an explanation for this, so I'll move forward and try something else * instead of using my own initialization, I would use one of the available ones: ``` self.w1 = torch.zeros((X.shape[0], self.hidden_units), requires_grad=True, dtype=torch.float64) torch.nn.init.normal_(self.w1, mean=0, std=0.01) ``` * Default values (mean=0, std=1) had the NaN problem * Playing with values I foudn that mean=0, std=0.01 fixed it * mean=0, std=0.1 still gave NaN * mean=0, std=0.01 yielded the best results * other intializations: Xavier Normal and Xavier Uniform ``` torch.nn.init.xavier_normal_(self.w1) torch.nn.init.xavier_normal_(self.w1, gain=torch.nn.init.calculate_gain('relu')) torch.nn.init.xavier_uniform_(self.w1) torch.nn.init.xavier_uniform_(self.w1, gain=torch.nn.init.calculate_gain('relu')) ``` * both Normal and Uniform showed the same behaviour * worked out of the box with default params * if gain is not calculated: 66% accuracy test * of gain is calculated: 72% accuracy test * Best Scores achieved: * Heavy loading train accuracy: 100.0 test accuracy: 76.0 * xavier_normal_ with gain * hidden_units = 1000 * epochs=10000 * learning_rate=0.01 * ~ 40 minutes training * Lightweight This config is very fast to train (~30 seconds) but fluctuates in performance depending (I guess) on the initial randomization. Results go from 70% accuracy on test upto 80%, many times being 72% or 76% * xavier_normal_ with gain * hidden_units = 10 * epochs=1000 * learning_rate=0.01 * ~ 40 minutes training # Ideas for next steps * Implement my own ReLU and Sigmoid functions with Autograd https://github.com/jcjohnson/pytorch-examples#pytorch-defining-new-autograd-functions * Generalising nn1hl to accept L hidden layers * Extracting the logic of the optimizer away from the model. Get inspired by torch.Optimizer https://pytorch.org/docs/stable/optim.html
nn1hl.py 0 → 100644
 """ My attempt at reproducing Coursera's logistic regresion example with autograd """ import numpy as np import torch class NN1HiddenLayerModel(): def __init__(self): self.w1 = None self.b1 = None self.w2 = None self.b2 = None self.hidden_units = 10 def train(self, X, Y, epochs=1000, learning_rate=0.5): self.w1 = torch.zeros((X.shape[0], self.hidden_units), requires_grad=True, dtype=torch.float64) self.b1 = torch.zeros((self.hidden_units, 1) , requires_grad=True, dtype=torch.double) self.w2 = torch.zeros((self.hidden_units, 1) , requires_grad=True, dtype=torch.float64) self.b2 = torch.zeros((1,1) , requires_grad=True, dtype=torch.double) # torch.nn.init.normal_(self.w1, mean=0, std=0.01) # torch.nn.init.normal_(self.b1, mean=0, std=0.01) # torch.nn.init.normal_(self.w2, mean=0, std=0.01) # torch.nn.init.normal_(self.b2, mean=0, std=0.01) torch.nn.init.xavier_normal_(self.w1, gain=torch.nn.init.calculate_gain('relu')) torch.nn.init.xavier_normal_(self.b1, gain=torch.nn.init.calculate_gain('relu')) torch.nn.init.xavier_normal_(self.w2, gain=torch.nn.init.calculate_gain('sigmoid')) torch.nn.init.xavier_normal_(self.b2, gain=torch.nn.init.calculate_gain('sigmoid')) # torch.nn.init.xavier_uniform_(self.w1, gain=torch.nn.init.calculate_gain('relu')) # torch.nn.init.xavier_uniform_(self.b1, gain=torch.nn.init.calculate_gain('relu')) # torch.nn.init.xavier_uniform_(self.w2, gain=torch.nn.init.calculate_gain('sigmoid')) # torch.nn.init.xavier_uniform_(self.b2, gain=torch.nn.init.calculate_gain('sigmoid')) m = X.shape[1] for i in range(epochs): z1 = self.w1.transpose(0, 1).mm(X).add(self.b1) A1 = z1.clamp(min=0) # ReLu z2 = self.w2.transpose(0, 1).mm(A1).add(self.b2) A2 = torch.sigmoid(z2) loss = Y.mul(A2.log()) + (1-Y).mul((1-A2).log()) loss = loss.sum()/(-m) loss.backward() # import pdb; pdb.set_trace() with torch.no_grad(): self.w1 -= learning_rate * self.w1.grad self.b1 -= learning_rate * self.b1.grad self.w2 -= learning_rate * self.w2.grad self.b2 -= learning_rate * self.b2.grad # Manually zero the gradients after running the backward pass self.w1.grad.zero_() self.b1.grad.zero_() self.w2.grad.zero_() self.b2.grad.zero_() if i % 100 == 0: # import pdb; pdb.set_trace() print ("Loss after iteration %i: %f" %(i, loss)) def predict(self, X): with torch.no_grad(): z1 = self.w1.transpose(0, 1).mm(X).add(self.b1) A1 = z1.clamp(min=0) # ReLu z2 = self.w2.transpose(0, 1).mm(A1).add(self.b2) Y_pred = torch.sigmoid(z2) Y_pred[Y_pred<0.5] = 0 Y_pred[Y_pred>=0.5] = 1 return Y_pred def benchmark(self, X, Y): Y_pred = self.predict(X) accuracy = np.mean(np.abs(Y_pred.numpy() - Y.numpy())) accuracy = 100 - 100 * accuracy return accuracy if __name__ == "__main__": from coursera01w02.lr_utils import load_dataset train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset() train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T train_set_x = train_set_x_flatten/255. test_set_x = test_set_x_flatten/255. train_set_x = torch.from_numpy(train_set_x).type(torch.float64) train_set_y = torch.from_numpy(train_set_y).type(torch.float64) test_set_x = torch.from_numpy(test_set_x).type(torch.float64) test_set_y = torch.from_numpy(test_set_y).type(torch.float64) # d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True) nn1hl = NN1HiddenLayerModel() nn1hl.train(train_set_x, train_set_y, epochs=1000, learning_rate=0.01) train_accuracy = nn1hl.benchmark(train_set_x, train_set_y) test_accuracy = nn1hl.benchmark(test_set_x, test_set_y) print(f"train accuracy: {train_accuracy}") print(f"test accuracy: {test_accuracy}") \ No newline at end of file
