Hey Jordan, looking at problem 2.4, how do you want us to implement the neural network? Do you want us to use:
Method #1
model = Sequential()
model.add(Dense(256, activation='relu',input_shape=(784,)))
model.compile(loss=keras.losses.categorical_crossentropy,optimizer=keras.optimizers.Adadelta(),metrics=['accuracy'])
model.fit(x_train, y_train,batch_size=32,epochs=10,verbose=1,validation_data=(x_test, y_test))
Method #2
alpha = 0.01 # set learning rate
theta_1 = np.random.normal(0,.1,size=(2,3)); b1 = np.zeros((1,3)) # init weights
theta_2 = np.random.normal(0,.1,size=(3,2)); b2 = np.zeros((1,2))
J = []
for i in range(10000):
l1 = relu(np.dot(X, theta_1) + b1) # l1 = X * theta_1
y_hat = softmax(np.dot(l1, theta_2) + b2) # Y_hat = l1 * theta_2 + b
cost = np.sum( - (Y * np.log(y_hat) + (1 - Y) * np.log(1 - y_hat)) )
J.append(cost) # store cost
dJ_dZ2 = d_softmax(y_hat,Y)
dJ_dtheta2 = np.dot(l1.T, dJ_dZ2) # compute gradients
dJ_db2 = np.sum(dJ_dZ2, axis=0, keepdims=True)
dJ_dZ1 = np.dot(dJ_dZ2, theta_2.T) * d_relu(l1)
dJ_db1 = np.sum(dJ_dZ1, axis=0, keepdims=True)
theta_2 -= alpha * dJ_dtheta2 # weight update
b2 -= alpha * dJ_db2
theta_1 -= alpha * np.dot(X.T, dJ_dZ1)
b1 -= alpha * dJ_db1
if J[-1] == 0 or J[-1] > 10: break
The issue with method #1 is that you can't implement the learning rate portion that you wanted us to use, so I am assuming it's method #2 but I wanted to clarify with you. Please let me know.
Hey Jordan, looking at problem 2.4, how do you want us to implement the neural network? Do you want us to use:
Method #1
model = Sequential()
model.add(Dense(256, activation='relu',input_shape=(784,)))
model.compile(loss=keras.losses.categorical_crossentropy,optimizer=keras.optimizers.Adadelta(),metrics=['accuracy'])
model.fit(x_train, y_train,batch_size=32,epochs=10,verbose=1,validation_data=(x_test, y_test))
Method #2
alpha = 0.01 # set learning rate
theta_1 = np.random.normal(0,.1,size=(2,3)); b1 = np.zeros((1,3)) # init weights
theta_2 = np.random.normal(0,.1,size=(3,2)); b2 = np.zeros((1,2))
J = []
for i in range(10000):
l1 = relu(np.dot(X, theta_1) + b1) # l1 = X * theta_1
y_hat = softmax(np.dot(l1, theta_2) + b2) # Y_hat = l1 * theta_2 + b
The issue with method #1 is that you can't implement the learning rate portion that you wanted us to use, so I am assuming it's method #2 but I wanted to clarify with you. Please let me know.