Neural Networks¶

Problem: Linear Classifiers aren’t that powerful

Image Features: Color Histogram (Neglect space information)
Image Features: Histogram of Oriented Gradients (HoG)

Image Features: Bag of Words (Data-Driven!)

To show how much does each visual words appear in the image

Neural Networks¶

2-layer Neural Network

\(f = W_2 max(0,W_1x)\) \(W_2 \in R^{C\times H}\) \(W_1 \in R^{H\times D}\) \(x\in R^D\)

Fully connected network \((NLP)\)

3-layer Neural Network

\(f = W_3 max(0,W_2 max(0,W_1 x))\) \(W_3 \in E^{C\times H_2}\) \(W_2 \in R^{H_2\times H_1}\) \(x\in R^D\)

Activation Functions¶

Remeber the two-head horse ? (now we can recongnize them !)

import numpy as np
from numpy.random import randn
# initialize weights and data
N,Din,H Dout = 64,1000,100,10
x,y = randn(N,Din),randn(N,Dout)
w1,w2 = randn(Din,H),randn(H,Dout)
# compute loss(sigmoid activation)
for t in range(10000):
  h = 1.0/(1.0+np.exp(-x.dot(w1)))
  y_pred = h.dot(w2)
  loss = np.square(y_pred-y).sum()
#Compute gradients
  dy_pred = 2.0 * (y_pred - y)
  dw2 = h.T.dot(dy_pred)
  dh = dy_pred.dot(w2.T)
  dw1 = x.T.dot(dh*h*(1-h))
#SGD step
  w1 -= 1e-4 * dw1
  w2 -= 1e-4 *dw2

Space Warping¶

Universal Approximation¶

A neural network with one hidden layer can approximate any function \(f: R_N \to R_M\) with arbitrary precision
Output is a sum of shifted, scaled ReLUs

With 4K hidden units we can build a sum of K bumps
Reality check: Networks don’t really learn bumps!

Universal approximation tells us: Neural nets can represent any function

Universal approximation DOES NOT tell us:

Whether we can actually learn any function with SGD

How much data we need to learn a function

Remember: kNN is also a universal approximator!