Neural Networks: Perceptron

4 minute read

Updated:

4. Neural Networks: Perceptron

Learning outcomes

  • Describe the biological principles that inspired neural networks
  • Draw the diagram of the McColloch and Pitts neuron
  • Distinguish btwn generative and discriminative learning models

Perceptron

  • A probabilistic model for information storage and organization in the brain.
  • Simplest neural network in machine learning, and one of the most popular supervised method.
  • A collection of McCulloch and Pitts neurons together with a set of inputs and weights.

Demo in https://cs.stanford.edu/people/karpathy/convnetjs/

  • The model starts somewhere, and it slowly adapts to the shape of datapoints depending on

Our brain & neurons

  • Neural networks are models for classification and regression that were originally inspired by the brain
  • Carbon footprint of m.l is crazy

History of the engineers and neural network

  • 1943 McCulloch and Ptts – designed a neuron model
  • 1949 Donald Hebb – published about the behaviour of neurons – connection being stronger
  • 1958- Frank Rosenblatt – invention of perceptron
  • 1952- Widrow and Hoff- Adaptive Switching Circuits
  • 2009- Deep learning, etc

Firing

  • The main idea behind neural network is the fact that neurons are connected through dendrites. And they accumulate charge, and when the charge goes above the threshold, the neuron fires and sends a spike.
  • It doesn’t send small signals, it accumulates and fires
  • They tried to implement this in an artificial neuron

Hebbian learning – when 2 neurons fire at the same time, the connection might be stronger.

  • Rule that specifies how much weight of the connection between two units should be increased or decreased in proportion to the product of their activation.

McCulloch and Pitts Neuron

  • image
  • Input - vector of values and vector of weights
  • The inputs (x1,x2, …, xm) are multiplied by weight (w1,we,…,wm), and the whole thing goes into the sum. (x1w1+ x2w2 + …+ xm+wm) =h.
  • IF
    • h > threshold (theta), it outputs 1.
    • h < threshold, it outputs 0

E.g. – output?

  • image
  • 1*0.5 + 2-1 + 1*2 = 0.5, Threshold= 1.
  • This is smaller than the threshold which is 1. So it would output 0.

Model critique - How does the artificial neuron that we saw differ from actual neuron?

  1. We don’t know if neurons are implementing a sum – sum is most used in practices
  2. Neuron send trains of spikes of signal (with frequency encoding information), whereas the model send single spike down the axon.
  3. Our model is synchronous – so the layer is does step by step. whereas the actual neuron has no clock so it computes diff things diff time.
  • The model is inspired, but in reality, its approximation, regardless of whether is really how thy work

(neural networks are) A function approximator

  • Given the inputs, we get an output and we can try to match input and output to be as close as possible to any function we want.
  • We use neural network to classify the input, to learn a boundary function between the classes
  • Regression and classification are two sides of the same coin.
    • Regression – about learning a function- output of something that is not a 0 or 1.
    • Classification – is this an airplane or a truck? 0 or 1. One will be classed as another, because between the two classes, there’s always a boundary so you can see classification has done regression on this surface
      • We are trying to find a function that represents the space
  • Neural networks can be used both as generative and discriminative model - but we only do discriminative.

Generative and discriminative model

  • Generative – models that learn the distributions behind the datapoints so that you can use it to generate more data points
    • E.g. regression over the parameters of a certain distribution
  • Discriminative – it learns the boundaries so it can tell where the point lies, but can’t generate more datapoint

Perceptron

  • Screenshot 2020-02-17 at 10 37 48 pm
  • A perceptron is a model with a number of independent neurons
  • There is one output neuron per class
    • Each neuron recognizes one class, and you classify it as one that has highest output
  • Perceptron as a vector of input – fed into each neuron – multiplication by the weight and the sum happens - threshold – and one output per neuron
    • One will output 1, rest is 0 classifier is sure
    • If more than one is outputting a 1 classifier is confused.
  • If its wrong, then it needs to have its weight changed change it by how much tho?
  • 5 ways per neuron, so 25 parameters. Ive committed to specified number of parameters which is not going to change. parametric method
  • Decision boundary can only be a straight line Learning means to change the weights of the neuron (parameter)

The amount of data available has become MASSIVE thanks to internet. and the computation has grown to the point is we can simulate the surface speedup happened coz graphic cards manufacturers have started to design graphic processors for vector calculations. this could be leveraged for neural networks now there are libraries

Training the perceptron

  • Learning happens through optimisation.
  • We want to train these machines and training means changing the weight.
  • We need to decide how we define and how good we are and what we wanna get better at, and once we have defined that mathematically and we need to find an algorithm that optimizes for this objective function
    • We define an error function, then an optimisation algo finds the parameters that obtain the minimum error.
  • Objective function
    • If for every point we took the difference btwn the output of the perceptron and the desired class and we sum this up we have number of mistakes. Number of mistake is sth that we can compute

Conclusion

  • Local methods stop when the gradient is close to zero, which means that they are close to a stationary point.
  • There is no guarantee that such a point is the global minimum. Local methods will, in general, converge to a local minimum to the objective function.
  • A local minimum is a point such that all the points around it have a higher value of the objective function.

Leave a comment