Nearest Neighbour Methods

2 minute read

Updated:

3. Nearest Neighbour Methods

Learning outcomes

  • Describe difference between parametric and non-parametric methods
  • Apply k-nearest neighbour to a small dataset

Parametric methods

  • It has fixed number of parameters which change through learning
  • Role of them is to adapt the parameter so that the function that they represent classify the data point better
  • There exists a specified probability distribution (fixed structure) that you ‘assume’ your data follows.
  • Important thing is that the parameter they tend to choose a shape of the decision boundary - so I want my d.b to be circle, and then you fit the model to fit to adapt as best as possible to the data
    • Once the model is chosen, it will never be other than a straight line
  • Main idea is that you commit to a certain structure and learn the parameters

  • U use the data points to train, and throw it away
  • E.g. linear regression – parameter is the weight coefficient.

Non-parametric methods

  • Uses a flexible number of parameters, and the number of parameters often grows with the size of the training set.
  • It focuses on the data rather than a particular structure.
  • The data point becomes a part of your parameter
  • E.g. K-nearest neighbour

K-nearest neighbour

  • U gotta store every point and when you get a new point, you ask yourself which one in the dataset is the closest to the new point that I’m tryna classify? So there are 3 closest points, and each point gets to vote. 2 triangle and one circle its triangle.
    • E.g. in the book – when you r in the club and you don’t know the move to the dance, u’ll try to look people close to you to figure out what to do.

KD trees

  • In 2d, finding the nearest neighbour is very simple. But the problem is no points are very close to each other in 3d and more. So finding the closest point becomes computationally expensive KD trees!
    • If I had to compare everyone to each other itd be O(n2)
  • Create a binary tree by choosing one dimension at a time to split into 2, where the median of the point of the coordinates is.
  • It compares the point in a hierarchical structure.

Effect of K

  • K=1 – decision boundaries that are less smooth
    • When too small, they are sensitive to noise and error
  • As k increases, the d.b gets smoother
    • But the accuracy drops, as the points that are too far away are considered.

Leave a comment