# Types of Machine Learning algorithms

Based on the Coursera Machine Learning course1

## Supervised Learning

### Regression problem

We choose a model class: y=f(x;W) - a model class f is a way of using some numerical parameters W, to map each input vector x into a predicted output y

Univariate linear regression:

• Prediction: h**θ(x)=θ0 + θ1 * x
• Cost function: $J(\theta_0,\theta_1) = \sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$

Multivariate linear regression:

• Prediction: hθ*(*x*)=*θ*0 * *x*0 + *θ*1 * *x*1 + … + *θn * x**n
• Cost function: $J(\theta_0,\theta_1) = \sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$

### Classification problems

Y={0,1} - binary classification; can have more values We end up creating a decision boundary; inside we predict y=1, outside we predict y=0

Logistic regression with 2 outputs (binary classification):

• Prediction: $h_\theta(x)=g(\Theta^T*x)=\frac{1}{1+e^{-\Theta^T*x}}$ (sigmoid/logistic function)
• Cost: $J(\theta) = -\frac{1}{m}\sum_{i=1}^m(y^{(i)}*\log(h_\theta(x^{(i)}) + (1-y^{(i)})*\log(1-h_\theta(x^{(i)})) + \frac{\lambda}{2m}\sum_{j=1}^n\theta_j$

Logistic regression with n outputs (one vs all algorithm):

If we have to distinguish between many similar classes (ex: dog breeds), the problem is called fine grained classification.

### Support vector machines

They are learning algorithms used for binary classification of data. What a SVM does is represent examples as points in space separated by as wide a margin as possible. New examples are mapped in the same space and depending on which side of the boundary they fell are classified into 2 categories.

It can use Kernels, which take a number of landmarks in space and use the distance between an example X and a landmark L as a feature in the hypothesis. There are multiple ways to compute the distance:

##### Random initialization

To initialize our centroids we will pick K random points from the dataset, and set our μ centroids at their positions. To make sure K-means doesn’t reach a local optima, especially for a small K, we will run the algorithm more than once.

##### Choosing K

We can use the elbow method, where we look at a plot of J versus K and choose based on how sharply the line turns.

You can choose depending on the downstream use of the data.

## Reinforcement learning

Here we have an agent and an environment which the agent can interact with. Based on his interactions and our goal, we give him a score. His goal is to optimize that score.

1. https://www.coursera.org/learn/machine-learning