Machine learning Cheat Sheet

8 min readMay 10, 2021

NumPy is a library for efficient array computations, modeled after Matlab. Arrays differ from plain Python lists in the way they are stored and handled. Array elements stay together in memory, so they can be quickly accessed. NumPy also supports quick sub indexing, e.g., a[0, :, 2] gives you all array elements whose first index is 0 and third index is 2.

Furthermore, NumPy provides vectorized mathematical functions. When, e.g., you call numpy.sin(a), the sine function is applied on every element of array a. This is done using compiled C code, so it works much faster than a Python for loop, even faster than list comprehensions.

SciPy provides a large menu of libraries for scientific computation, such as integration, interpolation, signal processing, linear algebra, statistics, etc. It is built upon the infrastructure of Numpy.

I should also mention the matplotlib library for graphing and visualization. It also is built on Numpy, and designed to work well with Numpy. These three are well-established Python libraries for scientific use.

Pandas library is good for analyzing tabular data. You can use it for exploratory data analysis, statistics, visualization.

Scikit-learn is a collection of advanced machine-learning algorithms for Python. It also is built upon Numpy and SciPy.

So, when do you use each?

If you want to understand the data you have, use Pandas.

If you want to use this data to train a machine-learning algorithm, use scikit-learn.

If you are doing some other scientific of engineering calculation, use SciPy.

If you want to plot data, use matplotlib.

All of these use Numpy, so you will need to set up your data structures using Numpy.

Generally speaking, I prefer using Python’s PyData stack (pandas, scikit-learn, numpy, scipy, etc) when I am dealing with smaller datasets that fit on a single machine, and where your datasets can be analysed by in-memory processing on that machine. I would prefer using Apache Spark and similar tools when the data is truly large volume (several gigabytes to petabytes). Spark has numerous machine learning algorithms that are suitable for a distributed setting, but Python’s ML and data analysis packages in my view have more polish, ease of use and are better integrated. Further more, Apache Spark is sorely lacking in the visualization department, even considering Zeppelin, which is good for simple dashboards, at best.

As for the other question of what Pandas can do that Numpy can’t -

Pandas is primarily a table manipulation library that plays nice with visualization libraries (seaborn, matplotlib) and also matrix algebra libraries (numpy). It is centred around the idea of dataframes, which are tabular data abstractions used by R programmers and are implemented in a similar way in Pandas. Pandas is extremely handy for quickly loading files from disk, summarising, describing and analyzing the data quickly, and for fast plots. Pandas is fantastic because it is reasonably fast, and integrates really nicely with other libraries and on the Jupyter notebook. You can essentially get started with analyzing a dataset in just a few lines of codes, if you know your way around Pandas. Pandas also works with Scikit-Learn in that you can pass Pandas dataframes to machine learning constructors within Scikit-Learn.

Numpy is a library that’s fantastic at doing various linear algebra operations, such as dot and cross products, matrix multiplication and inverses, and so on. It doesn’t have the same ability to manipulate files and visualize data as Pandas does. Numpy is great for doing lots of matrix calculations and fast. You can also invoke C and Fortran code from Python using Numpy, which makes it handy for speedy calculations.

The catch of course is that Pandas uses Numpy to build the dataframe objects, and Pandas series objects can be converted to and from Numpy arrays too. So, Pandas plays well with Numpy.

Deep learning playlist: https://www.youtube.com/playlist?list...

Prerequisites for this series: 1: Python tutorials (first 16 videos): https://www.youtube.com/playlist?list...

2: Pandas tutorials(first 8 videos): https://www.youtube.com/playlist?list... 3: Machine learning playlist (first 16 videos): https://www.youtube.com/playlist?list...

neural networks

how to determine number of layers and neurons in each layer

learning rate

momentum

regularization

weights

loss function

activation function

confusion matrix

precision

accuracy

recall

f1 score

k fold cross validation

over fitting

under fitting

machine learning

what iis machine learning

linear regression single variable

best fit line

minimum error which line (sum of squared difference of all data points )

y= mx +c (m = slope, c= intercept)

anaconda — sklearn

use linear regression (from sklearn import linear_model)

fitting data means training with testing data

learn and predict

model calculated values for slope m and intercept c

best fit line

linear regression multiple variable (multivariate regression)

price depends on bedrooms age and area

one value missing , variables in linear relationship

features is nothing but an independent variable

y = m1.x1 +m2.x2 + m3x3 + c

fill missing data

create a linear regression object

gradient descent and cost function

you have input and output , you find prediction function

finding best fit line = error(actual data point and predicted )sqaure them sum them and divide by data points n , mean square error

mse = 1/n sigma (actual(y) — predicted (mx+c))

mean square error = cost function

gradient descent find best fit line in efficient way less iterations

cost plotted with m and c

for every value of m and c there will be a cost

start woth some m and b 0 and 0

reduce m and c

take mini steps and check error

go to minima

error is minimum at one point

use m and b at that point

how do we take this baby steps ?

fixed sixed steps i might miss the global minima

as i reach minima step sizes are reduced

at each point calculate slope

learning rate

calculus helps

3blue1 brown channel for calculus

slope = change in y/change in x

slope at particual point

small change in y/small change in x

as x shrinks to 0 and y shrinks to 0 we converge

derivative

dy/dx

partial derivative

function depends on 2 variable

partial derivative of x with y

partial derivative of y with x =0

derivative

partial derviative of m and b

partial derivate gives u a sidrection slope

to take a step in that direction u use learning rate

use python to implement

Activation function

The purpose of the activation function is to introduce non-linearity into the network

in turn, this allows you to model a response variable (aka target variable, class label, or score) that varies non-linearly with its explanatory variables

non-linear means that the output cannot be reproduced from a linear combination of the inputs (which is not the same as output that renders to a straight line — the word for this is affine).

another way to think of it: without a non-linear activation function in the network, a NN, no matter how many layers it had, would behave just like a single-layer perceptron, because summing these layers would give you just another linear function (see definition just above).

Why would we want to eliminate linearity?

If the data we wish to model is non-linear then we need to account for that in our model.

One sentence answer: <<no matter how many layers would behave just like a single perceptron (because linear functions added together just give you a linear function).

rectified linear activation functions are extremely successful, and if our goal is just to model/approximate functions, eliminating non-linearity at all steps isn’t necessarily the right answer. With enough linear pieces, you can approximate almost any non-linear function to a high degree of accuracy. I found this a good explanation of why rectified linear units work:

Rectified linear activation functions are non-linear. I’m not sure what your comment has to do with the answer.

What is Regularization?

In general, regularization means to make things regular or acceptable. This is exactly why we use it for applied machine learning. In the context of machine learning, regularization is the process which regularizes or shrinks the coefficients towards zero. In simple words, regularization discourages learning a more complex or flexible model, to prevent overfitting.

Moving on with this article on Regularization in Machine Learning.

How Does Regularization Work?

The basic idea is to penalize the complex models i.e. adding a complexity term that would give a bigger loss for complex models. To understand it, let’s consider a simple relation for linear regression. Mathematically, it is stated as below:
Y≈ W_0+ W_1 X_1+ W_2 X_(2 )+⋯+W_P X_P

Where Y is the learned relation i.e. the value to be predicted.
X_1,X_(2 ),〖…,X〗_P , are the features deciding the value of Y.
W_1,W_(2 ),〖…,W〗_P , are the weights attached to the features X_1,X_(2 ),〖…,X〗_P respectively.
W_0 represents the bias.

Now, in order to fit a model that accurately predicts the value of Y, we require a loss function and optimized parameters i.e. bias and weights.

The loss function generally used for linear regression is called the residual sum of squares (RSS). According to the above stated linear regression relation, it can be given as:
RSS= ∑_(j=1)^m (Y_i-W_0-∑_(i=1)^n W_i X_ji )²

We can also call RSS as the linear regression objective without regularization.

Now, the model will learn by the means of this loss function. Based on our training data, it will adjust the weights (coefficients). If our dataset is noisy, it will face overfitting problems and estimated coefficients won’t generalize on the unseen data.

This is where regularization comes into action. It regularizes these learned estimates towards zero by penalizing the magnitude of coefficients.

But how it assigns a penalty to the coefficients, let’s explore.

Activation function list

loss function list

Machine learning Cheat Sheet

What is Regularization?

How Does Regularization Work?

Written by F Notes

No responses yet