# Hard parts of Simple Linear Regression

### Simple Linear Regression

Regression (predicting an output based on a new input and previous learning).

Basically, Regression Analysis allows us to discover if there’s a relationship

between an independent variable/s and a dependent variable (the target). For

example, in a Simple Linear Regression we want to know if there’s a

relationship between x and y. This is very useful in forecasting (e.g. where is

the trend going) and time series modelling (e.g. temperature levels by year and

if global warming is true).

Here we’ll be dealing with one independent variable and one dependent. Later

on we’ll be dealing with multiple variables and show how can they be used to

predict the target (similar to what we talked about predicting something based

on several features/attributes).

For now, let’s see an example of a Simple Linear Regression wherein we

analyze Salary Data (Salary_Data.csv). Here’s the dataset (comma-separated

values and the columns are years, experience, and salary):

```
YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
```

Here’s the Python code for fitting Simple Linear Regression to the Training

Set:

```
# Importing the libraries
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)
# Fitting Simple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predicting the Test set results
y_pred = regressor.predict(X_test)
# Visualising the Training set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
# Visualising the Test set results
plt.scatter(X_test, y_test, color = 'red')plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
```

The overall goal here is to create a model that will predict Salary based on

Years of Experience. First, we create a model using the Training Set (70% of

the dataset). It will then fit a line that is close as possible with most of the data

points.

After the line is created, we then apply that same line to the Test Set (the

remaining 30% or 1/3 of the dataset).

Notice that the line performed well both on the Training Set and the Test Set.

As a result, there’s a good chance that the line or our model will also perform

well on new data.

Let’s have a recap of what happened. First, we imported the necessary

libraries (pandas for processing data, matplotlib for data visualization). Next,

we imported the dataset and assigned X (the independent variable) to Years of

Experience and y (the target) to Salary. We then split the dataset into Training

Set (2⁄3) and Test Set (1⁄3).

Then, we apply the Linear Regression model and fitted a line (with the help of

scikit-learn, which is a free software machine learning library for the Python

programming language). This is accomplished through the following lines of

code:

```
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
```

After learning from the Training Set (X_train and y_train), we then apply that

regressor to the Test Set (X_test) and compare the results using data

visualization (matplotlib).

It’s a straightforward approach. Our model learns from the Training Set and

then applies that to the Test Set (and see if the model is good enough). This is

the essential principle of Simple Linear Regression.