ML | Supervised Learning Regression Model

Chandima Jayamina
4 min readAug 16, 2024

--

Regression is a fundamental concept in machine learning, particularly in supervised learning. It involves predicting a continuous output based on input data. Unlike classification, where the goal is to categorize inputs into discrete labels, regression focuses on estimating numerical values.

1. Linear regression on single value

In here lets try to find on the price for particular land. In here we will have the area in one column and price for that area in another column as given below.

So lets try to create a machine learning model that evaluates price for given area. 🤓

step1 : Load essential libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
  • Pandas : Used to deal with dataframes like loading csv data to dataframe. Pandas is the most popular library we can used to load data from any type of sources.
  • Numpy : Used for math calculations
  • matplotlib : Use for visual representations

step2 : Load data

df = pd.read_csv('./datafiles/homeprices.csv')
df.head()

in here we read the csv file and load the data to dataframe. df.head gives the few top records to get the idea of data.

Step3 : Get the understanding of data by visualizations

%matplotlib inline
plt.scatter(df.area, df.price, color='red', marker='*')
plt.xlabel('Area(sqr.ft)')
plt.ylabel('Price($)')

In here we plot the area vs the price in graph. So we can see how data positioned in the graph as below.

Step4: Create the machine learning model

from sklearn import linear_model
linear_reg = linear_model.LinearRegression()
linear_reg.fit(df[['area']], df['price'])

In here we used sklearn library to create the linear model. We train the model using fit function. In Linear regression features(independent variables) should be 2d array and label(dependent variable) should be 1d array.

Step5: Predict using the model

linear_reg.predict([[3200]])

In here we predict using our trained machine learning model.

This is the reference of scikit learn website. In here what it says is it tries to create the best fitting line which has the minimum error from the values given in the dataset. After creating the best fitting line it used to predict the values. This line is something like Y = Mx + B. We can find the M and B of our model by

print('Coefficient: ' + str(linear_reg.coef_[0]) + ', Intercept: ' 
+ str(linear_reg.intercept_))

Here is how our regression line looks like. This is the most simplest supervised learning technique.

Code and the dataset available in :

2. Linear regression on multi value

In here we try create a regression line using multiple independent values(features). Lets look at sample data set.

In the dataset area, bedrooms, age is independent variables(features). The dependent variable is Price. In the dataset we can see there are some null values, we need to handle those values before sending the data to train model. This part is called data preprocessing I will discuss more details of data preprocessing in another blog.

Regression line should be something similer to this. Lets try to create the model.

For the data preprocessing I use the median to replace null values

# Fill null values with median
df.bedrooms = df.bedrooms.fillna(math.floor(df.bedrooms.median()))
df.head()

Model Creation is same as the above one.

from sklearn import linear_model
linear_reg = linear_model.LinearRegression()
linear_reg.fit(df[['area','bedrooms', 'age']], df['price'])

Code and the dataset available in :

https://github.com/ChandimaJayamina/ML/blob/main/1.1%20Linear%20regression%20on%20Multiple%20features.ipynb

IF you have simple project that need to add some Machine learning model to make it better you can use these models. But if you need to calculate more accurate prices in real projects you need to have Neural network will discuss in later ML blogs. Hope if you are begineer to ML these blogs would helpfull. More exiting blogs will be soon available keep in touch. 😈

References : https://scikit-learn.org/stable/modules/linear_model.html, https://www.youtube.com/@codebasics

--

--

Chandima Jayamina
Chandima Jayamina

Written by Chandima Jayamina

Aspiring Data Scientist with a DevOps background. Exploring machine learning, data analysis, and predictive modeling. Sharing my journey in tech.

No responses yet