Week 2 project :Comparing linear regression and random forest regression models for Airbnb booking prices prediction.

Introduction

In statistics,** linear regression** is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear

linear regression can be used if the goal is;

Error reduction in prediction or forecasting in smaller data sets Simple and Straight forward interpretability
To explain variation in the response variable that can be attributed to variation in the explanatory variables
To quantify the strength of the relationship between the response and the explanatory variables,

** Random forest Regression** Random forest is a statistical algorithm that is used to cluster points of data in functional groups. When the data set is large and/or there are many variables it becomes difficult to cluster the data because not all variables can be taken into account, therefore the algorithm can also give a certain chance that a data point belongs in a certain group.

Random forest regression can be used when the goal is;

-To capture complex non linear relationships

To provide feature important scores
To capture intricate patterns
To provide more stable and robust prediction to when dealing with larger data sets

To make the decision I tested the two models using the same dataset and from the output RandomForest regression was the most fit model since it had lesser Mean Squared Error.
Here's a link to the project Airbnbs Price Prediction.)