Introduction
In statistics,** linear regression** is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear
- linear regression can be used if the goal is;
- Error reduction in prediction or forecasting in smaller data sets Simple and Straight forward interpretability
- To explain variation in the response variable that can be attributed to variation in the explanatory variables
- To quantify the strength of the relationship between the response and the explanatory variables,
- ** Random forest Regression** Random forest is a statistical algorithm that is used to cluster points of data in functional groups. When the data set is large and/or there are many variables it becomes difficult to cluster the data because not all variables can be taken into account, therefore the algorithm can also give a certain chance that a data point belongs in a certain group.
Random forest regression can be used when the goal is;
-To capture complex non linear relationships
- To provide feature important scores
- To capture intricate patterns
- To provide more stable and robust prediction to when dealing with larger data sets
To make the decision I tested the two models using the same dataset and from the output RandomForest regression was the most fit model since it had lesser Mean Squared Error.
Here's a link to the project Airbnbs Price Prediction.)
Top comments (0)