Selecting the Best Model
We have discussed the implementation of different models, the performance varies depending on the model used for these predictions. There is a way to select the best performing regression model. R squared. We built and trained different model with some data. And assessed there performance. Import libraries, upload data, and preprocess data wherever required, (Label, onehotencoder etc), check for missing values, feature scaling if required. initialize model, train model get y_pred, and y_test and then evaluate for each model with following code: from sklearn.metrics import r2_score r2_score(y_test, y_pred) ------------------------------------------------------------- Random Forest Regression Importing the libraries [1] import numpy as np import matplotlib.pyplot as plt import pandas as pd Importing the dataset[2] dataset = pd.read_csv('Data.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values Splitting the dataset into the Training set and Test set[3] from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0) Training the Random Forest Regression model on the whole dataset[4] from sklearn.ensemble import RandomForestRegressor regressor = RandomForestRegressor(n_estimators = 10, random_state = 0) regressor.fit(X_train, y_train) Predicting the Test set results[5] y_pred = regressor.predict(X_test) np.set_printoptions(precision=2) print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)) Evaluating the Model Performance[6] from sklearn.metrics import r2_score r2_score(y_test, y_pred)