 ## Selecting the Best Model

We have discussed the implementation of different models, the performance varies depending on the model used for these predictions. There is a way to select the best performing regression model. R squared. We built and trained different model with some data. And assessed there performance. Import libraries, upload data, and preprocess data wherever required, (Label, onehotencoder etc), check for missing values, feature scaling if required. initialize model, train model get y_pred, and y_test and then evaluate for each model with following code: from sklearn.metrics import r2_score r2_score(y_test, y_pred) ------------------------------------------------------------- Random Forest Regression Importing the libraries  import numpy as np import matplotlib.pyplot as plt import pandas as pd Importing the dataset dataset = pd.read_csv('Data.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values Splitting the dataset into the Training set and Test set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0) Training the Random Forest Regression model on the whole dataset from sklearn.ensemble import RandomForestRegressor regressor = RandomForestRegressor(n_estimators = 10, random_state = 0) regressor.fit(X_train, y_train) Predicting the Test set results y_pred = regressor.predict(X_test) np.set_printoptions(precision=2) print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)) Evaluating the Model Performance from sklearn.metrics import r2_score r2_score(y_test, y_pred) ## Artificial Neural Network ## Decision Tree

Regression and classification ## Cleaning Data

Data cleaning is a first and foremost task in data science. Any data collected from web scrapping is seldom free from noise. To start analizing the data we have to get rid of unnecessary data. import pandas as pd import os df = pd.read_csv('path') files = [file for file in os.listdir('path')] . all_data =pd.DataFrame() for file in files: ....df = pd.read_csv("path"+file) ....all_data=pd.concat([all_data, df]) all_data.head() all_data.to_csv("all_data", indexing = false) all_data["Month"] = all_data["Order Date"].str[0:2] all_data.head() all_data["Month"] = all_data["Month"].astype(int32) nan_df = all_data[all_data.isna[].any(axis =1)] all_data = all_data.dropna(how = 'all') all_data = all_data[all_data['Order Date']].str[0:2 != "Or"] 