Selecting the Best Model

We have discussed the implementation of different models, the performance varies depending on the model used for these predictions. There is a way to select the best performing regression model. R squared. We built and trained different model with some data. And assessed there performance. Import libraries, upload data, and preprocess data wherever required, (Label, onehotencoder etc), check for missing values, feature scaling if required. initialize model, train model get y_pred, and y_test and then evaluate for each model with following code: from sklearn.metrics import r2_score r2_score(y_test, y_pred) ------------------------------------------------------------- Random Forest Regression Importing the libraries [1] import numpy as np import matplotlib.pyplot as plt import pandas as pd Importing the dataset[2] dataset = pd.read_csv('Data.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values Splitting the dataset into the Training set and Test set[3] from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0) Training the Random Forest Regression model on the whole dataset[4] from sklearn.ensemble import RandomForestRegressor regressor = RandomForestRegressor(n_estimators = 10, random_state = 0) regressor.fit(X_train, y_train) Predicting the Test set results[5] y_pred = regressor.predict(X_test) np.set_printoptions(precision=2) print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)) Evaluating the Model Performance[6] from sklearn.metrics import r2_score r2_score(y_test, y_pred)

Artificial Neural Network

To import libraries, process data , train model various steps Involved: Part 1 - Data Preprocessing Importing the dataset Encoding categorical data Label Encoding the "Gender" column One Hot Encoding the "Geography" column Splitting the dataset into the Training set and Test set Feature Scaling Part 2 - Building the ANN Initializing the ANN Adding the input layer and the first hidden layer Adding the second hidden layer Adding the output layer Part 3 - Training the ANN Compiling the ANN Training the ANN on the Training set Part 4 - Making the predictions and evaluating the model Predicting the result of a single observation Now we take up an example with a Churn_Modelling.csv file uploaded on our folder: Importing the libraries import numpy as np import pandas as pd import tensorflow as tf Importing the dataset dataset = pd.read_csv('Churn_Modelling.csv') X = dataset.iloc[:, 3:-1].values y = dataset.iloc[:, -1].values Encoding categorical data Label Encoding the "Gender" column from sklearn.preprocessing import LabelEncoder le = LabelEncoder() X[:, 2] = le.fit_transform(X[:, 2]) One Hot Encoding the "Geography" column from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough') X = np.array(ct.fit_transform(X)) Splitting the dataset into the Training set and Test set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0) Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) Part 2 - Building the ANN Initializing the ANN ann = tf.keras.models.Sequential() Adding the input layer and the first hidden layer ann.add(tf.keras.layers.Dense(units=6, activation='relu')) Adding the second hidden layer ann.add(tf.keras.layers.Dense(units=6, activation='relu')) Adding the output layer ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid')) Part 3 - Training the ANN Compiling the ANN ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']) Training the ANN on the Training set ann.fit(X_train, y_train, batch_size = 32, epochs = 100) Part 4 - Making the predictions and evaluating the model Predicting the Test Set result y_pred = ann.predict(X_test) y_pred = (y_pred > 0.5) print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)) Predicting the result of a single observation *** print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5) Making the confusion matrix from sklearn.metrics import confusion_matrix, accuracy_score cm = confusion_matrix(y_test, y_pred) print(cm) accuracy_score(y_test, y_pred)

Cleaning Data

Data cleaning is a first and foremost task in data science. Any data collected from web scrapping is seldom free from noise. To start analizing the data we have to get rid of unnecessary data. import pandas as pd import os df = pd.read_csv('path') files = [file for file in os.listdir('path')] . all_data =pd.DataFrame() for file in files: ....df = pd.read_csv("path"+file) ....all_data=pd.concat([all_data, df]) all_data.head() all_data.to_csv("all_data", indexing = false) all_data["Month"] = all_data["Order Date"].str[0:2] all_data.head() all_data["Month"] = all_data["Month"].astype(int32) nan_df = all_data[all_data.isna[].any(axis =1)] all_data = all_data.dropna(how = 'all') all_data = all_data[all_data['Order Date']].str[0:2 != "Or"]

Deep learning course material

Deep Learning A-Zā„¢: Practice Datasets Section 1. Deep Learning A-Z (Folder Structure. Updated 20171021) Google Colab file with instructions Updated code Part 1: Artificial Neural Networks (ANN) Datasets & Templates: Artificial-Neural-Networks Additional Reading: Yann LeCun et al., 1998, Efficient BackProp By Xavier Glorot et al., 2011 Deep sparse rectifier neural networks CrossValidated, 2015, A list of cost functions used in neural networks, alongside applications Andrew Trask, 2015, A Neural Network in 13 lines of Python (Part 2 ā€“ Gradient Descent) Michael Nielsen, 2015, Neural Networks and Deep Learning Part 2: Convolutional Neural Networks (CNN) Datasets & Templates: Convolutional-Neural-Networks Additional Reading: Yann LeCun et al., 1998, Gradient-Based Learning Applied to Document Recognition Jianxin Wu, 2017, Introduction to Convolutional Neural Networks C.-C. Jay Kuo, 2016, Understanding Convolutional Neural Networks with A Mathematical Model Kaiming He et al., 2015, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Dominik Scherer et al., 2010, Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition Adit Deshpande, 2016, The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3) Rob DiPietro, 2016, A Friendly Introduction to Cross-Entropy Loss Peter Roelants, 2016, How to implement a neural network Intermezzo 2 Part 3: Recurrent Neural Networks (RNN) Datasets & Templates: Recurrent-Neural-Networks Homework-Challenge Additional Reading: Oscar Sharp & Benjamin, 2016, Sunspring Sepp (Josef) Hochreiter, 1991, Untersuchungen zu dynamischen neuronalen Netzen Yoshua Bengio, 1994, Learning Long-Term Dependencies with Gradient Descent is Difficult Razvan Pascanu, 2013, On the difficulty of training recurrent neural networks Sepp Hochreiter & Jurgen Schmidhuber, 1997, Long Short-Term Memory Christopher Olah, 2015, Understanding LSTM Networks Shi Yan, 2016, Understanding LSTM and its diagrams Andrej Karpathy, 2015, The Unreasonable Effectiveness of Recurrent Neural Networks Andrej Karpathy, 2015, Visualizing and Understanding Recurrent Networks Klaus Greff, 2015, LSTM: A Search Space Odyssey Xavier Glorot, 2011, Deep sparse rectifier neural networks Part 4: Self Organizing Maps (SOM) Datasets & Templates: Self-Organizing-Maps Mega-Case-Study Additional Reading: Tuevo Kohonen, 1990, The Self-Organizing Map Mat Buckland, 2004?, Kohonen's Self Organizing Feature Maps Nadieh Bremer, 2003, SOM ā€“ Creating hexagonal heatmaps with D3.js Part 5: Boltzmann Machines (BM) Datasets & Templates: Boltzmann-Machines Additional Reading: Yann LeCun, 2006, A Tutorial on Energy-Based Learning Jaco Van Dormael, 2009, Mr. Nobody Geoffrey Hinton, 2006, A fast learning algorithm for deep belief nets Oliver Woodford, 2012?, Notes on Contrastive Divergence Yoshua Bengio, 2006, Greedy Layer-Wise Training of Deep Networks Geoffrey Hinton, 1995, The wake-sleep algorithm for unsupervised neural networks Ruslan Salakhutdinov, 2009?, Deep Boltzmann Machines Part 6: AutoEncoders (AE) Datasets & Templates: AutoEncoders Additional Reading: Malte Skarupke, 2016, Neural Networks Are Impressively Good At Compression Francois Chollet, 2016, Building Autoencoders in Keras Chris McCormick, 2014, Deep Learning Tutorial - Sparse Autoencoder Eric Wilkinson, 2014, Deep Learning: Sparse Autoencoders Alireza Makhzani, 2014, k-Sparse Autoencoders Pascal Vincent, 2008, Extracting and Composing Robust Features with Denoising Autoencoders Salah Rifai, 2011, Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Pascal Vincent, 2010, Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion Geoffrey Hinton, 2006, Reducing the Dimensionality of Data with Neural Networks Annex: Get the Machine Learning Basics Datasets & Templates: Get-Machine-Learning-Basics ------------------------------------------------------- Data Science For Business by Foster Provost & Tom Fawcett Data Science From Scratch With Python by Joel Grus Machine Learning A-Z Course

1 2 3 Next Last