ML Resources and Notes: Multiple Regression

Fist we identify different steps then we write code for specified steps for clarity of codes: ---------------------------------- For Multiple Linear Regression: Importing the libraries Importing the dataset Encoding categorical data Splitting the dataset into the Training set and Test set Training the Multiple Linear Regression model on the Training set Predicting the Test set results ----------------------------------- Import Libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd Import data dataset = pd.read_csv('50_Startups.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values Encoding Categorical Data from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough') X = np.array(ct.fit_transform(X)) Splitting Training and Test Data from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0) Training multiple linear regression model on Training set from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(X_train, y_train) Predicting the test result y_pred = regressor.predict(X_test) np.set_printoptions(precision=2) print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)) Making prediction for a single set of data: 1, 0, 0, 160000, 130000, 300000 print(regressor.predict([[1, 0, 0, 160000, 130000, 300000]])) Getting final linear regression eqation: print(regressor.coef_) print(regressor.intercept_)

ML Resources and notes_Simple_Regression

1: Importing libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd 2: Importing data dataset = pd.read_csv('Salary_Data.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values 3: Splitting data in train and test from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0) 4: Training data set from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(X_train, y_train) 5: Prediction of test data set y_pred = regressor.predict(X_train) 6: Visualizing train data and predicted value plt.scatter(X_train, y_train, color = 'red') plt.plot(X_train, regressor.predict(X_train), color = 'blue') plt.title('Test data Age-Salary vs prediction ') plt.xlabel('Age') plt.ylabel('Salary') plt.show() 7: Prediction of a single value (independent variable) : a y_a = regressor.predict([[ a ]])

ML Resources and notes: Data Preprocessing

https://sdsclub.com/machine-learning-a-z-tips-and-resources ML 1] Data Preprocessing: collecting data, data cleaning, data splitting in test and train data 2] Modelling: Build, Train, Predict 3] Evaluating: Efficiency calculation and verdict In splitting data for Train and Test 80 : 20 rule is followed. 80% of data for training and 20 % of data is segregated for testing. The model so built is tested against test data value for which value to be predicted by model is already known there by evaluating the performance of the model. Feature scaling: How each feature of the data will affect is normalized or standardized by modifying the features so that the impact of each feature is proportional. Normalization: X' = (X-X'min')/(X'max'-X'min)' Standardization: X' =(X-u)/o- that is ratio of diff of mean and standard deviation. Importing Libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd iMPORTING dTA dataset = pd.read_csv('Data.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values print(X) print(y) tAKING CARE OF MISSING DATA from sklearn.impute import SimpleImputer imputer = SimpleImputer(missing_values=np.nan, strategy='mean') imputer.fit(X[:, 1:3]) X[:, 1:3] = imputer.transform(X[:, 1:3]) print(X) Encoding the categorical independent variable from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough') X = np.array(ct.fit_transform(X)) print(X) Encoding dependent variable from sklearn.preprocessing import LabelEncoder le = LabelEncoder() y = le.fit_transform(y) print(y) Splitting data to training and test sets from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1) print(X_train) print(Y_train) print(X_test) print(Y_test) Feature Scalling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.fit_transform(X_test) print(X_train) print(x_test)

Pandas on the go

Let dogs be a pandas dataframe dogs.head() dogs.info() dogs.shape dogs.describe() dogs.values dogs.columns dogs.index dogs.sort_values("weight_kgs") dogs.sort_values("weight_kgs", ascending = False) dogs.sort_values(["weight_kgs", "height_cm"]) dogs.sort_values(["weight_kgs", "height_cm"], ascending = [True, False]) dogs["name"] dogs[["breed", "height_cm"]] cols_to_subset = ["breed", "height_cm"] dogs[cols_to_subset] dogs[ "height_cm"] > 50 dogs[dogs[ "height_cm"] ] dogs[dogs["breed"] == "Labrador"] dogs[dogs["date_of_birth"] < "2015-01-01"] is_lab = dogs["breed"] == "Labrador" is_brown = dogs["color"] == "brown" dogs[is_lab & is_brown] dogs[(is_lab = dogs["breed"] == "Labrador") & (is_brown = dogs["color"] == "brown")] is_black_or_brown = dogs["color"].isin(["Black", "Brown"]) dogs[is_black_or_brown] # Create indiv_per_10k col as homeless individuals per 10k state pop homelessness["indiv_per_10k"] = 10000 * homelessness["individuals"]/ homelessness["state_pop"] # Subset rows for indiv_per_10k greater than 20 high_homelessness = homelessness[homelessness["indiv_per_10k"] > 20] # Sort high_homelessness by descending indiv_per_10k high_homelessness_srt = high_homelessness.sort_values("indiv_per_10k") # From high_homelessness_srt, select the state and indiv_per_10k cols result = high_homelessness_srt[["state", "indiv_per_10k"]] # See the result print(result) Iterating through the data frame (bric) to print.: for index, row in brics.iterrows(): country = row['country'] population = row['population'] print(f"The population of {country} is {population} million!")

Resigning Of A ML Pioneer from Google

Resigning of a pioneer in its field came as a surprise, as a caution note against uncontrolled development of AI. Though many have voted against it and have supported that the generative intelligence of any AI module or chatGPT or any of its equivalent is far from its level to pose any serious threat to mankind, It has to cross many stages to reach any where near to human intelligence. But the alarm raised can also not be over looked. If the artificial intelligence took the control of any system, as it learn very fast and in no time it may far exceed human capacity to control it. Most intriguing part of ML is that even coders do not have complete idea once deployed, how the model learns. As a general public I do not think the ML models may challenge human brain in next decade. But also keep in mind I do not understand the complexity of the code of this server, once deployed, how it is serving peoples world wide. So we must go by the leaned peoples in this field. Our problem is that even experts are divided over this issue.

First Previous 1 2 3 4 5 Next Last