Did the neural network outperform the random forest classifier? If it did not what do you suggest we do to improve the neural network?

26 Aug Did the neural network outperform the random forest classifier? If it did not what do you suggest we do to improve the neural network?

Posted at 04:09h in Computer Science by

For Project 3, refer to the Google Apps dataset that we explored in previous videos. Below is a summary of how to set up the data for this project.

import numpy as np
import os
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

#read in the file and make a copy of the dataset
apps = pd.read_csv("http://www.jpstats.org/data/googleplaystore.csv")
dat = apps.copy()

#separate features from labels
y = dat["Installs"]
X = dat.drop("Installs", axis=1)

classnames, indices = np.unique(y, return_inverse=True)
y = indices

#first split into train and test sets
from sklearn.model_selection import train_test_split

X_train_full, X_test, y_train_full, y_test = train_test_split(X,y, test_size=0.2, random_state=34, 
                                                              stratify=y)

#now split the train_full into train and validation
X_train, X_val, y_train, y_val = train_test_split(X_train_full,y_train_full, test_size=0.2, random_state=34, 
                                                              stratify=y_train_full)

#separate numeric from categorical features
X_num = X.select_dtypes(include=[np.number]) 
X_cat = X.select_dtypes(exclude=[np.number])

#build pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

num_attribs = list(X_num)

cat_classes = np.unique(dat["Category"])
type_classes = np.unique(dat["Type"])
cont_classes = np.unique(dat["Content Rating"])
gen_classes = np.unique(dat["Genres"])

num_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy="median")),
        ('std_scaler', StandardScaler()),
    ])

full_pipeline = ColumnTransformer([
        ("num", num_pipeline, num_attribs),
        ("cat1", OneHotEncoder(categories=[cat_classes]), ["Category"]),
        ("cat2", OneHotEncoder(categories=[type_classes]), ["Type"]),
        ("cat3", OneHotEncoder(categories=[cont_classes]), ["Content Rating"]),
        ("cat4", OneHotEncoder(categories=[gen_classes]), ["Genres"])
    ])

X_train_prep = full_pipeline.fit_transform(X_train)
X_val_prep = full_pipeline.transform(X_val)
X_test_prep = full_pipeline.transform(X_test)

In a Jupyter notebook, fit a random forest classifier to this data. Use grid search to fine tune the model but only fine tune the n_estimators and max_leaf_nodes parameters. You decide what values to try for each of these in the grid search. In everything, set the random state to 34.

Determine the best model and fit the training data (X_train_prep and y_train, don’t include the validation set here). Now find the accuracy of the model on the test set. Recall that we got 15.7% accuracy on the test data when we used a Decision Tree classifier. Is the random forest any better? Explain why it is better or not.

You will now fit an MLP classification neural network to this data. For the number of layers and number of neurons, we will use a random number. For the number of hidden layers, use the following code:

from numpy.random import randint, seed
#number of hidden layers to use in NN
seed(birthdate)
randint(2, 6)

Where it says birthdate, I want you to put your birthdate in the format mmdd. So if your birthday is October 4, then put seed(1004). If your birthday is before October, then do not put the leading zero. For example, if your birthday is Aug 4, then put seed(804).

For the number of neurons in each layer, use the following code:

#number of neurons for each hidden layer
seed(birthdate)
randint(3, 8)*100

Again, put your birthdate (mmdd) where it says birthdate.

When you set up the sequential model, there is no need to start with a flatten layer since you are not dealing with images. So the first hidden layer in your network will look like:

keras.layers.Dense([the number of nodes from the above code], activation="relu", input_shape =X_train_prep.shape[1:] )

Use relu for the activation for all hidden layers. For the output layer, just use softmax with the number of nodes equal to 20 (since there are 20 categories in the labels).

Run 200 epochs. After you finish training the model, plot the training and validation loss and accuracy for each epoch. Comment on this plot by stating if you think more epochs could help or if you think we could have stopped at a lower number of epochs.

Now get the accuracy for the test set.

Did the neural network outperform the random forest classifier? If it did not what do you suggest we do to improve the neural network?

Show all your code and output in a Jupyter notebook. Also, comment (using markdown) on what you are doing in each step.

In the Jupyter notebook, put Project 3 in a heading at the top. Underneath that, put your first and last name as a subheading (use three #’s for the subheading). For the Random Forest part, put “Random Forest” as another heading in a markdown cell. For the MLP part, put “MLP” as a heading in a markdown cell.

Name your file [last name]_Project3.ipynb (for example Patrick_Project3.ipynb), and submit it to this assignment.

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?

About Wridemy

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.

How It Works

To make an Order you only need to click on “Place Order” and we will direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Are there Discounts?

All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.

Study Help

About Us

26 Aug Did the neural network outperform the random forest classifier? If it did not what do you suggest we do to improve the neural network?

About Wridemy

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.

How It Works

Are there Discounts?

Hire a tutor today CLICK HERE to make your first order

Wridemy