Skip to content

ayushsood2/PySpark-HDFS-MachineLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

PySpark-HDFS-MachineLearning

Random Forest Classification using PySpark and HDFS. Predicting term deposit subscription for Portuguese Bank.

Instructions:

Download the Ipython Notebook to run it on your machine.

Pre-requisite required to run the notebook

  1. HDFS needs to be setup on your machine to read and write the data. We have used Windows 10 for this purpose, but any OS can be used. You can follow the tutorial for setting up single node HDFS cluster here

  2. Spark needs to be installed from the official Apache Spark Link

  3. PySpark Shell needs to be configured in order to run it inside Jupyter Notebook. To setup in your machine follow these instructions

  4. Apart from these usual Python Libraries needs to be installed using pip installer.

pip install Jupyter Notebook

pip install pandas

pip install numpy

pip install math

pip install pyspark

pip install matplotlib

pip install seaborn

Let us know if you need any help with Spark or HDFS Cluster Setup or if there are any issues running the notebook.

About

Random Forest Classification using PySpark and HDFS. Predicting term deposit subscription for Portuguese Bank.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors