Simple Spark Application

A simple Spark application that counts the occurrence of each word in a corpus and then counts the occurrence of each character in the most popular words. Includes the same program implemented in Java and Scala.

To make a jar:

mvn package

To run from a gateway node in a CDH5 cluster:

source /etc/spark/conf/spark-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera

# system jars:
CLASSPATH=/etc/hadoop/conf
CLASSPATH=$CLASSPATH:$HADOOP_HOME/*:$HADOOP_HOME/lib/*
CLASSPATH=$CLASSPATH:$HADOOP_HOME/../hadoop-mapreduce/*:$HADOOP_HOME/../hadoop-mapreduce/lib/*
CLASSPATH=$CLASSPATH:$HADOOP_HOME/../hadoop-yarn/*:$HADOOP_HOME/../hadoop-yarn/lib/*
CLASSPATH=$CLASSPATH:$HADOOP_HOME/../hadoop-hdfs/*:$HADOOP_HOME/../hadoop-hdfs/lib/*
CLASSPATH=$CLASSPATH:$SPARK_HOME/assembly/lib/*

# app jar:
CLASSPATH=$CLASSPATH:target/sparkwordcount-0.0.1-SNAPSHOT.jar

$JAVA_HOME/bin/java -cp $CLASSPATH -Dspark.master=local com.cloudera.sparkwordcount.SparkWordCount <input file> 2

This will run the application in a single local process. If the cluster is running a Spark standalone cluster manager, you can replace "-Dspark.master=local" with "-Dspark.master=spark://<master host>:<master port>".

If the cluster is running YARN, you can replace "-Dspark.master=local" with "-Dspark.master=yarn-client".

Future releases will include a "spark-submit" script that abstracts away the pain of building the classpath and specifying the cluster manager.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
src/main		src/main
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Spark Application

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

manishamde/simplesparkapp

Folders and files

Latest commit

History

Repository files navigation

Simple Spark Application

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages