Running Apache Spark on Windows

Running hadoop on windows is not trivial, however running Apache Spark on Windows proved not too difficult. I came across couple of blogs and stackoverflow discussion which made this possible. Putting down my notes below which are outcome of these reference material.

  1. Download http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-without-hadoop.tgz ( http://spark.apache.org/downloads.html )
  2. Download Hadoop distribution for Windows from http://www.barik.net/archive/2015/01/19/172716/
  3. Create hadoop_env.cmd  in {HADOOP_INSTALL_DIR}/conf directory.
    SET JAVA_HOME=C:\Progra~1\Java\jdk1.7.0_80
    
  4. In a new command window run hadoop-env.cmd followed by  {HADOOP_INSTALL_DIR}/bin/hadoop classpath
    The output of this command is used to initialize SPARK_DIST_CLASSPATH in spark-env.cmd (You may need to create this file.)
  5. Create spark-env.cmd in {SPARK_INSTALL_DIR}/conf
     #spark-env.cmd content
     SET HADOOP_HOME=C:\amit\hadoop\hadoop-2.6.0
     SET HADOOP_CONF_DIR=%HADOOP_HOME%\conf
     set SPARK_DIST_CLASSPATH=<Output of hadoop classpath>
     SET JAVA_HOME=C:\Progra~1\Java\jdk1.7.0_80
    
  6. Now run the examples or spark shell from {SPARK_INSTALL_DIR}/bin directory. Please note that you may have to run spark-env.cmd explicitly prior running the examples or spark-shell.

References :

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: