Big + Far Math Challenge @ ICC

April 22, 2017


Recently I participated and won First prize in Big Far Math challenge hosted by ICC. The challenge description can be found here –

Participating in it was a quite exciting and learning experience for me. I could explore different technical areas while gathering data and preparing visualizations with it.

I have shared the source code and a static version of the visualization on GitHub. The dynamic version was hosted on Apache Solr running on my local desktop.

You can visit the project page @ from which you can navigate to the visualizations that I came up with.

I am also sharing the presentation given to the judges as part of assessment if you are looking for more details.

– Amit

Running Apache Spark on Windows

July 10, 2016

Running hadoop on windows is not trivial, however running Apache Spark on Windows proved not too difficult. I came across couple of blogs and stackoverflow discussion which made this possible. Putting down my notes below which are outcome of these reference material.

  1. Download ( )
  2. Download Hadoop distribution for Windows from
  3. Create hadoop_env.cmd  in {HADOOP_INSTALL_DIR}/conf directory.
    SET JAVA_HOME=C:\Progra~1\Java\jdk1.7.0_80
  4. In a new command window run hadoop-env.cmd followed by  {HADOOP_INSTALL_DIR}/bin/hadoop classpath
    The output of this command is used to initialize SPARK_DIST_CLASSPATH in spark-env.cmd (You may need to create this file.)
  5. Create spark-env.cmd in {SPARK_INSTALL_DIR}/conf
     #spark-env.cmd content
     SET HADOOP_HOME=C:\amit\hadoop\hadoop-2.6.0
     set SPARK_DIST_CLASSPATH=<Output of hadoop classpath>
     SET JAVA_HOME=C:\Progra~1\Java\jdk1.7.0_80
  6. Now run the examples or spark shell from {SPARK_INSTALL_DIR}/bin directory. Please note that you may have to run spark-env.cmd explicitly prior running the examples or spark-shell.

References :

Big Data For Social Good Challenge

March 16, 2015


During this winter, I participated in

Big Data For Social Good Challenge

which I just stumbled upon while searching something.

This challenge was about using IBM Bluemix’s “Analytics For Hadoop” service to process a data set that is minimum 500MB in size.

This was a wonderful opportunity to get some hands on on IBM Bluemix ( IBM is giving extended trial access if you are a participant). Apart from this I was also keen to build some Data visualization app on my own.

I selected CitiBike data for one year (2013-2014). Initially I did not had a clue about what insights I could gather from the dataset, but as soon as I ran some Apache Pig scripts and started looking at the output, I could see more and more use cases around the dataset.  I could not address all the use cases I thought as I soon hit the deadline pressure. I had to finish the video demonstration and write some write up about the project.

Overall it was a very enriching experience as I did so many things for the very first time.

Listing some of them below

  • IBM Bigsheets and  BigSQL
  • Using Chart.js library
  • Using Google Maps JavaScript APIs –  It was remarkably simpler than I thought. Much appreciate these APIs from Google.
  • Creating the custom Map icon – Never realized it would be this difficult
  • HTML 5/CSS challenges when putting up the UI
  • Last but not the least GitHub’s easy way to publish your work online.

Now that the challenge is in Public voting and judging phase, appreciate if you could take a look at

and provide your feedback and vote if you like it.

Introduction to Apache Pig

September 28, 2014


I had created this presentation on introduction of Apache Pig. Hope you find this useful to understand basics of Apache Pig.

Introduction to Apache Pig
Introduction to Apache Pig