Spark Cluster on Google Compute Engine

We were up an running with R on GCE in close to an hour. Looking forward to giving Spark a go soon as well. More compute power on tap creates possibilities.

Ido Green

gce+sparkWhat is Spark and Why?

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop. In the past, I’ve wrote an intro on how to install Spark on GCE and since then, I wanted to do a follow up on the topic but with more real world example of installing a cluster. Luckily to me, a reader of the blog did the work! So after I got his approval, I wanted to share with you his script.

View original post 183 more words

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s