Documentation

5-minute quick start guide

In this tutorial, you'll learn how to setup a very simple Spark application for reading and writing data from/to Cassandra. Before you start, you need to have basic knowledge of Apache Cassandra and Apache Spark. Refer to Cassandra documentation and Spark documentation.

Prerequisites

Install and launch a Cassandra cluster and a Spark cluster.

Configure a new Scala project with the following dependencies:

Apache Spark and its dependencies
Apache Cassandra thrift and clientutil libraries matching the version of Cassandra
DataStax Cassandra driver for your Cassandra version

This driver does not depend on the Cassandra server code.

For a detailed dependency list, see project/CassandraSparkBuild.scala
For dependency versions, see project/Versions.scala

Add the spark-cassandra-connector jar and its dependency jars to the following classpaths. Make sure the Connector version you use coincides with your Spark version (i.e. Spark 1.2.x with Connector 1.2.x):

"com.datastax.spark" %% "spark-cassandra-connector" % Version

the classpath of your project
the classpath of every Spark cluster node

Building

See Building And Artifacts

Preparing example Cassandra schema

Create a simple keyspace and table in Cassandra. Run the following statements in cqlsh:

CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
CREATE TABLE test.kv(key text PRIMARY KEY, value int);

Then insert some example data:

INSERT INTO test.kv(key, value) VALUES ('key1', 1);
INSERT INTO test.kv(key, value) VALUES ('key2', 2);

Now you're ready to write your first Spark program using Cassandra.

Setting up `SparkContext`

Before creating the SparkContext, set the spark.cassandra.connection.host property to the address of one of the Cassandra nodes:

val conf = new SparkConf(true)
   .set("spark.cassandra.connection.host", "127.0.0.1")

Create a SparkContext. Substitute 127.0.0.1 with the actual address of your Spark Master (or use "local" to run in local mode):

val sc = new SparkContext("spark://127.0.0.1:7077", "test", conf)

Enable Cassandra-specific functions on the SparkContext, RDD, and DataFrame:

import com.datastax.spark.connector._

Loading and analyzing data from Cassandra

Use the sc.cassandraTable method to view this table as a Spark RDD:

val rdd = sc.cassandraTable("test", "kv")
println(rdd.count)
println(rdd.first)
println(rdd.map(_.getInt("value")).sum)

Saving data from RDD to Cassandra

Add two more rows to the table:

val collection = sc.parallelize(Seq(("key3", 3), ("key4", 4)))
collection.saveToCassandra("test", "kv", SomeColumns("key", "value"))

Next - Connecting to Cassandra

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0_quick_start.md

0_quick_start.md

Documentation

5-minute quick start guide

Prerequisites

Building

Preparing example Cassandra schema

Setting up `SparkContext`

Loading and analyzing data from Cassandra

Saving data from RDD to Cassandra

Files

0_quick_start.md

Latest commit

History

0_quick_start.md

File metadata and controls

Documentation

5-minute quick start guide

Prerequisites

Building

Preparing example Cassandra schema

Setting up SparkContext

Loading and analyzing data from Cassandra

Saving data from RDD to Cassandra

Setting up `SparkContext`