HadoopCon_2015_SparkSQL

Tutorial of Spark SQL on HadoopCon 2015. This is a file of IPython notebook/Jupyter by using the Python language.

Requirements

This training material requires Spark 1.4.1

Introduction

In this tutorial, you will learn how to initialize Spark SQL with SQLContext (HiveContext), manipulate DataFrames, import data, user defined functions, and operate cache(). For example,

Slides

This is a link of Spark SQL

Python API

Spark 1.4.1:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

Reference

Please check Spark SQL and DataFrame Guide and Apache Spark for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
HadoopCon_SparkSQL_2015.ipynb		HadoopCon_SparkSQL_2015.ipynb
README.md		README.md
SparkSQL_training.py		SparkSQL_training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HadoopCon_2015_SparkSQL

Requirements

Introduction

Slides

Python API

Reference

About

Releases

Packages

Languages

wlsherica/HadoopCon_2015_SparkSQL

Folders and files

Latest commit

History

Repository files navigation

HadoopCon_2015_SparkSQL

Requirements

Introduction

Slides

Python API

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages