Skip to content

shrechak/spark-google-spreadsheets

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Google Spreadsheets

Google Spreadsheets datasource for SparkSQL and DataFrames

Build Status

Notice

The version 0.4.0 breaks compatibility with previous version. You must use a ** spreadsheetId ** to identify which spreadsheets is to be accessed or altered. On older versions a spreadsheet name is used.

If you don't know spreadsheetId, please read the Introduction to the Google Sheets API v4.

Requirements

This library has different versions of Spark 1.6+, and 2.0+:

Latest compatible versions

This library Spark Version
0.5.x 2.0.x
0.4.x 1.6.x

Linking

Using SBT:

libraryDependenicies += "com.github.potix2" %% "spark-google-spreadsheets" % "0.5.0"

Using Maven:

<dependency>
  <groupId>com.github.potix2</groupId>
  <artifactId>spark-google-spreadsheets_2.11</artifactId>
  <version>0.5.0</version>
</dependency>

SQL API

CREATE TABLE cars
USING com.github.potix2.spark.google.spreadsheets
OPTIONS (
    path "<spreadsheetId>/worksheet1",
    serviceAccountId "[email protected]",
    credentialPath "/path/to/credentail.p12"
)

Scala API

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)

// Creates a DataFrame from a specified worksheet
val df = sqlContext.read.
    format("com.github.potix2.spark.google.spreadsheets").
    option("serviceAccountId", "[email protected]").
    option("credentialPath", "/path/to/credentail.p12").
    load("<spreadsheetId>/worksheet1")

// Saves a DataFrame to a new worksheet
df.write.
    format("com.github.potix2.spark.google.spreadsheets").
    option("serviceAccountId", "[email protected]").
    option("credentialPath", "/path/to/credentail.p12").
    save("<spreadsheetId>/newWorksheet")

License

Copyright 2016, Katsunori Kanda

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Google Spreadsheets datasource for SparkSQL and DataFrames

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 95.8%
  • Shell 4.2%