Skip to content

Latest commit

 

History

History
71 lines (55 loc) · 2.81 KB

jdbc-data-source.md

File metadata and controls

71 lines (55 loc) · 2.81 KB

JdbcDataSource

Description

The JdbcDataSource framework is a utility framework that helps configuring and reading DataFrames.

This framework provides for reading from a jdbc connection.

The framework is composed of two classes:

  • JdbcDataSource, which is created based on a JdbcDataSourceConfig class and provides one main function:
    override def read(implicit spark: SparkSession): Try[DataFrame]
  • JdbcDataSourceConfig: the necessary configuration parameters

Sample code

import org.tupol.spark.io._

implicit val sparkSession: SparkSession = ???
val sourceConfiguration: JdbcDataSourceConfig = ???
val dataframe = JdbcDataSource(sourceConfiguration).read

Optionally, one can use the implicit decorator for the SparkSession available by importing org.tupol.spark.io.implicits._.

Sample code

import org.tupol.spark.io._
import org.tupol.spark.io.implicits._

val sourceConfiguration: JdbcDataSourceConfig = ???
val dataframe = spark.source(sourceConfiguration).read

Configuration Parameters

  • url Required
    • the JDBC friendly URL pointing to the source data base
  • table Required
    • the source table
  • user Optional
    • the data base connection user
  • password Optional
    • the data base connection password
  • driver Optional
    • the JDBc driver class
  • schema.path Optional
    • this is an optional parameter that represents local path or the class path to the json Apache Spark schema that should be enforced on the input data
    • this schema can be easily obtained from a DataFrame by calling the prettyJson function
    • if this parameter is found the schema will be loaded from the given file, otherwise, the schema parameter is tried
  • schema Optional
    • this is an optional parameter that represents the json Apache Spark schema that should be enforced on the input data
    • this schema can be easily obtained from a DataFrame by calling the prettyJson function
    • due to it's complex structure, this parameter can not be passed as a command line argument, but it can only be passed through the application.conf file
  • options Optional
    • due to it's complex structure, this parameter can not be passed as a command line argument, but it can only be passed through the application.conf file
    • for more details about the available options please check the References section.

References