Home

Welcome to the azure-documentdb-spark wiki!

This wiki contains the following resources for your reference:

Azure DocumentDB Spark Connector User Guide
Query Test Guide
Aggregations Examples
GraphFrames Example

Introduction

This project provides a client library that allows Azure DocumentDB to act as an input source or output sink for Spark jobs. Fast connectivity between Apache Spark and Azure DocumentDB accelerates your ability to solve your fast moving Data Sciences problems where your data can be quickly persisted and retrieved using Azure DocumentDB. With the Spark to DocumentDB connector, you can more easily solve scenarios including (but not limited to) blazing fast IoT scenarios, update-able columns when performing analytics, push-down predicate filtering, and performing advanced analytics to data sciences against your fast changing data against a geo-replicated managed document store with guaranteed SLAs for consistency, availability, low latency, and throughput.

This package is highly experimental and is provided as a technical preview only.

Common Scenarios

Common scenarios to use Apache Spark and DocumentDB together include:

Distributed Aggregations and Analytics
Push-down Predicate Filtering
Blazing Fast IoT Scenarios
Updateable Columns

Below are more details surrounding the scenario; if you're ready to use azure-documentdb-spark, please refer to the Azure DocumentDB Spark Connector User Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Introduction

Common Scenarios

Distributed Aggregations and Analytics

Push-down Predicate Filtering

Blazing Fast IoT Scenarios

Updateable Columns

Clone this wiki locally