Skip to content

Commit

Permalink
GitBook: No commit message
Browse files Browse the repository at this point in the history
  • Loading branch information
navinrathore authored and gitbook-bot committed Sep 20, 2023
1 parent a1f3d15 commit 0126de9
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,12 @@
* [Linking across datasets](setup/link.md)
* [Data Sources and Sinks](dataSourcesAndSinks/connectors.md)
* [Zingg Pipes](dataSourcesAndSinks/pipes.md)
* [Databricks](dataSourcesAndSinks/databricks.md)
* [Databricks](connectors/databricks.md)
* [Snowflake](dataSourcesAndSinks/snowflake.md)
* [JDBC](dataSourcesAndSinks/jdbc.md)
* [Postgres](connectors/jdbc/postgres.md)
* [MySQL](connectors/jdbc/mysql.md)
* [AWS S3](dataSourcesAndSinks/amazonS3.md)
* [AWS S3](connectors/amazons3.md)
* [Cassandra](dataSourcesAndSinks/cassandra.md)
* [MongoDB](dataSourcesAndSinks/mongodb.md)
* [Neo4j](dataSourcesAndSinks/neo4j.md)
Expand Down
25 changes: 25 additions & 0 deletions docs/connectors/amazons3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# S3

1. Set a bucket e.g. zingg28032023 and a folder inside it e.g. zingg

2. Create aws access key and export via env vars (ensure that the user with below keys has read/write access to above):

export AWS_ACCESS_KEY_ID=<access key id>
export AWS_SECRET_ACCESS_KEY=<access key>

(if mfa is enabled AWS_SESSION_TOKEN env var would also be needed )

3. Download hadoop-aws-3.1.0.jar and aws-java-sdk-bundle-1.11.271.jar via maven

4. Set above in zingg.conf :
spark.jars=/<location>/hadoop-aws-3.1.0.jar,/<location>/aws-java-sdk-bundle-1.11.271.jar

5. Run using:

./scripts/zingg.sh --phase findTrainingData --properties-file config/zingg.conf --conf examples/febrl/config.json --zinggDir s3a://zingg28032023/zingg
./scripts/zingg.sh --phase label --properties-file config/zingg.conf --conf examples/febrl/config.json --zinggDir s3a://zingg28032023/zingg
./scripts/zingg.sh --phase train --properties-file config/zingg.conf --conf examples/febrl/config.json --zinggDir s3a://zingg28032023/zingg
./scripts/zingg.sh --phase match --properties-file config/zingg.conf --conf examples/febrl/config.json --zinggDir s3a://zingg28032023/zingg

6. Models etc. would get saved in
Amazon S3 > Buckets > zingg28032023 >zingg > 100
9 changes: 9 additions & 0 deletions docs/connectors/databricks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
title: Databricks
parent: Data Sources and Sinks
nav_order: 2
---

# Databricks

something coming soon

0 comments on commit 0126de9

Please sign in to comment.