GitBook: No commit message

zinggAI · Sep 20, 2023 · 0126de9 · 0126de9
1 parent a1f3d15
commit 0126de9
Show file tree

Hide file tree

Showing 3 changed files with 36 additions and 2 deletions.
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -38,12 +38,12 @@
   * [Linking across datasets](setup/link.md)
 * [Data Sources and Sinks](dataSourcesAndSinks/connectors.md)
   * [Zingg Pipes](dataSourcesAndSinks/pipes.md)
-  * [Databricks](dataSourcesAndSinks/databricks.md)
+  * [Databricks](connectors/databricks.md)
   * [Snowflake](dataSourcesAndSinks/snowflake.md)
   * [JDBC](dataSourcesAndSinks/jdbc.md)
     * [Postgres](connectors/jdbc/postgres.md)
     * [MySQL](connectors/jdbc/mysql.md)
-  * [AWS S3](dataSourcesAndSinks/amazonS3.md)
+  * [AWS S3](connectors/amazons3.md)
   * [Cassandra](dataSourcesAndSinks/cassandra.md)
   * [MongoDB](dataSourcesAndSinks/mongodb.md)
   * [Neo4j](dataSourcesAndSinks/neo4j.md)

diff --git a/docs/connectors/amazons3.md b/docs/connectors/amazons3.md
@@ -0,0 +1,25 @@
+# S3
+
+1. Set a bucket e.g. zingg28032023 and a folder inside it e.g. zingg
+
+2. Create aws access key and export via env vars (ensure that the user with below keys has read/write access to above):
+
+export AWS_ACCESS_KEY_ID=<access key id>
+export AWS_SECRET_ACCESS_KEY=<access key>
+
+(if mfa is enabled AWS_SESSION_TOKEN env var would also be needed )
+
+3. Download hadoop-aws-3.1.0.jar and aws-java-sdk-bundle-1.11.271.jar via maven
+
+4. Set above in zingg.conf :
+spark.jars=/<location>/hadoop-aws-3.1.0.jar,/<location>/aws-java-sdk-bundle-1.11.271.jar
+
+5. Run using:
+
+ ./scripts/zingg.sh --phase findTrainingData --properties-file config/zingg.conf  --conf examples/febrl/config.json --zinggDir  s3a://zingg28032023/zingg
+ ./scripts/zingg.sh --phase label --properties-file config/zingg.conf  --conf examples/febrl/config.json --zinggDir  s3a://zingg28032023/zingg
+ ./scripts/zingg.sh --phase train --properties-file config/zingg.conf  --conf examples/febrl/config.json --zinggDir  s3a://zingg28032023/zingg
+ ./scripts/zingg.sh --phase match --properties-file config/zingg.conf  --conf examples/febrl/config.json --zinggDir  s3a://zingg28032023/zingg
+
+6. Models etc. would get saved in 
+Amazon S3 > Buckets > zingg28032023 >zingg > 100
diff --git a/docs/connectors/databricks.md b/docs/connectors/databricks.md
@@ -0,0 +1,9 @@
+---
+title: Databricks
+parent: Data Sources and Sinks
+nav_order: 2
+---
+
+# Databricks
+
+something coming soon