From 116ed8753652990392e6ef9350fafb1e954c31f7 Mon Sep 17 00:00:00 2001 From: gnanaprakash-ravi Date: Mon, 9 Oct 2023 10:48:58 +0530 Subject: [PATCH 1/3] fix docs configuration --- docs/stepbystep/installation/docker/shared-locations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/stepbystep/installation/docker/shared-locations.md b/docs/stepbystep/installation/docker/shared-locations.md index 884686b35..e580b84de 100644 --- a/docs/stepbystep/installation/docker/shared-locations.md +++ b/docs/stepbystep/installation/docker/shared-locations.md @@ -12,4 +12,4 @@ The **zinggDir** location where model information is stored may use a shared loc zingg.sh --phase label --conf config.json --zinggDir /location ``` -Similarly, the output and data dir [configurations](../../../setup/configuration.md) inside config.json can be made using a shared location. Please ensure that the running user has access permissions for this location. +Similarly, the output and data dir [configurations](../../../stepbystep/configuration) inside config.json can be made using a shared location. Please ensure that the running user has access permissions for this location. From 4fbe7c247c056c21ba382b41c554f4529ab46693 Mon Sep 17 00:00:00 2001 From: gnanaprakash-ravi Date: Mon, 9 Oct 2023 12:44:37 +0530 Subject: [PATCH 2/3] fix broken links --- docs/accuracy/definingOwn.md | 8 ++++---- docs/running/databricks.md | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/accuracy/definingOwn.md b/docs/accuracy/definingOwn.md index a4e686111..5395eb1d9 100644 --- a/docs/accuracy/definingOwn.md +++ b/docs/accuracy/definingOwn.md @@ -5,7 +5,7 @@ description: To add blocking functions and how they work # Defining Own Functions -You can add your own [blocking functions](https://github.com/zinggAI/zingg/tree/main/core/src/main/java/zingg/hash) which will be evaluated by Zingg to build the [blocking tree.](../zModels.md) +You can add your own [blocking functions](https://github.com/zinggAI/zingg/tree/main/common/core/src/main/java/zingg/common/core/hash) which will be evaluated by Zingg to build the [blocking tree.](../zModels.md) The blocking tree works on the matched records provided by the user as part of the training. At every node, it selects the hash function and the field on which it should be applied so that there is the least elimination of the matching pairs. Say we have data like this: @@ -49,8 +49,8 @@ Pair 1 is getting eliminated above, hence last1char is not a good function. So, first1char(firstname) will be chosen. This brings near similar records together - in a way, clusters them to break the cartesian join. -These business-specific blocking functions go into [Hash Functions](https://github.com/zinggAI/zingg/tree/main/core/src/main/java/zingg/hash) and must be added to [HashFunctionRegistry](../../core/src/main/java/zingg/hash/HashFunctionRegistry.java) and [hash functions config](../../core/src/main/resources/hashFunctions.json). +These business-specific blocking functions go into [Hash Functions](https://github.com/zinggAI/zingg/tree/main/common/core/src/main/java/zingg/common/core/hash) and must be added to [HashFunctionRegistry](https://github.com/zinggAI/zingg/blob/main/common/core/src/main/java/zingg/common/core/hash/HashFunctionRegistry.java) and [hash functions config](https://github.com/zinggAI/zingg/blob/main/common/core/src/main/resources/hashFunctions.json). -Also, for similarity, you can define your own measures. Each dataType has predefined features, for example, [String](../../core/src/main/java/zingg/feature/StringFeature.java) fuzzy type is configured for Affine and Jaro. +Also, for similarity, you can define your own measures. Each dataType has predefined features, for example, [String](https://github.com/zinggAI/zingg/blob/main/common/core/src/main/java/zingg/common/core/feature/StringFeature.java) fuzzy type is configured for Affine and Jaro. -You can define your own [comparisons](https://github.com/zinggAI/zingg/tree/main/core/src/main/java/zingg/similarity/function) and use them. +You can define your own [comparisons](https://github.com/zinggAI/zingg/tree/main/common/core/src/main/java/zingg/common/core/similarity/function) and use them. diff --git a/docs/running/databricks.md b/docs/running/databricks.md index 57b718745..a773795c0 100644 --- a/docs/running/databricks.md +++ b/docs/running/databricks.md @@ -3,7 +3,7 @@ title: Running on Databricks parent: Running Zingg on Cloud nav_order: 6 --- -There are several ways to run Zingg on Databricks. All [file formats and data sources and sinks](../dataSourcesAndSinks) are supported within Databricks. +There are several ways to run Zingg on Databricks. All [file formats and data sources and sinks](https://github.com/zinggAI/zingg/tree/main/docs/dataSourcesAndSinks) are supported within Databricks. # Running directly within Databricks using the Databricks notebook interface This uses the Zingg Python API and an [example notebook is available here](https://github.com/zinggAI/zingg/blob/main/examples/databricks/FebrlExample.ipynb) From a8ad0e494c862b03db75aff6d22de601c808015c Mon Sep 17 00:00:00 2001 From: gnanaprakash-ravi Date: Tue, 10 Oct 2023 13:24:17 +0530 Subject: [PATCH 3/3] relative path for links --- docs/accuracy/definingOwn.md | 4 ++-- docs/running/databricks.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/accuracy/definingOwn.md b/docs/accuracy/definingOwn.md index 5395eb1d9..dccc195ef 100644 --- a/docs/accuracy/definingOwn.md +++ b/docs/accuracy/definingOwn.md @@ -49,8 +49,8 @@ Pair 1 is getting eliminated above, hence last1char is not a good function. So, first1char(firstname) will be chosen. This brings near similar records together - in a way, clusters them to break the cartesian join. -These business-specific blocking functions go into [Hash Functions](https://github.com/zinggAI/zingg/tree/main/common/core/src/main/java/zingg/common/core/hash) and must be added to [HashFunctionRegistry](https://github.com/zinggAI/zingg/blob/main/common/core/src/main/java/zingg/common/core/hash/HashFunctionRegistry.java) and [hash functions config](https://github.com/zinggAI/zingg/blob/main/common/core/src/main/resources/hashFunctions.json). +These business-specific blocking functions go into [Hash Functions](https://github.com/zinggAI/zingg/tree/main/common/core/src/main/java/zingg/common/core/hash) and must be added to [HashFunctionRegistry](../../common/core/src/main/java/zingg/common/core/hash/HashFunctionRegistry.java) and [hash functions config](../../common/core/src/main/resources/hashFunctions.json). -Also, for similarity, you can define your own measures. Each dataType has predefined features, for example, [String](https://github.com/zinggAI/zingg/blob/main/common/core/src/main/java/zingg/common/core/feature/StringFeature.java) fuzzy type is configured for Affine and Jaro. +Also, for similarity, you can define your own measures. Each dataType has predefined features, for example, [String](../../common/core/src/main/java/zingg/common/core/feature/StringFeature.java) fuzzy type is configured for Affine and Jaro. You can define your own [comparisons](https://github.com/zinggAI/zingg/tree/main/common/core/src/main/java/zingg/common/core/similarity/function) and use them. diff --git a/docs/running/databricks.md b/docs/running/databricks.md index a773795c0..57b718745 100644 --- a/docs/running/databricks.md +++ b/docs/running/databricks.md @@ -3,7 +3,7 @@ title: Running on Databricks parent: Running Zingg on Cloud nav_order: 6 --- -There are several ways to run Zingg on Databricks. All [file formats and data sources and sinks](https://github.com/zinggAI/zingg/tree/main/docs/dataSourcesAndSinks) are supported within Databricks. +There are several ways to run Zingg on Databricks. All [file formats and data sources and sinks](../dataSourcesAndSinks) are supported within Databricks. # Running directly within Databricks using the Databricks notebook interface This uses the Zingg Python API and an [example notebook is available here](https://github.com/zinggAI/zingg/blob/main/examples/databricks/FebrlExample.ipynb)