Add Table of content

mrpowers-io · Dec 7, 2024 · c316b48 · c316b48
1 parent 02d5799
commit c316b48
Showing 1 changed file with 63 additions and 46 deletions.
diff --git a/README.md b/README.md
@@ -11,6 +11,21 @@ Use [chispa](https://github.com/MrPowers/chispa) for PySpark applications.
 Read [Testing Spark Applications](https://leanpub.com/testing-spark) for a full explanation on the best way to test
 Spark code!  Good test suites yield higher quality codebases that are easy to refactor.
 
+## Table of Contents
+- [Install](#install)
+- [Examples](#simple-examples)
+- [Why is this library fast?](#why-is-this-library-fast)
+- [Usage](#usage)
+  - [Local Testing SparkSession](#local-sparksession-for-test)
+  - [DataFrameComparer](#datasetcomparerdataframecomparer)
+    - [Unordered DataFrames comparison](#unordered-dataframe-equality-comparisons)
+    - [Approximate DataFrames comparison](#approximate-dataframe-equality)
+    - [Ignore Nullable DataFrames comparison](#equality-comparisons-ignoring-the-nullable-flag)
+  - [ColumnComparer](#column-equality)
+  - [SchemaComparer](#schema-equality)
+- [Testing tips](#testing-tips)
+
+
 ## Install
 
 Fetch the JAR file from Maven.
@@ -149,6 +164,7 @@ slower.
 
 ## Usage
 
+### Local SparkSession for test
 The spark-fast-tests project doesn't provide a SparkSession object in your test suite, so you'll need to make one
 yourself.
 
@@ -176,6 +192,7 @@ big DataFrames in your test suite.
 Make sure to only use the `SparkSessionTestWrapper` trait in your test suite. You don't want to use test specific
 configuration (like one shuffle partition) when running production code.
 
+### DatasetComparer / DataFrameComparer
 The `DatasetComparer` trait defines the `assertSmallDatasetEquality` method. Extend your spec file with the
 `SparkSessionTestWrapper` trait to create DataFrames and the `DatasetComparer` trait to make DataFrame comparisons.
 
@@ -221,50 +238,7 @@ assertLargeDatasetEquality(actualDF, expectedDF)
 `assertSmallDatasetEquality` is faster for test suites that run on your local machine.  `assertLargeDatasetEquality`
 should only be used for DataFrames that are split across nodes in a cluster.
 
-### Column Equality
-
-The `assertColumnEquality` method can be used to assess the equality of two columns in a DataFrame.
-
-Suppose you have the following DataFrame with two columns that are not equal.
-
-```
-+-------+-------------+
-|   name|expected_name|
-+-------+-------------+
-|   phil|         phil|
-| rashid|       rashid|
-|matthew|        mateo|
-|   sami|         sami|
-|     li|         feng|
-|   null|         null|
-+-------+-------------+
-```
-
-The following code will throw a `ColumnMismatch` error message:
-
-```scala
-assertColumnEquality(df, "name", "expected_name")
-```
-
-<p>
-    <img src="./images/assertColumnEquality_error_message.png" alt="Description" width="500", height="200">
-</p>
-
-Mix in the `ColumnComparer` trait to your test class to access the `assertColumnEquality` method:
-
-```scala
-import com.github.mrpowers.spark.fast.tests.ColumnComparer
-
-object MySpecialClassTest
-  extends TestSuite
-    with ColumnComparer
-    with SparkSessionTestWrapper {
-
-  // your tests
-}
-```
-
-### Unordered DataFrame equality comparisons
+#### Unordered DataFrame equality comparisons
 
 Suppose you have the following `actualDF`:
 
@@ -297,7 +271,7 @@ performing the comparison.
 
 `assertSmallDataFrameEquality(sourceDF, expectedDF, orderedComparison = false)` will not throw an error.
 
-### Equality comparisons ignoring the nullable flag
+#### Equality comparisons ignoring the nullable flag
 
 You might also want to make equality comparisons that ignore the nullable flags for the DataFrame columns.
 
@@ -326,7 +300,7 @@ val expectedDF = spark.createDF(
 assertSmallDatasetEquality(sourceDF, expectedDF, ignoreNullable = true)
 ```
 
-### Approximate DataFrame Equality
+#### Approximate DataFrame Equality
 
 The `assertApproximateDataFrameEquality` function is useful for DataFrames that contain `DoubleType` columns. The
 precision threshold must be set when using the `assertApproximateDataFrameEquality` function.
@@ -355,6 +329,49 @@ val expectedDF = spark.createDF(
 assertApproximateDataFrameEquality(sourceDF, expectedDF, 0.01)
 ```
 
+### Column Equality
+
+The `assertColumnEquality` method can be used to assess the equality of two columns in a DataFrame.
+
+Suppose you have the following DataFrame with two columns that are not equal.
+
+```
++-------+-------------+
+|   name|expected_name|
++-------+-------------+
+|   phil|         phil|
+| rashid|       rashid|
+|matthew|        mateo|
+|   sami|         sami|
+|     li|         feng|
+|   null|         null|
++-------+-------------+
+```
+
+The following code will throw a `ColumnMismatch` error message:
+
+```scala
+assertColumnEquality(df, "name", "expected_name")
+```
+
+<p>
+    <img src="./images/assertColumnEquality_error_message.png" alt="Description" width="500", height="200">
+</p>
+
+Mix in the `ColumnComparer` trait to your test class to access the `assertColumnEquality` method:
+
+```scala
+import com.github.mrpowers.spark.fast.tests.ColumnComparer
+
+object MySpecialClassTest
+  extends TestSuite
+    with ColumnComparer
+    with SparkSessionTestWrapper {
+
+  // your tests
+}
+```
+
 ### Schema Equality
 
 The SchemaComparer provide `assertSchemaEqual` API which is useful for comparing schema of dataframe schema