We are also experiencing this bug: https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201505.mbox/%[email protected]%3E
This is our attempt to create a reproducable test case so it can be fixed
- Use maven to build the model
- Run GenerateDataOne (I used sbt run)
- Run GenerateDataTwo (I used sbt run)
- Run ErrorExample (I used sbt run)
The resulting output should show the error condition - records with dissimilr ids that are incorrectly joined together