Open
Description
What did you find confusing? Please describe.
In AWS EMR, it's explicitly stated in documentation what .jar files are built in. For example, optimized connector to Amazon Redshift is built in. Relation between EMR and SageMaker implementations of PySpark are not mentioned anywhere, especially about connectors and Redshift.
Describe how documentation can be improved
Note explicitly which .jar additional files and connectors are built in for SageMaker PySpark implementations.
Additional context
Redshift is a popular data source for ML, and it would be very convenient if:
- connector from EMR was built in
- this was stated explicitly in documentation