You can use Python extension modules and libraries with your AWS Glue ETL scripts as long as they are written in pure Python. C libraries such as pandas are not supported at the present time, nor are extensions written in other languages.
-- AWS
Deprecated, please migrate to v3/v4
AWS Glue Development enviroment based on svajiraya/aws-glue-libs fix.
- Announced released bin '19
- Python Shell Supported Library
- Python Shell version running
- Glue lib reference
- Glue Dynamic frames
- Glue script samples
- Known Issues for AWS Glue
- packaged with: debian 10, OpenJDK 8, spark 2.4, maven 3.6, python 3.6, pip 20, pytest, glue lib, boto3
- additionally: aws cli, cdk, poetry
- Samples:
- glue:
/opt/samples/glue
- cdk:
/opt/samples/cdk
- cloudformation:
/opt/samples/cloudformation
- glue:
# install docker and configure aliases
curl -sSL https://raw.githubusercontent.com/webysther/aws-glue-docker/master/start.sh | sh
# to use pandas
glue
# or pyspark
glue-spark
# here you are inside docker
# Glue PySpark (REPL)
pyspark
# Glue PySpark
# /app is you current folder
glue-spark sparksubmit /app/spark_script.py
# Test
glue pytest
# aliases inside docker (backwards compatibility)
gluesparksubmit == sparksubmit
gluepyspark == pyspark
gluepytest == pytest
MIT License. Please see License File for more information.