Note: Azure ML has an updated method of consuming Delta files: https://learn.microsoft.com/en-us/azure/databricks/mlflow/tracking-ex-delta
With the new announcement from Databricks of relasing DeltaLake for standalone compute, we can now easily integrate the AML CI/Cluster with Delta file format generated/saved from Spark
https://delta.io/news/delta-lake-1-0-0-released/
1) Use Databricks to import the notebook: Databricks_Delta_Load.ipynb
The databricks notebook will show a demo of how to load sample safe_driver data and save it as a spark dataframe in delta file format. Than we will register/create a datastore and upload the delta files and than create a file dataset referencing that datastore .
2) Verify that the Datastore and Dataset have been registered in AML, you should see a list of parquet files and the json transaction log file for delta:
2) Use AML Notebook to import the notebook: Delta_AML_Read_Demo.ipynb and run it on a AML compute instance
This notebook will show how to install the detla table and than use AML Datastore and Dataset to download the delta table and convert the delta table to pandas frames:
from deltalake import DeltaTable
dt = DeltaTable("/mnt/batch/tasks/shared/LS_root/mounts/clusters/deltademocpu/code/delta_driver/")
table= dt.to_pyarrow_table()
# Convert back to pandas
df_pandas = table.to_pandas()