Skip to content

Basic Demo of showing how to use Delta file format from AML Compute

License

Notifications You must be signed in to change notification settings

azeltov/aml-delta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Note: Azure ML has an updated method of consuming Delta files: https://learn.microsoft.com/en-us/azure/databricks/mlflow/tracking-ex-delta

This repo will show a basic Demo of how to use Delta file format from AML

With the new announcement from Databricks of relasing DeltaLake for standalone compute, we can now easily integrate the AML CI/Cluster with Delta file format generated/saved from Spark

https://delta.io/news/delta-lake-1-0-0-released/

image

image

1) Use Databricks to import the notebook: Databricks_Delta_Load.ipynb

The databricks notebook will show a demo of how to load sample safe_driver data and save it as a spark dataframe in delta file format. Than we will register/create a datastore and upload the delta files and than create a file dataset referencing that datastore .

2) Verify that the Datastore and Dataset have been registered in AML, you should see a list of parquet files and the json transaction log file for delta:

image

2) Use AML Notebook to import the notebook: Delta_AML_Read_Demo.ipynb and run it on a AML compute instance

This notebook will show how to install the detla table and than use AML Datastore and Dataset to download the delta table and convert the delta table to pandas frames:

from deltalake import DeltaTable
dt = DeltaTable("/mnt/batch/tasks/shared/LS_root/mounts/clusters/deltademocpu/code/delta_driver/")

table= dt.to_pyarrow_table()
# Convert back to pandas
df_pandas = table.to_pandas()

About

Basic Demo of showing how to use Delta file format from AML Compute

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published