Skip to content

Commit

Permalink
Merge pull request #183 from treeverse/samples-use-python-wrapper-v1.0
Browse files Browse the repository at this point in the history
Changed many samples to use new Python wrapper
  • Loading branch information
kesarwam authored Feb 20, 2024
2 parents 6744f9b + 0f99d0b commit 2955aaa
Show file tree
Hide file tree
Showing 19 changed files with 2,125 additions and 7,132 deletions.
7 changes: 4 additions & 3 deletions 00_notebooks/00_index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"All of these notebooks can be run using the provided `docker-compose.yml` (unless otherwise specified). \n",
"\n",
"See \"_Standalone demos_\" below for those, including those for Airflow and Dagster, that run standalone."
"See \"_Standalone demos_\" below for those, including those for Airflow, Dagster and Prefect, that run standalone."
]
},
{
Expand All @@ -30,14 +30,14 @@
"* [**Data Lineage with lakeFS**](./data-lineage.ipynb) \n",
"* [**Integration of lakeFS with Delta Lake and Apache Spark**](./delta-lake.ipynb) \n",
"* [**Integration of lakeFS with Delta Lake and Python**](./delta-lake-python.ipynb)\n",
"* [**Displaying diff between Delta Tables**](./delta-diff.ipynb)<br/>_See also the [accompanying blog](https://lakefs.io/blog/lakefs-supports-delta-lake-diff/)_\n",
"* [**Displaying diff between Delta Tables**](./delta-diff.ipynb)\n",
"* [**Only allow specific file formats in data lake**](hooks-webhooks-demo.ipynb) (with lakeFS webhooks)\n",
"* [**Prevent unintended schema change**](hooks-schema-validation.ipynb) (with lakeFS Lua hooks)\n",
"* [**Avoid leaking PII data**](hooks-schema-and-pii-validation.ipynb) (shows how to use multiple Lua hooks)\n",
"* [**Import into a lakeFS repository from multiple paths**](./import-multiple-buckets.ipynb) \n",
"* [**ML Experimentation/Reproducibility 01 (Dogs)**](./ml-reproducibility.ipynb)\n",
"* [**ML Experimentation 02 (Wine Quality)**](./ml-experimentation-wine-quality-prediction.ipynb)</br>_See also the [accompanying blog](https://lakefs.io/blog/building-an-ml-experimentation-platform-for-easy-reproducibility-using-lakefs/)_\n",
"* [**RBAC demo**](./rbac-demo.ipynb) </br> _lakefS Cloud only_ \n",
"* [**RBAC demo**](./rbac-demo.ipynb) ([lakefS Cloud](https://lakefs.cloud/register) only)\n",
"* [**Version Control of multi-buckets pipelines**](./version-control-of-multi-buckets-pipelines.ipynb) \n",
"* [**Reprocess and Backfill Data with new ETL logic**](./reprocess-backfill-data.ipynb) \n",
"* **lakeFS and Apache Iceberg**\n",
Expand All @@ -46,6 +46,7 @@
" * [What happens if you use Iceberg without the lakeFS support](./iceberg-lakefs-default.ipynb)\n",
"* **Using R with lakeFS**\n",
" * [Basic usage](./R.ipynb)\n",
" * [Basic usage with weather data](./R-weather.ipynb)\n",
" * [NYC Film permits example](./R-nyc.ipynb)\n",
" * [Rough notes around R client](./R-client.ipynb)"
]
Expand Down
24 changes: 24 additions & 0 deletions 00_notebooks/assets/lakefs_demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
import os

def print_diff(diff):
results = map(
lambda n:[n.path,n.path_type,n.size_bytes,n.type],
diff)

from tabulate import tabulate
print(tabulate(
results,
headers=['Path','Path Type','Size(Bytes)','Type']))

def print_commit(log):
from datetime import datetime
from pprint import pprint

print('Message:', log.message)
print('ID:', log.id)
print('Committer:', log.committer)
print('Creation Date:', datetime.utcfromtimestamp(log.creation_date).strftime('%Y-%m-%d %H:%M:%S'))
print('Parents:', log.parents)
print('Metadata:')
pprint(log.metadata)
Loading

0 comments on commit 2955aaa

Please sign in to comment.