Skip to content

Latest commit

 

History

History
37 lines (27 loc) · 2.08 KB

README.md

File metadata and controls

37 lines (27 loc) · 2.08 KB

Xorbits Example Notebooks

This repository provides a collection of examples for Xorbits.

Example 1: explore NYC taxi dataset using Xorbits

This example shows you how to use Xorbits to do some initial exploration of the NYC taxi dataset and get a sense of the ease-of-use of Xorbits.

To run this example on your favorite platform:

Platform Link
Colab https://colab.research.google.com/github/xprobe-inc/examples/blob/main/nyc-taxi/nyc-taxi.ipynb
Kaggle https://www.kaggle.com/code/cornmonster/notebooka9814fb1ba

Example 2: Text Deduplication using Xorbits over OSCAR Corpus

This example demonstrates how to use Xorbits to perform text deduplication on the OSCAR Corpus. The OSCAR Corpus is a massive multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the GPT-2 model.

Platform Link
Colab https://colab.research.google.com/github/xprobe-inc/examples/blob/main/text-dedup/text-dedup.ipynb

Example 3: Get the license with most stars using Xorbits dataset over bigcode/the-stack Hugging Face dataset

This example demonstrates how to use Xorbits to get the license with most stars over the bigcode/the-stack dataset. The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The dataset was created as part of the BigCode Project, an open scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs).

Platform Link
Colab https://colab.research.google.com/github/xprobe-inc/examples/blob/main/most-stars-license/most-stars-license.ipynb

Example 4: Data visualization using Xorbits with Plotly and Dash

This example demonstrates how to perform data visualization using Xorbits in Plotly and Dash.

You can run this example locally:

Platform Link
Local https://github.com/xorbitsai/examples/tree/main/nba-data