description |
---|
Installing Deep Lake and accessing your first Deep Lake Dataset. |
Deep Lake can be installed through pip. By default, Deep Lake does not install dependencies for audio, video, google-cloud, and other features. Details on all installation options are available here.
! pip install deeplake
Let's load MNIST, the hello world dataset of machine learning.
First, instantiate a Dataset
by pointing to its storage location. Datasets hosted on Activeloop Platform are typically identified by the namespace of the organization followed by the dataset name: activeloop/mnist-train
.
import deeplake
dataset_path = 'hub://activeloop/mnist-train'
ds = deeplake.load(dataset_path) # Returns a Deep Lake Dataset but does not download data locally
Data is not immediately read into memory because Deep Lake operates lazily. You can fetch data by calling the .numpy()
method, which reads data into a NumPy array.
# Indexing
img = ds.images[0].numpy() # Fetch the 1st image and return a NumPy array
label = ds.labels[0].numpy(aslist=True) # Fetch the 1st label and store it as a
# as a list
text_labels = ds.labels[0].data()['text'] # Fetch the first labels and return them as text
# Slicing
imgs = ds.images[0:100].numpy() # Fetch 100 images and return a NumPy array
# The method above produces an exception if
# the images are not all the same size
labels = ds.labels[0:100].numpy(aslist=True) # Fetch 100 labels and store
# them as a list of NumPy arrays
Congratulations, you've got Deep Lake working on your local machine:nerd: