Merge branch 'master' of github.com:FluxML/FastAI.jl

FluxML · May 13, 2022 · 48e27c4 · 48e27c4 · lorenzoh · May 13, 2022
2 parents bfa9bf8 + 975d5f9
commit 48e27c4
Show file tree

Hide file tree

Showing 46 changed files with 1,730 additions and 397 deletions.
diff --git a/Project.toml b/Project.toml
@@ -5,7 +5,6 @@ version = "0.4.3"
 
 [deps]
 Animations = "27a7e980-b3e6-11e9-2bcd-0b925532e340"
-Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
 CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
 ColorVectorSpace = "c3611d14-8923-5661-9e6a-0046d554d3a4"
@@ -14,14 +13,15 @@ DataAugmentation = "88a5189c-e7ff-4f85-ac6b-e6158070f02e"
 DataDeps = "124859b0-ceae-595e-8997-d05f6a7a8dfe"
 DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 DataLoaders = "2e981812-ef13-4a9c-bfa0-ab13047b12a9"
+FeatureRegistries = "c6aefb4f-3ac3-4095-8805-528476b02c02"
 FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
 FilePathsBase = "48062228-2e41-5def-b9a4-89aafe57970f"
 FixedPointNumbers = "53c48c17-4a7d-5ca2-90c5-79b7896eea93"
 Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
 FluxTraining = "7bf95e4d-ca32-48da-9824-f0dc5310474f"
 Glob = "c27321d9-0574-5035-807b-f59d2c89b15c"
-ImageInTerminal = "d8c32880-2388-543b-8c61-d9f865259254"
 ImageIO = "82e4d734-157c-48bb-816b-45c225c6df19"
+ImageInTerminal = "d8c32880-2388-543b-8c61-d9f865259254"
 IndirectArrays = "9b13fd28-a010-5f03-acff-a1bbcff69959"
 InlineTest = "bd334432-b1e7-49c7-a2dc-dd9149e4ebd6"
 JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
@@ -32,6 +32,7 @@ MosaicViews = "e94cdb99-869f-56ef-bcf0-1ae2bcbe0389"
 Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
 PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
 ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
 Requires = "ae029012-a4dd-5104-9daa-d747884805df"
 Setfield = "efcf1570-3423-57d1-acb7-fd33fddbac46"
@@ -53,14 +54,15 @@ DataAugmentation = "0.2.4"
 DataDeps = "0.7"
 DataFrames = "1"
 DataLoaders = "0.1"
+FeatureRegistries = "0.1"
 FileIO = "1.7"
 FilePathsBase = "0.9"
 FixedPointNumbers = "0.8"
 Flux = "0.12, 0.13"
 FluxTraining = "0.2, 0.3"
 Glob = "1"
-ImageInTerminal = "0.4"
 ImageIO = "0.6"
+ImageInTerminal = "0.4"
 IndirectArrays = "0.5, 1"
 InlineTest = "0.2"
 JLD2 = "0.4"

diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@ As an example, here is how to train an image classification model:
 
 ```julia
 using FastAI
-data, blocks = loaddataset("imagenette2-160", (Image, Label))
+data, blocks = load(datarecipes()["imagenette2-160"])
 task = ImageClassificationSingle(blocks)
 learner = tasklearner(task, data, callbacks=[ToGPU()])
 fitonecycle!(learner, 10)

diff --git a/docs/api.md b/docs/api.md
diff --git a/docs/background/blocksencodings.md b/docs/background/blocksencodings.md
@@ -101,25 +101,22 @@ task = BlockTask(
 
 Now `encode` expects a sample and just runs the encodings over that, giving us an encoded input `x` and an encoded target `y`.
 
-{cell=main}
 ```julia
-data = loadfolderdata(joinpath(datasetpath("dogscats"), "train"), filterfn=isimagefile, loadfn=(loadfile, parentname))
+data = loadfolderdata(joinpath(load(datasets()["dogscats"]), "train"), filterfn=isimagefile, loadfn=(loadfile, parentname))
 sample = getobs(data, 1)
 x, y = encodesample(task, Training(), sample)
 summary(x), summary(y)
 ```
 
 This is equivalent to:
 
-{cell=main}
 ```julia
 x, y = encode(task.encodings, Training(), FastAI.getblocks(task).sample, sample)
 summary(x), summary(y)
 ```
 
 Image segmentation looks almost the same except we use a `Mask` block as target. We're also using `OneHot` here, because it also has an `encode` task for `Mask`s. For this task, `ProjectiveTransforms` will be applied to both the `Image` and the `Mask`, using the same random state for cropping and augmentation.
 
-{cell=main}
 ```julia
 task = BlockTask(
     (Image{2}(), Mask{2}(1:10)),
@@ -133,19 +130,16 @@ task = BlockTask(
 
 The easiest way to understand how encodings are applied to each block is to use [`describetask`](#) and [`describeencodings`](#) which print a table of how each encoding is applied successively to each block. Rows where a block is **bolded** indicate that the data was transformed by that encoding.
 
-{cell=main}
 ```julia
 describetask(task)
 ```
 
 The above tables make it clear what happens during training ("encoding a sample") and inference (encoding an input and "decoding an output"). The more general form [`describeencodings`](#) takes in encodings and blocks directly and can be useful for building an understanding of how encodings apply to some blocks.
 
-{cell=main}
 ```julia
 FastAI.describeencodings(task.encodings, (Image{2}(),))
 ```
 
-{cell=main}
 ```julia
 FastAI.describeencodings((OneHot(),), (Label(1:10), Mask{2}(1:10), Image{2}()))
 ```

diff --git a/docs/background/datapipelines.md b/docs/background/datapipelines.md
@@ -26,8 +26,8 @@ using DataLoaders: batchviewcollated
 using FastAI
 using FastAI.Datasets
 
-data = loadtaskdata(datasetpath("imagenette2-320"), ImageClassification)
-task = ImageClassification(Datasets.getclassesclassification("imagenette2-320"), (224, 224))
+data, blocks = load(datarecipes()["imagenette2-320"])
+task = ImageClassificationSingle(blocks, size=(224, 224))
 
 # maps data processing over `data`
 taskdata = taskdataset(data, task, Training())
@@ -68,7 +68,8 @@ using FastAI
 using FastAI.Datasets
 using FluxTraining: step!
 
-data = loaddataset("imagenette2-320", (Image, Label))
+
+data, blocks = load(datarecipes()["imagenette2-320"])
 task = ImageClassificationSingle(blocks)
 learner = tasklearner(task, data)
 
@@ -130,13 +131,14 @@ If the data loading is still slowing down training, you'll probably have to spee
 For many computer vision tasks, you will resize and crop images to a specific size during training for GPU performance reasons. If the images themselves are large, loading them from disk itself can take some time. If your dataset consists of 1920x1080 resolution images but you're resizing them to 256x256 during training, you're wasting a lot of time loading the large images. *Presizing* means saving resized versions of each image to disk once, and then loading these smaller versions during training. We can see the performance difference using ImageNette since it comes in 3 sizes: original, 360px and 180px.
 
 ```julia
-data_orig, _ = loaddataset("imagenette2", (Image, Label))
+
+data_orig = load(datarecipes()["imagenette2"])
 @time eachobsparallel(data_orig, buffered = false)
 
-data_320px, _ = loaddataset("imagenette2-320", (Image, Label))
+data_320px = load(datarecipes()["imagenette2-320"])
 @time eachobsparallel(data_320px, buffered = false)
 
-data_160px, _ = loaddataset("imagenette2-160", (Image, Label))
+data_160px = load(datarecipes()["imagenette2-160"])
 @time eachobsparallel(data_160px, buffered = false)
 ```
 

diff --git a/docs/data_containers.md b/docs/data_containers.md
@@ -13,8 +13,7 @@ ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"
 {cell=main, output=false}
 ```julia
 using FastAI
-import FastAI: Image
-data, _ = loaddataset("imagenette2-160", (Image, Label))
+data, _ = load(findfirst(datarecipes(datasetid="imagenette2-160")))
 ```
 
 A data container is any type that holds observations of data and allows us to load them with `getobs` and query the number of observations with `nobs`. In this case, each observation is a tuple of an image and the corresponding class; after all, we want to use it for image classification. 
@@ -31,15 +30,15 @@ image
 nobs(data)
 ```
 
-[`loaddataset`](#) makes it easy to a load a data container that is compatible with some block types, but to get a better feel for what it does, let's look under the hood by creating the same data container using some mid-level APIs.
+`load(`[`datasets`](#)`[id])` makes it easy to a load a data container that is compatible with some block types, but to get a better feel for what it does, let's look under the hood by creating the same data container using some mid-level APIs.
 
 ## Creating data containers from files
 
-Before we recreate the data container, [`datasetpath`](#) downloads a dataset and returns the path to the extracted files.
+Before we recreate the data container, we'll download the dataset and get the path where the files are saved to:
 
 {cell=main}
 ```julia
-dir = datasetpath("imagenette2-160")
+dir = load(datasets()["imagenette2-160"])
 ```
 
 Now we'll start with [`FileDataset`](#) which creates a data container (here a `Vector`) of files given a path. We'll use the path of the downloaded dataset:
@@ -127,12 +126,16 @@ Using this official split, it will be easier to compare the performance of your
 
 ## Dataset recipes
 
-We saw above how different image classification datasets can be loaded with the same logic as long as they are in a common format. To encapsulate the logic for loading common dataset formats, FastAI.jl has `DatasetRecipe`s. When we used [`finddatasets`](#) in the [discovery tutorial](discovery.md), it returned pairs of a dataset name and a `DatasetRecipe`. For example, `"imagenette2-160"` has an associated [`ImageFolders`](#) recipe and we can load it using [`loadrecipe`] and the path to the downloaded dataset:
+We saw above how different image classification datasets can be loaded with the same logic as long as they are in a common format. To encapsulate the logic for loading common dataset formats, FastAI.jl has [`DatasetRecipe`](#)s. When we used [`datarecipes`](#) in the [discovery tutorial](discovery.md), it showed us such recipes that allow loading a dataset for a specific task. For example, `"imagenette2-160"` has an associated [`ImageFolders`](#) recipe which we can load by getting the entry and calling `load` on it:
 
 {cell=main}
 ```julia
-name, recipe = finddatasets(blocks=(Image, Label), name="imagenette2-160")[1]
-data, blocks = loadrecipe(recipe, datasetpath(name))
+entry = datarecipes()["imagenette2-160"]
+```
+
+{cell=main}
+```julia
+data, blocks = load(entry)
 ```
 
 These recipes also take care of loading the data block information for the dataset. Read the [discovery tutorial](discovery.md) to find out more about that.
diff --git a/docs/discovery.md b/docs/discovery.md
@@ -6,16 +6,16 @@ For finding both, we can make use of `Block`s. A `Block` represents a kind of da
 
 ## Finding a dataset
 
-To find a dataset with compatible samples, we can pass the types of these blocks to [`finddatasets`](#) which will return a list of dataset names and recipes to load them in a suitable way.
+To find a dataset with compatible samples, we can pass the types of these blocks as a filter to [`datasets`](#) which will show us only dataset recipes for loading those blocks.
 
 {cell=main}
 ```julia
 using FastAI
 import FastAI: Image
-finddatasets(blocks=(Image, Mask))
+datarecipes(blocks=(Image, Mask))
 ```
 
-We can see that the `"camvid_tiny"` dataset can be loaded so that each sample is a pair of an image and a segmentation mask. Let's use [`loaddataset`](#) to load a [data container](data_containers.md) and concrete blocks.
+We can see that the `"camvid_tiny"` dataset can be loaded so that each sample is a pair of an image and a segmentation mask. Let's use a data recipe to load a [data container](data_containers.md) and concrete blocks.
 
 {cell=main, result=false, output=false style="display:none;"}
 ```julia
@@ -24,7 +24,7 @@ ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"
 
 {cell=main, output=false}
 ```julia
-data, blocks = loaddataset("camvid_tiny", (Image, Mask))
+data, blocks = load(findfirst(datarecipes(id="camvid_tiny", blocks=(Image, Mask))))
 ```
 
 As with every data container, we can load a sample using `getobs` which gives us a tuple of an image and a segmentation mask.
@@ -35,7 +35,7 @@ image, mask = sample = getobs(data, 1)
 size.(sample), eltype.(sample)
 ```
 
-`loaddataset` also returned `blocks` which are the concrete `Block` instances for the dataset. We passed in _types_ of blocks (`(Image, Mask)`) and get back _instances_ since the specifics of some blocks depend on the dataset. For example, the returned target block carries the labels for every class that a pixel can belong to.
+Loading the dataset recipe also returned `blocks`, which are the concrete [`Block`] instances for the dataset. We passed in _types_ of blocks (`(Image, Mask)`) and get back _instances_ since the specifics of some blocks depend on the dataset. For example, the returned target block carries the labels for every class that a pixel can belong to.
 
 {cell=main}
 ```julia
@@ -55,8 +55,8 @@ checkblock((inputblock, targetblock), (image, mask))
 In short, if you have a learning task in mind and want to load a dataset for that task, then
 
 1. define the types of input and target block, e.g. `blocktypes = (Image, Label)`,
-2. use [`finddatasets`](#)`(blocks=blocktypes)` to find compatbile datasets; and
-3. run [`loaddataset`](#)`(datasetname, blocktypes)` to load a data container and the concrete blocks
+2. use `filter(`[`datarecipes`](#)`(), blocks=blocktypes)` to find compatbile dataset recipes; and
+3. run `load(`[`datarecipes`](#)`()[id])` to load a data container and the concrete blocks
 
 ### Exercises
 
@@ -66,14 +66,14 @@ In short, if you have a learning task in mind and want to load a dataset for tha
 
 ## Finding a learning task
 
-Armed with a dataset, we can go to the next step: creating a learning task. Since we already have blocks defined, this amounts to defining the encodings that are applied to the data before it is used in training. Here, FastAI.jl already defines some convenient constructors for learning tasks and you can find them with [`findlearningtasks`](#). Here we can pass in either block types as above or the block instances we got from `loaddataset`.
+Armed with a dataset, we can go to the next step: creating a learning task. Since we already have blocks defined, this amounts to defining the encodings that are applied to the data before it is used in training. Here, FastAI.jl already defines some convenient constructors for learning tasks and you can find them with [`learningtasks`](#). Here we can pass in either block types as above or the block instances:
 
 {cell=main}
 ```julia
-findlearningtasks(blocks)
+learningtasks(blocks=blocks)
 ```
 
-Looks like we can use the [`ImageSegmentation`](#) function to create a learning task for our learning task. Every function returned can be called with `blocks` and, optionally, some keyword arguments for customization.
+Looks like we can use the [`ImageSegmentation`](#) function to create a learning task. Every function returned can be called with `blocks` and, optionally, some keyword arguments for customization.
 
 {cell=main}
 ```julia

diff --git a/docs/fastai_api_comparison.md b/docs/fastai_api_comparison.md
@@ -6,7 +6,7 @@ FastAI.jl is in many ways similar to the original Python [fastai](docs.fast.ai),
 
 FastAI.jl's own data block API makes it possible to derive every part of a high-level interface with a unified API across tasks. Instead it suffices to create a learning task and based on the blocks and encodings specified the proper model builder, loss function, and visualizations are implemented (see below). For a high-level API, a complete `Learner` can be constructed using [`tasklearner`](#) without much boilerplate. There are some helper functions for  creating these learning tasks, for example [`ImageClassificationSingle`](#) and [`ImageSegmentation`](#).
 
-FastAI.jl additionally has a unified API for registering and discovering functionality across applications also based on the data block abstraction.  `finddatasets` and `loaddataset` let you quickly load common datasets matching some data modality and `findlearningtask` lets you find learning task helpers for common tasks. See [the discovery tutorial](discovery.md) for more info.
+FastAI.jl additionally has a unified API for registering and discovering functionality across applications also based on the data block abstraction.  [`datasets`](#) and [`datarecipes`](#) let you quickly load common datasets matching some data modality and [`learningtasks`] lets you find learning task helpers for common tasks. See [the discovery tutorial](discovery.md) for more info.
 
 ### Vision
 
@@ -122,7 +122,7 @@ Metrics are handled by the [`Metrics`](#) callback which takes in reducing metri
 
 ### fastai.data.external
 
-FastAI.jl makes all the same datasets available in `fastai.data.external` available. See `FastAI.Datasets.DATASETS` for a list of all datasets and use [`datasetpath`](#)`(name)` to download and extract a dataset.
+FastAI.jl makes all the same datasets available in `fastai.data.external` available. See [`datasets`](#) for a list of all datasets that can be downloaded.
 
 ### funcs_kwargs and DataLoader, fastai.data.core
 

diff --git a/docs/howto/augmentvision.md b/docs/howto/augmentvision.md
@@ -15,7 +15,7 @@ using FastAI
 import FastAI: Image
 import CairoMakie; CairoMakie.activate!(type="png")
 
-data, blocks = loaddataset("imagenette2-160", (Image, Label))
+data, blocks = load(datarecipes()["imagenette2-160"])
 task = BlockTask(
     blocks,
     (

diff --git a/docs/howto/findfunctionality.md b/docs/howto/findfunctionality.md
@@ -0,0 +1,46 @@
+# How to find functionality
+
+For some kinds of functionality, FastAI.jl provides feature registries that allow you to search for and use features. The following registries currently exist:
+
+- [`datasets`](#) to download and unpack datasets,
+- [`datarecipes`](#) to load datasets into [data containers](/documents/docs/data_containers.md) that are compatible with a learning task; and
+- [`learningtasks`](#) to find learning tasks that are compatible with a dataset
+
+To load functionality:
+
+1. Get an entry using its ID
+    {cell}
+    ```julia
+    using FastAI
+    entry = datasets()["mnist_var_size_tiny"]
+    ```
+2. And load it
+    {cell}
+    ```julia
+    load(entry)
+    ```
+
+
+## Datasets
+
+{cell}
+```julia
+using FastAI
+datasets()
+```
+
+## Data recipes
+
+{cell}
+```julia
+using FastAI
+datarecipes()
+```
+
+## Learning tasks
+
+{cell}
+```julia
+using FastAI
+learningtasks()
+```