Skip to content

Commit

Permalink
Merge branch 'master' of github.com:FluxML/FastAI.jl
Browse files Browse the repository at this point in the history
  • Loading branch information
lorenzoh committed May 13, 2022
2 parents bfa9bf8 + 975d5f9 commit 48e27c4
Show file tree
Hide file tree
Showing 46 changed files with 1,730 additions and 397 deletions.
8 changes: 5 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ version = "0.4.3"

[deps]
Animations = "27a7e980-b3e6-11e9-2bcd-0b925532e340"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
ColorVectorSpace = "c3611d14-8923-5661-9e6a-0046d554d3a4"
Expand All @@ -14,14 +13,15 @@ DataAugmentation = "88a5189c-e7ff-4f85-ac6b-e6158070f02e"
DataDeps = "124859b0-ceae-595e-8997-d05f6a7a8dfe"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DataLoaders = "2e981812-ef13-4a9c-bfa0-ab13047b12a9"
FeatureRegistries = "c6aefb4f-3ac3-4095-8805-528476b02c02"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
FilePathsBase = "48062228-2e41-5def-b9a4-89aafe57970f"
FixedPointNumbers = "53c48c17-4a7d-5ca2-90c5-79b7896eea93"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
FluxTraining = "7bf95e4d-ca32-48da-9824-f0dc5310474f"
Glob = "c27321d9-0574-5035-807b-f59d2c89b15c"
ImageInTerminal = "d8c32880-2388-543b-8c61-d9f865259254"
ImageIO = "82e4d734-157c-48bb-816b-45c225c6df19"
ImageInTerminal = "d8c32880-2388-543b-8c61-d9f865259254"
IndirectArrays = "9b13fd28-a010-5f03-acff-a1bbcff69959"
InlineTest = "bd334432-b1e7-49c7-a2dc-dd9149e4ebd6"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
Expand All @@ -32,6 +32,7 @@ MosaicViews = "e94cdb99-869f-56ef-bcf0-1ae2bcbe0389"
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
Requires = "ae029012-a4dd-5104-9daa-d747884805df"
Setfield = "efcf1570-3423-57d1-acb7-fd33fddbac46"
Expand All @@ -53,14 +54,15 @@ DataAugmentation = "0.2.4"
DataDeps = "0.7"
DataFrames = "1"
DataLoaders = "0.1"
FeatureRegistries = "0.1"
FileIO = "1.7"
FilePathsBase = "0.9"
FixedPointNumbers = "0.8"
Flux = "0.12, 0.13"
FluxTraining = "0.2, 0.3"
Glob = "1"
ImageInTerminal = "0.4"
ImageIO = "0.6"
ImageInTerminal = "0.4"
IndirectArrays = "0.5, 1"
InlineTest = "0.2"
JLD2 = "0.4"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ As an example, here is how to train an image classification model:

```julia
using FastAI
data, blocks = loaddataset("imagenette2-160", (Image, Label))
data, blocks = load(datarecipes()["imagenette2-160"])
task = ImageClassificationSingle(blocks)
learner = tasklearner(task, data, callbacks=[ToGPU()])
fitonecycle!(learner, 10)
Expand Down
63 changes: 0 additions & 63 deletions docs/api.md

This file was deleted.

8 changes: 1 addition & 7 deletions docs/background/blocksencodings.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,25 +101,22 @@ task = BlockTask(

Now `encode` expects a sample and just runs the encodings over that, giving us an encoded input `x` and an encoded target `y`.

{cell=main}
```julia
data = loadfolderdata(joinpath(datasetpath("dogscats"), "train"), filterfn=isimagefile, loadfn=(loadfile, parentname))
data = loadfolderdata(joinpath(load(datasets()["dogscats"]), "train"), filterfn=isimagefile, loadfn=(loadfile, parentname))
sample = getobs(data, 1)
x, y = encodesample(task, Training(), sample)
summary(x), summary(y)
```

This is equivalent to:

{cell=main}
```julia
x, y = encode(task.encodings, Training(), FastAI.getblocks(task).sample, sample)
summary(x), summary(y)
```

Image segmentation looks almost the same except we use a `Mask` block as target. We're also using `OneHot` here, because it also has an `encode` task for `Mask`s. For this task, `ProjectiveTransforms` will be applied to both the `Image` and the `Mask`, using the same random state for cropping and augmentation.

{cell=main}
```julia
task = BlockTask(
(Image{2}(), Mask{2}(1:10)),
Expand All @@ -133,19 +130,16 @@ task = BlockTask(

The easiest way to understand how encodings are applied to each block is to use [`describetask`](#) and [`describeencodings`](#) which print a table of how each encoding is applied successively to each block. Rows where a block is **bolded** indicate that the data was transformed by that encoding.

{cell=main}
```julia
describetask(task)
```

The above tables make it clear what happens during training ("encoding a sample") and inference (encoding an input and "decoding an output"). The more general form [`describeencodings`](#) takes in encodings and blocks directly and can be useful for building an understanding of how encodings apply to some blocks.

{cell=main}
```julia
FastAI.describeencodings(task.encodings, (Image{2}(),))
```

{cell=main}
```julia
FastAI.describeencodings((OneHot(),), (Label(1:10), Mask{2}(1:10), Image{2}()))
```
Expand Down
14 changes: 8 additions & 6 deletions docs/background/datapipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ using DataLoaders: batchviewcollated
using FastAI
using FastAI.Datasets

data = loadtaskdata(datasetpath("imagenette2-320"), ImageClassification)
task = ImageClassification(Datasets.getclassesclassification("imagenette2-320"), (224, 224))
data, blocks = load(datarecipes()["imagenette2-320"])
task = ImageClassificationSingle(blocks, size=(224, 224))

# maps data processing over `data`
taskdata = taskdataset(data, task, Training())
Expand Down Expand Up @@ -68,7 +68,8 @@ using FastAI
using FastAI.Datasets
using FluxTraining: step!

data = loaddataset("imagenette2-320", (Image, Label))

data, blocks = load(datarecipes()["imagenette2-320"])
task = ImageClassificationSingle(blocks)
learner = tasklearner(task, data)

Expand Down Expand Up @@ -130,13 +131,14 @@ If the data loading is still slowing down training, you'll probably have to spee
For many computer vision tasks, you will resize and crop images to a specific size during training for GPU performance reasons. If the images themselves are large, loading them from disk itself can take some time. If your dataset consists of 1920x1080 resolution images but you're resizing them to 256x256 during training, you're wasting a lot of time loading the large images. *Presizing* means saving resized versions of each image to disk once, and then loading these smaller versions during training. We can see the performance difference using ImageNette since it comes in 3 sizes: original, 360px and 180px.

```julia
data_orig, _ = loaddataset("imagenette2", (Image, Label))

data_orig = load(datarecipes()["imagenette2"])
@time eachobsparallel(data_orig, buffered = false)

data_320px, _ = loaddataset("imagenette2-320", (Image, Label))
data_320px = load(datarecipes()["imagenette2-320"])
@time eachobsparallel(data_320px, buffered = false)

data_160px, _ = loaddataset("imagenette2-160", (Image, Label))
data_160px = load(datarecipes()["imagenette2-160"])
@time eachobsparallel(data_160px, buffered = false)
```

Expand Down
19 changes: 11 additions & 8 deletions docs/data_containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,7 @@ ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"
{cell=main, output=false}
```julia
using FastAI
import FastAI: Image
data, _ = loaddataset("imagenette2-160", (Image, Label))
data, _ = load(findfirst(datarecipes(datasetid="imagenette2-160")))
```

A data container is any type that holds observations of data and allows us to load them with `getobs` and query the number of observations with `nobs`. In this case, each observation is a tuple of an image and the corresponding class; after all, we want to use it for image classification.
Expand All @@ -31,15 +30,15 @@ image
nobs(data)
```

[`loaddataset`](#) makes it easy to a load a data container that is compatible with some block types, but to get a better feel for what it does, let's look under the hood by creating the same data container using some mid-level APIs.
`load(`[`datasets`](#)`[id])` makes it easy to a load a data container that is compatible with some block types, but to get a better feel for what it does, let's look under the hood by creating the same data container using some mid-level APIs.

## Creating data containers from files

Before we recreate the data container, [`datasetpath`](#) downloads a dataset and returns the path to the extracted files.
Before we recreate the data container, we'll download the dataset and get the path where the files are saved to:

{cell=main}
```julia
dir = datasetpath("imagenette2-160")
dir = load(datasets()["imagenette2-160"])
```

Now we'll start with [`FileDataset`](#) which creates a data container (here a `Vector`) of files given a path. We'll use the path of the downloaded dataset:
Expand Down Expand Up @@ -127,12 +126,16 @@ Using this official split, it will be easier to compare the performance of your

## Dataset recipes

We saw above how different image classification datasets can be loaded with the same logic as long as they are in a common format. To encapsulate the logic for loading common dataset formats, FastAI.jl has `DatasetRecipe`s. When we used [`finddatasets`](#) in the [discovery tutorial](discovery.md), it returned pairs of a dataset name and a `DatasetRecipe`. For example, `"imagenette2-160"` has an associated [`ImageFolders`](#) recipe and we can load it using [`loadrecipe`] and the path to the downloaded dataset:
We saw above how different image classification datasets can be loaded with the same logic as long as they are in a common format. To encapsulate the logic for loading common dataset formats, FastAI.jl has [`DatasetRecipe`](#)s. When we used [`datarecipes`](#) in the [discovery tutorial](discovery.md), it showed us such recipes that allow loading a dataset for a specific task. For example, `"imagenette2-160"` has an associated [`ImageFolders`](#) recipe which we can load by getting the entry and calling `load` on it:

{cell=main}
```julia
name, recipe = finddatasets(blocks=(Image, Label), name="imagenette2-160")[1]
data, blocks = loadrecipe(recipe, datasetpath(name))
entry = datarecipes()["imagenette2-160"]
```

{cell=main}
```julia
data, blocks = load(entry)
```

These recipes also take care of loading the data block information for the dataset. Read the [discovery tutorial](discovery.md) to find out more about that.
20 changes: 10 additions & 10 deletions docs/discovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ For finding both, we can make use of `Block`s. A `Block` represents a kind of da

## Finding a dataset

To find a dataset with compatible samples, we can pass the types of these blocks to [`finddatasets`](#) which will return a list of dataset names and recipes to load them in a suitable way.
To find a dataset with compatible samples, we can pass the types of these blocks as a filter to [`datasets`](#) which will show us only dataset recipes for loading those blocks.

{cell=main}
```julia
using FastAI
import FastAI: Image
finddatasets(blocks=(Image, Mask))
datarecipes(blocks=(Image, Mask))
```

We can see that the `"camvid_tiny"` dataset can be loaded so that each sample is a pair of an image and a segmentation mask. Let's use [`loaddataset`](#) to load a [data container](data_containers.md) and concrete blocks.
We can see that the `"camvid_tiny"` dataset can be loaded so that each sample is a pair of an image and a segmentation mask. Let's use a data recipe to load a [data container](data_containers.md) and concrete blocks.

{cell=main, result=false, output=false style="display:none;"}
```julia
Expand All @@ -24,7 +24,7 @@ ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"

{cell=main, output=false}
```julia
data, blocks = loaddataset("camvid_tiny", (Image, Mask))
data, blocks = load(findfirst(datarecipes(id="camvid_tiny", blocks=(Image, Mask))))
```

As with every data container, we can load a sample using `getobs` which gives us a tuple of an image and a segmentation mask.
Expand All @@ -35,7 +35,7 @@ image, mask = sample = getobs(data, 1)
size.(sample), eltype.(sample)
```

`loaddataset` also returned `blocks` which are the concrete `Block` instances for the dataset. We passed in _types_ of blocks (`(Image, Mask)`) and get back _instances_ since the specifics of some blocks depend on the dataset. For example, the returned target block carries the labels for every class that a pixel can belong to.
Loading the dataset recipe also returned `blocks`, which are the concrete [`Block`] instances for the dataset. We passed in _types_ of blocks (`(Image, Mask)`) and get back _instances_ since the specifics of some blocks depend on the dataset. For example, the returned target block carries the labels for every class that a pixel can belong to.

{cell=main}
```julia
Expand All @@ -55,8 +55,8 @@ checkblock((inputblock, targetblock), (image, mask))
In short, if you have a learning task in mind and want to load a dataset for that task, then

1. define the types of input and target block, e.g. `blocktypes = (Image, Label)`,
2. use [`finddatasets`](#)`(blocks=blocktypes)` to find compatbile datasets; and
3. run [`loaddataset`](#)`(datasetname, blocktypes)` to load a data container and the concrete blocks
2. use `filter(`[`datarecipes`](#)`(), blocks=blocktypes)` to find compatbile dataset recipes; and
3. run `load(`[`datarecipes`](#)`()[id])` to load a data container and the concrete blocks

### Exercises

Expand All @@ -66,14 +66,14 @@ In short, if you have a learning task in mind and want to load a dataset for tha

## Finding a learning task

Armed with a dataset, we can go to the next step: creating a learning task. Since we already have blocks defined, this amounts to defining the encodings that are applied to the data before it is used in training. Here, FastAI.jl already defines some convenient constructors for learning tasks and you can find them with [`findlearningtasks`](#). Here we can pass in either block types as above or the block instances we got from `loaddataset`.
Armed with a dataset, we can go to the next step: creating a learning task. Since we already have blocks defined, this amounts to defining the encodings that are applied to the data before it is used in training. Here, FastAI.jl already defines some convenient constructors for learning tasks and you can find them with [`learningtasks`](#). Here we can pass in either block types as above or the block instances:

{cell=main}
```julia
findlearningtasks(blocks)
learningtasks(blocks=blocks)
```

Looks like we can use the [`ImageSegmentation`](#) function to create a learning task for our learning task. Every function returned can be called with `blocks` and, optionally, some keyword arguments for customization.
Looks like we can use the [`ImageSegmentation`](#) function to create a learning task. Every function returned can be called with `blocks` and, optionally, some keyword arguments for customization.

{cell=main}
```julia
Expand Down
4 changes: 2 additions & 2 deletions docs/fastai_api_comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ FastAI.jl is in many ways similar to the original Python [fastai](docs.fast.ai),

FastAI.jl's own data block API makes it possible to derive every part of a high-level interface with a unified API across tasks. Instead it suffices to create a learning task and based on the blocks and encodings specified the proper model builder, loss function, and visualizations are implemented (see below). For a high-level API, a complete `Learner` can be constructed using [`tasklearner`](#) without much boilerplate. There are some helper functions for creating these learning tasks, for example [`ImageClassificationSingle`](#) and [`ImageSegmentation`](#).

FastAI.jl additionally has a unified API for registering and discovering functionality across applications also based on the data block abstraction. `finddatasets` and `loaddataset` let you quickly load common datasets matching some data modality and `findlearningtask` lets you find learning task helpers for common tasks. See [the discovery tutorial](discovery.md) for more info.
FastAI.jl additionally has a unified API for registering and discovering functionality across applications also based on the data block abstraction. [`datasets`](#) and [`datarecipes`](#) let you quickly load common datasets matching some data modality and [`learningtasks`] lets you find learning task helpers for common tasks. See [the discovery tutorial](discovery.md) for more info.

### Vision

Expand Down Expand Up @@ -122,7 +122,7 @@ Metrics are handled by the [`Metrics`](#) callback which takes in reducing metri

### fastai.data.external

FastAI.jl makes all the same datasets available in `fastai.data.external` available. See `FastAI.Datasets.DATASETS` for a list of all datasets and use [`datasetpath`](#)`(name)` to download and extract a dataset.
FastAI.jl makes all the same datasets available in `fastai.data.external` available. See [`datasets`](#) for a list of all datasets that can be downloaded.

### funcs_kwargs and DataLoader, fastai.data.core

Expand Down
2 changes: 1 addition & 1 deletion docs/howto/augmentvision.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ using FastAI
import FastAI: Image
import CairoMakie; CairoMakie.activate!(type="png")

data, blocks = loaddataset("imagenette2-160", (Image, Label))
data, blocks = load(datarecipes()["imagenette2-160"])
task = BlockTask(
blocks,
(
Expand Down
46 changes: 46 additions & 0 deletions docs/howto/findfunctionality.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# How to find functionality

For some kinds of functionality, FastAI.jl provides feature registries that allow you to search for and use features. The following registries currently exist:

- [`datasets`](#) to download and unpack datasets,
- [`datarecipes`](#) to load datasets into [data containers](/documents/docs/data_containers.md) that are compatible with a learning task; and
- [`learningtasks`](#) to find learning tasks that are compatible with a dataset

To load functionality:

1. Get an entry using its ID
{cell}
```julia
using FastAI
entry = datasets()["mnist_var_size_tiny"]
```
2. And load it
{cell}
```julia
load(entry)
```


## Datasets

{cell}
```julia
using FastAI
datasets()
```

## Data recipes

{cell}
```julia
using FastAI
datarecipes()
```

## Learning tasks

{cell}
```julia
using FastAI
learningtasks()
```
Loading

2 comments on commit 48e27c4

@lorenzoh
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/60177

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.4.3 -m "<description of version>" 48e27c4c226ec8ffe9d670911e1127ccf0ab9f16
git push origin v0.4.3

Please sign in to comment.