Add ImageNet #146

adrhill · 2022-06-23T13:39:04Z

Draft PR to add the ImageNet 2012 Classification Dataset (ILSVRC 2012-2017) as a ManualDataDep.
Closes #100.

Since ImageNet is very large (>150 GB) and requires signing up and accepting the terms of access, it can only be added manually. The ManualDataDep instruction message for ImageNet includes the following:

instructions on creating symlinks to existing ImageNet datasets (e.g. for use on shared compute clusters)
instructions on downloading and unpacking ImageNet from scratch, based on the PyTorch guide, as linked by @CarloLucibello

When unpacked "PyTorch-style", the ImageNet dataset is assumed to look as follows: ImageNet -> split-folder -> WordNet ID folder -> class samples as jpg-files, e.g.:

ImageNet
├── train
├── val
│   ├── n01440764
│   │   ├── ILSVRC2012_val_00000293.JPEG
│   │   ├── ILSVRC2012_val_00002138.JPEG
│   │   └── ...
│   ├── n01443537
│   └── ...
├── test
└── devkit
    ├── data
    │   ├── meta.mat
    │   └── ...
    └── ...

Current limitations

Since ImageNet is too large to precompute all preprocessed images and keep them in memory, the dataset precomputes a list of all file paths instead. ~~Calling Base.getindex(d::ImageNet, i) loads the image via ImageMagick.jl and preprocesses it when required. This adds dependencies on ImageMagick and Images.jl via LazyModules.~~

This also means that the ImageNet struct currently doesn't contain features (which might be a requirement for SupervisedDatasets?)

adapted from MetalHead 0.5.3's utils.jl

codecov-commenter · 2022-06-23T15:01:16Z

Codecov Report

Merging #146 (09d5be4) into master (86dabc4) will decrease coverage by 1.33%.
The diff coverage is 6.75%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
- Coverage   48.56%   47.23%   -1.33%     
==========================================
  Files          44       47       +3     
  Lines        2261     2335      +74     
==========================================
+ Hits         1098     1103       +5     
- Misses       1163     1232      +69

Impacted Files	Coverage Δ
src/datasets/vision/imagenet_reader/preprocess.jl	`0.00% <0.00%> (ø)`
.../datasets/vision/imagenet_reader/ImageNetReader.jl	`5.00% <5.00%> (ø)`
src/datasets/vision/imagenet.jl	`7.31% <7.31%> (ø)`
src/MLDatasets.jl	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

lorenzoh · 2022-06-23T15:28:54Z

Will be good to have ImageNet support!

I'm wondering if there may be a simpler implementation for this, though. It seems the dataset has the same format as the (derived) ImageNette and ImageWoof datasets. The way those are loaded in FastAI.jl combines the MLUtils.jl primitives and those could be used to load ImageNet as folows:

using MLDatasets, MLUtils, FileIO

function ImageNet(dir)
    files = FileDataset(identity, path, "*.JPEG").paths
    return mapobs((FileIO.load, loadlabel), files)
end
# get the class name from the file path. could add a lookup here to convert the ID to the human-readable name
loadlabel(file::String) = split(file, "/")[end-2]


data  = ImageNet(IMAGENET_DIR)

# only training set
data  = ImageNet(joinpath(IMAGENET_DIR, "train"))

I'd also suggest using FileIO.jl for loading images which will the faster JpegTurbo.jl to load the images.

If more control over the image loading is desired, like converting to a color upon reading or loading an image into a smaller size (much faster if it'll be downsized during training anyway) , one could also use JpegTurbo.jl directly:

function ImageNet(dir; C = RGB{N0f8}, preferred_size = nothing)
    files = FileDataset(identity, path, "*.JPEG").paths
    return mapobs((f -> JpegTurbo.jpeg_decode(C, f; preferred_size), loadlabel), files)
end

# load as grayscale and smaller image size
data = ImageNet(IMAGENET_DIR; C = Gray{N0f8}, preferred_size = (224, 224))

adrhill · 2022-06-23T18:11:30Z

Thanks a lot, loading smaller images with JpegTurbo is indeed much faster!
I've also added a lookup-table wnid_to_label to the metadata. Once you know the label, you can access class names and descriptions by indexing the corresponding metadata entries, e.g. metadata["class_names"][label].

adrhill · 2022-06-23T18:28:42Z

JpegTurbo's preferred_size keyword already returns images pretty close to the desired 224x224 size. At the cost of losing a couple of pixels, we could skip the second resizing in resize_smallest_dimension, which allocates and instead directly center_crop, which is just a view.

adrhill · 2022-06-23T22:28:57Z

I've done some local benchmarks:
Current commit cac14d2 with JpegTurbo loading smaller images:

julia> using MLDatasets

julia> dataset = ImageNet(Float32, :val);

julia> @benchmark dataset[1:16]
BenchmarkTools.Trial: 44 samples with 1 evaluation.
 Range (min … max):  104.413 ms … 143.052 ms  ┊ GC (min … max):  7.28% … 18.57%
 Time  (median):     113.164 ms               ┊ GC (median):    10.80%
 Time  (mean ± σ):   115.515 ms ±   9.030 ms  ┊ GC (mean ± σ):  10.46% ±  3.68%

    ▃          █                                                 
  ▇▄█▄▁▁▄▄▇▇▇▄▄█▄▇▄▄▇▁▄▁▄▁▁▄▁▄▄▁▄▁▁▄▄▁▁▁▄▁▁▁▁▁▄▁▄▁▁▁▄▁▁▁▁▁▁▁▁▁▄ ▁
  104 ms           Histogram: frequency by time          143 ms <

 Memory estimate: 131.78 MiB, allocs estimate: 2050.

Without resize_smallest_dimension , using only center_crop:

julia> @benchmark dataset[1:16]
BenchmarkTools.Trial: 57 samples with 1 evaluation.
 Range (min … max):  80.594 ms … 103.226 ms  ┊ GC (min … max):  7.43% … 19.03%
 Time  (median):     86.954 ms               ┊ GC (median):     8.95%
 Time  (mean ± σ):   88.287 ms ±   5.683 ms  ┊ GC (mean ± σ):  10.90% ±  3.57%

    ▄ ▄ ▁▁ █▄  ▁▄   ▁    ▁  ▁▁ ▁   ▄   ▁ ▁   ▁           ▁      
  ▆▆█▆█▁██▁██▆▁██▆▁▆█▁▁▆▁█▁▆██▆█▁▁▁█▆▁▆█▁█▁▁▁█▁▁▁▆▁▁▁▁▁▁▁█▁▁▁▆ ▁
  80.6 ms         Histogram: frequency by time          101 ms <

 Memory estimate: 115.96 MiB, allocs estimate: 1826.

Additionally using StackedViews.jl for batching:

julia> @benchmark dataset[1:16]
BenchmarkTools.Trial: 95 samples with 1 evaluation.
 Range (min … max):  47.971 ms … 73.503 ms  ┊ GC (min … max): 0.00% … 8.68%
 Time  (median):     51.116 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   52.903 ms ±  4.922 ms  ┊ GC (mean ± σ):  4.69% ± 5.81%

  ▂ ▂▄█                                                        
  █▇█████▃▅▃▅▅▆▃█▇▇▅▅▅▁▅▁▁▃▃▆▆▁▁▁▁▁▅▃▃▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁▁▃ ▁
  48 ms           Histogram: frequency by time          70 ms <

 Memory estimate: 38.43 MiB, allocs estimate: 1499.

johnnychen94

I'm against @lazy import ImageCore (and @lazy import ImageShow) because it's very likely to hit the world-age issue if not used carefully. I mean, if this is a safe solution I'll be the first one to refactor the JuliaImages ecosystem this way. But since @CarloLucibello is the actual maintainer of this package, I'll leave this decision to him.

johnnychen94 · 2022-06-24T01:29:39Z

src/datasets/vision/imagenet_reader/ImageNetReader.jl

+
+# Load image from ImageNetFile path and preprocess it to normalized 224x224x3 Array{Tx,3}
+function readimage(Tx::Type{<:Real}, file::AbstractString)
+    im = JpegTurbo.jpeg_decode(ImageCore.RGB{Tx}, file; preferred_size=IMGSIZE)


I'm not sure if all ImageNet images meets the requirement, but note that the actual decomposed result size size(im) might not be preferred_size.

I'm actually running into warnings with images smaller than preferred_size:

┌ Warning: Failed to infer appropriate scale ratio, use `scale_ratio=2` instead. │ actual_size = (127, 100) │ preferred_size = (224, 224) └ @ JpegTurbo ~/.julia/packages/JpegTurbo/b5MSG/src/decode.jl:165

do you have experience with this @lorenzoh ?

The reason for this is that JpegTurbo.jl (or libjpegt-turbo) only supports a very limited range of scale_ratio: they are $M/8$ where $M \in [1, 2, ..., 16]$. Thus the maximal possible scale_ratio is 2. This is exactly why size(img) == preferred_size may not hold in practice.

The supported scale_ratio permits a faster decoding algorithm (by scaling the coefficients instead of the actual images), this is why we can observe the performance boost here.

The perhaps safest (I think) solution is to add a imresize after it:

img = @suppress_err JpegTurbo.jpeg_decode(file; preferred_size=(224, 224)) if size(img) != (224, 224) img = imresize(img, (224, 224)) end

The @suppress_err macro is a handy tool from https://github.com/JuliaIO/Suppressor.jl to disable this warning message.

I don't plan to make this imresize happen automatically in JpegTurbo.jl because it would otherwise break people's expectation on "keyword preferred_size can make decoding faster"

src/datasets/vision/imagenet_reader/preprocess.jl

src/datasets/vision/imagenet.jl

src/MLDatasets.jl

src/datasets/vision/imagenet_reader/ImageNetReader.jl

src/datasets/vision/imagenet.jl

src/datasets/vision/imagenet_reader/ImageNetReader.jl

src/datasets/vision/imagenet.jl

adrhill · 2022-06-30T14:16:17Z

Thanks for the review @Dsantra92!
I'm slightly busy due to the JuliaCon submission deadline on Monday, but I'll get back to this PR as soon as possible.

adrhill · 2022-06-30T21:00:25Z

The order of the classes in the metadata also still has to be fixed as it doesn't match https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt.

docs/src/datasets/vision.md

src/MLDatasets.jl

src/datasets/vision/imagenet.jl

src/datasets/vision/imagenet_reader/ImageNetReader.jl

src/datasets/vision/imagenet_reader/preprocess.jl

adrhill · 2022-08-03T16:04:55Z

Sorry for stalling this.

I guess the issue with this PR boils down to whether preprocessing functions belong to MLDatasets or to packages exporting pre-trained models. This question has already been raised in FluxML/Metalhead.jl#117.

Since images in ImageNet have different dimensions, providing an ImageNet data loader with matching preprocessing functions would be somewhat useless, as it would not be able to load batches of data. And as discussed here in the context of JpegTurbo.jpeg_decode, a lot of performance would be left on the table if we loaded full-size images just to immediately resize them.

I took a look at how other Deep Learning frameworks deal with this and both torchvision and Keras Applications export preprocessing functions with their pre-trained models. MLDataset's FileDataset pattern would work well if pre-trained model libraries exported a corresponding loadfn.
One of the issues mentioned in FluxML/Metalhead.jl#117 is import latency for extra dependencies such as DataAugmentation.jl. Maybe LazyModules.jl could help circumvent this problem.

RomeoV · 2023-02-02T01:59:21Z

Hey everyone, this looks awesome. Is anyone still working on this? Otherwise I would suggest trying to merge this, even if it's not "perfect" with regards to extra dependencies or open questions about transformations.

adrhill · 2023-02-02T12:24:18Z

I'm still interested in working on this.

To get this merged, we could make the preprocess and inverse_preprocess functions part of the ImageNet struct and provide the current functions as defaults.

Edit: inverse_preprocess is now a field of the ImageNet struct, preprocess is the loadfn of the internal FileDataset.

CarloLucibello · 2023-02-02T17:52:51Z

this needs a rebase, otherwise looks mostly good

src/datasets/vision/imagenet.jl

ToucheSir · 2023-02-02T19:04:04Z

src/datasets/vision/imagenet_reader/preprocess.jl

+const PYTORCH_MEAN = [0.485f0, 0.456f0, 0.406f0]
+const PYTORCH_STD = [0.229f0, 0.224f0, 0.225f0]
+
+normalize_pytorch(x) = (x .- PYTORCH_MEAN) ./ PYTORCH_STD
+inv_normalize_pytorch(x) = x .* PYTORCH_STD .+ PYTORCH_MEAN


I would drop the pytorch prefix/suffix and use something else. The comments can stay. If PyTorch isn't the only library that does this preprocessing, then it makes sense to represent that with more general names. If different libraries are providing different preprocessing functionality for ImageNet (or not providing any), then I'd argue there is no canonical default set of ImageNet transformations and this code (aside from maybe the descriptive stats) shouldn't be in MLDatasets.

Good point. Since this is just an internal function used by default_preprocess, I would suggest either _normalize or default_normalize. The appeal of using these coefficients as defaults is that they should work out of the box with pre-trained vision models from Metalhead.jl.

Wait, so do other libraries provide this functionality in their ImageNet dataset APIs? I checked https://www.tensorflow.org/datasets/catalog/imagenet2012 and it has no mention of preprocessing, so is PyTorch the only library that does this? If so, I would vote to remove the preprocessing functions as mentioned above.

If I am not wrong, these normalization values depend on the model you are using. Also, none of the existing vision datasets have preprocessing functions. These functions are ideally handled by data preprocessing libraries/modules.

The norm values should not be model-specific. They're derived directly from the data before any model is involved.

In the pytorch case, notice however that although the transformations are stored in the "model weights", the mean and std is the same across models (see e.g. the mobilenet model).

In a similar spirit, I would definitely defend the decision of shipping the set of transformations (cropping, interpolation, linear transformation, etc) as part of the dataset. However I agree with the very first point that the name transformation_pytorch isn't really precise, although I think it is fair to link to the corresponding transformations for tensorflow, pytorch, and/or the timm library in a related comment.

PyTorch also lumps code for pretrained models, data augmentations and datasets into one library, I don't think we need to follow their every example :)

In a similar spirit, I would definitely defend the decision of shipping the set of transformations (cropping, interpolation, linear transformation, etc) as part of the dataset.

This is precisely why I asked about what other libraries are doing. If nobody else is shipping the same set of transformations, then they can hardly be considered canonical for ImageNet. That doesn't mean we should never ship helpers to create common augmentation pipelines, but that it is better served by packages which have access to efficient augmentation libraries (e.g. Augmentor, DataAugmentation) and not by some unoptimized implementation which is simultaneously more general (because it's applicable to other datasets) and less general (because many papers using ImageNet do not use these augmentations) than the dataset it's been attached to.

Let's just apply the channelview and permute transformation by default here,
and make the (permuted) mean and std values be part of the type

I have also taken a look at Keras' ImageNet utilities. While these normalization constants are used in many places throughout torchvision and PyTorch, it looks like TensorFlow and Keras do indeed use their own constants.

I agree with @ToucheSir's sentiment

If nobody else is shipping the same set of transformations, then they can hardly be considered canonical for ImageNet.

However, this point can be drawn even further, as nothing about ImageNet is truly canonical.
To give some examples (some of which have previously been discussed):

There is no canonical reason why Images have to be loaded in 224 x 224 format.

There is no canonical reason to apply the resizing algorithm JpegTurbo.jl uses when calling jpeg_decode with a preferred_size.

There is no canonical way of sorting class labels. Some sort by WordNet ID (e.g. PyTorch), others don't.

Getting this merged

So make this dataloader as "unopinionated" as possible, we could just make it a very thin wrapper around FileDataset which just loads metadata. This would require the user to pass a loadfn which handles the transformation from file path to array. Class ordering could be handled using a sort_by_wnid=True keyword argument and all new dependencies introduced in this PR could be removed (ImageCore, JpegTurbo and StackViews).

Future work

However, I do strongly feel like some package in the wider Julia ML / Deep Learning ecosystem should export loadfns that are usable with Metalhead's PyTorch models out of the box. @lorenzoh previously proposed adding such functionality to DataAugmentation.jl in FluxML/Metalhead.jl#117.
Once this functionality is available somewhere, ImageNet's docstring in MLDataset should be updated to showcase this common use-case.

Until this functionality exists, I would suggest adding a "Home" => "Tutorials" => "ImageNet" page to the MLDatasets docs which implements the current load function.

it looks like TensorFlow and Keras do indeed use their own constants.

Nice find. I was not expecting that mode == "torch" conditional.

However, this point can be drawn even further, as nothing about ImageNet is truly canonical. To give some examples (some of which have previously been discussed):

1. There is no canonical reason why Images have to be loaded in 224 x 224 format. 2. There is no canonical reason to apply the resizing algorithm JpegTurbo.jl uses when calling `jpeg_decode` with a `preferred_size`. 3. There is no canonical way of sorting class labels. Some sort by WordNet ID (e.g. PyTorch), others don't.

The difference here is that all three of those points can have a decent fallback without depending on external packages. Another argument is that more people will rely on these defaults than won't. I'm not sure augmentations pass that threshold.
I'm not saying users shouldn't be able to pass in a transformation function, but identity or some such seems a more defensible default. Indeed, the torchvision ImageNet class does not do any additional transforms by default, so we'd be deviating from every other library if we stuck with this default centre crop.

src/datasets/vision/imagenet.jl

src/datasets/vision/imagenet_reader/ImageNetReader.jl

src/datasets/vision/imagenet.jl

adrhill · 2024-03-19T18:42:00Z

In case someone is still interested in using this, I've opened a unregistered repository containing this PR:
https://github.com/adrhill/ImageNetDataset.jl

The most notable difference is that ImageNetDataset.jl contains some custom preprocessing pipelines that support convert2image and work out of the box with Metalhead.jl.

adrhill added 6 commits June 23, 2022 15:15

Add Images and ImageMagick deps using LazyModules

7852e48

Add Image preprocessing script

20be9a6

adapted from MetalHead 0.5.3's utils.jl

Add ImageNet dataset

59afe92

Rename ImageNetReader file to match struct name

1f4dfaf

Formatting fixes

dfdeaa5

Remove lowpass on image before resizing

d2ded7e

Use FileDataset and replace ImageMagick with JpegTurbo

4809296

Add missing reference URL to comment

cac14d2

adrhill marked this pull request as ready for review June 23, 2022 18:22

adrhill added 3 commits June 24, 2022 00:03

Remove use of imresize

1e850ba

Replace Images dependency by ImageCore

2aa0170

Use StackViews.jl for batching

06ad214

johnnychen94 reviewed Jun 24, 2022

View reviewed changes

adrhill added 3 commits June 24, 2022 15:52

Load ImageCore and StackViews non-lazily

3097302

Bake Tx into FileDataset's loadfn

02e966d

Fix indexing bug in center_crop_view

9fb811c

Dsantra92 reviewed Jun 29, 2022

View reviewed changes

src/datasets/vision/imagenet.jl Outdated Show resolved Hide resolved

src/datasets/vision/imagenet.jl Outdated Show resolved Hide resolved

src/datasets/vision/imagenet_reader/ImageNetReader.jl Show resolved Hide resolved

src/datasets/vision/imagenet.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Jul 2, 2022

View reviewed changes

adrhill added 2 commits July 8, 2022 18:09

Move installation guide into separate markdown file

4e0e8d4

Include feedback from code review

0daca90

adrhill added 3 commits February 2, 2023 17:01

Support custom preprocessing functions

09feb3d

Sort classes by WordNet ID

df14fea

Update docstring

c92ae00

Merge branch 'master' into ah/imagenet

8637ebe

CarloLucibello reviewed Feb 2, 2023

View reviewed changes

src/datasets/vision/imagenet.jl Show resolved Hide resolved

ToucheSir reviewed Feb 2, 2023

View reviewed changes

Update docstrings

944bd83

CarloLucibello reviewed Feb 3, 2023

View reviewed changes

src/datasets/vision/imagenet.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Feb 3, 2023

View reviewed changes

src/datasets/vision/imagenet_reader/ImageNetReader.jl Outdated Show resolved Hide resolved

adrhill added 4 commits February 7, 2023 13:06

Remove StackViews dependency

fe38d43

Remove normalization constants

6af86c6

Add more metadata

95b13d9

Add img_size argument

09d5be4

CarloLucibello reviewed Feb 9, 2023

View reviewed changes

src/datasets/vision/imagenet.jl Show resolved Hide resolved

CarloLucibello reviewed Feb 9, 2023

View reviewed changes

src/datasets/vision/imagenet.jl Outdated Show resolved Hide resolved

Format to SciML code style, matching JuliaML#205

ae1929d

ToucheSir mentioned this pull request Nov 14, 2024

Add Imagenette dataset #245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ImageNet #146

Add ImageNet #146

adrhill commented Jun 23, 2022 •

edited

Loading

codecov-commenter commented Jun 23, 2022 •

edited

Loading

lorenzoh commented Jun 23, 2022

adrhill commented Jun 23, 2022 •

edited

Loading

adrhill commented Jun 23, 2022 •

edited

Loading

adrhill commented Jun 23, 2022

johnnychen94 left a comment

johnnychen94 Jun 24, 2022

adrhill Jun 24, 2022

johnnychen94 Jun 25, 2022 •

edited

Loading

adrhill commented Jun 30, 2022

adrhill commented Jun 30, 2022

adrhill commented Aug 3, 2022

RomeoV commented Feb 2, 2023

adrhill commented Feb 2, 2023 •

edited

Loading

CarloLucibello commented Feb 2, 2023

ToucheSir Feb 2, 2023

adrhill Feb 2, 2023

ToucheSir Feb 2, 2023

Dsantra92 Feb 2, 2023

ToucheSir Feb 2, 2023

RomeoV Feb 2, 2023

ToucheSir Feb 3, 2023 •

edited

Loading

CarloLucibello Feb 3, 2023

adrhill Feb 3, 2023

ToucheSir Feb 3, 2023

adrhill commented Mar 19, 2024

Add ImageNet #146

Are you sure you want to change the base?

Add ImageNet #146

Conversation

adrhill commented Jun 23, 2022 • edited Loading

Current limitations

codecov-commenter commented Jun 23, 2022 • edited Loading

Codecov Report

lorenzoh commented Jun 23, 2022

adrhill commented Jun 23, 2022 • edited Loading

adrhill commented Jun 23, 2022 • edited Loading

adrhill commented Jun 23, 2022

johnnychen94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnnychen94 Jun 25, 2022 • edited Loading

Choose a reason for hiding this comment

adrhill commented Jun 30, 2022

adrhill commented Jun 30, 2022

adrhill commented Aug 3, 2022

RomeoV commented Feb 2, 2023

adrhill commented Feb 2, 2023 • edited Loading

CarloLucibello commented Feb 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ToucheSir Feb 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Getting this merged

Future work

Choose a reason for hiding this comment

adrhill commented Mar 19, 2024

adrhill commented Jun 23, 2022 •

edited

Loading

codecov-commenter commented Jun 23, 2022 •

edited

Loading

adrhill commented Jun 23, 2022 •

edited

Loading

adrhill commented Jun 23, 2022 •

edited

Loading

johnnychen94 Jun 25, 2022 •

edited

Loading

adrhill commented Feb 2, 2023 •

edited

Loading

ToucheSir Feb 3, 2023 •

edited

Loading