-
Notifications
You must be signed in to change notification settings - Fork 61
release_notes_v21.12.01
This version would be only available through the PyPI package (https://pypi.org/project/cucim/21.12.1/).
cuCIM now supports loading the entire image with multi-threads. It also supports batch loading of images.
If device
parameter of read_region()
method is "cuda"
, it loads a relevant portion of the image file (compressed tile data) into GPU memory using cuFile(GDS, GPUDirect Storage), then decompress those data using nvJPEG's Batched Image Decoding API.
Current implementations are not efficient and performance is poor compared to CPU implementations. However, we plan to improve it over the next versions.
The following parameters would be added in the read_region
method:
-
num_workers
: number of workers(threads) to use for loading the image. (default:1
) -
batch_size
: number of images to load at once. (default:1
) -
drop_last
: whether to drop the last batch if the batch size is not divisible by the number of images. (default:False
) -
preferch_factor
: number of samples loaded in advance by each worker. (default:2
) -
shuffle
: whether to shuffle the input locations (default:False
) -
seed
: seed value for random value generation (default: 0)
Loading entire image by using multithreads
from cucim import CuImage
img = CuImage("input.tif")
region = img.read_region(level=1, num_workers=8) # read whole image at level 1 using 8 workers
Loading batched image using multithreads
You can feed locations of the region through the list/tuple of locations or through the NumPy array of locations.
(e.g., ((<x for loc 1>, <y for loc 1>), (<x for loc 2>, <y for loc 2>)])
).
Each element in the location should be int type (int64) and the dimension of the location should be
equal to the dimension of the size.
You can feed any iterator of locations (dimensions of the input don't matter, flattening the item in the iterator once if the item is also an iterator).
For example, you can feed the following iterator:
-
[0, 0, 100, 0]
or(0, 0, 100, 0)
would be interpreted as a list of(0, 0)
and(100, 0)
. -
((sx, sy) for sy in range(0, height, patch_size) for sx in range(0, width, patch_size))
would iterate over the locations of the patches. -
[(0, 100), (0, 200)]
would be interpreted as a list of(0, 0)
and(100, 0)
. - Numpy array such as
np.array(((0, 100), (0, 200)))
ornp.array((0, 100, 0, 200))
would be also available and using Numpy array object would be faster than using python list/tuple.
import numpy as np
from cucim import CuImage
cache = CuImage.cache("per_process", memory_capacity=1024)
img = CuImage("image.tif")
locations = [[0, 0], [100, 0], [200, 0], [300, 0],
[0, 200], [100, 200], [200, 200], [300, 200]]
# locations = np.array(locations)
region = img.read_region(locations, (224, 224), batch_size=4, num_workers=8)
for batch in region:
img = np.asarray(batch)
print(img.shape)
for item in img:
print(item.shape)
# (4, 224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (4, 224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
Loading image using nvJPEG and cuFile (GDS, GPUDirect Storage)
If cuda
argument is specified in device
parameter of read_region()
method, it uses nvJPEG with GPUDirect Storage to load images.
Use CuPy instead of Numpy, and Image Cache (CuImage.cache
) wouldn't be used in the case.
import cupy as cp
from cucim import CuImage
img = CuImage("image.tif")
locations = [[0, 0], [100, 0], [200, 0], [300, 0],
[0, 200], [100, 200], [200, 200], [300, 200]]
# locations = np.array(locations)
region = img.read_region(locations, (224, 224), batch_size=4, device="cuda")
for batch in region:
img = cp.asarray(batch)
print(img.shape)
for item in img:
print(item.shape)
# (4, 224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (4, 224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
We have compared performance against Tifffile for loading the entire image.
- OS: Ubuntu 18.04
- CPU: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz, 12 processors.
- Memory: 64GB (G-Skill DDR4 2133 16GB X 4)
- Storage
- SATA SSD: Samsung SSD 850 EVO 1TB
Benchmarked loading several images with Tifffile.
-
Use read_region() APIs to read the entire image (.svs/.tiff) at the largest resolution level.
- Performed on the following images that use a different compression method
- JPEG2000 YCbCr: TUPAC-TR-467.svs, 55MB, 19920x26420, tile size 240x240
- JPEG: image.tif (256x256 multi-resolution/tiled TIF conversion of TUPAC-TR-467.svs), 238MB, 19920x26420, tile size 256x256
- JPEG2000 RGB: CMU-1-JP2K-33005.svs, 126MB, 46000x32893, tile size 240x240
- JPEG: 0005f7aaab2800f6170c399693a96917.tiff in Prostate cANcer graDe Assessment (PANDA) Challenge, 46MB, 27648x29440, tile size 512x512
- JPEG: 000920ad0b612851f8e01bcc880d9b3d.tiff in Prostate cANcer graDe Assessment (PANDA) Challenge, 14MB, 15360x13312, tile size 512x512
- JPEG: 001d865e65ef5d2579c190a0e0350d8f.tiff in Prostate cANcer graDe Assessment (PANDA) Challenge, 71MB, 28672x34560, tile size 512x512
- Performed on the following images that use a different compression method
-
Use the same number of workers (threads) for both cuCIM and Tifffile.
- Tifffile uses half of the available processors by default (6 in the test system)
- Tested with 6 and 12 threads
-
Use the average time of 5 samples.
-
Test code is available at here
- JPEG2000 YCbCr: TUPAC-TR-467.svs, 55MB, 19920x26420, tile size 240x240
- cuCIM [6 threads]: 2.7688472287729384
- tifffile [6 threads]: 7.4588409311138095
- cuCIM [12 threads]: 2.1468488964252175
- tifffile [12 threads]: 6.142562598735094
- JPEG: image.tif (256x256 multi-resolution/tiled TIF conversion of TUPAC-TR-467.svs), 238MB, 19920x26420, tile size 256x256
- cuCIM [6 threads]: 0.6951584462076426
- tifffile [6 threads]: 1.0252630705013872
- cuCIM [12 threads]: 0.5354489935562015
- tifffile [12 threads]: 1.5688881931826473
- JPEG2000 RGB: CMU-1-JP2K-33005.svs, 126MB, 46000x32893, tile size 240x240
- cuCIM [6 threads]: 9.2361351958476
- tifffile [6 threads]: 27.936951795965435
- cuCIM [12 threads]: 7.4136177686043085
- tifffile [12 threads]: 22.46532293939963
- JPEG: 0005f7aaab2800f6170c399693a96917.tiff, 46MB, 27648x29440, tile size 512x512
- cuCIM [6 threads]: 0.7972335423342883
- tifffile [6 threads]: 0.926042037177831
- cuCIM [12 threads]: 0.6366931471042335
- tifffile [12 threads]: 0.9512427857145667
- JPEG: 000920ad0b612851f8e01bcc880d9b3d.tiff, 14MB, 15360x13312, tile size 512x512
- cuCIM [6 threads]: 0.2257618647068739
- tifffile [6 threads]: 0.25579613661393524
- cuCIM [12 threads]: 0.1840262952260673
- tifffile [12 threads]: 0.2717844221740961
- JPEG: 001d865e65ef5d2579c190a0e0350d8f.tiff, 71MB, 28672x34560, tile size 512x512
- cuCIM [6 threads]: 0.9925791253335774
- tifffile [6 threads]: 1.131185239739716
- cuCIM [12 threads]: 0.8037087645381689
- tifffile [12 threads]: 1.1474561678245663
Project Management
- Roadmap
- Development Process
- Architecture Decision Records
- Issue Tracking
- Release Process
- Related Projects
Contributing
- How to Contribute
- Submitting Bugs and Suggestions
- Source Code Organization
- Coding Guidelines
- Contributor License Agreement
Documentation