In this dataset, you are provided with a large number of small pathology images to classify. Files are named with an image id. The train_labels.csv file provides the ground truth for the images in the train folder. You are predicting the labels for the images in the test folder. A positive label indicates that the center 32x32px region of a patch contains at least one pixel of tumor tissue. Tumor tissue in the outer region of the patch does not influence the label. This outer region is provided to enable fully-convolutional models that do not use zero-padding, to ensure consistent behavior when applied to a whole-slide image.
The original PCam dataset contains duplicate images due to its probabilistic sampling, however, the version presented on Kaggle does not contain duplicates. We have otherwise maintained the same data and splits as the PCam benchmark. data link https://www.kaggle.com/c/histopathologic-cancer-detection
-
I will cover the following recipes:
-
Exploring the dataset
-
Creating a custom dataset
-
Splitting the dataset
-
Transforming the data
-
Creating dataloaders
-
Building the classification model
-
Defining the loss function
-
Defining the optimizer
-
Training and evaluation of the model
-
Deploying the model
-
Model inference on test data