TODO List

Write unit tests.
Achieve 100% test coverage.
Python Learning: design a preprocessing function that takes a tf.tensor. This function should take a batch, preprocess it (possibly using many cores) and then fetch the results. The results should then be saved to disk. Useful links: TensorFlow guide to data performance, TensorFlow tutorial to image classification, TensorFlow tutorial to loading images, TensorFlow guide to building input pipelines.
bl.utils could be split into many utilities submodules.
Use type annotations where applicable.
Document code.
Allow different batch sizes for different models.
Why do more_itertools.filter_except and more_itertools.map_except need to do exceptions = tuple(exceptions)?
Finish step detection analysis.
Implement a function wrapper that transforms the arguments before forwarding. For instance:

import operator

lower_eq = transform(operator.eq, keyfunc=lambda x: x.lower())
assert "Hi" != "hi"
assert lower_eq("Hi", "hi")

Am I normalizing images correctly? Make sure I am!
Write READMEs for each subpackage.
Include licenses in each module.
Make cv2 path-like compliant.
Take a look at the relationship between bubble or droplet formation rate and camera acquisition speed.
[No. Take a look at sentinel package or PEP 0661] Implement a typing helper Sentinel which expects a sentinel value called, for instance, _sentinel, or another type. Equivalent to typing.Optional, but using any other sentinel instead of None. See typing.Literal in Python 3.8.
Create my own models and test Kramer's. Some steps are:
- Learn where to put Dropout layers. This paper is awesome.
- Always make the number of dense units a multiple of 8. There is a Tensorflow reference for this, find it.
- Check if image sizes should be multiples of 8 as well.
- Implement droplet/bubble tracking. See what André Provensi texted me.
- Can the wet/dry areas ratio be of use to the nets?
- Think of cool names for the nets.
Read this. Am I evaluating models correctly?
Include strategy as part of a model's description?
Implement callbacks for reporting the history and timestamps of a models' training. This would be useful to compare the training of models, in special execution speed (to allow comparison between CPUs versus GPUs or uniform versus mixed precision).
See Netron for NN.
Choose a reasonably performing network and train two versions of it: with and without mixed precision. Measure train time and final validation loss. The training should always be performed in the same conditions (i.e. using GPUs and MirroredStrategy), being the application of mixed precision the only difference between the two nets.
Organize datasets and publish them on Kaggle?
Use narrower visualization windows?
Take a look at this, on how to use TensorBoard, and at TensorFlow's guide.
Include depth? See

Elboushaki, A., Hannane, R., Afdel, K., Koutti, L., 2020. MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Systems with Applications.. doi:10.1016/j.eswa.2019.112829

They have two inputs: a RGB image + a depth, which maps each pixel of an image to a relative distance to the photographer. With a 2D experiment, this would be very important to include a depth map to allow the model to see a different between closer bubbles (that should look bigger) and more distant bubbles (which look smaller).

Use object detection.
Use transfer learning from one case to another.
Implement a way to measure the training time.
Implement a warm-up: the first epoch of training (after compiling or restoring) should be discarded to avoid including TF warmup in the training time measurement.
Optimize for the activation functions
For many parameters, and above all for setting key names, how about creating a specialized dataclasses.dataclass? For instance, instead of:

class CSVDataset:
    def __init__(
        self,
        path: Path,
        features_columns: Optional[List[str]] = None,
        target_column: str = "target",
    ) -> None:
        if features_columns is None:
            features_columns = ["image_path"]

        X = pd.read_csv(path, columns=features_columns + [target_column])
        self.y = X.pop(target_column)
        self.X = X

we could write:

@dataclass(frozen=True, kwargs=True)
class CSVDatasetColumns:
    features_columns: List[str] = field(default_factory=lambda: ["image_path"])
    target_column: str = "target"


class CSVDataset:
    def __init__(
        self, path: Path, csv_columns: CSVDatasetColumns = CSVDatasetColumns()
    ) -> None:
        X = pd.read_csv(
            path, columns=csv_columns.features_columns + [csv_columns.target_column]
        )
        self.y = X.pop(csv_columns.target_column)
        self.X = X

It may become a little bit more verbose, but it also isolates the logic of parameters. Also, it avoids using string constants directly in the function signature, delegating this responsibility to a helper class.

Implement integrated gradients.
Perform the following studies:
- Influence of batch size
- Learning curve (metric versus dataset size)
- Visualization window size
- Direct versus indirect visualization
- How random contrast (and others) affect image variance, and what does this mean in machine learning?
- Train on one set, evaluate on another
[No. It is not useful enough.] Release Pack as a standalone package, including functional programming functionality:

def double(arg):
    return 2 * arg


def is_greater_than(threshold):
    return lambda arg: arg > threshold


p = Pack("abc", x=3, y=2)
res = (
    p  # sends p
    | double  # duplicates all values: Pack('aa', 'bb', 'cc', x=6, y=4)
    | (
        str.upper,
        is_greater_than(5),
    )  # applies str.upper to args, is_greater_than(5) to kwargs values
)
print(res)  # prints Pack('AA', 'BB', 'CC', x=True, y=False)

and think of other things.

Study RNNs. Perhaps a network could be fed 3 consecutive images (for instance) to give an output.
Take a look at this: Python library for heat transfer.
Take a look at fastai's fastcore.
Take a look at BubCNN.
Take a look at this: use consecutive images for each output.
Use tf.keras.layers.TimeDistributed to handle temporal data!
Rescale images before feeding them to the network?
Use Evidential Deep Learning?
Check separable convolutions
Improve Pack, making it more like Parameters.
This ideia looks amazing, maybe use it?
Use LocallyConnected layers?
Check this post to get some ideas: https://pub.towardsai.net/state-of-the-art-models-in-every-machine-learning-field-2021-c7cf074da8b2
try model optimization.
try pretrained models from TensorFlow Hub.
try transfer learning.
try to fix the RSquare metric
implement structural similarity. Check the papers(1), (2), (3), (4). There is also a tutorial in PythonMachineLearning.
check Sample Correlation Coefficient.
use ONNX format, simplifier and optimizer?
take a look at the beautiful ConvLSTM2D for timeseries of images!
take a look at ndindex
take a look at Probabilistic Layers Regression
make LeakyReLU trainable?
quantize models?
generate a test case in which temperature is estimated from the boiling curve and that's what the models have to predict
in condensation, investigate center cropping versus random cropping of the ROI
test autoML with center and random cropping
generate some videos like Hobold's showing boiling together with an error bar, the nominal heat flux and the predicted flux. This can be done for the four surfaces simultaneously
investigate error as a function of the heat flux
try assembling models: a first model classifies the boiling regime in low/mid/high heat fluxes. After that, prediction is dispatched to a model specialized in that heat flux range.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md

TODO.md

TODO List

Files

TODO.md

Latest commit

History

TODO.md

File metadata and controls

TODO List