- Write unit tests.
- Achieve 100% test coverage.
- Python Learning: design a preprocessing function that takes a
tf.tensor
. This function should take a batch, preprocess it (possibly using many cores) and then fetch the results. The results should then be saved to disk. Useful links: TensorFlow guide to data performance, TensorFlow tutorial to image classification, TensorFlow tutorial to loading images, TensorFlow guide to building input pipelines. -
bl.utils
could be split into many utilities submodules. - Use type annotations where applicable.
- Document code.
- Allow different batch sizes for different models.
- Why do
more_itertools.filter_except
andmore_itertools.map_except
need to doexceptions = tuple(exceptions)
? - Finish step detection analysis.
- Implement a function wrapper that transforms the arguments before forwarding. For instance:
import operator
lower_eq = transform(operator.eq, keyfunc=lambda x: x.lower())
assert "Hi" != "hi"
assert lower_eq("Hi", "hi")
- Am I normalizing images correctly? Make sure I am!
- Write READMEs for each subpackage.
- Include licenses in each module.
- Make
cv2
path-like compliant. - Take a look at the relationship between bubble or droplet formation rate and camera acquisition speed.
- [No. Take a look at
sentinel package or
PEP 0661] Implement a
typing helper
Sentinel
which expects a sentinel value called, for instance,_sentinel
, or another type. Equivalent totyping.Optional
, but using any other sentinel instead ofNone
. Seetyping.Literal
in Python 3.8. - Create my own models and test Kramer's. Some steps are:
- Learn where to put Dropout layers. This paper is awesome.
- Always make the number of dense units a multiple of 8. There is a Tensorflow reference for this, find it.
- Check if image sizes should be multiples of 8 as well.
- Implement droplet/bubble tracking. See what André Provensi texted me.
- Can the wet/dry areas ratio be of use to the nets?
- Think of cool names for the nets.
- Read this. Am I evaluating models correctly?
- Include
strategy
as part of a model's description? - Implement callbacks for reporting the history and timestamps of a models' training. This would be useful to compare the training of models, in special execution speed (to allow comparison between CPUs versus GPUs or uniform versus mixed precision).
- See Netron for NN.
- Choose a reasonably performing network and train two versions of it: with and without mixed precision. Measure train time and final validation loss. The training should always be performed in the same conditions (i.e. using GPUs and MirroredStrategy), being the application of mixed precision the only difference between the two nets.
- Organize datasets and publish them on Kaggle?
- Use narrower visualization windows?
- Take a look at this, on how to use TensorBoard, and at TensorFlow's guide.
- Include depth? See
Elboushaki, A., Hannane, R., Afdel, K., Koutti, L., 2020. MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Systems with Applications.. doi:10.1016/j.eswa.2019.112829
They have two inputs: a RGB image + a depth, which maps each pixel of an image to a relative distance to the photographer. With a 2D experiment, this would be very important to include a depth map to allow the model to see a different between closer bubbles (that should look bigger) and more distant bubbles (which look smaller).
- Use object detection.
- Use transfer learning from one case to another.
- Implement a way to measure the training time.
- Implement a warm-up: the first epoch of training (after compiling or restoring) should be discarded to avoid including TF warmup in the training time measurement.
- Optimize for the activation functions
- For many parameters, and above all for setting key names, how about
creating a specialized
dataclasses.dataclass
? For instance, instead of:
class CSVDataset:
def __init__(
self,
path: Path,
features_columns: Optional[List[str]] = None,
target_column: str = "target",
) -> None:
if features_columns is None:
features_columns = ["image_path"]
X = pd.read_csv(path, columns=features_columns + [target_column])
self.y = X.pop(target_column)
self.X = X
we could write:
@dataclass(frozen=True, kwargs=True)
class CSVDatasetColumns:
features_columns: List[str] = field(default_factory=lambda: ["image_path"])
target_column: str = "target"
class CSVDataset:
def __init__(
self, path: Path, csv_columns: CSVDatasetColumns = CSVDatasetColumns()
) -> None:
X = pd.read_csv(
path, columns=csv_columns.features_columns + [csv_columns.target_column]
)
self.y = X.pop(csv_columns.target_column)
self.X = X
It may become a little bit more verbose, but it also isolates the logic of parameters. Also, it avoids using string constants directly in the function signature, delegating this responsibility to a helper class.
-
Implement integrated gradients.
-
Perform the following studies:
- Influence of batch size
- Learning curve (metric versus dataset size)
- Visualization window size
- Direct versus indirect visualization
- How random contrast (and others) affect image variance, and what does this mean in machine learning?
- Train on one set, evaluate on another
-
[No. It is not useful enough.] Release
Pack
as a standalone package, including functional programming functionality:
def double(arg):
return 2 * arg
def is_greater_than(threshold):
return lambda arg: arg > threshold
p = Pack("abc", x=3, y=2)
res = (
p # sends p
| double # duplicates all values: Pack('aa', 'bb', 'cc', x=6, y=4)
| (
str.upper,
is_greater_than(5),
) # applies str.upper to args, is_greater_than(5) to kwargs values
)
print(res) # prints Pack('AA', 'BB', 'CC', x=True, y=False)
and think of other things.
- Study RNNs. Perhaps a network could be fed 3 consecutive images (for instance) to give an output.
- Take a look at this: Python library for heat transfer.
- Take a look at
fastai
'sfastcore
. - Take a look at
BubCNN
. - Take a look at this: use consecutive images for each output.
- Use
tf.keras.layers.TimeDistributed
to handle temporal data! - Rescale images before feeding them to the network?
- Use Evidential Deep Learning?
- Check separable convolutions
- Improve
Pack
, making it more likeParameters
. - This ideia looks amazing, maybe use it?
- Use LocallyConnected layers?
- Check this post to get some ideas: https://pub.towardsai.net/state-of-the-art-models-in-every-machine-learning-field-2021-c7cf074da8b2
- try model optimization.
- try pretrained models from TensorFlow Hub.
- try transfer learning.
- try to fix the RSquare metric
- implement structural similarity. Check the papers(1), (2), (3), (4). There is also a tutorial in PythonMachineLearning.
- check Sample Correlation Coefficient.
- use ONNX format, simplifier and optimizer?
- take a look at the beautiful ConvLSTM2D for timeseries of images!
- take a look at
ndindex
- take a look at Probabilistic Layers Regression
- make LeakyReLU trainable?
- quantize models?
- generate a test case in which temperature is estimated from the boiling curve and that's what the models have to predict
- in condensation, investigate center cropping versus random cropping of the ROI
- test autoML with center and random cropping
- generate some videos like Hobold's showing boiling together with an error bar, the nominal heat flux and the predicted flux. This can be done for the four surfaces simultaneously
- investigate error as a function of the heat flux
- try assembling models: a first model classifies the boiling regime in low/mid/high heat fluxes. After that, prediction is dispatched to a model specialized in that heat flux range.