Skip to content

Commit

Permalink
Removes cross-links given shortness of document
Browse files Browse the repository at this point in the history
  • Loading branch information
fmaguire committed Dec 18, 2020
1 parent 5b8d5ae commit dd5c42a
Show file tree
Hide file tree
Showing 6 changed files with 8 additions and 8 deletions.
2 changes: 1 addition & 1 deletion content/03.when-to-use-dl.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ In general, these include problems that feature hidden patterns across the data,
Problems in computer vision and natural language processing often exhibit these very features, which helps explain why these areas were some of the first to experience significant breakthroughs during the recent deep learning revolution [@doi:10.1145/3065386].
As long as large amounts of accurate and labeled data are available, applications to areas of biology with related data characteristics, such as genetic medicine [@doi:10.3389/fpsyt.2018.00290], radiology [@doi:10.1007/s11604-018-0726-3], microscopy [@doi:10.1364/OPTICA.4.001437], and pharmacovigilance [@doi:10.1093/jamia/ocw180], are similarly likely to benefit from deep learning techniques.
For example, Ferreira et al. used deep learning to recognize individual birds from images [@doi:10.1111/2041-210X.13436] despite this problem being very difficult historically.
By combining automatic data collection using RFID tags with data augmentation and transfer learning (explained in [Tip 5](#architecture)), the authors were able to use deep learning to achieve 90% accuracy across several species.
By combining automatic data collection using RFID tags with data augmentation and transfer learning, the authors were able to use deep learning to achieve 90% accuracy across several species.
Another research area where deep learning excels is generative modeling, where new samples are created based on the training data [@doi:10.1109/ISACV.2018.8354080].
One other area of machine learning that has been revolutionized by deep learning is reinforcement learning, which is concerned with training agents to interact with an environment [@arXiv:1709.06560].
Overall, initial evaluation as to whether similar problems (including analogous ones in other domains) have been solved successfully using deep learning can inform researchers about the potential for deep learning to address their needs.
Expand Down
4 changes: 2 additions & 2 deletions content/05.dl-complexities.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ One specific reproducibility pitfall that is often missed in applying deep learn
That is, the CUDA/CuDNN architectures that facilitate the parallelized computing that power state-of-the-art DL often use algorithms by default that produce different outcomes from iteration to iteration.
Therefore, achieving reproducibility in this context requires explicitly specifying the use of deterministic algorithms (which are typically available within deep learning libraries), which is distinct from the setting of random seeds that typically achieve reproducibility by controlling pseudorandom deterministic procedures such as shuffling and initialization [@url:https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#reproducibility].

Similar to the suggestions in [Tip 2](#baselines) about starting with simpler models, try to start with a relatively small network and then increase the size and complexity as needed.
Similar to the suggestions above about starting with simpler models, try to start with a relatively small network and then increase the size and complexity as needed.
This can help prevent practitioners from wasting significant time and resources on running highly complex models that feature numerous unresolved problems.
Again, beware of the choices made implicitly (that is, by default settings) by deep learning libraries (for example, selection of optimization algorithm), as these seemingly trivial specifics can have significant effects on model performance.
For example, adaptive methods often lead to faster convergence during training but may lead to worse generalization performance on independent datasets [@url:https://papers.nips.cc/paper/7003-the-marginal-value-of-adaptive-gradient-methods-in-machine-learning]).
These nuanced elements are easy to overlook, but it is critical to consider them carefully and to evaluate their potential impact (see [Tip 6](#hyperparameters)).
These nuanced elements are easy to overlook, but it is critical to consider them carefully and to evaluate their potential impact.

In short, use smaller and simpler networks to enable faster prototyping and follow general software development best practices to maximize reproducibility.
2 changes: 1 addition & 1 deletion content/06.know-your-problem.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,6 @@ For example, the existence in a dataset of samples collected from the same indiv
Another potential issue is the existence of systematic biases, which can be induced by confounding variables and can lead to artifacts or so-called "batch effects."
Consequently, models may learn to rely on the correlations that these systematic biases underpin, even though they are irrelevant to the scientific context of the study.
This can lead to misguided predictions and misleading conclusions [@doi:10.1038/nrg2825].
As described in [Tip 1](#concepts), unsupervised learning and other exploratory analyses can help identify such biases in these datasets before applying a deep learning model.
Unsupervised learning and other exploratory analyses can help identify such biases in these datasets before applying a deep learning model.

Overall, practitioners should thoroughly study their data and understand its context and peculiarities _before_ moving on to performing deep learning.
4 changes: 2 additions & 2 deletions content/07.architecture-and-representation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ This is further complicated by the fact that many recommendations do not general
Therefore, unfortunately, choosing how to represent data and design an architecture is closer to an art than a science.
That said, there are some general principles that are useful to follow when experimenting.

First and foremost, use your knowledge of the available data and your question (see [Tip 4](#know-your-problem)) to inform your data representation and architectural design choices.
First and foremost, use your knowledge of the available data and your question to inform your data representation and architectural design choices.
For example, if the dataset is an array of measurements with no natural ordering of inputs (such as gene expression data), multilayer perceptrons (MLPs) may be effective.
These are the most basic type of neural network, and they are able to learn complex non-linear relationships across the input data despite their relative simplicity.
Similarly, if the dataset is comprised of images, convolutional neural networks (CNNs) are a good choice because they emphasize local structures and adjacency within the data.
CNNs may also be a good choice for learning on sequences, as recent empirical evidence suggests that they can outperform canonical sequence learning techniques such as recurrent neural networks (RNNs) and the closely related long short-term memory (LSTM) networks [@arxiv:1803.01271].
Accessible high-level overviews of these different neural network architectures are provided in [@doi:10.1016/j.ymeth.2020.06.016] and [@doi:10.1038/nature14539].

Deep learning models typically benefit from increasing the amount of labeled data with which to train on.
Large amounts of data help to avoid overfitting (see [Tip 7](#overfitting)), and increase the likelihood of achieving top performance on a given task.
Large amounts of data help to avoid overfitting, and increase the likelihood of achieving top performance on a given task.
If there is not enough data available to train a well-performing model, consider using transfer learning.
In transfer learning, a model whose weights were generated by training on another dataset is used as the starting point for training [@tag:Yosinski2014].
Transfer learning is most useful when the pre-training and target datasets are of similar nature [@tag:Yosinski2014].
Expand Down
2 changes: 1 addition & 1 deletion content/08.hyperparameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Moreover, additional hyperparameters are introduced by common techniques that fa
These include parameter norm penalties (typically in the form of $L^2$ regularization), dropout [@tag:srivastava-dropout], and batch normalization [@tag:ioffe-batchnorm], which can reduce the effect of the so-called vanishing or exploding gradient problem when working with deep neural networks.

This wide array of potential parameters can make it difficult to evaluate the extent to which neural network methods are well suited to solving a task, as it can be unclear to practitioners whether previous successful applications were the result of interactions between unique data attributes and specific hyperparameter settings.
Similar to the Continental Breakfast Included effect discussed in [Tip 2](#baselines), a lack of clarity on how extensive arrays of hyperparameters were tested and/or chosen can affect method developers as they attempt to compare techniques.
Similar to the Continental Breakfast Included effect discussed above, a lack of clarity on how extensive arrays of hyperparameters were tested and/or chosen can affect method developers as they attempt to compare techniques.
This effect also has implications for those seeking to use existing deep learning methods, as performance estimates from deep neural networks are often provided after tuning.
The implication of this effect on users of deep neural networks is that attaining performance numbers that match those reported in publications is likely to require significant effort towards temporally expensive hyperparameter optimization.

Expand Down
2 changes: 1 addition & 1 deletion content/11.interpretation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ Nonetheless, the data supported this rule, as pneumonia patients with a history
The neural network had, therefore, also learned to make predictions according to this rule despite the fact that it has nothing to do with causality or mechanism.
Therefore, it would have been disastrous to guide treatment decisions according to the predictions of the neural network, even though the neural network had high predictive accuracy.

To trust deep learning models, we must combine knowledge of the training data ([Tip 4](#know-your-problem)) with inspection of the model ([Tip 8](#blackbox)).
To trust deep learning models, we must combine knowledge of the training data with inspection of the model.
To move beyond fitting predictive models and towards the building of an understanding that can inform scientific deduction, we suggest working to disentangle a model's internal logic by comparing data domains where models succeed to those in which they fail.
By doing so, we can avoid overinterpreting models and view them for what they are: complex statistical models trained on high dimensional data.

0 comments on commit dd5c42a

Please sign in to comment.