diff --git a/.gitignore b/.gitignore index b68a587..f8744b0 100644 --- a/.gitignore +++ b/.gitignore @@ -112,3 +112,4 @@ talk_figures results images* models +doc/referee* diff --git a/README.md b/README.md index 1b13bd9..d730d93 100644 --- a/README.md +++ b/README.md @@ -3,15 +3,15 @@ ![](https://img.shields.io/badge/ADS-2020arXiv200100018W-blue.svg) ![](https://img.shields.io/badge/arXiv-2001.00018-orange.svg) -![](doc/results-example.jpg) +![](doc/gradcam-example.png) ## Connecting Optical Morphology, Environment, and HI Mass Fraction for Low-Redshift Galaxies Using Deep Learning A galaxy's cold gas content can determine its current and future star formation properties. Most of that cold gas in present-day galaxies is in the form of neutral atomic hydrogen (HI), which radiates weakly through a 21-cm emission line. Since it is so difficult to observe this signal, many different heuristics have been developed in order to estimate the HI mass fraction (equivalent to a galaxy's HI mass normalized by its stellar mass). This work aims to improve existing approaches by leveraging all optical imaging information. -We use deep convolutional neural networks to process SDSS *gri* images (spanning 224 x 224 pixels, or roughly 100" x 100") of optical counterparts to HI detections in low-redshift Universe (*z* < 0.05). By using [data augmentation](https://ui.adsabs.harvard.edu/abs/2015MNRAS.450.1441D/abstract), a [one-cycle learning rate schedule](https://arxiv.org/abs/1803.09820), the [Rectified Adam](https://arxiv.org/abs/1908.03265) + [LookAhead](https://arxiv.org/abs/1907.08610) optimizer, and [resnet-34](https://arxiv.org/abs/1512.03385) architecture ([+ bag of tricks](https://arxiv.org/abs/1812.01187) + [Mish activation function](https://arxiv.org/abs/1908.08681)), **we can predict mass fractions to within 0.25 dex RMSE for the SDSS x ALFALFA data set**. +We use deep convolutional neural networks to encode SDSS *gri* images (spanning 224 x 224 pixels, or roughly 100" x 100") of optical counterparts to HI detections in low-redshift Universe. We can predict HI mass fractions to within 0.23 dex RMSE for the SDSS x ALFALFA data set using imaging alone. When the CNN is also used for pattern recognition, the combined result outperforms all other machine learning regression methods (e.g., as low as 0.20 dex scatter for an independent ALFALFA data set). -Results can be found in our paper: https://arxiv.org/abs/2001.00018 +Results can be found in the paper: https://arxiv.org/abs/2001.00018 ## Usage @@ -21,24 +21,17 @@ git clone https://github.com/jwuphysics/HI-convnets.git cd HI-convnets ``` -Results can be replicated by evaluating the Jupyter notebooks in `notebook`, and/or by running the code in `src/train_alfalfa.py` and `src/train_xGASS.py`. - -Many of the notebooks can be run on [Google Colab](colab.research.google.com) or via Google Compute Engine; these are named accordingly. They can also be viewed online, e.g., using the [Jupyter `nbviewer`](https://nbviewer.jupyter.org/github/jwuphysics/HI-convnets/blob/master/notebook/COLAB%20-%20Visualizing%20galaxy%20features%20related%20to%20gas%20mass%20fraction.ipynb). Shown below is an example of running Grad-CAM on a trained convnet. An input galaxy (left) is fed forward through the convnet, and the algorithm highlights gas-poor (center) and gas-rich (right) features with overall confidence given by the *p*-values listed above each image. - -![](doc/gradcam-example.jpg) +Note that the most recent results are found in `notebook/updates`, while previous results can be found in `notebook`. ## Dependencies -Pytorch `>=1.0` and Fastai `>=1.0` are required to run this code. They can be installed together using the Anaconda command +Pytorch and Fastai `>=2.0` are required to run this code. At the time of this writing, the [`fastai2` library](https://github.com/fastai/fastai2/) is undergoing large changes, and will eventually supercede the [`fastai` repository](https://github.com/fastai/fastai). -``` -conda install -c pytorch -c fastai fastai -``` ## Data -All data were queried from the [SDSS DR14 image cutout service](http://skyserver.sdss.org/dr14/en/help/docs/api.aspx#imgcutout) using a download script similar to the one in our [metallicity prediction deep convnet](https://github.com/jwuphysics/galaxy-cnns). See, for example, `src/get_sdss_cutouts.py`. Positions were taken from the ALFALFA [α.40 catalogs](http://egg.astro.cornell.edu/alfalfa/data/) ([Haynes et al. 2011](https://ui.adsabs.harvard.edu/abs/2011AJ....142..170H/abstract)) and [xGASS catalogs](http://xgass.icrar.org/data.html). +Imaging data were queried from the [SDSS DR14 image cutout service](http://skyserver.sdss.org/dr14/en/help/docs/api.aspx#imgcutout) using a download script similar to the one in our [metallicity prediction deep convnet](https://github.com/jwuphysics/galaxy-cnns). See, for example, `src/get_sdss_cutouts.py`. Galaxy positions for the training data set were taken from the [ALFALFA 40% catalogs](http://egg.astro.cornell.edu/alfalfa/data/) ([Haynes et al. 2011](https://ui.adsabs.harvard.edu/abs/2011AJ....142..170H/abstract)) crossmatched to the SDSS Main Galaxy Sample. The ALFALFA 100%, [xGASS](http://xgass.icrar.org/data.html), and [NIBLES](https://ui.adsabs.harvard.edu/abs/2016A%26A...595A.118V/abstract) catalogs were used as test data sets. ## Citation @@ -62,8 +55,8 @@ archivePrefix = {arXiv}, ## Contact -If you have any questions or comments, please reach out via [email](mailto:jfwu@jhu.edu)! +If you have any questions or comments, please reach out via [email](mailto:jwuphysics@gmail.com)! ## Acknowledgments -This work began during the [MIAPP Programme on Galaxy Evolution](http://www.munich-iapp.de/programmes-topical-workshops/2019/galaxy-evolution/daily-schedule/) and was inspired by conversations with [Mike Jones (IAA)](http://amiga.iaa.es/p/321-Michael-G-Jones.htm) and [Luke Leisman (Valpariso)](https://www.valpo.edu/physics-astronomy/about/faculty-and-staff/lukas-leisman/). Conversations with @jegpeek were also super helpful. Some of this work was also done at the Interstellar Institute meeting, [SO-STAR](https://interstellarinstitute.org/programs/so-star/presentation.html). The Fastai [course](https://course.fast.ai/) and [software](https://github.com/fastai/fastai) developed by Jeremy Howard et al. have been immensely useful for this work. Likewise, the [Grad-CAM implementation](https://github.com/anhquan0412/animation-classification/blob/master/gradcam.py) by @anhquan0412, and [combined RAdam + LookAhead optimizer (aka Ranger)](https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer) by @lessw2020 are used in this work. +This work began during the [MIAPP Programme on Galaxy Evolution](http://www.munich-iapp.de/programmes-topical-workshops/2019/galaxy-evolution/daily-schedule/) and was inspired by conversations with [Mike Jones (IAA)](http://amiga.iaa.es/p/321-Michael-G-Jones.htm) and [Luke Leisman (Valpariso)](https://www.valpo.edu/physics-astronomy/about/faculty-and-staff/lukas-leisman/). Conversations with @jegpeek were super helpful. The anonymous ApJ referee also provided lots of useful comments and feedback. Some of this work was also done at the Interstellar Institute meeting, [SO-STAR](https://interstellarinstitute.org/programs/so-star/presentation.html). The Fastai [course](https://course.fast.ai/) and [software](https://github.com/fastai/fastai) developed by Jeremy Howard et al. have been immensely useful for this work. Likewise, the [Grad-CAM implementation](https://github.com/anhquan0412/animation-classification/blob/master/gradcam.py) by @anhquan0412, and [combined RAdam + LookAhead optimizer (aka Ranger)](https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer) by @lessw2020 are used in this work. diff --git a/doc/gradcam-example.jpg b/doc/gradcam-example.jpg deleted file mode 100644 index 57b6ef9..0000000 Binary files a/doc/gradcam-example.jpg and /dev/null differ diff --git a/doc/gradcam-example.png b/doc/gradcam-example.png new file mode 100644 index 0000000..f75efe5 Binary files /dev/null and b/doc/gradcam-example.png differ diff --git a/doc/results-example.jpg b/doc/results-example.jpg deleted file mode 100644 index 2655bd7..0000000 Binary files a/doc/results-example.jpg and /dev/null differ