Predicting Exoplanets With Logistic Regression

Course: Part-Time Data Science

Instructor: Amber Yandow

- Author: Fennec C. Nightingale

Overview:

Predict & map whether or not a star is likely to have an exoplanet & make recommendations for astronomers to speed up the rate of exoplanet discovery Focus on recall so we're unlikely to inccoretly miss any planets.

The Data:

CSV & XML tree Data on stars, their planets, their parent systems & their physical characteristics.

Our expolanet data is from the Open Exoplanet Archive/Catalog & includes: Star names, magnitudes, radii, distance, right asciension/declination, spectral class
Our additional star data is from the HYG dataset & includes: Id numbers, names, magnitudes, luminosity, x, y, z coordinates for the stars, spectral class, and some details about each stars orbit

The Process:

I used Python in Jupyter Notebook to perform OSEMN & Logistic regression to create our model and predictions for housing prices in King County.

O - Obtain

We obtained our stars data here from the links above over at Kaggle. If you want to get started on your own classification project like this, fork this repo.

S- Scrub

After importing all of our data we checked it for null values, outliers, duplicates, and any other errors there might be in our dataset. We checked each column and decided what data we needed to keep or discard, what we might need to fill, or any other alterations we could make to fix up our data before we start modeling. This turned out to get rid of too much of our initial dataset on exoplanets alone, so I also lined up the ID numbers with the HYG dataset so I could randomly sample stars we have not found planets around.

E - Explore

We check out our data to see how our values are distributed, if there is any strong correlation, or if theres anything we missed in our scrubbing. Some of the catagories we wanted to include had really high correlations, but our cut off was .6 & there was no way to fix the multicolinearity through strategies like multiplication, so those catagories were dropped.

M - Model

We use the Sklearn Logistic regression module to get our best fit in this project.
To work with some of our data in this model, we also have to get dummies for our catagorical variables. After doing an initial model including all of our variables we used a GricSearchCV to go back through and refine our model, trying to make our predictions stronger. After modeling, we check all available evaluation metrics & compare.

N - iNterpret

Here we take a deep dive into figuring out what our evaluation metrics are saying about our models & plot how our best features compare.

Observations

We were able to make predictions as to wether or not a star would have n exoplanet, based on basic information about the stars themselves, with a high degree of recall, precision & accuracy.
Currently our biggest predictors are things that affect how well we see stars, like their absolute magnutde, luminosity index & distance

Future Work

-Use kepler labelled time series data to train deep learning algorithms to detect exoplanets based on light fluxuations in observed stars.

-Write something that is able to parse and accurately separate stellar types (as well as predict missing values) to test predictions made against more random data.

-Use additional data from the Open Exoplanet Catalogue to predict features of planets around stars & predicted stars.

-When more data is available, expand predictor to include multi-planetary predictions.

For More Informarion

See the full analysis in the Jupyter Notebooks or review our Presentation. For additional info, contact me here: Fennec C. Nightingale,

Repository Strucure

├──.ipynb_checkpoints
├──.virtual_documents
├──.__pycache__
├──Scrubbed.csv
├──Images
    ├── hist.png
    ├── MilkyWay.png
    ├── outerarmmid.png
    ├── outerarmmiin.png
    ├── outerarmout.png
    ├── outerarmouter.png
    ├── planetviolin.png
    ├── poscoef.png
    ├── negcoef.png
    ├── ROC.png
├── PDF
    ├──Obtain & Scrub.pdf
    ├──Modeling.pdf
    ├──Presentation.pdf
├── Obtain & Scrub.ipynb
└── Exoplanet Regression.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Exoplanets With Logistic Regression

Overview:

The Data:

The Process:

O - Obtain

S- Scrub

E - Explore

M - Model

N - iNterpret

Observations

Future Work

For More Informarion

Repository Strucure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Images		Images
PDF		PDF
__pycache__		__pycache__
.gitignore		.gitignore
Exoplanet Regression.ipynb		Exoplanet Regression.ipynb
Obtain & Scrub.ipynb		Obtain & Scrub.ipynb
README.md		README.md
scrubbed.csv		scrubbed.csv

Fennecnightingale/Exoplanet-Prediction

Folders and files

Latest commit

History

Repository files navigation

Predicting Exoplanets With Logistic Regression

Overview:

The Data:

The Process:

O - Obtain

S- Scrub

E - Explore

M - Model

N - iNterpret

Observations

Future Work

For More Informarion

Repository Strucure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages