[Interactive Data Science] - Creating Nontechnical Jupyter Notebooks For Exploratory Data Analysis and Machine Learning Modeling of Cancer Data

What is exploratory data analysis?

Exploratory data analysis is the analysis of datasets to understand their main characteristics. Oftentimes, this will be in the form of visual data. These analysis techniques that fall under EDA help to identify trends, problems, and potentially a hypothesis. In Python, we can use libraries such as pandas and seaborn to do this analysis. It is common that this is done before any Machine Learning is applied because we can understand the facts that revolve around the data in question. Some common graphical techniques are Box Plots, Histogram, and Scatter Plots.

What is an autoencoder?

An autoencoder is a neural network that compresses data Autoencoders can reduce data dimesions because it can learn to noise in data and they can detect anomalies in data. They consist of

What is Voila?

Voila turns Jupyter notebooks into standalone web applications. Voila allows us to connect to a dedicated Jupyter kernel which can execute callbacks to interactive widget changes. This is important because our notebooks need to have these widgets (sliders, check boxes, etc.) to be accessible for people without a coding background.

Here is a diagram explaining the execution model of Voila (credit):

What's the problem?

Oftentimes, nontechnical scientists and biologists want to analyze large sets of data and benefit from the immense power of Data Science. However, although Jupyter Notebooks are a step in the right direction to helping nontechnical people to utilize these workflows, they still present problems for those who don't have the background or confidence in using them. For example, the Notebooks depend on running code cells that might not be understood or able to be modified by someone that doesn't understand what the code means. Through our solution, we can

Why do we want to solve it?

We believe that this problem is important because the faciliation of collaboration and understanding between biologists and data scientists can lead to new breakthroughs and enrich the research of those biologists. Data science and machine learning have made a mark on fields such as genomics and healthcare, and as the data that we deal with in these fields gets larger

How to use this app?

Clone this repository on Jupyter Hub.
Create a Conda environment using environment.yml
Open main.ipynb and render with Voila

Dependencies

Packages defined in environment.yml
Voila to render Jupyter Notebooks as interactive apps.

Development

Want to contribute? Great!

Todos

Write MORE Tests
Add Night Mode

People

Here are all of the awesome people who contributed to this project:

George Zaki
Robin Kramer
Garrett Stevens
Yogesh Dhande
Amulya Shastry
Siqi Sun
Ruben Cuevas

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

[Interactive Data Science] - Creating Nontechnical Jupyter Notebooks For Exploratory Data Analysis and Machine Learning Modeling of Cancer Data

What is exploratory data analysis?

What is an autoencoder?

What is Voila?

What's the problem?

Why do we want to solve it?

How to use this app?

Dependencies

Development

Todos

People

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

[Interactive Data Science] - Creating Nontechnical Jupyter Notebooks For Exploratory Data Analysis and Machine Learning Modeling of Cancer Data

What is exploratory data analysis?

What is an autoencoder?

What is Voila?

What's the problem?

Why do we want to solve it?

How to use this app?

Dependencies

Development

Todos

People

License