Skip to content

Latest commit

 

History

History
94 lines (57 loc) · 4.48 KB

File metadata and controls

94 lines (57 loc) · 4.48 KB

[Interactive Data Science] - Creating Nontechnical Jupyter Notebooks For Exploratory Data Analysis and Machine Learning Modeling of Cancer Data

What is exploratory data analysis?

Exploratory data analysis is the analysis of datasets to understand their main characteristics. Oftentimes, this will be in the form of visual data. These analysis techniques that fall under EDA help to identify trends, problems, and potentially a hypothesis. In Python, we can use libraries such as pandas and seaborn to do this analysis. It is common that this is done before any Machine Learning is applied because we can understand the facts that revolve around the data in question. Some common graphical techniques are Box Plots, Histogram, and Scatter Plots.

What is an autoencoder?

An autoencoder is a neural network that compresses data Autoencoders can reduce data dimesions because it can learn to noise in data and they can detect anomalies in data. They consist of

What is Voila?

Voila turns Jupyter notebooks into standalone web applications. Voila allows us to connect to a dedicated Jupyter kernel which can execute callbacks to interactive widget changes. This is important because our notebooks need to have these widgets (sliders, check boxes, etc.) to be accessible for people without a coding background.

Here is a diagram explaining the execution model of Voila (credit):

Image of Voila

What's the problem?

Oftentimes, nontechnical scientists and biologists want to analyze large sets of data and benefit from the immense power of Data Science. However, although Jupyter Notebooks are a step in the right direction to helping nontechnical people to utilize these workflows, they still present problems for those who don't have the background or confidence in using them. For example, the Notebooks depend on running code cells that might not be understood or able to be modified by someone that doesn't understand what the code means. Through our solution, we can

Why do we want to solve it?

We believe that this problem is important because the faciliation of collaboration and understanding between biologists and data scientists can lead to new breakthroughs and enrich the research of those biologists. Data science and machine learning have made a mark on fields such as genomics and healthcare, and as the data that we deal with in these fields gets larger

How to use this app?

  • Clone this repository on Jupyter Hub.
  • Create a Conda environment using environment.yml
  • Open main.ipynb and render with Voila

Dependencies

  1. Packages defined in environment.yml
  2. Voila to render Jupyter Notebooks as interactive apps.

Development

Want to contribute? Great!

Todos

  • Write MORE Tests
  • Add Night Mode

People

Here are all of the awesome people who contributed to this project:

  • George Zaki
  • Robin Kramer
  • Garrett Stevens
  • Yogesh Dhande
  • Amulya Shastry
  • Siqi Sun
  • Ruben Cuevas

License

MIT