Skip to content

Commit

Permalink
Progress bar implementation (ydataai#345)
Browse files Browse the repository at this point in the history
* Progress bar implementation

- Feature as requested in ydataai#224
- Test for ydataai#282
- Many thanks @marco-cardoso for your initial implementation ydataai#225
- Display no progress bar for disabled modules (e.g. individual correlations).
- Update requirements, notebooks, docs, examples, linting

* Decouple notebooks and notebook tests. One test hangs on issue in nbval:
computationalmodelling/nbval#136

* Disable missing plots in minimal mode

* Create additional demo with Chicago employees data

* Compartmentalize column sorting in describe module
  • Loading branch information
sbrugman authored Feb 2, 2020
1 parent a25b9db commit 8dca684
Show file tree
Hide file tree
Showing 38 changed files with 138,073 additions and 42,064 deletions.
7 changes: 1 addition & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,6 @@ env:
- TEST=examples
- TEST=lint

jobs:
exclude:
- python: "3.5"
env: TEST=examples

install:
- pip install --upgrade pip six
- pip install -r requirements.txt
Expand All @@ -33,7 +28,7 @@ install:
script:
- if [ $TEST == 'unit' ]; then pytest --cov=. tests/unit/; fi
- if [ $TEST == 'issue' ]; then pytest --cov=. tests/issues/; fi
- if [ $TEST == 'examples' ]; then pytest --cov=. --nbval --sanitize-with tests/sanitize-notebook.cfg examples/; fi
- if [ $TEST == 'examples' ]; then pytest --cov=. --nbval tests/notebooks/; fi
- if [ $TEST == 'console' ]; then pandas_profiling -h; fi
- if [ $TEST == 'lint' ]; then pytest --black -m black src/; flake8 . --select=E9,F63,F7,F82 --show-source --statistics; fi

Expand Down
5 changes: 3 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ docs:
rmdir docs/pandas_profiling

test:
pytest --nbval --cov=./ --black --sanitize-with tests/sanitize-notebook.cfg tests/unit/
pytest --nbval --cov=./ --black --sanitize-with tests/sanitize-notebook.cfg tests/issues/
pytest --black tests/unit/
pytest --black tests/issues/
pytest --nbval tests/notebooks/
flake8 . --select=E9,F63,F7,F82 --show-source --statistics

install:
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ A set of options is available in order to adapt the report generated.

* `title` (`str`): Title for the report ('Pandas Profiling Report' by default).
* `pool_size` (`int`): Number of workers in thread pool. When set to zero, it is set to the number of CPUs available (0 by default).
* `progress_bar` (`bool`): If True, `pandas-profiling` will display a progress bar.

More settings can be found in the [default configuration file](https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_default.yaml), [minimal configuration file](https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_minimal.yaml) and [dark themed configuration file](https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_dark.yaml).

Expand Down
110 changes: 32 additions & 78 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ <h1 id="pandas-profiling">Pandas Profiling</h1>
<p><a href="https://travis-ci.com/pandas-profiling/pandas-profiling"><img alt="Build Status" src="https://travis-ci.com/pandas-profiling/pandas-profiling.svg?branch=master"></a>
<a href="https://codecov.io/gh/pandas-profiling/pandas-profiling"><img alt="Code Coverage" src="https://codecov.io/gh/pandas-profiling/pandas-profiling/branch/master/graph/badge.svg?token=gMptB4YUnF"></a>
<a href="https://github.com/pandas-profiling/pandas-profiling/releases"><img alt="Release Version" src="https://img.shields.io/github/release/pandas-profiling/pandas-profiling.svg"></a>
<a href="https://pypi.org/project/pandas-profiling/"><img alt="Python Version" src="https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue.svg"></a>
<a href="https://pypi.org/project/pandas-profiling/"><img alt="Python Version" src="https://img.shields.io/pypi/pyversions/pandas-profiling"></a>
<a href="https://github.com/python/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a></p>
<p>Generates profile reports from a pandas <code>DataFrame</code>.
The pandas <code>df.describe()</code> function is great but a little basic for serious exploratory data analysis.
Expand All @@ -41,6 +41,7 @@ <h1 id="pandas-profiling">Pandas Profiling</h1>
<li><strong>Histogram</strong></li>
<li><strong>Correlations</strong> highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices</li>
<li><strong>Missing values</strong> matrix, count, heatmap and dendrogram of missing values</li>
<li><strong>Text analysis</strong> learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.</li>
</ul>
<h2 id="announcements">Announcements</h2>
<p>With your help, we got approved for <a href="https://github.com/sponsors/sbrugman">GitHub Sponsors</a>!
Expand Down Expand Up @@ -71,6 +72,7 @@ <h2 id="examples">Examples</h2>
<li><a href="http://pandas-profiling.github.io/pandas-profiling/examples/vektis/vektis_report.html">Vektis</a> (Vektis Dutch Healthcare data)</li>
<li><a href="http://pandas-profiling.github.io/pandas-profiling/examples/website_inaccessibility/website_inaccessibility_report.html">Website Inaccessibility</a> (demonstrates the URL type)</li>
<li><a href="http://pandas-profiling.github.io/pandas-profiling/examples/colors/colors_report.html">Colors</a> (a simple colors dataset)</li>
<li><a href="http://pandas-profiling.github.io/pandas-profiling/examples/russian_vocabulary/russian_vocabulary.html">Russian Vocabulary</a> (demonstrates text analysis)</li>
</ul>
<h2 id="installation">Installation</h2>
<h3 id="using-pip">Using pip</h3>
Expand Down Expand Up @@ -108,7 +110,7 @@ <h3 id="getting-started">Getting started</h3>
)
</code></pre>
<p>To generate the report, run:</p>
<pre><code class="python">profile = ProfileReport(df, title='Pandas Profiling Report', style={'full_width':True})
<pre><code class="python">profile = ProfileReport(df, title='Pandas Profiling Report', html={'style':{'full_width':True}})
</code></pre>
<h4 id="jupyter-notebook">Jupyter Notebook</h4>
<p>We recommend generating reports interactively by using the Jupyter notebook.
Expand Down Expand Up @@ -150,6 +152,7 @@ <h3 id="advanced-usage">Advanced usage</h3>
<ul>
<li><code>title</code> (<code>str</code>): Title for the report ('Pandas Profiling Report' by default).</li>
<li><code>pool_size</code> (<code>int</code>): Number of workers in thread pool. When set to zero, it is set to the number of CPUs available (0 by default).</li>
<li><code>progress_bar</code> (<code>bool</code>): If True, <code>pandas-profiling</code> will display a progress bar.</li>
</ul>
<p>More settings can be found in the <a href="https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_default.yaml">default configuration file</a>, <a href="https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_minimal.yaml">minimal configuration file</a> and <a href="https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_dark.yaml">dark themed configuration file</a>.</p>
<p><strong>Example</strong></p>
Expand Down Expand Up @@ -261,13 +264,16 @@ <h2 id="dependencies">Dependencies</h2>

import pandas as pd
import numpy as np
from tqdm.auto import tqdm

from pandas_profiling.model.messages import MessageType
from pandas_profiling.version import __version__
from pandas_profiling.utils.dataframe import clean_column_names, rename_index
from pandas_profiling.utils.dataframe import rename_index
from pandas_profiling.utils.paths import get_config_default, get_config_minimal
from pandas_profiling.config import config
from pandas_profiling.controller import pandas_decorator
from pandas_profiling.model.describe import describe as describe_df
from pandas_profiling.model.messages import MessageType
from pandas_profiling.report import get_report_structure


Expand Down Expand Up @@ -305,12 +311,8 @@ <h2 id="dependencies">Dependencies</h2>
# Rename reserved column names
df = rename_index(df)

# Remove spaces and colons from column names
df = clean_column_names(df)

# Sort names according to config (asc, desc, no sort)
df = self.sort_column_names(df)
config[&#34;column_order&#34;] = df.columns.tolist()
# Ensure that columns are strings
df.columns = df.columns.astype(&#34;str&#34;)

# Get dataset statistics
description_set = describe_df(df)
Expand All @@ -319,26 +321,17 @@ <h2 id="dependencies">Dependencies</h2>
self.sample = self.get_sample(df)
self.title = config[&#34;title&#34;].get(str)
self.description_set = description_set

self.date_end = datetime.utcnow()
self.report = get_report_structure(
self.date_start, self.date_end, self.sample, description_set
)

def sort_column_names(self, df):
sort = config[&#34;sort&#34;].get(str)
if sys.version_info[1] &lt;= 5 and sort != &#34;None&#34;:
warnings.warn(&#34;Sorting is supported from Python 3.6+&#34;)
disable_progress_bar = not config[&#34;progress_bar&#34;].get(bool)

if sort in [&#34;asc&#34;, &#34;ascending&#34;]:
df = df.reindex(sorted(df.columns, key=lambda s: s.casefold()), axis=1)
elif sort in [&#34;desc&#34;, &#34;descending&#34;]:
df = df.reindex(
reversed(sorted(df.columns, key=lambda s: s.casefold())), axis=1
with tqdm(
total=1, desc=&#34;build report structure&#34;, disable=disable_progress_bar
) as pbar:
self.report = get_report_structure(
self.date_start, self.date_end, self.sample, description_set
)
elif sort != &#34;None&#34;:
raise ValueError(&#39;&#34;sort&#34; should be &#34;ascending&#34;, &#34;descending&#34; or None.&#39;)
return df
pbar.update(1)

def get_sample(self, df: pd.DataFrame) -&gt; dict:
sample = {}
Expand All @@ -360,7 +353,7 @@ <h2 id="dependencies">Dependencies</h2>
&#34;&#34;&#34;
return self.description_set

def get_rejected_variables() -&gt; list:
def get_rejected_variables(self) -&gt; list:
return [
message.column_name
for message in self.description_set[&#34;messages&#34;]
Expand Down Expand Up @@ -592,12 +585,8 @@ <h2 class="section-title" id="header-classes">Classes</h2>
# Rename reserved column names
df = rename_index(df)

# Remove spaces and colons from column names
df = clean_column_names(df)

# Sort names according to config (asc, desc, no sort)
df = self.sort_column_names(df)
config[&#34;column_order&#34;] = df.columns.tolist()
# Ensure that columns are strings
df.columns = df.columns.astype(&#34;str&#34;)

# Get dataset statistics
description_set = describe_df(df)
Expand All @@ -606,26 +595,17 @@ <h2 class="section-title" id="header-classes">Classes</h2>
self.sample = self.get_sample(df)
self.title = config[&#34;title&#34;].get(str)
self.description_set = description_set

self.date_end = datetime.utcnow()
self.report = get_report_structure(
self.date_start, self.date_end, self.sample, description_set
)

def sort_column_names(self, df):
sort = config[&#34;sort&#34;].get(str)
if sys.version_info[1] &lt;= 5 and sort != &#34;None&#34;:
warnings.warn(&#34;Sorting is supported from Python 3.6+&#34;)
disable_progress_bar = not config[&#34;progress_bar&#34;].get(bool)

if sort in [&#34;asc&#34;, &#34;ascending&#34;]:
df = df.reindex(sorted(df.columns, key=lambda s: s.casefold()), axis=1)
elif sort in [&#34;desc&#34;, &#34;descending&#34;]:
df = df.reindex(
reversed(sorted(df.columns, key=lambda s: s.casefold())), axis=1
with tqdm(
total=1, desc=&#34;build report structure&#34;, disable=disable_progress_bar
) as pbar:
self.report = get_report_structure(
self.date_start, self.date_end, self.sample, description_set
)
elif sort != &#34;None&#34;:
raise ValueError(&#39;&#34;sort&#34; should be &#34;ascending&#34;, &#34;descending&#34; or None.&#39;)
return df
pbar.update(1)

def get_sample(self, df: pd.DataFrame) -&gt; dict:
sample = {}
Expand All @@ -647,7 +627,7 @@ <h2 class="section-title" id="header-classes">Classes</h2>
&#34;&#34;&#34;
return self.description_set

def get_rejected_variables() -&gt; list:
def get_rejected_variables(self) -&gt; list:
return [
message.column_name
for message in self.description_set[&#34;messages&#34;]
Expand Down Expand Up @@ -823,15 +803,15 @@ <h2 id="returns">Returns</h2>
</details>
</dd>
<dt id="pandas_profiling.ProfileReport.get_rejected_variables"><code class="name flex">
<span>def <span class="ident">get_rejected_variables</span></span>(<span>)</span>
<span>def <span class="ident">get_rejected_variables</span></span>(<span>self)</span>
</code></dt>
<dd>
<section class="desc"></section>
<details class="source">
<summary>
<span>Expand source code</span>
</summary>
<pre><code class="python">def get_rejected_variables() -&gt; list:
<pre><code class="python">def get_rejected_variables(self) -&gt; list:
return [
message.column_name
for message in self.description_set[&#34;messages&#34;]
Expand Down Expand Up @@ -861,31 +841,6 @@ <h2 id="returns">Returns</h2>
return sample</code></pre>
</details>
</dd>
<dt id="pandas_profiling.ProfileReport.sort_column_names"><code class="name flex">
<span>def <span class="ident">sort_column_names</span></span>(<span>self, df)</span>
</code></dt>
<dd>
<section class="desc"></section>
<details class="source">
<summary>
<span>Expand source code</span>
</summary>
<pre><code class="python">def sort_column_names(self, df):
sort = config[&#34;sort&#34;].get(str)
if sys.version_info[1] &lt;= 5 and sort != &#34;None&#34;:
warnings.warn(&#34;Sorting is supported from Python 3.6+&#34;)

if sort in [&#34;asc&#34;, &#34;ascending&#34;]:
df = df.reindex(sorted(df.columns, key=lambda s: s.casefold()), axis=1)
elif sort in [&#34;desc&#34;, &#34;descending&#34;]:
df = df.reindex(
reversed(sorted(df.columns, key=lambda s: s.casefold())), axis=1
)
elif sort != &#34;None&#34;:
raise ValueError(&#39;&#34;sort&#34; should be &#34;ascending&#34;, &#34;descending&#34; or None.&#39;)
return df</code></pre>
</details>
</dd>
<dt id="pandas_profiling.ProfileReport.to_app"><code class="name flex">
<span>def <span class="ident">to_app</span></span>(<span>self)</span>
</code></dt>
Expand Down Expand Up @@ -1157,7 +1112,6 @@ <h4><code><a title="pandas_profiling.ProfileReport" href="#pandas_profiling.Prof
<li><code><a title="pandas_profiling.ProfileReport.get_rejected_variables" href="#pandas_profiling.ProfileReport.get_rejected_variables">get_rejected_variables</a></code></li>
<li><code><a title="pandas_profiling.ProfileReport.get_sample" href="#pandas_profiling.ProfileReport.get_sample">get_sample</a></code></li>
<li><code><a title="pandas_profiling.ProfileReport.html" href="#pandas_profiling.ProfileReport.html">html</a></code></li>
<li><code><a title="pandas_profiling.ProfileReport.sort_column_names" href="#pandas_profiling.ProfileReport.sort_column_names">sort_column_names</a></code></li>
<li><code><a title="pandas_profiling.ProfileReport.to_app" href="#pandas_profiling.ProfileReport.to_app">to_app</a></code></li>
<li><code><a title="pandas_profiling.ProfileReport.to_file" href="#pandas_profiling.ProfileReport.to_file">to_file</a></code></li>
<li><code><a title="pandas_profiling.ProfileReport.to_html" href="#pandas_profiling.ProfileReport.to_html">to_html</a></code></li>
Expand All @@ -1177,4 +1131,4 @@ <h4><code><a title="pandas_profiling.ProfileReport" href="#pandas_profiling.Prof
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad()</script>
</body>
</html>
</html>
Loading

0 comments on commit 8dca684

Please sign in to comment.