Book by: Claus O. Wilke
Source: dataviz
I recreated the visualization originally coded in R, from the book "Fundamentals of Data Visualization" by Claus O. Wilke, using Python and brief some important information.
Access the full book for more details.
Table of Contents:
To run the code, several packages stated in the requirements.sh
are required.
Install Miniconda3 or Anaconda before running the following command:
chmod 755 requirements.sh
./requirements.sh
After installation, a virtual environment called dataviz
will be created and activated. The script requirements.sh also handled the reported issues in part 4.
All data required for visualization practice are acquired from dviz.supp of Clause O.Wilke.
For quicker load into Python, I wll covert these rda
file to tsv
format using the script data/rda2tsv.py
and save all data in folder data/resources
.
Data visualization in Python like in R’s ggplot2
ggplot
:
- ggplot from ŷhat
- This package has not been updated seen 2016. When installed, there raised many issues related to deprecated functions in old version of pandas.
seaborn
:
matplotlib
:
jpython widgets
:
holoviz
: Github
ggplot
:
- ggplot from ŷhat
- This package has not been updated seen 2016. When installed, there raised many issues related to deprecated functions in old version of pandas.
seaborn
:
matplotlib
:
jpython widgets
:holoviz
: Github.
├── 1.From_data_to_viz
│ └── 2.Mapping_data_onto_aesthetics.ipynb
├── 2.Principles_of_figure_design
├── 3.Miscellaneous_topics
├── README.md
├── data
│ └── 2_daily_temperature_NOAA.csv
├── requirements.txt
└── src
└── utils.py
(1) Install rpy2
on MacOSx
Using pip install rpy2
on MacOSx will turn out this error:
ERROR: Failed building wheel for rpy2
Workaround: https://stackoverflow.com/a/52362473/11524628
env CC=/usr/local/Cellar/gcc/X.x.x/bin/gcc-X pip install rpy2
X.x.x is the latest version of gcc in MacOSx
(2) ggplot
All problems related to ggplot
can be fixed by downgrading the version of pandas:
pip install pandas==0.19.2
To keep using the new version of pandas, the following are workarounds.
System information:
python: 3.7
pandas: 1.0.3
ggplot: 0.11.5
For current version, there is an issue when importing ggplot
:
AttributeError: module 'pandas' has no attribute 'tslib'
Workaround: yhat/ggpy#662
AttributeError: 'DataFrame' object has no attribute 'sort'
Workaround: yhat/ggpy#612
AttributeError: module 'numpy' has no attribute 'ar'