Maximum of 60 characters, including spaces.
Total budget amount requested in USD, including indirect costs; this number should be between $100,000 USD and $400,000 USD total costs over a two-year period. Enter whole numbers only (no dollar signs, commas, or cents).
Provide a short summary of the work being proposed (maximum of 500 words).
Matplotlib is the foundational data visualization library for the Scientific Python Ecosystem, with over a million users, including researchers in bio-medical imaging, microscopy, and genomics. Matplotlib is used by researchers across the entire scientific workflow from initial data exploration and visualization, to evaluating the output of AI/ML models, to publishing finalized figures.
For the past 20 years Matplotlib has been maintained by a vibrant, primarily volunteer, community. However we have grown too big and widely used to continue on solely volunteer effort. For the past 42 months CZI EOSS support for developers has had a positive effect on the project by complementing and enabling, not replacing, volunteer work. We propose to continue this effort.
The primary component of the proposed work is the continued maintenance of the library and its community. Maintenance covers a wide range of tasks including triaging and fixing bugs, reviewing Pull Requests, tagging and building releases, keeping the continuous integration services running, and mentoring new contributors. These tasks are essential for the project's health; though each individually is small, they are frequently time critical and sometimes tedious. It is unfair and impractical to rely solely on volunteers to accomplish such tasks.
A major improvement enabled by supported developers has been our transition to a regular release schedule. Currently, feature releases are now made at a regular 2-per-year schedule, typically with 3 bugfix releases between them. This regularity, roughly doubling our previous average rate, allows downstream projects and users to rapidly benefit from ongoing improvements in Matplotlib.
In addition to on-going and routine maintenance, there are substantial but incremental enhancements to Matplotlib that require long blocks of dedicated effort to implement. Without supported developers, such projects can drag out for months to years or stall altogether. Examples include fixing long-standing rendering and performance issues, overhauling build systems to match the changing Python ecosystem, homogenizing and smoothing the API, and new user-facing functionality. Projects to be pursued with the funding requested here will be selected in consultation with downstream biomedical libraries.
Finally, supported developers improve the management of the project. We now have the time and bandwidth to make strategic decisions about the direction of the project to ensure the long term health and viability of Matplotlib. An important part of project management is community management: fostering, diversifying, and growing our community. Supported developers are able to perform outreach: attending conferences, mentoring sprints, or teaching tutorials. We must ensure that our community is open and welcoming to everyone who wants to join, with opportunities to contribute in a spectrum of roles as their interests and skills develop.
We propose to continue full support (1 FTE) for Elliott Sales de Andrade and partial support (.15 FTE) for Thomas Caswell. The effort will be split with approximately .7 FTE for maintenance, .25 FTE for medium sized enhancements, and .2 FTE for community and project management.
Describe the expected value of the proposed work to the biomedical research community (maximum of 250 words).
Scientific Python libraries in biomedical and other fields rely on Matplotlib for visualization as either a direct dependency or in their documentation and standard user process. These include other general purpose tools, such as scikit-learn, networkx, pandas, xarray, and scikit-image that are used by biomedical researchers, and biomedical-specific projects such CellProfiler, scanpy, starfish, nipy, MNE-python, DeepLabCut. In total CZI has funded at least 32 proposals that depended on Matplotlib, including all of the projects listed here.
The proposed work will help ensure the health and continuing growth of Matplotlib as a foundational component.
Indicate the number of software projects involved in your proposal (up to five). Complete the table with the following information for each software project.
- Main code repository (e.g., GitHub URL), enter in format https://www.example.com. https://github.com/matplotlib/matplotlib
- Homepage URL (if none, re-enter the main code repository URL), enter in format https://www.example.com. https://matplotlib.org
Briefly describe the other software tools (either proprietary or open source) that the audience for this proposal primarily uses. How do the software project(s) in this proposal compare to these other tools in terms of user base size, usage, and maturity? How do existing tools and the project(s) in this proposal interact? (maximum of 250 words)
Matplotlib is the most widely used and de-facto standard visualization library in Python (over 1M monthly users) and is a mature library (20+ years old) with over 1,500 individual code contributors. In addition to being directly used by scientists, it is a core dependency of libraries and applications that implement domain-specific visualizations. To aid users in discovering these extensions we maintain a lightly curated list of third-party extensions [1] and have been assigned a Trove classifier on PyPI [2] that allows downstream developers to self-identify as Matplotlib extensions.
Given the centrality of visualization to data analysis across all domains, no single tool can satisfy all needs. There are a range of tools not built on Matplotlib (see [3] for a long but not exhaustive list) that target use cases that Matplotlib is not well suited for. Outside of the Python ecosystem, a wide range of biomedical visualization libraries and applications exist in R or Java. Proprietary solutions such as MATLAB or Tableau may also be used in various scientific fields.
Matplotib's ubiquity and maturity provide users with a stable and easily understood tool on which to build both bespoke and reproducible visualizations. Its availability in the Python ecosystem allows for direct integration with data processing and modelling tools in a familiar environment.
[1] https://matplotlib.org/mpl-third-party/ [2] https://pypi.org/search/?q=&o=&c=Framework+%3A%3A+Matplotlib [3] https://pyviz.org/tools.html
Choose the two categories that best describe the software project(s) audience:
- Bioinformatics
- Single-cell biology
- Structural biology
- Clinical research
- Genomics
- Neuroscience
- Infectious disease
- Imaging
- Data management and workflows
- Machine learning and data analysis
- Visualization
- Have you ever received grant funding from CZI, the Wellcome Trust, or the Kavli Foundation? Select Yes or No. YES
- Please check the box(es) of the organization(s) from which you received funding. CZI
- Did you previously apply for funding under the CZI EOSS program? Select Yes
or No. YES
- If yes, have you previously received funding under the CZI EOSS program? If yes, please provide your application ID in the format EOSS1-0000000001. EOSS-0000000100, EOSS3-0000000149, EOSS4-0000000187
- Deep Universal Probabilistic Programming - used in examples and tutorials
- SciPy: Fundamental Tools for Biomedical Research - used post-analysis
- scikit-learn - used to visualize model results and effectiveness
- Digital Biomarker Discovery Pipeline - used to visualize digital biomarkers (sleep tracking, blood sugar monitoring, etc.)
- DeepLabCut - visualization of pose estimation
- FastSurfer - a fast and accurate deep-learning based neuroimaging pipeline - visualization of predictions and confusion matrix
- bcbio-nextgen - Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis - used for validation plots
- Scanpy - Single-Cell Analysis in Python - used for visualizing results of clustering, marker genes, etc.
- phasorpy - analysis of fluorescence lifetime and hyperspectral images using the phasor approach - based on mpl
- nilearn - Machine learning for NeuroImaging in Python - plots glass brain, clustering results, object recognition, etc.
- QIIME 2 - next-generation microbiome bioinformatics platform that is
extensible, free, open source, and community developed
- used in various plugins for their needs
- NumPy - used for doc examples
- Pandas - high-performance, easy-to-use data structures - used for various plots
- scikit-image - Image processing in Python - used for doc examples
- Xarray - N-D labeled arrays and datasets in Python - used for plotting of datasets
- PyMC3 - Bayesian Modeling in Python - used for various statistical and in-algorithm plots
- ArviZ - Exploratory analysis of Bayesian models - used as default plotting backend
- MDAnalysis - An object-oriented toolkit to analyze molecular dynamics trajectories - used for examples and visualization
- NetworkX - Network Analysis in Python - used to show network graphs
- Nibabel - Access a multitude of neuroimaging data formats - used for orthographic slice viewer
- Orange Data Mining - used for clusters and regressions
- COMBINE lab - COMputational BIology and Network Evolution lab - used in various analyses
- pyOpenMS - mass spectrometry, specifically for the analysis of proteomics and metabolomics data - used for domain-specific plots
- MNE-Python - exploring, visualizing, and analyzing human neurophysiological data - used for siganls, topographies, images, and other domain-specific plots
- MR sequence diagrams in Python
- A package for simulating polysome profiles from Ribo-Seq data
- MSA visualization python package for sequence analysis
- Use interactive matplotlib to label images for classification
- A tool for plotting CAFE5 gene family expansion/contraction result
- A complete processing pipeline for anatomical neuronal tracing
- Context specific and dynamic gene regulatory network reconstruction and analysis
- Integrative Genomics Viewer