-
Notifications
You must be signed in to change notification settings - Fork 663
Google Season of Docs
MDAnalysis is participating in Google's Google Season of Docs 2019. For the "big picture" overview see our GSoD blog post and a separate page for the other GSoD 2019 application materials. This page lists our project ideas for 2019.
If you want to work with us, please read the materials here and get in touch with us on the developer mailinglist. We're happy to answer questions and develop a project idea with you. Oliver (@orbeckst) and Richard (@richardjgowers) will be the mentors.
See also our GSoD FAQ for commonly asked questions for working with us.
MDAnalysis is a Python library that provides an abstract and object-oriented interface to data from particle-based simulations (primarily molecular dynamics simulations), which are widely used for simulating diverse systems such as the interaction of drugs with biomolecules or new materials. MDAnalysis is widely used in the scientific community and is written by scientists for scientists. Feedback from our users indicates that they like using MDAnalysis but wished that the documentation were easier to read and had more examples. The docs for scikit-learn and the PyTorch tutorials are generally cited as excellent examples of documentation and, taking these as examples, we would like to improve our documentation to make it more accessible and more immediately useful for new users.
Our goal for GSoD is to make it easy for new users to analyze their data. We want to accomplish this goal by
- providing a quick introduction to the essentials of using MDAnalysis through (1) overview documentation and (2) tutorials that address common areas of interest of users;
- connecting the introductory docs to the API docs so that users can easily learn and explore by themselves.
From our two priority areas we propose four projects.
The column effort is a rough estimate of what percentage of one GSoD could be spent on this project. Technical writers can combine a lower effort project (as an introductory project) with one of the high-effort projects. Projects with effort ranges are modular in that we can work on different independent sub-components and thus tailor the effort.
project | name | effort | description | mentors |
---|---|---|---|---|
1 | User story based documentation | 75% | Create documentation (starting from existing MDAnalysis docs) addressing common well-known use cases of the library. The structure should be at a higher level than the existing module-level default documentation and similar to the structure used for libraries such as scikit-learn and pytorch. | @orbeckst @richardjgowers |
2 | Introduction to analyzing Molecular Dynamics trajectories with MDAnalysis | 25% | Create documentation centered around the 2016 SciPy talk by Beckstein (video, slides) with notebooks illustrating the fundamentals of molecular dynamics and how MDAnalysis facilitates analyzing such simulations. | @orbeckst @richardjgowers |
3 | Quick Start Guide | 25% | Create a guide for getting started with MDAnalysis within a Jupyter notebook in less than 10 minutes. Includes installation, data loading, and sample real-world use case. Base on MDAnalysis Docs: Overview and MDAnalysis Tutorial. | @orbeckst @richardjgowers |
4 | Beginner, Intermediate, and Advanced Tutorials | 50%–100% | Reorganize the existing documentation into Beginner, Intermediate, and Advanced material that build upon each other. The tutorials should progress from (beginner) basic trajectory analysis to (intermediate) working with topology information to (advanced) System building (see the selection of topics). The material should include code references in GitHub, static or live Jupyter notebooks, and illustrations to facilitate learning and understanding. | @orbeckst @richardjgowers |
We identified two broad areas for improvement from which the GSoD 2019 objectives are drawn. The work in these areas is more extensive than could conceivably be covered in a single GSoD but we want to provide the information below as an additional source so that both technical writers and mentors can build on it when necessary.
We want to restructure our docs for user-friendliness issue #1175 and refactor docs away from how the source code is organized into how the user interacts with the code (started in PR #1827). We envision a split into three major blocks:
- introduction with examples (more like a tutorial) and explanation of the underlying principles and guiding concepts (see the 2016 MDAnalysis paper (doi:10.25080/majora-629e541a-00e), SciPy 2016 talk and the slides in the presentation scipy-MDAnalysis-Beckstein.pdf, which all outline the fundamentals)
- API docs (similar to the majority of the current docs at https://www.mdanalysis.org/docs/)
- developer docs (notes for developers, can be technical/arcane – e.g., some material from the wiki, details of the fundamental data structures, notes on file formats)
The current documentation is part of the code base and consists of:
- Python doc strings that are directly embedded in the code and associated with functions, classes, methods, attributes, and constants. Many modules also directly contain overviews and examples.
- Pages in the
doc/sphinx/source
directory, which consists of documents that combine multiple modules or give more general overviews. The documentation is written in restructured text and automatically processed with sphinx. As part of the continuous integration process, it is tested that these docs build correctly. Docs from the latest build are automatically and immediately published in HTML format as the "development docs" at https://www.mdanalysis.org/mdanalysis/.
We would like to maintain the ability to automatically build the docs and continue working in the sphinx framework outlined above. A technical writer would be trained in working with our current development process where changes to this documentation would be handled like other changes to the code base. This means that the writer would use git for version control and submit pull requests to the GitHub repository. As part of the standard review process, the mentors (and other developers and community members) would give rapid feedback on the contribution of the writer. Once a PR is approved, it will be merged and the docs will be autogenerated and immediately available.
We have one "official" introductory tutorial and various other tutorials but it is initially confusing to new users what they should look at and it is too long. We need to provide a better "road map" for new users and clearly lay out tutorials for different levels and with clear learning goals.
We need to split the current MDAnalysis Tutorial into multiple self-contained tutorials and sort them by level (introductory, intermediate, advanced). The tutorials can and should build on each other. There should be a top level entry point that gives an overview over the tutorials. An initial outline would contain the following (not all content exists yet, especially at intermediate/advanced level):
- Introductory level
- Installation: installing MDAnalysis and testing trajectories (MDAnalysisTests for simple examples, MDAnalysisData for advanced examples)
-
Basic trajectory analysis: Loading data into a
Universe
, selecting atoms withUniverse.select_atoms()
as anAtomGroup
, iterating through a trajectory, getting positions fromAtomGroup.positions
, and using numpy operations to calculate observables of interest from the positions. -
Using analysis tools in
MDAnalysis.analysis
: Performing common analysis tasks such as RMSD calculation and fitting, hydrogen bond analysis, or dihedral analysis using the common analysis classes. -
Working with AtomGroups: introduction to some often used methods of
AtomGroup
and how to work with multiple AtomGroups; slicing and fancy indexing ofAtomGroup
. -
Writing trajectories: difference between "trajectories" and "single frame" file formats; standard code pattern for writing trajectories or single frames; writing single frames directly with
AtomGroup.write()
- Intermediate level
- Selections (requires Basic trajectory analysis and Working with AtomGroups): in-depth tutorial on the selection language; dynamically updating selections
-
Working with Groups (requires Working with AtomGroups): The "container" hierarchy (
Universe
>Segment
>Residue
>Atom
) and the corresponding groupsSegmentGroup
,ResidueGroup
,AtomGroup
: commonalities and differences, aggregating methods. How to work withfragments
ormolecules
. - Writing selections: outputting selections for other codes
- Working with topology information: introduction to the topology system; how to work with bonds; identify bonded atoms; working with angles and dihedrals; selections by type
- Applying on-the-fly transformations: A unique capability of MDAnalysis are trajectory transformations that change the trajectory while it is being read and so avoid generating intermediate files that are needed with other analysis packages. This tutorial would be based on the notebook on-the-fly-transformations.ipynb.
- In-memory trajectories: how to use the MemoryReader to speed up analysis or generate temporary reduced system trajectories for analysis (see, e.g., Workshop notebook trajectory_magic.ipynb)
- Visualization in notebooks with NGLView: how to use nglview with MDAnalysis (see Workshop notebook Visualisation_with_NGLView.ipynb and binder notebook nglview_drawframes.ipynb)
- Advanced level
- System building (requires Working with topology information): how to add atoms or bonds or create simple topologies from scratch; generating initial coordinates
- Extending file reading with own code (requires System building): write a Reader for once own custom file format and dynamically add it to MDAnalysis
- Write your own analysis class: shows how to leverage the MDAnalysis.analysis.AnalysisBase class to create feature-full custom analysis tools.
For this and other documents we want to start adding example Jupyter notebooks (such as the first few example notebooks) to our sphinx-based restructured text documentation via the nbsphinx extension.
We also want to include more diagrams, pictures, and graphs to make clearer what the relationships between different parts of the code are and what output might look like.
At the moment, the primary sources of information for users are
- the package documentation https://www.mdanalysis.org/docs
- the basic tutorial
- the most recent scientific article on MDAnalysis: R. J. Gowers et al. MDAnalysis: A Python package for the rapid analysis of molecular dynamics simulations. In S. Benthall and S. Rostrup, editors, Proceedings of the 15th Python in Science Conference, pages 98-105, Austin, TX, 2016. SciPy, doi:10.25080/majora-629e541a-00e.
- example Jupyter notebooks (also available as live binder notebooks)
- Workshop materials
- two videos from conference presentations