lint and automatic lint (#413)

Fix #344. To lint codes, this PR uses a popular tool, [pre-commit](https://pre-commit.com/). `.pre-commit-config.yaml` shows all hooks used, including [black](https://github.com/psf/black) and some common fixes. It can add a git hook to `git commit`. Thus, everything committed can be already formatted. In addition, there is also a [CI](https://pre-commit.ci/) to fix PRs. This way, we can ensure that everything new will also be formatted. This PR introduces a huge change history to almost all files, but it's a necessary step to start linting our codes. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
deepmodeling · Jan 25, 2023 · ffa52c5 · ffa52c5
1 parent 7b0f44b
commit ffa52c5
Show file tree

Hide file tree

Showing 166 changed files with 6,957 additions and 4,954 deletions.
diff --git a/.github/ISSUE_TEMPLATE/request-for-help.md b/.github/ISSUE_TEMPLATE/request-for-help.md
@@ -13,7 +13,7 @@ Before asking questions, you can
 search the previous issues or discussions
 check the [README](https://github.com/deepmodeling/dpdata/#readme).
 
-Please **do not** post requests for help (e.g. with installing or using dpdata) here. 
+Please **do not** post requests for help (e.g. with installing or using dpdata) here.
 Instead go to [discussions](https://github.com/deepmodeling/dpdata/discussions).
 
 This issue tracker is for tracking dpdata development related issues only.

diff --git a/.github/workflows/pub-pypi.yml b/.github/workflows/pub-pypi.yml
@@ -36,4 +36,3 @@ jobs:
       uses: pypa/gh-action-pypi-publish@master
       with:
         password: ${{ secrets.PYPI_API_TOKEN }}
-
diff --git a/.github/workflows/test_import.yml b/.github/workflows/test_import.yml
@@ -15,4 +15,3 @@ jobs:
         architecture: 'x64'
     - run: python -m pip install .
     - run: python -c 'import dpdata'
-
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,25 @@
+# See https://pre-commit.com for more information
+# See https://pre-commit.com/hooks.html for more hooks
+repos:
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.4.0
+    hooks:
+    # there are many log files in tests
+    # TODO: seperate py files and log files
+    -   id: trailing-whitespace
+        exclude: "^tests/.*$"
+    -   id: end-of-file-fixer
+        exclude: "^tests/.*$"
+    -   id: check-yaml
+    -   id: check-json
+    -   id: check-added-large-files
+    -   id: check-merge-conflict
+    -   id: check-symlinks
+    -   id: check-toml
+# Python
+-   repo: https://github.com/psf/black
+    rev: 22.12.0
+    hooks:
+    -   id: black-jupyter
+ci:
+  autoupdate_branch: devel
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@ dpdata only works with python 3.7 or above.
 
 
 # Installation
-One can download the source code of dpdata by 
+One can download the source code of dpdata by
 ```bash
 git clone https://github.com/deepmodeling/dpdata.git dpdata
 ```
@@ -25,10 +25,10 @@ This section gives some examples on how dpdata works. Firstly one needs to impor
 ```python
 import dpdata
 ```
-The typicall workflow of `dpdata` is 
+The typicall workflow of `dpdata` is
 
 1. Load data from vasp or lammps or deepmd-kit data files.
-2. Manipulate data 
+2. Manipulate data
 3. Dump data to in a desired format
 
 
@@ -41,9 +41,9 @@ or let dpdata infer the format (`vasp/poscar`) of the file from the file name ex
 d_poscar = dpdata.System('my.POSCAR')
 ```
 The number of atoms, atom types, coordinates are loaded from the `POSCAR` and stored to a data `System` called `d_poscar`.
-A data `System` (a concept used by [deepmd-kit](https://github.com/deepmodeling/deepmd-kit)) contains frames that has the same number of atoms of the same type. The order of the atoms should be consistent among the frames in one `System`. 
+A data `System` (a concept used by [deepmd-kit](https://github.com/deepmodeling/deepmd-kit)) contains frames that has the same number of atoms of the same type. The order of the atoms should be consistent among the frames in one `System`.
 It is noted that `POSCAR` only contains one frame.
-If the multiple frames stored in, for example, a `OUTCAR` is wanted, 
+If the multiple frames stored in, for example, a `OUTCAR` is wanted,
 ```python
 d_outcar = dpdata.LabeledSystem('OUTCAR')
 ```
@@ -53,9 +53,9 @@ The `System` or `LabeledSystem` can be constructed from the following file forma
 
 | Software| format | multi frames | labeled | class	    | format key    |
 | ------- | :---   | :---:        | :---:   | :---          | :---          |
-| vasp	  | poscar | False        | False   | System	    | 'vasp/poscar' | 
-| vasp    | outcar | True         | True    | LabeledSystem | 'vasp/outcar' |	
-| vasp    | xml    | True         | True    | LabeledSystem | 'vasp/xml'    |	
+| vasp	  | poscar | False        | False   | System	    | 'vasp/poscar' |
+| vasp    | outcar | True         | True    | LabeledSystem | 'vasp/outcar' |
+| vasp    | xml    | True         | True    | LabeledSystem | 'vasp/xml'    |
 | lammps  | lmp    | False        | False   | System        | 'lammps/lmp'  |
 | lammps  | dump   | True         | False   | System        | 'lammps/dump' |
 | deepmd  | raw    | True         | False   | System	    | 'deepmd/raw'  |
@@ -89,7 +89,7 @@ The `System` or `LabeledSystem` can be constructed from the following file forma
 
 The Class `dpdata.MultiSystems`  can read data  from a dir which may contains many files of different systems, or from single xyz file which contains different systems.
 
-Use `dpdata.MultiSystems.from_dir` to read from a  directory, `dpdata.MultiSystems` will walk in the directory 
+Use `dpdata.MultiSystems.from_dir` to read from a  directory, `dpdata.MultiSystems` will walk in the directory
 Recursively  and  find all file with specific file_name. Supports all the file formats that `dpdata.LabeledSystem` supports.
 
 Use  `dpdata.MultiSystems.from_file` to read from single file. Single-file support is available for the `quip/gap/xyz` and `ase/structure` formats.
@@ -148,7 +148,7 @@ coords = d_outcar['coords']
 ```
 Available properties are (nframe: number of frames in the system, natoms: total number of atoms in the system)
 
-| key		|  type		| dimension		| are labels	| description 
+| key		|  type		| dimension		| are labels	| description
 | ---		| ---		| ---			| ---		| ---
 | 'atom_names'	| list of str	| ntypes		| False		| The name of each atom type
 | 'atom_numbs'	| list of int	| ntypes		| False		| The number of atoms of each atom type
@@ -186,7 +186,7 @@ dpdata.LabeledSystem('OUTCAR').sub_system([0,-1]).to('deepmd/raw', 'dpmd_raw')
 by which only the first and last frames are dumped to `dpmd_raw`.
 
 
-## replicate 
+## replicate
 dpdata will create a super cell of the current atom configuration.
 ```python
 dpdata.System('./POSCAR').replicate((1,2,3,) )
@@ -197,9 +197,9 @@ tuple(1,2,3) means don't copy atom configuration in x direction, make 2 copys in
 ## perturb
 By the following example, each frame of the original system (`dpdata.System('./POSCAR')`) is perturbed to generate three new frames. For each frame, the cell is perturbed by 5% and the atom positions are perturbed by 0.6 Angstrom. `atom_pert_style` indicates that the perturbation to the atom positions is subject to normal distribution. Other available options to `atom_pert_style` are`uniform` (uniform in a ball), and `const` (uniform on a sphere).
 ```python
-perturbed_system = dpdata.System('./POSCAR').perturb(pert_num=3, 
-    cell_pert_fraction=0.05, 
-    atom_pert_distance=0.6, 
+perturbed_system = dpdata.System('./POSCAR').perturb(pert_num=3,
+    cell_pert_fraction=0.05,
+    atom_pert_distance=0.6,
     atom_pert_style='normal')
 print(perturbed_system.data)
 ```
@@ -213,7 +213,7 @@ s.to_vasp_poscar('POSCAR.P42nmc.replace')
 ```
 
 # BondOrderSystem
-A new class `BondOrderSystem` which inherits from class `System` is introduced in dpdata. This new class contains information of chemical bonds and formal charges (stored in `BondOrderSystem.data['bonds']`, `BondOrderSystem.data['formal_charges']`). Now BondOrderSystem can only read from .mol/.sdf formats, because of its dependency on rdkit (which means rdkit must be installed if you want to use this function). Other formats, such as pdb, must be converted to .mol/.sdf format (maybe with software like open babel). 
+A new class `BondOrderSystem` which inherits from class `System` is introduced in dpdata. This new class contains information of chemical bonds and formal charges (stored in `BondOrderSystem.data['bonds']`, `BondOrderSystem.data['formal_charges']`). Now BondOrderSystem can only read from .mol/.sdf formats, because of its dependency on rdkit (which means rdkit must be installed if you want to use this function). Other formats, such as pdb, must be converted to .mol/.sdf format (maybe with software like open babel).
 ```python
 import dpdata
 system_1 = dpdata.BondOrderSystem("tests/bond_order/CH3OH.mol", fmt="mol") # read from .mol file
@@ -242,7 +242,7 @@ According to our test, our sanitization procedure can successfully read 4852 sma
 
 ```python
 import dpdata
-    
+
 for sdf_file in glob.glob("bond_order/refined-set-ligands/obabel/*sdf"):
     syst = dpdata.BondOrderSystem(sdf_file, sanitize_level='high', verbose=False)
 ```

diff --git a/docs/Makefile b/docs/Makefile
@@ -17,4 +17,4 @@ help:
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
-	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/conf.py b/docs/conf.py
@@ -16,19 +16,20 @@
 import sys
 import subprocess as sp
 from datetime import date
-sys.path.insert(0, os.path.abspath('..'))
+
+sys.path.insert(0, os.path.abspath(".."))
 
 
 # -- Project information -----------------------------------------------------
 
-project = 'dpdata'
-copyright = '2019-%d, DeepModeling ' % date.today().year
-author = 'Han Wang'
+project = "dpdata"
+copyright = "2019-%d, DeepModeling " % date.today().year
+author = "Han Wang"
 
 # The short X.Y version
-version = '0.0'
+version = "0.0"
 # The full version, including alpha/beta/rc tags
-release = '0.0.0-rc'
+release = "0.0.0-rc"
 
 
 # -- General configuration ---------------------------------------------------
@@ -41,27 +42,27 @@
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
 extensions = [
-    'deepmodeling_sphinx',
-    'sphinx_rtd_theme',
-    'sphinx.ext.mathjax',
-    'sphinx.ext.viewcode',
-    'sphinx.ext.intersphinx',
-    'numpydoc',
-    'm2r2',
-    'sphinxarg.ext',
+    "deepmodeling_sphinx",
+    "sphinx_rtd_theme",
+    "sphinx.ext.mathjax",
+    "sphinx.ext.viewcode",
+    "sphinx.ext.intersphinx",
+    "numpydoc",
+    "m2r2",
+    "sphinxarg.ext",
 ]
 
 # Add any paths that contain templates here, relative to this directory.
-templates_path = ['_templates']
+templates_path = ["_templates"]
 
 # The suffix(es) of source filenames.
 # You can specify multiple suffix as a list of string:
 #
 # source_suffix = ['.rst', '.md']
-source_suffix = ['.rst', '.md']
+source_suffix = [".rst", ".md"]
 
 # The master toctree document.
-master_doc = 'index'
+master_doc = "index"
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
@@ -73,18 +74,18 @@
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
 # This pattern also affects html_static_path and html_extra_path .
-exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
 
 # The name of the Pygments (syntax highlighting) style to use.
-pygments_style = 'sphinx'
+pygments_style = "sphinx"
 
 
 # -- Options for HTML output -------------------------------------------------
 
 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
 #
-html_theme = 'sphinx_rtd_theme'
+html_theme = "sphinx_rtd_theme"
 
 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
@@ -95,7 +96,7 @@
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-#html_static_path = ['_static']
+# html_static_path = ['_static']
 
 # Custom sidebar templates, must be a dictionary that maps document names
 # to template names.
@@ -111,7 +112,7 @@
 # -- Options for HTMLHelp output ---------------------------------------------
 
 # Output file base name for HTML help builder.
-htmlhelp_basename = 'dpdatadoc'
+htmlhelp_basename = "dpdatadoc"
 
 
 # -- Options for LaTeX output ------------------------------------------------
@@ -120,15 +121,12 @@
     # The paper size ('letterpaper' or 'a4paper').
     #
     # 'papersize': 'letterpaper',
-
     # The font size ('10pt', '11pt' or '12pt').
     #
     # 'pointsize': '10pt',
-
     # Additional stuff for the LaTeX preamble.
     #
     # 'preamble': '',
-
     # Latex figure (float) alignment
     #
     # 'figure_align': 'htbp',
@@ -138,19 +136,15 @@
 # (source start file, target name, title,
 #  author, documentclass [howto, manual, or own class]).
 latex_documents = [
-    (master_doc, 'dpdata.tex', 'dpdata Documentation',
-     'Han Wang', 'manual'),
+    (master_doc, "dpdata.tex", "dpdata Documentation", "Han Wang", "manual"),
 ]
 
 
 # -- Options for manual page output ------------------------------------------
 
 # One entry per manual page. List of tuples
 # (source start file, name, description, authors, manual section).
-man_pages = [
-    (master_doc, 'dpdata', 'dpdata Documentation',
-     [author], 1)
-]
+man_pages = [(master_doc, "dpdata", "dpdata Documentation", [author], 1)]
 
 
 # -- Options for Texinfo output ----------------------------------------------
@@ -159,26 +153,47 @@
 # (source start file, target name, title, author,
 #  dir menu entry, description, category)
 texinfo_documents = [
-    (master_doc, 'dpdata', 'dpdata Documentation',
-     author, 'dpdata', 'One line description of project.',
-     'Miscellaneous'),
+    (
+        master_doc,
+        "dpdata",
+        "dpdata Documentation",
+        author,
+        "dpdata",
+        "One line description of project.",
+        "Miscellaneous",
+    ),
 ]
 
 
 # -- Extension configuration -------------------------------------------------
 def run_apidoc(_):
     from sphinx.ext.apidoc import main
-    sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
+
+    sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
     cur_dir = os.path.abspath(os.path.dirname(__file__))
     module = os.path.join(cur_dir, "..", "dpdata")
-    main(['-M', '--tocfile', 'api', '-H', 'API documentation', '-o', os.path.join(cur_dir, "api"), module, '--force'])
+    main(
+        [
+            "-M",
+            "--tocfile",
+            "api",
+            "-H",
+            "API documentation",
+            "-o",
+            os.path.join(cur_dir, "api"),
+            module,
+            "--force",
+        ]
+    )
+
 
 def run_formats(_):
     sp.check_output([sys.executable, "make_format.py"])
 
+
 def setup(app):
-    app.connect('builder-inited', run_apidoc)
-    app.connect('builder-inited', run_formats)
+    app.connect("builder-inited", run_apidoc)
+    app.connect("builder-inited", run_formats)
 
 
 intersphinx_mapping = {

diff --git a/docs/credits.rst b/docs/credits.rst
@@ -1,4 +1,4 @@
 Authors
 =======
 
-.. git-shortlog-authors::
+.. git-shortlog-authors::
diff --git a/docs/formats.rst b/docs/formats.rst
@@ -6,4 +6,3 @@ dpdata supports the following formats:
 .. csv-table:: Supported Formats
    :file: formats.csv
    :header-rows: 1
-
Original file line number	Diff line number	Diff line change
Expand Up		@@ -36,4 +36,3 @@ jobs:
		uses: pypa/gh-action-pypi-publish@master
		with:
		password: ${{ secrets.PYPI_API_TOKEN }}
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,4 +15,3 @@ jobs:
		architecture: 'x64'
		- run: python -m pip install .
		- run: python -c 'import dpdata'
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,4 +6,3 @@ dpdata supports the following formats:
		.. csv-table:: Supported Formats
		:file: formats.csv
		:header-rows: 1