update paper

Ziqi-Li · Mar 29, 2021 · e292670 · e292670
1 parent 4384e45
commit e292670
Show file tree

Hide file tree

Showing 7 changed files with 5 additions and 199 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/fastgwr/.DS_Store b/fastgwr/.DS_Store
diff --git a/fastgwr/tests/.DS_Store b/fastgwr/tests/.DS_Store
diff --git a/fastgwr/tests/test_gwr.py b/fastgwr/tests/test_gwr.py
diff --git a/fastgwr/tests/test_mgwr.py b/fastgwr/tests/test_mgwr.py
diff --git a/paper/figure maker.ipynb b/paper/figure maker.ipynb
diff --git a/paper/paper.md b/paper/paper.md
@@ -21,7 +21,7 @@ bibliography: paper.bib
 
 # Summary
 
-`fastgwr` is a command-line interface (CLI) tool for fast parallel fitting of Geographically Weighted Regression (GWR) models. The single-bandwidth GWR, as well as the multi-bandwidth Multiscale GWR (MGWR) model, are both available in the current version of the software. GWR models are typically computationally intensive in memory and time. To address these challenges, `fastgwr` uses Message Passing Interface (MPI, @gropp1999using) to implement the parallel algorithms developed in @li2019fast and @li2020computational. The program builds on top of the `mpi4py` package [@dalcin2008mpi] which provides bindings of the MPI with python to allow the algorithm to be executed on multiple processors across nodes. The goal of  `fastgwr` is to enable studies of spatial non-stationary processes using large-scale and fine-resolution geospatial datasets.
+`fastgwr` is a command-line interface (CLI) tool for fast and parallel fitting of Geographically Weighted Regression (GWR) models. The single-bandwidth GWR, as well as the multi-bandwidth Multiscale GWR (MGWR) model, are both available in the current version of the software. GWR models are typically computationally intensive in memory and time. To address these challenges, `fastgwr` uses Message Passing Interface (MPI, @gropp1999using) to implement the parallel algorithms developed in @li2019fast and @li2020computational. The program builds on top of the `mpi4py` package [@dalcin2008mpi] which provides bindings of the MPI with python to allow the algorithm to be executed on multiple processors across nodes. The goal of  `fastgwr` is to enable studies of spatial non-stationary processes using large-scale and fine-resolution geospatial datasets.
 
 # Statement of need
 
@@ -31,7 +31,7 @@ As geospatial data are increasingly available from different sources such as rem
 
 # State of the Field
 
-There are currently existing packages that allow users to fit GWR and MGWR models. Two most popular open-source options are `mgwr` in python [@oshan2019mgwr] and `GWmodel` in R [@gollini2013gwmodel], both of which provide friendly APIs and are actively maintained. `GWmodel` supports a wide array of geographically weighted models and analysis tools; however, the performance of `GWmodel` is lagged behind and not suitable for large datasets. A comprehensive performance comparison between `GWmodel` and `fastgwr` can be found in @li2019fast and @li2020computational. As for `mgwr`, the parallelism of `fastgwr` has been built into `mgwr` by leveraging the `multiprocessing` package. For small and moderate sized problems, the performance between `mgwr` and `fastgwr` is comparable. Nevertheless, the major advantage of `fastgwr` is that the use of MPI-based parallelism allows the program to run in parallel across multiple computer nodes. In this way, `fastgwr` is the only option if the analyst wants to run the GWR program on a high performance computing cluster, which empowers larger-scale analysis that is impossible for a single workstation. To demostrate this, `fastgwr` was executed on the University of Arizona's [Ocelote](https://public.confluence.arizona.edu/display/UAHPC/Ocelote+Quick+Start) cluster using the [Zillow datasets](https://github.com/Ziqi-Li/FastGWR/tree/master/Zillow-test-dataset), and the scalability can be seen in \autoref{fig:example}. It is expected that the scalability will further increase with larger datasets because the data transfer will take a relative smaller proportion in the total computation time. Additionally, the model fitting results of `fastgwr` have been validated against `mgwr` which can be found in the [notebooks](https://github.com/Ziqi-Li/FastGWR/tree/master/validation%20notebook) in the attached [Gituhb repository](https://github.com/Ziqi-Li/FastGWR).
+There are currently existing packages that allow users to fit GWR and MGWR models. Two most popular open-source options are `mgwr` in python [@oshan2019mgwr] and `GWmodel` in R [@gollini2013gwmodel], both of which provide friendly APIs and are actively maintained. `GWmodel` supports a wide array of geographically weighted models and analysis tools; however, the performance of `GWmodel` is lagged behind and not suitable for large datasets. A comprehensive performance comparison between `GWmodel` and `fastgwr` can be found in @li2019fast and @li2020computational. As for `mgwr`, the parallelism of `fastgwr` has been built into `mgwr` by leveraging the `multiprocessing` package. For small and moderate sized problems, the performance between `mgwr` and `fastgwr` is comparable. Nevertheless, the major advantage of `fastgwr` is that the use of MPI-based parallelism allows the program to run in parallel across multiple computer nodes. In this way, `fastgwr` is the only option if the analyst wants to run the GWR program on a high performance computing cluster, and it empowers larger-scale analysis that is impossible for a single workstation. To demostrate this, `fastgwr` was executed on the University of Arizona's [Ocelote](https://public.confluence.arizona.edu/display/UAHPC/Ocelote+Quick+Start) cluster using the [Zillow datasets](https://github.com/Ziqi-Li/FastGWR/tree/master/Zillow-test-dataset), and the scalability can be seen in \autoref{fig:example}. It is expected that the scalability will further increase with larger datasets because the data transfer among processes will take a relative smaller proportion in the total computation time. Additionally, the model fitting results of `fastgwr` have been validated against `mgwr` which can be found in the [notebooks](https://github.com/Ziqi-Li/FastGWR/tree/master/validation%20notebook) in the attached [Gituhb repository](https://github.com/Ziqi-Li/FastGWR).
 
 ![Scalability of `fastgwr`. The GWR model is fitted with 50,000 Zillow records and the MGWR model is fitted with 10,000 Zillow records. \label{fig:example}](scalability.png){ width=50%}