Skip to content

Commit

Permalink
update paper
Browse files Browse the repository at this point in the history
  • Loading branch information
Ziqi authored and Ziqi committed Mar 29, 2021
1 parent 4384e45 commit e292670
Show file tree
Hide file tree
Showing 7 changed files with 5 additions and 199 deletions.
Binary file modified .DS_Store
Binary file not shown.
Binary file modified fastgwr/.DS_Store
Binary file not shown.
Binary file removed fastgwr/tests/.DS_Store
Binary file not shown.
69 changes: 0 additions & 69 deletions fastgwr/tests/test_gwr.py

This file was deleted.

60 changes: 0 additions & 60 deletions fastgwr/tests/test_mgwr.py

This file was deleted.

71 changes: 3 additions & 68 deletions paper/figure maker.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ bibliography: paper.bib

# Summary

`fastgwr` is a command-line interface (CLI) tool for fast parallel fitting of Geographically Weighted Regression (GWR) models. The single-bandwidth GWR, as well as the multi-bandwidth Multiscale GWR (MGWR) model, are both available in the current version of the software. GWR models are typically computationally intensive in memory and time. To address these challenges, `fastgwr` uses Message Passing Interface (MPI, @gropp1999using) to implement the parallel algorithms developed in @li2019fast and @li2020computational. The program builds on top of the `mpi4py` package [@dalcin2008mpi] which provides bindings of the MPI with python to allow the algorithm to be executed on multiple processors across nodes. The goal of `fastgwr` is to enable studies of spatial non-stationary processes using large-scale and fine-resolution geospatial datasets.
`fastgwr` is a command-line interface (CLI) tool for fast and parallel fitting of Geographically Weighted Regression (GWR) models. The single-bandwidth GWR, as well as the multi-bandwidth Multiscale GWR (MGWR) model, are both available in the current version of the software. GWR models are typically computationally intensive in memory and time. To address these challenges, `fastgwr` uses Message Passing Interface (MPI, @gropp1999using) to implement the parallel algorithms developed in @li2019fast and @li2020computational. The program builds on top of the `mpi4py` package [@dalcin2008mpi] which provides bindings of the MPI with python to allow the algorithm to be executed on multiple processors across nodes. The goal of `fastgwr` is to enable studies of spatial non-stationary processes using large-scale and fine-resolution geospatial datasets.

# Statement of need

Expand All @@ -31,7 +31,7 @@ As geospatial data are increasingly available from different sources such as rem

# State of the Field

There are currently existing packages that allow users to fit GWR and MGWR models. Two most popular open-source options are `mgwr` in python [@oshan2019mgwr] and `GWmodel` in R [@gollini2013gwmodel], both of which provide friendly APIs and are actively maintained. `GWmodel` supports a wide array of geographically weighted models and analysis tools; however, the performance of `GWmodel` is lagged behind and not suitable for large datasets. A comprehensive performance comparison between `GWmodel` and `fastgwr` can be found in @li2019fast and @li2020computational. As for `mgwr`, the parallelism of `fastgwr` has been built into `mgwr` by leveraging the `multiprocessing` package. For small and moderate sized problems, the performance between `mgwr` and `fastgwr` is comparable. Nevertheless, the major advantage of `fastgwr` is that the use of MPI-based parallelism allows the program to run in parallel across multiple computer nodes. In this way, `fastgwr` is the only option if the analyst wants to run the GWR program on a high performance computing cluster, which empowers larger-scale analysis that is impossible for a single workstation. To demostrate this, `fastgwr` was executed on the University of Arizona's [Ocelote](https://public.confluence.arizona.edu/display/UAHPC/Ocelote+Quick+Start) cluster using the [Zillow datasets](https://github.com/Ziqi-Li/FastGWR/tree/master/Zillow-test-dataset), and the scalability can be seen in \autoref{fig:example}. It is expected that the scalability will further increase with larger datasets because the data transfer will take a relative smaller proportion in the total computation time. Additionally, the model fitting results of `fastgwr` have been validated against `mgwr` which can be found in the [notebooks](https://github.com/Ziqi-Li/FastGWR/tree/master/validation%20notebook) in the attached [Gituhb repository](https://github.com/Ziqi-Li/FastGWR).
There are currently existing packages that allow users to fit GWR and MGWR models. Two most popular open-source options are `mgwr` in python [@oshan2019mgwr] and `GWmodel` in R [@gollini2013gwmodel], both of which provide friendly APIs and are actively maintained. `GWmodel` supports a wide array of geographically weighted models and analysis tools; however, the performance of `GWmodel` is lagged behind and not suitable for large datasets. A comprehensive performance comparison between `GWmodel` and `fastgwr` can be found in @li2019fast and @li2020computational. As for `mgwr`, the parallelism of `fastgwr` has been built into `mgwr` by leveraging the `multiprocessing` package. For small and moderate sized problems, the performance between `mgwr` and `fastgwr` is comparable. Nevertheless, the major advantage of `fastgwr` is that the use of MPI-based parallelism allows the program to run in parallel across multiple computer nodes. In this way, `fastgwr` is the only option if the analyst wants to run the GWR program on a high performance computing cluster, and it empowers larger-scale analysis that is impossible for a single workstation. To demostrate this, `fastgwr` was executed on the University of Arizona's [Ocelote](https://public.confluence.arizona.edu/display/UAHPC/Ocelote+Quick+Start) cluster using the [Zillow datasets](https://github.com/Ziqi-Li/FastGWR/tree/master/Zillow-test-dataset), and the scalability can be seen in \autoref{fig:example}. It is expected that the scalability will further increase with larger datasets because the data transfer among processes will take a relative smaller proportion in the total computation time. Additionally, the model fitting results of `fastgwr` have been validated against `mgwr` which can be found in the [notebooks](https://github.com/Ziqi-Li/FastGWR/tree/master/validation%20notebook) in the attached [Gituhb repository](https://github.com/Ziqi-Li/FastGWR).

![Scalability of `fastgwr`. The GWR model is fitted with 50,000 Zillow records and the MGWR model is fitted with 10,000 Zillow records. \label{fig:example}](scalability.png){ width=50%}

Expand Down

0 comments on commit e292670

Please sign in to comment.