Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results using MatchIt (1:1 Nearest Neighbor, Mahalanobis Distance) #142

Open
NaijiaLiu opened this issue Feb 10, 2023 · 5 comments

Comments

@NaijiaLiu
Copy link

I'm using matchit for 1:1 nearest neighbor matching under Mahalanobis distance. I got inconsistent results in different R sessions. Please see below. Fixing seeds do not resolve the issue. The numbers are significantly different using the same Lalonde dataset.

library(MatchIt)
data(lalonde)
m.out2 <- matchit(treat ~ age + educ + race + married + 
                    nodegree + re74 + re75, data = lalonde,
                  method = "nearest", distance = "mahalanobis")
m.data <- match.data(m.out2)

sum(m.data$re78[m.data$treat==1])/sum(m.data$treat==1) - 
sum(m.data$re78[m.data$treat==0])/sum(m.data$treat==0)
#> [1] -520.2002

Created on 2023-02-09 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22)
#>  os       macOS Monterey 12.6
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-02-09
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
#>  cli           3.4.1   2022-09-23 [1] CRAN (R 4.2.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
#>  evaluate      0.16    2022-08-09 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.3   2022-07-18 [1] CRAN (R 4.2.0)
#>  knitr         1.40    2022-08-24 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  MatchIt     * 4.5.0   2022-11-16 [1] CRAN (R 4.2.0)
#>  Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.2.0)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.0)
#>  rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.16    2022-08-24 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.8   2022-07-11 [1] CRAN (R 4.2.0)
#>  stringr       1.4.1   2022-08-20 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.33    2022-09-12 [1] CRAN (R 4.2.0)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
library(MatchIt)
#> Warning: package 'MatchIt' was built under R version 4.1.1
data(lalonde)
m.out2 <- matchit(treat ~ age + educ + race + married + 
                    nodegree + re74 + re75, data = lalonde,
                  method = "nearest", distance = "mahalanobis")
 
m.data <- match.data(m.out2)

sum(m.data$re78[m.data$treat==1])/sum(m.data$treat==1) - 
sum(m.data$re78[m.data$treat==0])/sum(m.data$treat==0)
#> [1] 527.9392

Created on 2023-02-09 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                                      
#>  version  R version 4.1.0 Patched (2021-07-16 r80637)
#>  os       macOS Big Sur 11.5.2                       
#>  system   aarch64, darwin20                          
#>  ui       X11                                        
#>  language (EN)                                       
#>  collate  en_US.UTF-8                                
#>  ctype    en_US.UTF-8                                
#>  tz       America/New_York                           
#>  date     2023-02-09                                 
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.1.1)
#>  cli           3.4.1   2022-09-23 [1] CRAN (R 4.1.1)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.1.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.1.1)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.3   2022-07-18 [1] CRAN (R 4.1.1)
#>  knitr         1.33    2021-04-24 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.1.1)
#>  MatchIt     * 4.3.0   2021-09-13 [1] CRAN (R 4.1.1)
#>  Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.1.1)
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.1.0)
#>  rlang         1.0.6   2022-09-24 [1] CRAN (R 4.1.1)
#>  rmarkdown     2.17    2022-10-07 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
#>  stringi       1.7.8   2022-07-11 [1] CRAN (R 4.1.1)
#>  stringr       1.4.1   2022-08-20 [1] CRAN (R 4.1.1)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.1.1)
#>  xfun          0.33    2022-09-12 [1] CRAN (R 4.1.1)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library
library(MatchIt)
#> Loading required package: backports
data(lalonde)
m.out2 <- matchit(treat ~ age + educ + race + married + 
                    nodegree + re74 + re75, data = lalonde,
                  method = "nearest", distance = "mahalanobis")
m.data <- match.data(m.out2)

sum(m.data$re78[m.data$treat==1])/sum(m.data$treat==1) - 
  sum(m.data$re78[m.data$treat==0])/sum(m.data$treat==0)
#> [1] 516.6367

Created on 2023-02-10 by the reprex package (v2.0.1)

Session info
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux rodete
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/local/google/home/soichiroy/.r-google/rhome/lib/libR.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] MatchIt_4.5.0   backports_1.4.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.9        digest_0.6.29     gmotd_1.0         withr_2.5.0      
#>  [5] R.methodsS3_1.8.1 lifecycle_1.0.3   reprex_2.0.1      magrittr_2.0.3   
#>  [9] evaluate_0.15     rlang_1.0.6       cli_3.4.1         fs_1.5.2         
#> [13] R.utils_2.12.0    R.oo_1.24.0       vctrs_0.5.2       rmarkdown_2.18   
#> [17] styler_1.9.0      tools_4.1.3       glue_1.6.2        R.cache_0.15.0   
#> [21] purrr_1.0.1       xfun_0.34         yaml_2.3.5        fastmap_1.1.0    
#> [25] compiler_4.1.3    htmltools_0.5.4   knitr_1.42
@ngreifer
Copy link
Collaborator

The discrepancy between the second and third runs can be attributed to a change in how the Mahalanobis distance was calculated in version 4.4.0 of MatchIt. Your second run uses MatchIt version 4.3.0, and your third run uses MatchIt version 4.5.0, so this discrepancy is expected. See the NEWS for information on this change.

I can't replicate the results in your first run even after installing the exact versions of R and the packages you mention in your session info, so I can't explain that error or fix it.

@NaijiaLiu
Copy link
Author

Thanks for your reply! I was teaching with this function and a bunch of students in class did get the result in the first run (-520). I will double check their session info too.

@ngreifer
Copy link
Collaborator

Wow, I'm so sorry that is happening in class! I am quite mystified. If you could send me the matchit object of a student that gets -520 as their result, that might be helpful.

@NaijiaLiu
Copy link
Author

Archive.zip

@NaijiaLiu
Copy link
Author

Thanks a lot! I uploaded the matchit object and matched dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants