Skip to content

Improving markovchain R package

Giorgio Alfredo Spedicato edited this page Mar 9, 2016 · 3 revisions

Background

This project aims to extend markovchain R package functions and capabilities in order to provide statisticians using R a more functional tools to perform analysis of stochastic projects related to Markov Chains (MC). Optimization (in terms of coding performance and algorithms) of existing package pertains to the project as well.

Related work

Other R packages providing functions and methods to handle discrete time Markov chains (DTMCs) are (to my knowledge): DTMCPack and clickstream. Affine statistical models to MCs are: msm and mstate (for survival analysis on multistate models), HMM and deepmixS4 for Hidden Markov Models and mcmc for Monte Carlo Markov Chains. By the way, markovchain R package is by far the most used R package to model DTMCs cause its intuitive S4 structure, the availability of method to perform probabilistic analysis (structural analysis of DTMCs) and statistical inference (estimation and simulations). The markovchain packages is knowing increasing popularity in the R statistical community and an initial optimization was funded for the first time during the Google Summer of Code Slot during 2015. The selected student rewrote a huge part of the code in Rcpp and improved greatily the existing statistical software.

Details of your coding project

I expect the student to code new functions or improve the existing one (using Rcpp/C++ when possible, R otherwise) to enhance the package statistical functions. Supporting documentation (.Rnd and vignettes) have to been produced as well, integrating, when suitable, with the existing one. In particular, during the three months of the project will (in descending order of priority):

  1. Improve existing methods/functions for higher order markov chains. In particular I expect the student to develop methods to estimate higher order multivariate markov chains (HOMMC) as shown in this paper (http://hkumath.hku.hk/~imr/IMRPreprintSeries/2007/IMR2007-15.pdf). Suitable S4 methods to properly handle HOMMCs (print, plot, ect..) have to be done as well improving the existing embrional approach for univariate higher order markov chains.
  2. Improve existing methods/function to estimate Continuous Time Markov Chains. Whilst good code already exists, improvements are needed to expand structural analysis and inference (http://www.mast.queensu.ca/~stat455/lecturenotes/set5.pdf).
  3. Improve general inference on DTMC. In particular, existing statistical tests for Markov Property, Markov order of the stochastic process should be revised and improved. Eventually, the R package should cover most (if not all) methods from (Anderson and Goodman, Statistical Inference about Markov Chains )
  4. General assessment of the structural analysis of DTMC. The student should review, document and optimize existing code for structural analysis of DTMCs and optimize it when necessary. Also the student could suggest better algorithms to perform structural analysis of DTMCs.
  5. Fine tuning of the existing code.
  6. One improvements suggested directly by the student.

Expected impact

A reliable and broad R package for Markov Processes will greatily support the statistical community, cause Markov models are widely used.

Mentors

Giorgio A. Spedicato, Academic Mentor Christophe Dutang.

Tests

The ideal student necessarily knows R and C++, as well as willing to learn Rcpp, writing R packages (see Hadley Wichkam free online books). Knowledge of GitHub is also assumed. Knowledge of unit tesing (R package testthat) and oxygen (R package roxygen2) is also assumed. Regarding Academic background, he should have sound knowledge of inferential statistics and probability. When submitting its application the chose canditate will satisfactorily have shown:

  • Details previous coding experiences and academic background explaining why the mentors should choose his project.
  • Have drafted a project idea that resembles and completes the details of the coding project.
  • Passes the following tests, after having forked the GitHub project site (https://github.com/spedygiorgio/markovchain), where the current developing version of the project is hosted.
    1. Rewriting in RCpp the rmarkovchain function (and the ancillary functions called from rmarkovchain, if not already in Rcpp) keeping the same front end (the same parameters names and type shall be shown to the final user) but fastening functionalities using C++. At the same, use roxygen2 tags to rewrite the helper.
    2. Rewrite the function verifyMarkovProperty, checking and fixing (if needed) the algorithm that assess whether a sequence satisfies the Markov Property and making the function to return a list that contains the statistic and the pvalue (like any htest class object).
    3. Propose a relevant improvement to the package capabilities/functionalities that he is willing to implement during the coding period.

Solutions of tests

Clone this wiki locally