Merge pull request #176 from se-passau/dev

Version 3.6 Merged-by: Thomas Bock <[email protected]>
se-sic · Feb 21, 2020 · 91fc448 · 91fc448
2 parents d02d523 + 75ae4a5
commit 91fc448
Show file tree

Hide file tree

Showing 12 changed files with 620 additions and 128 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -11,36 +11,32 @@
 ## with this program; if not, write to the Free Software Foundation, Inc.,
 ## 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 ##
-## Copyright 2017-2018 by Claus Hunsen <[email protected]>
+## Copyright 2017-2018,2020 by Claus Hunsen <[email protected]>
 ## All Rights Reserved.
 
+# TravisCI container
+os: linux
+dist: xenial
+warnings_are_errors: false
 
+# R environment, dependencies and information
 language: r
 r:
   - 3.3
   - 3.4
   - 3.5
-
-# TravisCI container
-sudo: required
-dist: trusty
-warnings_are_errors: false
-
-# # Branches
-# branches:
-#   only:
-#     - travis
-#     - claus-updates
-
-# R dependencies and information
+  - 3.6
 cache: packages
 repos:
   CRAN: https://cloud.r-project.org
 
-# installation
+# Installation
 install:
+    # package dependencies
     - sudo apt-get install libudunits2-dev
+    # package installation
     - Rscript install.R
 
+# Tests
 script:
     - Rscript tests.R
diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,21 @@
 # coronet – Changelog
 
+## 3.6
+
+### Added
+- Add a parameter `editor.definition` to the function `add.vertex.attribute.artifact.editor.count` which can be used to define, if author or committer or both count as editors when computing the attribute values. (#92, ff1e147ba563b2d71f8228afd49492a315a5ad48)
+- Add the possibility to filter out patchstack mails from the mails of the `ProjectData`. The option can be toggled using the newly added configuration option `mails.filter.patchstack.mails`. (1608e28ca36610c58d2a5447d12ee2052c6eb976, a932c8cdaa6fe5149c798bc09d9e421ba679c48d)
+- Add a new file `util-plot-evaluation.R` containing functions to plot commit edit types per author and project. (PR #171, d4af515f859ce16ffaa0963d6d3d4086bcbb7377, aa542a215f59bc3ed869cfefbc5a25fa050b1fc9. 0a0a5903e7c609dfe805a3471749eb2241efafe2)
+
+### Changed/Improved
+
+- Add R version 3.6 to test suite (8b2a52d38475a59c55feb17bb54ed12b9252a937, #161)
+- Update `.travis.yml` to improve compatibility with Travis CI (41ce589b3b50fd581a10e6af33ac6b1bbea63bb8)
+
+### Fixed
+
+- Ensure sorting of commit-count and LOC-count data.frames to fix tests with R 3.3 (33d63fd50c4b29d45a9ca586c383650f7d29efd5)
+
 
 ## 3.5
 

diff --git a/README.md b/README.md
@@ -103,7 +103,7 @@ While `proximity` triggers a file/function-based commit analysis in `Codeface`,
 When using this network library, the user only needs to give the `artifact` parameter to the [`ProjectConf`](#projectconf) constructor, which automatically ensures that the correct tagging is selected.
 
 The configuration files `{project-name}_{tagging}.conf` are mandatory and contain some basic configuration regarding a performed `Codeface` analysis (e.g., project name, name of the corresponding repository, name of the mailing list, etc.).
-For further details on those files, please have a look at some [example files](https://github.com/siemens/codeface/tree/master/conf) files in the `Codeface` repository.
+For further details on those files, please have a look at some [example files](https://github.com/siemens/codeface/tree/master/conf) in the `Codeface` repository.
 
 All the `*.list` files listed above are output files of `codeface-extraction` and contain meta data of, e.g., commits or e-mails to the mailing list, etc., in CSV format.
 This network library lazily loads and processes these files when needed.
@@ -133,7 +133,7 @@ Alternatively, you can run `Rscript install.R` to install the packages.
 
 Please insert the project into yours by use of [git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules).
 Furthermore, the file `install.R` installs all needed R packages (see [below](#needed-r-packages)) into your R library.
-Although, the use of of [packrat](https://rstudio.github.io/packrat/) with your project is recommended.
+Although, the use of [packrat](https://rstudio.github.io/packrat/) with your project is recommended.
 
 This library is written in a way to not interfere with the loading order of your project's `R` packages (i.e., `library()` calls), so that the library does not lead to masked definitions.
 
@@ -415,6 +415,8 @@ Additionally, for more examples, the file `showcase.R` is worth a look.
     * Functionality for the identification of network motifs (subgraph patterns)
 - `util-plot.R`
     * Everything needed for plotting networks
+- `util-plot-evaluation.R`
+    * Plotting functions for data evaluation
 - `util-misc.R`
     * Helper functions and also legacy functions, both needed in the other files
 - `showcase.R`
@@ -521,6 +523,10 @@ There is no way to update the entries, except for the revision-based parameters.
 - `commits.filter.untracked.files`
     * Remove all information concerning untracked files from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commits that solely changed untracked files. Networks built on top of this `ProjectData` do also not contain any information about untracked files.
     * [*`TRUE`*, `FALSE`]
+- `mails.filter.patchstack.mails`
+    * Filter patchstack mails from the mail data. In a thread, a patchstack spans the first sequence of mails where each mail has been authored by the thread creator and has been sent within a short time window after the preceding mail. The mails spanned by a patchstack are called
+'patchstack mails' and for each patchstack, every patchstack mail but the first one are filtered when `mails.filter.patchstack.mails = TRUE`.
+    * [`TRUE`, *`FALSE`*]
 - `issues.only.comments`
     * Only use comments from the issue data on disk and no further events such as references and label changes
     * [*`TRUE`*, `FALSE`]

diff --git a/showcase.R b/showcase.R
@@ -17,6 +17,7 @@
 ## Copyright 2017 by Felix Prasse <[email protected]>
 ## Copyright 2017-2018 by Thomas Bock <[email protected]>
 ## Copyright 2018 by Jakob Kronawitter <[email protected]>
+## Copyright 2019 by Klara Schlueter <[email protected]>
 ## All Rights Reserved.
 
 
@@ -80,6 +81,13 @@ revisions.callgraph = proj.conf$get.value("revisions.callgraph")
 x.data = ProjectData$new(project.conf = proj.conf)
 x = NetworkBuilder$new(project.data = x.data, network.conf = net.conf)
 
+## * Evaluation plots ------------------------------------------------------
+
+# edit.types = plot.commit.edit.types.in.project(x.data)
+# edit.types.scaled = plot.commit.edit.types.in.project(x.data, TRUE)
+# editor.types = plot.commit.editor.types.by.author(x.data)
+# editor.types.scaled = plot.commit.editor.types.by.author(x.data, TRUE)
+
 ## * Data retrieval --------------------------------------------------------
 
 # x.data$get.commits()

diff --git a/tests/test-data.R b/tests/test-data.R
@@ -12,7 +12,8 @@
 ## 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 ##
 ## Copyright 2018 by Christian Hechtl <[email protected]>
-## Copyright 2018 by Claus Hunsen <[email protected]>
+## Copyright 2018-2019 by Claus Hunsen <[email protected]>
+## Copyright 2019 by Jakob Kronawitter <[email protected]>
 ## All Rights Reserved.
 
 
@@ -34,6 +35,7 @@ test_that("Compare two ProjectData objects", {
 
     ##initialize a ProjectData object with the ProjectConf and clone it into another one
     proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
+    proj.conf$update.value("pasta", TRUE)
     proj.data.one = ProjectData$new(project.conf = proj.conf)
     proj.data.two = proj.data.one$clone()
 
@@ -43,19 +45,20 @@ test_that("Compare two ProjectData objects", {
     ## second object, as well, and test for equality.
 
     ##change the second data object
-    proj.data.one$get.commits()
+
+    proj.data.two$get.pasta()
 
     expect_false(proj.data.one$equals(proj.data.two), "Two not identical ProjectData objects.")
 
-    proj.data.two$get.commits()
+    proj.data.one$get.pasta()
 
     expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects.")
 
-    proj.data.two$get.pasta()
+    proj.data.one$get.commits()
 
     expect_false(proj.data.one$equals(proj.data.two), "Two not identical ProjectData objects.")
 
-    proj.data.one$get.pasta()
+    proj.data.two$get.commits()
 
     expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects.")
 
@@ -123,3 +126,56 @@ test_that("Compare two RangeData objects", {
     expect_false(proj.data.base$equals(range.data.four))
 
 })
+
+test_that("Filter patchstack mails", {
+
+    proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
+    proj.conf$update.value("mails.filter.patchstack.mails", TRUE)
+
+    ## create the project data
+    proj.data = ProjectData$new(proj.conf)
+
+    ## retrieve the mails while filtering patchstack mails
+    mails.filtered = proj.data$get.mails()
+
+    ## create new project with filtering disabled
+    proj.conf$update.value("mails.filter.patchstack.mails", FALSE)
+    proj.data = ProjectData$new(proj.conf)
+
+    ## retrieve the mails without filtering patchstack mails
+    mails.unfiltered = proj.data$get.mails()
+
+    ## get message ids
+    mails.filtered.mids = mails.filtered[["message.id"]]
+    mails.unfiltered.mids = mails.unfiltered[["message.id"]]
+
+    expect_equal(setdiff(mails.unfiltered.mids, mails.filtered.mids), c("<[email protected]>",
+                                                                        "<[email protected]>",
+                                                                        "<[email protected]>",
+                                                                        "<[email protected]>",
+                                                                        "<[email protected]>"))
+})
+
+test_that("Filter patchstack mails with PaStA enabled", {
+    proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
+    proj.conf$update.value("mails.filter.patchstack.mails", TRUE)
+    proj.conf$update.value("pasta", TRUE)
+
+    proj.data = ProjectData$new(proj.conf)
+
+    ## retrieve filtered PaStA data by calling 'get.pasta' which calls the filtering functionality internally
+    filtered.pasta = proj.data$get.pasta()
+
+    ## ensure that the remaining mails have not been touched
+    expect_true("<[email protected]>" %in% filtered.pasta[["message.id"]])
+    expect_true("<[email protected]>" %in% filtered.pasta[["message.id"]])
+    expect_true("<[email protected]>" %in% filtered.pasta[["message.id"]])
+    expect_equal(2, sum(filtered.pasta[["message.id"]] == "<[email protected]>"))
+
+    ## ensure that the three PaStA entries relating to the filtered patchstack mails have been merged to a single new
+    ## PaStA entry which has assigned the message ID of the first patchstack mail
+    expect_true("<[email protected]>" %in% filtered.pasta[["message.id"]])
+
+    ## ensure that there are no other entries than the ones that have been verified to exist above
+    expect_equal(6, nrow(filtered.pasta))
+})
diff --git a/tests/test-networks-covariates.R b/tests/test-networks-covariates.R
@@ -818,9 +818,7 @@ test_that("Test add.vertex.attribute.artifact.editor.count", {
 
     networks.and.data = get.network.covariates.test.networks("artifact")
 
-    expected.attributes = network.covariates.test.build.expected(list(1L), list(1L), list(3L, 1L))
-
-    expected.attributes = list(
+    expected.attributes.author = list(
         range = network.covariates.test.build.expected(
             c(1L), c(1L), c(3L, 1L)),
         cumulative = network.covariates.test.build.expected(
@@ -834,18 +832,58 @@ test_that("Test add.vertex.attribute.artifact.editor.count", {
         complete = network.covariates.test.build.expected(
             c(2L), c(2L), c(3L, 1L))
     )
+    expected.attributes.committer = list(
+        range = network.covariates.test.build.expected(
+            c(1L), c(1L), c(2L, 1L)),
+        cumulative = network.covariates.test.build.expected(
+            c(1L), c(1L), c(2L, 1L)),
+        all.ranges = network.covariates.test.build.expected(
+            c(1L), c(1L), c(2L, 1L)),
+        project.cumulative = network.covariates.test.build.expected(
+            c(1L), c(1L), c(2L, 1L)),
+        project.all.ranges = network.covariates.test.build.expected(
+            c(1L), c(1L), c(2L, 1L)),
+        complete = network.covariates.test.build.expected(
+            c(1L), c(1L), c(2L, 1L))
+    )
+    expected.attributes.both = list(
+        range = network.covariates.test.build.expected(
+            c(1L), c(2L), c(3L, 1L)),
+        cumulative = network.covariates.test.build.expected(
+            c(1L), c(2L), c(3L, 1L)),
+        all.ranges = network.covariates.test.build.expected(
+            c(2L), c(2L), c(3L, 1L)),
+        project.cumulative = network.covariates.test.build.expected(
+            c(1L), c(2L), c(3L, 1L)),
+        project.all.ranges = network.covariates.test.build.expected(
+            c(2L), c(2L), c(3L, 1L)),
+        complete = network.covariates.test.build.expected(
+            c(2L), c(2L), c(3L, 1L))
+    )
 
     ## Test
 
     lapply(AGGREGATION.LEVELS, function(level) {
-        networks.with.attr = add.vertex.attribute.artifact.editor.count(
+        networks.with.attr.author = add.vertex.attribute.artifact.editor.count(
             networks.and.data[["networks"]], networks.and.data[["project.data"]],
             aggregation.level = level
         )
+        networks.with.attr.committer = add.vertex.attribute.artifact.editor.count(
+            networks.and.data[["networks"]], networks.and.data[["project.data"]],
+            aggregation.level = level, editor.definition = "committer"
+        )
+        networks.with.attr.both = add.vertex.attribute.artifact.editor.count(
+            networks.and.data[["networks"]], networks.and.data[["project.data"]],
+            aggregation.level = level, editor.definition = c("author", "committer")
+        )
 
-        actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "editor.count")
+        actual.attributes.author = lapply(networks.with.attr.author, igraph::get.vertex.attribute, name = "editor.count")
+        actual.attributes.committer = lapply(networks.with.attr.committer, igraph::get.vertex.attribute, name = "editor.count")
+        actual.attributes.both = lapply(networks.with.attr.both, igraph::get.vertex.attribute, name = "editor.count")
 
-        expect_equal(expected.attributes[[level]], actual.attributes)
+        expect_equal(expected.attributes.author[[level]], actual.attributes.author)
+        expect_equal(expected.attributes.committer[[level]], actual.attributes.committer)
+        expect_equal(expected.attributes.both[[level]], actual.attributes.both)
     })
 })
 

diff --git a/util-conf.R b/util-conf.R
@@ -355,6 +355,12 @@ ProjectConf = R6::R6Class("ProjectConf", inherit = Conf,
                 allowed = c(TRUE, FALSE),
                 allowed.number = 1
             ),
+            mails.filter.patchstack.mails = list(
+                default = FALSE,
+                type = "logical",
+                allowed = c(TRUE, FALSE),
+                allowed.number = 1
+            ),
             synchronicity = list(
                 default = FALSE,
                 type = "logical",

diff --git a/util-core-peripheral.R b/util-core-peripheral.R
@@ -14,7 +14,7 @@
 ## Copyright 2017 by Mitchell Joblin <[email protected]>
 ## Copyright 2017 by Ferdinand Frank <[email protected]>
 ## Copyright 2017 by Sofie Kemper <[email protected]>
-## Copyright 2017-2019 by Claus Hunsen <[email protected]>
+## Copyright 2017-2020 by Claus Hunsen <[email protected]>
 ## Copyright 2017 by Felix Prasse <[email protected]>
 ## Copyright 2018-2019 by Christian Hechtl <[email protected]>
 ## Copyright 2018 by Klara Schlüter <[email protected]>
@@ -637,7 +637,7 @@ get.committer.not.author.commit.count = function(range.data) {
     res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `commits.df`
                        WHERE `committer.name` <> `author.name`
                        GROUP BY `committer.name`, `author.name`
-                       ORDER BY `freq` DESC")
+                       ORDER BY `freq` DESC, `author.name` ASC")
 
     logging::logdebug("get.committer.not.author.commit.count: finished.")
     return(res)
@@ -664,7 +664,7 @@ get.committer.and.author.commit.count = function(range.data) {
     res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `commits.df`
                        WHERE `committer.name` = `author.name`
                        GROUP BY `committer.name`, `author.name`
-                       ORDER BY `freq` DESC")
+                       ORDER BY `freq` DESC, `author.name` ASC")
 
     logging::logdebug("get.committer.and.author.commit.count: finished.")
     return(res)
@@ -699,7 +699,7 @@ get.committer.or.author.commit.count = function(range.data) {
 
     res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `ungrouped`
                        GROUP BY `name`
-                       ORDER BY `freq` DESC")
+                       ORDER BY `freq` DESC, `name` ASC")
 
     logging::logdebug("get.committer.or.author.commit.count: finished.")
     return(res)
@@ -725,7 +725,7 @@ get.committer.commit.count = function(range.data) {
 
     ## Execute a query to get the commit count per author
     res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `commits.df`
-                       GROUP BY `committer.name` ORDER BY `freq` DESC")
+                       GROUP BY `committer.name` ORDER BY `freq` DESC, `committer.name` ASC")
 
     logging::logdebug("get.committer.commit.count: finished.")
     return(res)
@@ -751,7 +751,7 @@ get.author.commit.count = function(proj.data) {
 
     ## Execute a query to get the commit count per author
     res = sqldf::sqldf("SELECT `author.name`, COUNT(*) AS `freq` FROM `commits.df`
-                       GROUP BY `author.name` ORDER BY `freq` DESC")
+                       GROUP BY `author.name` ORDER BY `freq` DESC, `author.name` ASC")
 
     logging::logdebug("get.author.commit.count: finished.")
     return(res)
@@ -813,7 +813,7 @@ get.author.loc.count = function(proj.data) {
     ## Execute a query to get the changed lines per author
     res = sqldf::sqldf("SELECT `author.name`, SUM(`added.lines`) + SUM(`deleted.lines`) AS `loc`
                         FROM `commits.df`
-                        GROUP BY `author.name` ORDER BY `loc` DESC")
+                        GROUP BY `author.name` ORDER BY `loc` DESC, `author.name` ASC")
 
     logging::logdebug("get.author.loc.count: finished.")
     return(res)