diff --git a/README.md b/README.md index 9968f022..25a8f64e 100644 --- a/README.md +++ b/README.md @@ -8,13 +8,13 @@ The network library `codeface-extraction-r` can be used to construct analyzable ### Submodule Please insert the project into yours by use of [git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules). -Furthermore, the file `install.R` installs all needed R packages (see below) into your R library. +Furthermore, the file `install.R` installs all needed R packages (see [below](#needed-r-packages)) into your R library. Although, the use of of [packrat](https://rstudio.github.io/packrat/) with your project is recommended. This library is written in a way to not interfere with the loading order of your project's `R` packages (i.e., `library()` calls), so that the library does not lead to masked definitions. To initialize the library in your project, you need to source all files of the library in your project using the following command: -``` +```R source("path/to/util-init.R", chdir = TRUE) ``` It may lead to unpredictable behavior, when you do not do this, as we need to set some system and environment variables to ensure correct behavior of all functionality (e.g., parsing timestamps in the correct timezone and reading files from disk using the correct encoding). @@ -38,9 +38,9 @@ It may lead to unpredictable behavior, when you do not do this, as we need to se ## How-To In this section, we give a short example on how to initialize all needed objects and build a bipartite network. -For more examples, please see the file `test.R`. +For more examples, please see the file `showcase.R`. -``` +```R CF.DATA = "/path/to/codeface-data" # path to codeface data CF.SELECTION.PROCESS = "threemonth" # releases, threemonth(, testing) @@ -57,97 +57,44 @@ net.conf = NetworkConf$new() ## update the values of the NetworkConf object to the specific needs net.conf$update.values(list(author.relation = AUTHOR.RELATION, - artifact.relation = ARTIFACT.RELATION)) + artifact.relation = ARTIFACT.RELATION, + simplify = TRUE)) ## get ranges information from project configuration -ranges = proj.conf$get.entry(entry.name = "ranges") +ranges = proj.conf$get.entry("ranges") ## create data object which actually holds and handles data -cf.data = ProjectData$new(proj.conf, net.conf) +data = ProjectData$new(proj.conf) + +## create network builder to construct networks from the given data object +netbuilder = NetworkBuilder$new(data, net.conf) ## create and get the bipartite network ## (construction configured by net.conf's "artifact.relation") -bpn = cf.data$get.bipartite.network() +bpn = netbuilder$get.bipartite.network() ## plot the retrieved network -plot.bipartite.network(bpn) +plot.network(bpn) + ``` There are two different classes of configuration objects in this library: -- the `ProjectConf` class, which determines all configuration parameters needed for the configured project (mainly data paths) and -- the `NetworkConf` class, which is used for all configuration parameters concerning data retrieval and network construction. +- the `ProjectConf` class which determines all configuration parameters needed for the configured project (mainly data paths) and +- the `NetworkConf` class which is used for all configuration parameters concerning data retrieval and network construction. You can find an overview on all the parameters in these classes below in this file. -For examples on how to use both classes and how to build networks with them, please look in the file `test.R`. +For examples on how to use both classes and how to build networks with them, please look in the file `showcase.R`. ## Configuration Classes -### NetworkConf - -In this section, we give an overview on the parameters of the `NetworkConf` class and their meaning. - -All parameters can be retrieved with the method `NetworkConf$get.variable(...)`, by passing one parameter name as method parameter. -Updates to the parameters can be done by calling `NetworkConf$update.variables(...)` and passing a list of parameter names and their respective values. - -**Note**: Default values are shown in *italics*. - -- `author.relation` - * The relation among authors, encoded as edges in an author network - * **Note**: The author--artifact relation in bipartite and multi networks is configured by `artifact.relation`! - * possible values: [*`"mail"`*, `"cochange"`, `"issue"`] -- `author.directed` - * The (time-based) directedness of edges in an author network - * [`TRUE`, *`FALSE`*] -- `author.all.authors` - * Denotes whether all available authors (from all analyses and data sources) shall be added to the network as a basis - * **Note**: Depending on the chosen author relation, there may be isolates then - * [`TRUE`, *`FALSE`*] -- `author.only.committers` - * Remove all authors from an author network (including bipartite and multi networks) who are not present in an author network constructed with `artifact.relation` as relation, i.e., all authors that have no biparite relations in a bipartite/multi network are removed. - * [`TRUE`, *`FALSE`*] -- `artifact.relation` - * The relation among artifacts, encoded as edges in an artifact network - * **Note**: This relation configures also the author--artifact relation in bipartite and multi networks! - * possible values: [*`"cochange"`*, `"callgraph"`, `"mail"`, `"issue"`] -- `artifact.directed` - * The (time-based) directedness of edges in an artifact network - * **Note**: This parameter does not take effect for now, as the co-change relation is always undirected, while the call-graph relation is always directed. - * [`TRUE`, *`FALSE`*] -- `edge.attributes` - * The list of edge-attribute names and information - * a subset of the following as a single vector: - - timestamp information: *`"date"`* - - author information: `"author.name"`, `"author.email"` - - e-mail information: *`"message.id"`*, *`"thread"`*, `"subject"` - - commit information: *`"hash"`*, *`"file"`*, *`"artifact.type"`*, *`"artifact"`*, `"changed.files"`, `"added.lines"`, `"deleted.lines"`, `"diff.size"`, `"artifact.diff.size"`, `"synchronicity"` - - PaStA information: `"pasta"`, - - issue information: *`"issue.id"`*, *`"event.name"`*, `"issue.state"`, `"creation.date"`, `"closing.date"`, `"is.pull.request"` - * **Note**: `"date"` is always included as this information is needed for several parts of the library, e.g., time-based splitting. - * **Note**: For each type of network that can be built, only the applicable part of the given vector of names is respected. - * **Note**: For the edge attributes `"pasta"` and `"synchronicty"`, the network configuration's parameters `pasta` and `synchronicity` need to be set to `TRUE`, respectively (see below). -- `simplify` - * Perform edge contraction to retrieve a simplified network - * [`TRUE`, *`FALSE`*] -- `skip.threshold` - * The upper bound for total amount of edges to build for a subset of the data, i.e., not building any edges for the subset exceeding the limit - * any positive integer - * **Example**: The amount of `mail`-based directed edges in an author network for one thread with 100 authors is 5049. - A value of 5000 for `skip.threshold` would lead to the omission of this thread from the network. - -The classes `ProjectData` and `RangeData` hold instances of the `NetworkConf` class, just pass the object as parameter to the constructor. -You can also update the object at any time, but as soon as you do so, all -cached data of the data object are reset and have to be rebuilt. - -For more examples, please look in the file `test.R`. - -## ProjectConf +### ProjectConf In this section, we give an overview on the parameters of the `ProjectConf` class and their meaning. All parameters can be retrieved with the method `ProjectConf$get.entry(...)`, by passing one parameter name as method parameter. There is no way to update the entries, except for the revision-based parameters. -### Basic Information +#### Basic Information - `project` * The project name from the Codeface analysis @@ -161,7 +108,7 @@ There is no way to update the entries, except for the revision-based parameters. - `mailinglists` * A list of the mailinglists of the project containing their name, type and source -### Artifact-Related Information +#### Artifact-Related Information - `artifact` * The artifact of the project used for all data retrievals @@ -175,9 +122,9 @@ There is no way to update the entries, except for the revision-based parameters. * The Codeface tagging parameter for the project, based on the `artifact` parameter * Either `"proximity"` or `"feature"` -### Revision-Related Information +#### Revision-Related Information -**Note**: This data is updated after performing a data-based splitting (i.e., by calling the functions `split.data.*`). +**Note**: This data is updated after performing a data-based splitting (i.e., by calling the functions `split.data.*(...)`). **Note**: These parameters can be updated using the method `ProjectConf$set.splitting.info()`, but you should *not* do that manually! - `revisions` @@ -192,7 +139,7 @@ There is no way to update the entries, except for the revision-based parameters. - `ranges.callgraph` * The revision ranges based on the list `revisions.callgraph` -### Data Paths +#### Data Paths - `datapath` * The data path to the Codeface results folder of this project @@ -203,9 +150,9 @@ There is no way to update the entries, except for the revision-based parameters. - `datapath.pasta` * The data path to the pasta data -### Splitting Information +#### Splitting Information -**Note**: This data is added to the `ProjectConf` object only after performing a data-based splitting (by calling the functions `split.data.*`). +**Note**: This data is added to the `ProjectConf` object only after performing a data-based splitting (by calling the functions `split.data.*(...)`). **Note**: These parameters can be updated using the method `ProjectConf$set.splitting.info()`, but you should *not* do that manually! - `split.type` @@ -223,13 +170,13 @@ There is no way to update the entries, except for the revision-based parameters. - `split.ranges` * The ranges constructed from `split.revisions` (either in sliding-window manner or not, depending on `split.sliding.window`) -### Data-Retrieval-Related Parameters (Configurable!) +#### (Configurable) Data-Retrieval-Related Parameters **Note**: These parameters can be configured using the method `ProjectConf$update.values()`. - `artifact.filter.base` - Remove all artifact information regarding the base artifact - (`Base_Feature` or `File_Level` for features and functions, respectively, as artifacts) + (`"Base_Feature"` or `"File_Level"` for features and functions, respectively, as artifacts) - [*`TRUE`*, `FALSE`] - `synchronicity` * Read and add synchronicity data to commits and co-change-based networks @@ -244,29 +191,96 @@ There is no way to update the entries, except for the revision-based parameters. * [`TRUE`, *`FALSE`*] * **Note**: To include PaStA-based edge attributes, you need to give the `"pasta"` edge attribute for `edge.attributes`. +### NetworkConf + +In this section, we give an overview on the parameters of the `NetworkConf` class and their meaning. + +All parameters can be retrieved with the method `NetworkConf$get.variable(...)`, by passing one parameter name as method parameter. +Updates to the parameters can be done by calling `NetworkConf$update.variables(...)` and passing a list of parameter names and their respective values. + +**Note**: Default values are shown in *italics*. + +- `author.relation` + * The relation among authors, encoded as edges in an author network + * **Note**: The author--artifact relation in bipartite and multi networks is configured by `artifact.relation`! + * possible values: [*`"mail"`*, `"cochange"`, `"issue"`] +- `author.directed` + * The (time-based) directedness of edges in an author network + * [`TRUE`, *`FALSE`*] +- `author.all.authors` + * Denotes whether all available authors (from all analyses and data sources) shall be added to the network as a basis + * **Note**: Depending on the chosen author relation, there may be isolates then + * [`TRUE`, *`FALSE`*] +- `author.only.committers` + * Remove all authors from an author network (including bipartite and multi networks) who are not present in an author network constructed with `artifact.relation` as relation, i.e., all authors that have no biparite relations in a bipartite/multi network are removed. + * [`TRUE`, *`FALSE`*] +- `artifact.relation` + * The relation among artifacts, encoded as edges in an artifact network + * **Note**: This relation configures also the author--artifact relation in bipartite and multi networks! + * possible values: [*`"cochange"`*, `"callgraph"`, `"mail"`, `"issue"`] +- `artifact.directed` + * The (time-based) directedness of edges in an artifact network + * **Note**: This parameter does not take effect for now, as the co-change relation is always undirected, while the call-graph relation is always directed. + * [`TRUE`, *`FALSE`*] +- `edge.attributes` + * The list of edge-attribute names and information + * a subset of the following as a single vector: + - timestamp information: *`"date"`* + - author information: `"author.name"`, `"author.email"` + - e-mail information: *`"message.id"`*, *`"thread"`*, `"subject"` + - commit information: *`"hash"`*, *`"file"`*, *`"artifact.type"`*, *`"artifact"`*, `"changed.files"`, `"added.lines"`, `"deleted.lines"`, `"diff.size"`, `"artifact.diff.size"`, `"synchronicity"` + - PaStA information: `"pasta"`, + - issue information: *`"issue.id"`*, *`"event.name"`*, `"issue.state"`, `"creation.date"`, `"closing.date"`, `"is.pull.request"` + * **Note**: `"date"` is always included as this information is needed for several parts of the library, e.g., time-based splitting. + * **Note**: For each type of network that can be built, only the applicable part of the given vector of names is respected. + * **Note**: For the edge attributes `"pasta"` and `"synchronicity"`, the project configuration's parameters `pasta` and `synchronicity` need to be set to `TRUE`, respectively (see below). +- `simplify` + * Perform edge contraction to retrieve a simplified network + * [`TRUE`, *`FALSE`*] +- `skip.threshold` + * The upper bound for total amount of edges to build for a subset of the data, i.e., not building any edges for the subset exceeding the limit + * any positive integer + * **Example**: The amount of `mail`-based directed edges in an author network for one thread with 100 authors is 5049. + A value of 5000 for `skip.threshold` (as it is smaller than 5049) would lead to the omission of this thread from the network. +- `unify.date.ranges` + * Cut the data sources to the largest start date and the smallest end date across all data sources + * **Note**: This parameter does not affect the original data object, but rather creates a clone. + * [`TRUE`, *`FALSE`*] + +The classes `ProjectData` and `RangeData` hold instances of the `NetworkConf` class, just pass the object as parameter to the constructor. +You can also update the object at any time, but as soon as you do so, all cached data of the data object are reset and have to be rebuilt. + +For more examples, please look in the file `showcase.R`. + ## File overview +- `util-init.R` + * Initialization file that can be used by other analysis projects (see Section [*Submodule*](#submodule)) - `util-conf.R` * The configuration classes of the project +- `util-read.R` + * Functionality to read data file from disk - `util-data.R` * All representations of the data classes -- `util-plot.R` - * Everything needed for plotting networks -- `util-misc.R` - * Helper functions and also legacy functions, both needed in the other files +- `util-networks.R` + * The `NetworkBuilder` class and all corresponding helper functions to construct networks - `util-split.R` * Splitting functionality for data objects and networks (time-based and activity-based, using arbitrary ranges) - `util-motifs.R` * Functionality for the identifaction of network motifs (subgraph patterns) - `util-bulk.R` * Collection functionality for the different network types (using Codeface revision ranges) +- `util-plot.R` + * Everything needed for plotting networks - `util-core-peripheral.R` * Author classification (core and peripheral) and related functions -- `util-init.R` - * Initialization file that can be used by other analysis projects (see Section *Submodule*) -- `test.R` - * Showcase file (see Section *How-To*) +- `util-networks-metrics.R` + * A set of network-metric functions +- `util-misc.R` + * Helper functions and also legacy functions, both needed in the other files +- `showcase.R` + * Showcase file (see Section also [*How-To*](#how-to)) - `tests.R` * Test suite (running all tests in `tests/` subfolder) diff --git a/test.R b/showcase.R similarity index 99% rename from test.R rename to showcase.R index 19ff793c..ea455e38 100644 --- a/test.R +++ b/showcase.R @@ -63,7 +63,7 @@ x = NetworkBuilder$new(project.data = x.data, network.conf = net.conf) ## * Data retrieval -------------------------------------------------------- -# x.data$get.commits.raw() +# x.data$get.commits() # x.data$get.synchronicity() # x.data$get.author2artifact() # x.data$get.commits.filtered() @@ -107,7 +107,7 @@ y = NetworkBuilder$new(project.data = y.data, network.conf = net.conf) ## * Data retrieval -------------------------------------------------------- -# y.data$get.commits.raw() +# y.data$get.commits() # y.data$get.synchronicity() # y.data$get.author2artifact() # y.data$get.commits.filtered() diff --git a/tests/codeface-data/results/testing/test_feature/feature/issues.list b/tests/codeface-data/results/testing/test_feature/feature/issues.list index 9c6a939a..898bec9a 100644 --- a/tests/codeface-data/results/testing/test_feature/feature/issues.list +++ b/tests/codeface-data/results/testing/test_feature/feature/issues.list @@ -1,36 +1,36 @@ -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";11;"Karl";"karl@example.org";"2013-04-21 23:52:09";"created" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";11;"Karl";"karl@example.org";"2013-05-05 23:28:57";"commented" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";1;"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"referenced" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";1;"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"merged" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";1;"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"closed" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";11;"Karl";"karl@example.org";"2013-06-01 22:37:03";"head_ref_deleted" -2;"CLOSED";"2013-04-21 23:52:09";"2014-05-25 20:02:08";"true";1342;"Thomas";"thomas@example.org";"2016-07-19 10:47:25";"referenced" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";15;"udo";"udo@example.org";"2016-04-17 02:07:37";"mentioned" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";15;"udo";"udo@example.org";"2016-04-17 02:07:37";"subscribed" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";1350;"Thomas";"thomas@example.org";"2016-07-14 02:03:14";"commented" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-14 17:42:52";"commented" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-15 08:37:57";"mentioned" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-15 08:37:57";"subscribed" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";1350;"Thomas";"thomas@example.org";"2016-07-15 08:37:57";"commented" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-27 22:25:25";"mentioned" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-27 22:25:25";"subscribed" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";1;"Olaf";"olaf@example.org";"2016-07-27 22:25:25";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1342;"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"mentioned" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1342;"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"subscribed" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 15:59:25";"created" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 16:03:23";"renamed" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 16:05:47";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-08-31 18:21:48";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1;"Olaf";"olaf@example.org";"2016-10-05 01:07:46";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-10-13 15:33:56";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-06 14:03:42";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1;"Olaf";"olaf@example.org";"2016-12-07 15:37:02";"merged" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1;"Olaf";"olaf@example.org";"2016-12-07 15:37:02";"closed" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1;"Olaf";"olaf@example.org";"2016-12-07 15:37:21";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-07 15:53:02";"commented" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-07 15:53:02";"created" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2017-02-20 22:25:41";"commented" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2017-03-02 17:30:10";"commented" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";1;"Max";"max@example.org";"2017-05-23 12:32:21";"merged" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";1;"Max";"max@example.org";"2017-05-23 12:32:21";"closed" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";1;"Max";"max@example.org";"2017-05-23 12:32:39";"commented" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Karl";"karl@example.org";"2013-04-21 23:52:09";"";"created" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Karl";"karl@example.org";"2013-05-05 23:28:57";"";"commented" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Karl";"karl@example.org";"2013-05-05 23:28:57";"";"referenced" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"";"merged" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"";"closed" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Karl";"karl@example.org";"2013-06-01 22:37:03";"";"head_ref_deleted" +2;"CLOSED";"2013-04-21 23:52:09";"2014-05-25 20:02:08";"true";"Thomas";"thomas@example.org";"2016-07-19 10:47:25";"";"referenced" +48;"OPEN";"2016-04-17 02:06:38";;"false";"udo";"udo@example.org";"2016-04-17 02:07:37";"Karl";"mentioned" +48;"OPEN";"2016-04-17 02:06:38";;"false";"udo";"udo@example.org";"2016-04-17 02:07:37";"Karl";"subscribed" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Thomas";"thomas@example.org";"2016-07-14 02:03:14";;"commented" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-14 17:42:52";"";"commented" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-15 08:37:57";"Thomas";"mentioned" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-15 08:37:57";"Thomas";"subscribed" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Thomas";"thomas@example.org";"2016-07-15 08:37:57";"";"commented" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-27 22:25:25";"udo";"mentioned" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-27 22:25:25";"udo";"subscribed" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Olaf";"olaf@example.org";"2016-07-27 22:25:25";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"Claus Hunsen";"mentioned" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"Claus Hunsen";"subscribed" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 15:59:25";"";"created" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 16:03:23";"";"renamed" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 16:05:47";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-08-31 18:21:48";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Olaf";"olaf@example.org";"2016-10-05 01:07:46";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-10-13 15:33:56";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-06 14:03:42";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Olaf";"olaf@example.org";"2016-12-07 15:37:02";"";"merged" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Olaf";"olaf@example.org";"2016-12-07 15:37:02";"";"closed" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Olaf";"olaf@example.org";"2016-12-07 15:37:21";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-07 15:53:02";"";"commented" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-07 15:53:02";"";"created" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2017-02-20 22:25:41";"";"commented" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2017-03-02 17:30:10";"";"commented" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Max";"max@example.org";"2017-05-23 12:32:21";"";"merged" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Max";"max@example.org";"2017-05-23 12:32:21";"";"closed" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Max";"max@example.org";"2017-05-23 12:32:39";"";"commented" diff --git a/tests/codeface-data/results/testing/test_pasta/similar-mailbox b/tests/codeface-data/results/testing/test_pasta/similar-mailbox index 546e1bad..ffba64c4 100644 --- a/tests/codeface-data/results/testing/test_pasta/similar-mailbox +++ b/tests/codeface-data/results/testing/test_pasta/similar-mailbox @@ -2,4 +2,4 @@ => 5a5ec9675e98187e1e92561e1888aa6f04faa338 => 3a0ed78458b3976243db6829f63eba3eead26774 => 1143db502761379c2bfcecc2007fc34282e7ee61 - => 0a1a5c523d835459c42f33e863623138555e2526 + => 0a1a5c523d835459c42f33e863623138555e2526 72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0 diff --git a/tests/codeface-data/results/testing/test_proximity/proximity/issues.list b/tests/codeface-data/results/testing/test_proximity/proximity/issues.list index 9c6a939a..898bec9a 100644 --- a/tests/codeface-data/results/testing/test_proximity/proximity/issues.list +++ b/tests/codeface-data/results/testing/test_proximity/proximity/issues.list @@ -1,36 +1,36 @@ -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";11;"Karl";"karl@example.org";"2013-04-21 23:52:09";"created" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";11;"Karl";"karl@example.org";"2013-05-05 23:28:57";"commented" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";1;"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"referenced" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";1;"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"merged" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";1;"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"closed" -2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";11;"Karl";"karl@example.org";"2013-06-01 22:37:03";"head_ref_deleted" -2;"CLOSED";"2013-04-21 23:52:09";"2014-05-25 20:02:08";"true";1342;"Thomas";"thomas@example.org";"2016-07-19 10:47:25";"referenced" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";15;"udo";"udo@example.org";"2016-04-17 02:07:37";"mentioned" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";15;"udo";"udo@example.org";"2016-04-17 02:07:37";"subscribed" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";1350;"Thomas";"thomas@example.org";"2016-07-14 02:03:14";"commented" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-14 17:42:52";"commented" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-15 08:37:57";"mentioned" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-15 08:37:57";"subscribed" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";1350;"Thomas";"thomas@example.org";"2016-07-15 08:37:57";"commented" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-27 22:25:25";"mentioned" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-27 22:25:25";"subscribed" -48;"OPEN";"2016-04-17 02:06:38";"null";"false";1;"Olaf";"olaf@example.org";"2016-07-27 22:25:25";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1342;"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"mentioned" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1342;"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"subscribed" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 15:59:25";"created" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 16:03:23";"renamed" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 16:05:47";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-08-31 18:21:48";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1;"Olaf";"olaf@example.org";"2016-10-05 01:07:46";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-10-13 15:33:56";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-06 14:03:42";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1;"Olaf";"olaf@example.org";"2016-12-07 15:37:02";"merged" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1;"Olaf";"olaf@example.org";"2016-12-07 15:37:02";"closed" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";1;"Olaf";"olaf@example.org";"2016-12-07 15:37:21";"commented" -51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-07 15:53:02";"commented" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-07 15:53:02";"created" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2017-02-20 22:25:41";"commented" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";13;"Claus Hunsen";"hunsen@fim.uni-passau.de";"2017-03-02 17:30:10";"commented" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";1;"Max";"max@example.org";"2017-05-23 12:32:21";"merged" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";1;"Max";"max@example.org";"2017-05-23 12:32:21";"closed" -57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";1;"Max";"max@example.org";"2017-05-23 12:32:39";"commented" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Karl";"karl@example.org";"2013-04-21 23:52:09";"";"created" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Karl";"karl@example.org";"2013-05-05 23:28:57";"";"commented" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Karl";"karl@example.org";"2013-05-05 23:28:57";"";"referenced" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"";"merged" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Olaf";"olaf@example.org";"2013-05-25 20:02:08";"";"closed" +2;"CLOSED";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"true";"Karl";"karl@example.org";"2013-06-01 22:37:03";"";"head_ref_deleted" +2;"CLOSED";"2013-04-21 23:52:09";"2014-05-25 20:02:08";"true";"Thomas";"thomas@example.org";"2016-07-19 10:47:25";"";"referenced" +48;"OPEN";"2016-04-17 02:06:38";;"false";"udo";"udo@example.org";"2016-04-17 02:07:37";"Karl";"mentioned" +48;"OPEN";"2016-04-17 02:06:38";;"false";"udo";"udo@example.org";"2016-04-17 02:07:37";"Karl";"subscribed" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Thomas";"thomas@example.org";"2016-07-14 02:03:14";;"commented" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-14 17:42:52";"";"commented" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-15 08:37:57";"Thomas";"mentioned" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-15 08:37:57";"Thomas";"subscribed" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Thomas";"thomas@example.org";"2016-07-15 08:37:57";"";"commented" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-27 22:25:25";"udo";"mentioned" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-27 22:25:25";"udo";"subscribed" +48;"OPEN";"2016-04-17 02:06:38";;"false";"Olaf";"olaf@example.org";"2016-07-27 22:25:25";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"Claus Hunsen";"mentioned" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"Claus Hunsen";"subscribed" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 15:59:25";"";"created" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 16:03:23";"";"renamed" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-07-12 16:05:47";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-08-31 18:21:48";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Olaf";"olaf@example.org";"2016-10-05 01:07:46";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-10-13 15:33:56";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-06 14:03:42";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Olaf";"olaf@example.org";"2016-12-07 15:37:02";"";"merged" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Olaf";"olaf@example.org";"2016-12-07 15:37:02";"";"closed" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Olaf";"olaf@example.org";"2016-12-07 15:37:21";"";"commented" +51;"CLOSED";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-07 15:53:02";"";"commented" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2016-12-07 15:53:02";"";"created" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2017-02-20 22:25:41";"";"commented" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Claus Hunsen";"hunsen@fim.uni-passau.de";"2017-03-02 17:30:10";"";"commented" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Max";"max@example.org";"2017-05-23 12:32:21";"";"merged" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Max";"max@example.org";"2017-05-23 12:32:21";"";"closed" +57;"CLOSED";"2016-12-07 15:53:02";"2017-05-23 12:32:21";"true";"Max";"max@example.org";"2017-05-23 12:32:39";"";"commented" diff --git a/tests/test-data-cut.R b/tests/test-data-cut.R new file mode 100644 index 00000000..3b3f461d --- /dev/null +++ b/tests/test-data-cut.R @@ -0,0 +1,64 @@ +## (c) Christian Hechtl, 2017 +## hechtl@fim.uni-passau.de + + +context("Cutting functionality on ProjectData side.") + +## +## Context +## + +CF.DATA = file.path(".", "codeface-data") +CF.SELECTION.PROCESS = "testing" +CASESTUDY = "test" +ARTIFACT = "feature" + +## use only when debugging this file independently +if (!dir.exists(CF.DATA)) CF.DATA = file.path(".", "tests", "codeface-data") + +test_that("Cut commit and mail data to same date range.", { + + ## configurations + + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) + data.sources = c("mails", "commits") + + ## construct objects + + x.data = ProjectData$new(proj.conf) + + commit.data.expected = data.frame(commit.id=sprintf("", c(32712,32712,32713,32713)), + date=as.POSIXct(c("2016-07-12 15:58:59","2016-07-12 15:58:59","2016-07-12 16:00:45", + "2016-07-12 16:00:45")), + author.name=c("Claus Hunsen","Claus Hunsen","Olaf","Olaf"), + author.email=c("hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","olaf@example.org", + "olaf@example.org"), + hash=c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0","72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", + "5a5ec9675e98187e1e92561e1888aa6f04faa338","5a5ec9675e98187e1e92561e1888aa6f04faa338"), + changed.files=as.integer(c(1,1,1,1)), + added.lines=as.integer(c(1,1,1,1)), + deleted.lines=as.integer(c(1,1,0,0)), + diff.size=as.integer(c(2,2,1,1)), + file=c("test.c","test.c","test.c","test.c"), + artifact=c("A","defined(A)","A","defined(A)"), + artifact.type=c("Feature","FeatureExpression","Feature","FeatureExpression"), + artifact.diff.size=as.integer(c(1,1,1,1))) + + mail.data.expected = data.frame(author.name=c("Thomas"), + author.email=c("thomas@example.org"), + message.id=c("<65a1sf31sagd684dfv31@mail.gmail.com>"), + date=as.POSIXct(c("2016-07-12 16:04:40")), + date.offset=as.integer(c(100)), + subject=c("Re: Fw: busybox 2 tab"), + thread=sprintf("", c(9))) + + commit.data = x.data$get.data.cut.to.same.date(data.sources = data.sources)$get.commits() + rownames(commit.data) = 1:nrow(commit.data) + + mail.data = x.data$get.data.cut.to.same.date(data.sources = data.sources)$get.mails() + rownames(mail.data) = 1:nrow(mail.data) + + expect_identical(commit.data, commit.data.expected, info = "Cut Raw commit data.") + expect_identical(mail.data, mail.data.expected, info = "Cut mail data.") + +}) diff --git a/tests/test-networks-cut.R b/tests/test-networks-cut.R new file mode 100644 index 00000000..9d7985e1 --- /dev/null +++ b/tests/test-networks-cut.R @@ -0,0 +1,66 @@ +## (c) Christian Hechtl, 2017 +## hechtl@fim.uni-passau.de + + +context("Cutting functionality on NetworkBuilder side.") + +## +## Context +## + +CF.DATA = file.path(".", "codeface-data") +CF.SELECTION.PROCESS = "testing" +CASESTUDY = "test" +ARTIFACT = "feature" + +## use only when debugging this file independently +if (!dir.exists(CF.DATA)) CF.DATA = file.path(".", "tests", "codeface-data") + +test_that("Cut commit and mail data to same date range.", { + + ## configurations + + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) + net.conf = NetworkConf$new() + net.conf$update.value(entry = "unify.date.ranges", value = TRUE) + + ## construct objects + + x.data = ProjectData$new(proj.conf) + x = NetworkBuilder$new(x.data, net.conf) + + commit.data.expected = data.frame(commit.id=sprintf("", c(32712,32712,32713,32713)), + date=as.POSIXct(c("2016-07-12 15:58:59","2016-07-12 15:58:59","2016-07-12 16:00:45", + "2016-07-12 16:00:45")), + author.name=c("Claus Hunsen","Claus Hunsen","Olaf","Olaf"), + author.email=c("hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","olaf@example.org", + "olaf@example.org"), + hash=c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0","72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", + "5a5ec9675e98187e1e92561e1888aa6f04faa338","5a5ec9675e98187e1e92561e1888aa6f04faa338"), + changed.files=as.integer(c(1,1,1,1)), + added.lines=as.integer(c(1,1,1,1)), + deleted.lines=as.integer(c(1,1,0,0)), + diff.size=as.integer(c(2,2,1,1)), + file=c("test.c","test.c","test.c","test.c"), + artifact=c("A","defined(A)","A","defined(A)"), + artifact.type=c("Feature","FeatureExpression","Feature","FeatureExpression"), + artifact.diff.size=as.integer(c(1,1,1,1))) + + mail.data.expected = data.frame(author.name=c("Thomas"), + author.email=c("thomas@example.org"), + message.id=c("<65a1sf31sagd684dfv31@mail.gmail.com>"), + date=as.POSIXct(c("2016-07-12 16:04:40")), + date.offset=as.integer(c(100)), + subject=c("Re: Fw: busybox 2 tab"), + thread=sprintf("", c(9))) + + commit.data = x$get.project.data()$get.commits() + rownames(commit.data) = 1:nrow(commit.data) + + mail.data = x$get.project.data()$get.mails() + rownames(mail.data) = 1:nrow(mail.data) + + expect_identical(commit.data, commit.data.expected, info = "Cut Raw commit data.") + expect_identical(mail.data, mail.data.expected, info = "Cut mail data.") + +}) diff --git a/tests/test-read.R b/tests/test-read.R index 2d224797..6363c571 100644 --- a/tests/test-read.R +++ b/tests/test-read.R @@ -22,7 +22,7 @@ test_that("Read the raw commit data.", { proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) ## read the actual data - commit.data.read = read.commits.raw(proj.conf$get.value("datapath"), proj.conf$get.value("artifact")) + commit.data.read = read.commits(proj.conf$get.value("datapath"), proj.conf$get.value("artifact")) ## build the expected data.frame commit.data.expected = data.frame(commit.id=sprintf("", c(32712,32712,32713,32713,32710,32710,32714,32711,32711)), @@ -158,17 +158,20 @@ test_that("Read and parse the pasta data.", { ## build the expected data.frame pasta.data.expected = data.frame(message.id=c("","", "","", - "","",""), + "","","", + ""), commit.hash=c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0","5a5ec9675e98187e1e92561e1888aa6f04faa338", "3a0ed78458b3976243db6829f63eba3eead26774","1143db502761379c2bfcecc2007fc34282e7ee61", "1143db502761379c2bfcecc2007fc34282e7ee61","1143db502761379c2bfcecc2007fc34282e7ee61", - "0a1a5c523d835459c42f33e863623138555e2526")) + "0a1a5c523d835459c42f33e863623138555e2526", "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0")) ## check the results expect_identical(pasta.data.read, pasta.data.expected, info = "PaStA data.") }) test_that("Read and parse the issue data.", { + ## FIXME @Roger1995: update issues.list with a more recent content! + ## configuration object for the datapath proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) @@ -181,9 +184,10 @@ test_that("Read and parse the issue data.", { creation.date=as.POSIXct(rep(c("2013-04-21 23:52:09","2016-04-17 02:06:38","2016-07-12 15:59:25","2016-04-17 02:06:38","2013-04-21 23:52:09","2016-04-17 02:06:38","2016-07-12 15:59:25","2016-12-07 15:53:02"), c(6,2,5,5,1,3,8,6))), closing.date=as.POSIXct(rep(c("2013-05-25 20:02:08",NA,"2016-12-07 15:37:02",NA,"2014-05-25 20:02:08",NA,"2016-12-07 15:37:02","2017-05-23 12:32:21"), c(6,2,5,5,1,3,8,6))), is.pull.request=rep(c(TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,TRUE), c(6,2,5,5,1,3,8,6)), - author.name=c("Karl","Karl","Olaf","Olaf","Olaf","Karl","udo","udo","Thomas","Thomas","Claus Hunsen","Claus Hunsen","Claus Hunsen","Thomas","Claus Hunsen","Claus Hunsen","Claus Hunsen","Thomas","Thomas","Claus Hunsen","Claus Hunsen","Olaf","Claus Hunsen","Olaf","Claus Hunsen","Claus Hunsen","Olaf","Olaf","Olaf","Claus Hunsen","Claus Hunsen","Claus Hunsen","Claus Hunsen","Max","Max","Max"), - author.email=c("karl@example.org","karl@example.org","olaf@example.org","olaf@example.org","olaf@example.org","karl@example.org","udo@example.org","udo@example.org","thomas@example.org","thomas@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","thomas@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","thomas@example.org","thomas@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","olaf@example.org","hunsen@fim.uni-passau.de","olaf@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","olaf@example.org","olaf@example.org","olaf@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","max@example.org","max@example.org","max@example.org"), - date=as.POSIXct(c("2013-04-21 23:52:09","2013-05-05 23:28:57","2013-05-25 20:02:08","2013-05-25 20:02:08","2013-05-25 20:02:08","2013-06-01 22:37:03","2016-04-17 02:07:37","2016-04-17 02:07:37","2016-07-12 15:59:25","2016-07-12 15:59:25","2016-07-12 15:59:25","2016-07-12 16:03:23","2016-07-12 16:05:47","2016-07-14 02:03:14","2016-07-14 17:42:52","2016-07-15 08:37:57","2016-07-15 08:37:57","2016-07-15 08:37:57","2016-07-19 10:47:25","2016-07-27 22:25:25","2016-07-27 22:25:25","2016-07-27 22:25:25","2016-08-31 18:21:48","2016-10-05 01:07:46","2016-10-13 15:33:56","2016-12-06 14:03:42","2016-12-07 15:37:02","2016-12-07 15:37:02","2016-12-07 15:37:21","2016-12-07 15:53:02","2016-12-07 15:53:02","2017-02-20 22:25:41","2017-03-02 17:30:10","2017-05-23 12:32:21","2017-05-23 12:32:21","2017-05-23 12:32:39")), + author.name=c("Karl","Karl","Karl","Olaf","Olaf","Karl","udo","udo","Thomas","Thomas","Claus Hunsen","Claus Hunsen","Claus Hunsen","Thomas","Claus Hunsen","Claus Hunsen","Claus Hunsen","Thomas","Thomas","Claus Hunsen","Claus Hunsen","Olaf","Claus Hunsen","Olaf","Claus Hunsen","Claus Hunsen","Olaf","Olaf","Olaf","Claus Hunsen","Claus Hunsen","Claus Hunsen","Claus Hunsen","Max","Max","Max"), + author.email=c("karl@example.org","karl@example.org","karl@example.org","olaf@example.org","olaf@example.org","karl@example.org","udo@example.org","udo@example.org","thomas@example.org","thomas@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","thomas@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","thomas@example.org","thomas@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","olaf@example.org","hunsen@fim.uni-passau.de","olaf@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","olaf@example.org","olaf@example.org","olaf@example.org","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","hunsen@fim.uni-passau.de","max@example.org","max@example.org","max@example.org"), + date=as.POSIXct(c("2013-04-21 23:52:09","2013-05-05 23:28:57","2013-05-05 23:28:57","2013-05-25 20:02:08","2013-05-25 20:02:08","2013-06-01 22:37:03","2016-04-17 02:07:37","2016-04-17 02:07:37","2016-07-12 15:59:25","2016-07-12 15:59:25","2016-07-12 15:59:25","2016-07-12 16:03:23","2016-07-12 16:05:47","2016-07-14 02:03:14","2016-07-14 17:42:52","2016-07-15 08:37:57","2016-07-15 08:37:57","2016-07-15 08:37:57","2016-07-19 10:47:25","2016-07-27 22:25:25","2016-07-27 22:25:25","2016-07-27 22:25:25","2016-08-31 18:21:48","2016-10-05 01:07:46","2016-10-13 15:33:56","2016-12-06 14:03:42","2016-12-07 15:37:02","2016-12-07 15:37:02","2016-12-07 15:37:21","2016-12-07 15:53:02","2016-12-07 15:53:02","2017-02-20 22:25:41","2017-03-02 17:30:10","2017-05-23 12:32:21","2017-05-23 12:32:21","2017-05-23 12:32:39")), + ref.name=c(rep("", 6), rep("Karl", 2), rep("Claus Hunsen", 2), rep("", 5), rep("Thomas", 2), rep("", 2), rep("udo", 2), rep("", 15)), event.name=c("created","commented","referenced","merged","closed","head_ref_deleted","mentioned","subscribed","mentioned","subscribed","created","renamed","commented","commented","commented","mentioned","subscribed","commented","referenced","mentioned","subscribed","commented","commented","commented","commented","commented","merged","closed","commented","commented","created","commented","commented","merged","closed","commented")) ## calculate event IDs issue.data.expected[["event.id"]] = sapply( diff --git a/tests/test-split.R b/tests/test-split.R index 9f34aa0e..2f2ac965 100644 --- a/tests/test-split.R +++ b/tests/test-split.R @@ -46,7 +46,7 @@ test_that("Split a data object time-based (split.basis == 'commits').", { ## data object project.data = ProjectData$new(proj.conf) data = list( - commits.raw = project.data$get.commits.raw(), + commits = project.data$get.commits(), mails = project.data$get.mails(), issues = project.data$get.issues(), synchronicity = project.data$get.synchronicity(), @@ -68,10 +68,10 @@ test_that("Split a data object time-based (split.basis == 'commits').", { ## check data for all ranges expected.data = list( - commits.raw = list( - "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$commits.raw[1:4, ], + commits = list( + "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$commits[1:4, ], "2016-07-12 16:01:59-2016-07-12 16:04:59" = data.frame(), - "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$commits.raw[5:9, ] + "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$commits[5:9, ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:01:59" = data.frame(), @@ -95,7 +95,7 @@ test_that("Split a data object time-based (split.basis == 'commits').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -119,7 +119,7 @@ test_that("Split a data object time-based (split.basis == 'mails').", { ## data object project.data = ProjectData$new(proj.conf) data = list( - commits.raw = project.data$get.commits.raw(), + commits = project.data$get.commits(), mails = project.data$get.mails(), issues = project.data$get.issues(), synchronicity = project.data$get.synchronicity(), @@ -142,11 +142,11 @@ test_that("Split a data object time-based (split.basis == 'mails').", { ## check data for all ranges expected.data = list( - commits.raw = list( + commits = list( "2004-10-09 18:38:13-2007-10-09 18:38:13" = data.frame(), "2007-10-09 18:38:13-2010-10-09 18:38:13" = data.frame(), "2010-10-09 18:38:13-2013-10-09 18:38:13" = data.frame(), - "2013-10-09 18:38:13-2016-07-12 16:05:38" = data$commits.raw[1:4, ] + "2013-10-09 18:38:13-2016-07-12 16:05:38" = data$commits[1:4, ] ), mails = list( "2004-10-09 18:38:13-2007-10-09 18:38:13" = data$mails[rownames(data$mails) %in% 1:2, ], @@ -174,7 +174,7 @@ test_that("Split a data object time-based (split.basis == 'mails').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -198,7 +198,7 @@ test_that("Split a data object time-based (split.basis == 'issues').", { ## data object project.data = ProjectData$new(proj.conf) data = list( - commits.raw = project.data$get.commits.raw(), + commits = project.data$get.commits(), mails = project.data$get.mails(), issues = project.data$get.issues(), synchronicity = project.data$get.synchronicity(), @@ -220,9 +220,9 @@ test_that("Split a data object time-based (split.basis == 'issues').", { ## check data for all ranges expected.data = list( - commits.raw = list( + commits = list( "2013-04-21 23:52:09-2015-04-21 23:52:09" = data.frame(), - "2015-04-21 23:52:09-2017-04-21 23:52:09" = data$commits.raw, + "2015-04-21 23:52:09-2017-04-21 23:52:09" = data$commits, "2017-04-21 23:52:09-2017-05-23 12:32:40" = data.frame() ), mails = list( @@ -247,7 +247,7 @@ test_that("Split a data object time-based (split.basis == 'issues').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -271,7 +271,7 @@ test_that("Split a data object time-based (bins == ... ).", { ## data object project.data = ProjectData$new(proj.conf) data = list( - commits.raw = project.data$get.commits.raw(), + commits = project.data$get.commits(), mails = project.data$get.mails(), issues = project.data$get.issues(), synchronicity = project.data$get.synchronicity(), @@ -291,8 +291,8 @@ test_that("Split a data object time-based (bins == ... ).", { ## check data for all ranges expected.data = list( - commits.raw = list( - "2016-01-01 00:00:00-2016-12-31 23:59:59" = data$commits.raw + commits = list( + "2016-01-01 00:00:00-2016-12-31 23:59:59" = data$commits ), mails = list( "2016-01-01 00:00:00-2016-12-31 23:59:59" = data$mails[rownames(data$mails) %in% 13:17, ] @@ -308,7 +308,7 @@ test_that("Split a data object time-based (bins == ... ).", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -332,7 +332,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ## data object project.data = ProjectData$new(proj.conf) data = list( - commits.raw = project.data$get.commits.raw(), + commits = project.data$get.commits(), mails = project.data$get.mails(), issues = project.data$get.issues(), synchronicity = project.data$get.synchronicity(), @@ -354,10 +354,10 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ## check data for all ranges expected.data = list( - commits.raw = list( - "2016-07-12 15:58:59-2016-07-12 16:05:41" = data$commits.raw[1:4, ], - "2016-07-12 16:05:41-2016-07-12 16:06:32" = data$commits.raw[5:7, ], - "2016-07-12 16:06:32-2016-07-12 16:06:33" = data$commits.raw[8:9, ] + commits = list( + "2016-07-12 15:58:59-2016-07-12 16:05:41" = data$commits[1:4, ], + "2016-07-12 16:05:41-2016-07-12 16:06:32" = data$commits[5:7, ], + "2016-07-12 16:06:32-2016-07-12 16:06:33" = data$commits[8:9, ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:05:41" = data$mails[rownames(data$mails) %in% 16:17, ], @@ -381,7 +381,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -394,7 +394,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ## ## split data - results = split.data.activity.based(project.data, activity.amount = nrow(data$commits.raw) + 10, + results = split.data.activity.based(project.data, activity.amount = nrow(data$commits) + 10, activity.type = "commits", sliding.window = FALSE) ## check time ranges @@ -406,8 +406,8 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ## check data for all ranges expected.data = list( - commits.raw = list( - "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$commits.raw + commits = list( + "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$commits ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$mails[rownames(data$mails) %in% 16:17, ] @@ -423,7 +423,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -449,9 +449,9 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ## check data for all ranges expected.data = list( - commits.raw = list( - "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$commits.raw[1:6, ], - "2016-07-12 16:06:10-2016-07-12 16:06:33" = data$commits.raw[7:9, ] + commits = list( + "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$commits[1:6, ], + "2016-07-12 16:06:10-2016-07-12 16:06:33" = data$commits[7:9, ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$mails[rownames(data$mails) %in% 16:17, ], @@ -471,7 +471,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -482,7 +482,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { ## too large number of windows expect_error( - split.data.activity.based(project.data, activity.type = "commits", number.windows = nrow(project.data$get.commits.raw()) + 10), + split.data.activity.based(project.data, activity.type = "commits", number.windows = nrow(project.data$get.commits()) + 10), info = "Error expected (number.windows) (1)." ) @@ -507,7 +507,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ## data object project.data = ProjectData$new(proj.conf) data = list( - commits.raw = project.data$get.commits.raw(), + commits = project.data$get.commits(), mails = project.data$get.mails(), issues = project.data$get.issues(), synchronicity = project.data$get.synchronicity(), @@ -532,12 +532,12 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ## check data for all ranges expected.data = list( - commits.raw = list( + commits = list( "2004-10-09 18:38:13-2010-07-12 11:05:35" = data.frame(), "2010-07-12 11:05:35-2010-07-12 12:05:41" = data.frame(), "2010-07-12 12:05:41-2010-07-12 12:05:44" = data.frame(), "2010-07-12 12:05:44-2016-07-12 15:58:40" = data.frame(), - "2016-07-12 15:58:40-2016-07-12 16:05:37" = data$commits.raw[1:4, ], + "2016-07-12 15:58:40-2016-07-12 16:05:37" = data$commits[1:4, ], "2016-07-12 16:05:37-2016-07-12 16:05:38" = data.frame() ), mails = list( @@ -574,7 +574,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -599,8 +599,8 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ## check data for all ranges expected.data = list( - commits.raw = list( - "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$commits.raw[1:4, ] + commits = list( + "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$commits[1:4, ] ), mails = list( "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$mails @@ -616,7 +616,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -642,9 +642,9 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ## check data for all ranges expected.data = list( - commits.raw = list( + commits = list( "2004-10-09 18:38:13-2010-07-12 12:05:43" = data.frame(), - "2010-07-12 12:05:43-2016-07-12 16:05:38" = data$commits.raw[1:4, ] + "2010-07-12 12:05:43-2016-07-12 16:05:38" = data$commits[1:4, ] ), mails = list( "2004-10-09 18:38:13-2010-07-12 12:05:43" = data$mails[rownames(data$mails) %in% 1:8, ], @@ -664,7 +664,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -699,7 +699,7 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ## data object project.data = ProjectData$new(proj.conf) data = list( - commits.raw = project.data$get.commits.raw(), + commits = project.data$get.commits(), mails = project.data$get.mails(), issues = project.data$get.issues(), synchronicity = project.data$get.synchronicity(), @@ -722,9 +722,9 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ## check data for all ranges expected.data = list( - commits.raw = list( - "2013-04-21 23:52:09-2016-07-12 16:05:47" = data$commits.raw[1:6, ], - "2016-07-12 16:05:47-2016-08-31 18:21:48" = data$commits.raw[7:9, ], + commits = list( + "2013-04-21 23:52:09-2016-07-12 16:05:47" = data$commits[1:6, ], + "2016-07-12 16:05:47-2016-08-31 18:21:48" = data$commits[7:9, ], "2016-08-31 18:21:48-2017-02-20 22:25:41" = data.frame(), "2017-02-20 22:25:41-2017-05-23 12:32:40" = data.frame() ), @@ -754,7 +754,7 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -779,8 +779,8 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ## check data for all ranges expected.data = list( - commits.raw = list( - "2013-04-21 23:52:09-2017-05-23 12:32:40" = data$commits.raw + commits = list( + "2013-04-21 23:52:09-2017-05-23 12:32:40" = data$commits ), mails = list( "2013-04-21 23:52:09-2017-05-23 12:32:40" = data$mails[rownames(data$mails) %in% 14:17, ] @@ -796,7 +796,7 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), @@ -822,8 +822,8 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ## check data for all ranges expected.data = list( - commits.raw = list( - "2013-04-21 23:52:09-2016-07-27 22:25:25" = data$commits.raw, + commits = list( + "2013-04-21 23:52:09-2016-07-27 22:25:25" = data$commits, "2016-07-27 22:25:25-2017-05-23 12:32:40" = data.frame() ), mails = list( @@ -844,7 +844,7 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ) ) results.data = list( - commits.raw = lapply(results, function(cf.data) cf.data$get.commits.raw()), + commits = lapply(results, function(cf.data) cf.data$get.commits()), mails = lapply(results, function(cf.data) cf.data$get.mails()), issues = lapply(results, function(cf.data) cf.data$get.issues()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()), diff --git a/util-conf.R b/util-conf.R index f7d8decd..9ad2283d 100644 --- a/util-conf.R +++ b/util-conf.R @@ -299,6 +299,326 @@ Conf = R6::R6Class("Conf", ) +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## ProjectConf ------------------------------------------------------------- + +ProjectConf = R6::R6Class("ProjectConf", inherit = Conf, + + ## * private ----------------------------------------------------------- + + private = list( + + ## * * project info ------------------------------------------------ + + data = NULL, # character + selection.process = NULL, # character + casestudy = NULL, # character + artifact = NULL, # character + + ## * * attributes --------------------------------------------------- + + attributes = list( + artifact.filter.base = list( + default = TRUE, + type = "logical", + allowed = c(TRUE, FALSE), + allowed.number = 1 + ), + synchronicity = list( + default = FALSE, + type = "logical", + allowed = c(TRUE, FALSE), + allowed.number = 1 + ), + synchronicity.time.window = list( + default = 5, + type = "numeric", + allowed = c(1, 5, 10, 15), + allowed.number = 1 + ), + pasta = list( + default = FALSE, + type = "logical", + allowed = c(TRUE, FALSE), + allowed.number = 1 + ) + ), + + ## * * revisions and ranges ---------------------------------------- + + #' Change the revision names to a equal name standard. + #' + #' @param ranges the list of ranges to be postprocessed + #' + #' @return the postprocessed ranges + postprocess.revision.list = function(ranges) { + # remove names ,e.g. "version", from release cycle names + casestudy = private$casestudy + to.remove = c( + "version-", "v-","version_", "v_","version", "v", + paste0(casestudy, "-"), paste0(casestudy,"-"), + paste0(casestudy, "_"), paste0(casestudy,"_"), + casestudy, casestudy + ) + + # run gsub for all pattern + ranges = tolower(ranges) + for (string in to.remove) { + ranges = gsub(string, "", ranges) + } + + # return simplified list of ranges + return(ranges) + }, + + #' Change the revision names of callgraph data to a equal name standard. + #' + #' @param r list of revisions to be postprocessed + #' + #' @return list of postprocessed revisions + postprocess.revision.list.for.callgraph.data = function(r) { + r = gsub("version-", "", r) # remove version prefix (SQLite) + r = gsub("OpenSSL_", "", r) # remove name prefix (OpenSSL) + r = gsub("\\.", "_", r) # replace dots by underscores + return(r) + }, + + ## * * path construction ------------------------------------------- + + subfolder.configurations = "configurations", + subfolder.results = "results", + + #' Construct and return the path to the configuration folder of Codeface. + #' + #' @param data the path to the codeface-data folder + #' @param selection.process the selection process of the current study ('threemonth', 'releases') + #' + #' @return the path to the configuration folder + get.configurations.folder = function(data, selection.process) { + return(file.path(data, private$subfolder.configurations, selection.process)) + + }, + + #' Construct and return the path to a Codeface configuration. + #' + #' @param data the path to the codeface-data folder + #' @param selection.process the selection process of the current study ('threemonth', 'releases') + #' @param casestudy the current casestudy + #' @param tagging the current tagging ('feature', 'proximity') + #' + #' @return the path to the configuration + construct.conf.path = function(data, selection.process, casestudy, tagging) { + ## construct the base name of the configuration + conf.basename = paste(casestudy, "_", tagging, ".conf", sep = "") + ## construct complete path + conf.file = file.path(private$get.configurations.folder(data, selection.process), conf.basename) + ## return path to config file + return(conf.file) + }, + + #' Construct and return the path to the results folder of Codeface. + #' + #' @param data the path to the codeface-data folder + #' @param selection.process the selection process of the current study ('threemonth', 'releases') + #' @param casestudy the current casestudy + #' @param suffix the suffix of the casestudy's results folder + #' @param subfolder an optional subfolder + #' + #' @return the path to the results folder + #' (i.e., "{data}/{selection.process}/{casestudy}_{suffix}[/{subfolder}]") + get.results.folder = function(data, selection.process, casestudy, suffix, subfolder = NULL) { + path = file.path(data, private$subfolder.results, selection.process, paste(casestudy, suffix, sep = "_")) + if (!is.null(subfolder)) { + path = file.path(path, subfolder) + } + return(path) + } + + ), + + ## * public ------------------------------------------------------------ + + public = list( + + #' Constructor of the class. + #' + #' @param data the path to the codeface-data folder + #' @param selection.process the selection process of the current study ('threemonth', 'releases') + #' @param casestudy the current casestudy + #' @param artifact the artifact to study ('feature','function','file') + initialize = function(data, selection.process, casestudy, artifact = "feature") { + super$initialize() + + if (!missing(data) && is.character(data)) { + private$data <- data + } + if (!missing(selection.process) && is.character(selection.process)) { + private$selection.process <- selection.process + } + if (!missing(casestudy) && is.character(casestudy)) { + private$casestudy <- casestudy + } + if (!missing(artifact) && is.character(artifact)) { + private$artifact <- artifact + } + + logging::loginfo("Construct configuration: starting.") + + ## convert artifact to tagging + tagging = ARTIFACT.TO.TAGGING[[ artifact ]] + if (is.null(tagging)) { + logging::logerror("Artifact '%s' cannot be converted to a proper Codeface tagging! Stopping...", artifact) + stop("Stopped due to wrong configuration parameters!") + } + ## construct file name for configuration + conf.file = private$construct.conf.path(data, selection.process, casestudy, tagging) + + ## load case-study confuration from given file + logging::loginfo("Attempting to load configuration file: %s", conf.file) + conf = yaml::yaml.load_file(conf.file) + + ## store basic information + conf$selection.process = selection.process + conf$casestudy = casestudy + + ## store artifact in configuration + conf$artifact = artifact + conf$artifact.short = ARTIFACT.TO.ABBREVIATION[[ conf$artifact ]] + conf$artifact.codeface = ARTIFACT.CODEFACE[[ conf$artifact ]] + ## store path to actual Codeface data + conf$datapath = private$get.results.folder(data, selection.process, casestudy, tagging, subfolder = tagging) + ## store path to call graphs + conf$datapath.callgraph = private$get.results.folder(data, selection.process, casestudy, "callgraphs") + ## store path to synchronicity data + conf$datapath.synchronicity = private$get.results.folder(data, selection.process, casestudy, "synchronicity") + ## store path to pasta data + conf$datapath.pasta = private$get.results.folder(data, selection.process, casestudy, "pasta") + ## store path to issue data + conf$datapath.issues = private$get.results.folder(data, selection.process, casestudy, tagging, subfolder = tagging) + + ## READ REVISIONS META-DATA + + ## read revisions file + revisions.file = file.path(conf$datapath, "revisions.list") + revisions.df <- try(read.table(revisions.file, header = FALSE, sep = ";", strip.white = TRUE, + encoding = "UTF-8"), silent = TRUE) + ## break if the list of revisions is empty or any other error occurs + if (inherits(revisions.df, 'try-error')) { + logging::logerror("There are no revisions available for the current casestudy.") + logging::logerror("Attempted to load following file: %s", revisions.file) + stop("Stopped due to missing revisions.") + } + ## convert columns accordingly + revisions.cols = c(revision = "as.character", date = "as.POSIXct") + for (i in 1:ncol(revisions.df)) { + revisions.df[i] = do.call(c, lapply(revisions.df[[i]], revisions.cols[i])) + colnames(revisions.df)[i] = names(revisions.cols)[i] + } + revisions = revisions.df[["revision"]] + revisions.dates = revisions.df[["date"]] + if (!is.null(revisions.dates)) names(revisions.dates) = revisions + conf[["revisions"]] = NULL + + ## change structure of values (i.e., insert 'default' sublists) + conf = lapply(conf, function(entry) { + return(list(value = entry, updatable = FALSE)) + }) + + ## SAVE FULL CONFIGURATION OBJECT + private$attributes = c(conf, private$attributes) + + ## construct and save revisions and ranges + ## (this has to be done after storing conf due to the needed access to the conf object) + self$set.revisions(revisions, revisions.dates) + + # ## logging + # self$print(allowed = TRUE) + + logging::loginfo("Construct configuration: finished.") + }, + + ## * * helper methods ---------------------------------------------- + + #' Get the corresponding callgraph revision for the given range. + #' + #' @param range the range for the callgraph revisions + #' + #' @return the callgraph revisions + get.callgraph.revision.from.range = function(range) { + idx = which(self$get.value("ranges") == range) + rev = self$get.value("revisions.callgraph")[idx + 1] + return(rev) + }, + + ## * * updating revisions and splitting information ---------------- + + #' Set the revisions and ranges for the study. + #' + #' @param revisions the revisions of the study + #' @param revisions.dates the revision dates of the study + #' @param sliding.window whether sliding window splitting is enabled or not + #' default: 'FALSE' + set.revisions = function(revisions, revisions.dates, sliding.window = FALSE) { + ## construct revisions for call-graph data + revisions.callgraph = private$postprocess.revision.list.for.callgraph.data(revisions) + + ## assemble revision data + rev.data = list( + revisions = revisions, + revisions.dates = revisions.dates, + revisions.callgraph = revisions.callgraph, + ranges = construct.ranges(revisions, sliding.window = sliding.window), + ranges.callgraph = construct.ranges(revisions.callgraph, sliding.window = sliding.window) + ) + ## change structure of values (i.e., insert 'default' sublists and set 'updatable' value) + rev.data = lapply(rev.data, function(entry) { + return(list(value = entry, updatable = FALSE)) + }) + + ## insert new values (update if needed) + for (name in names(rev.data)) { + private[["attributes"]][[name]] = rev.data[[name]] + } + }, + + #' Update the information on revisions and ranges regarding splitting. + #' + #' @param type either "time-based" or "activity-based", depending on splitting function + #' @param length the string given to time-based splitting (e.g., "3 months") or the activity + #' amount given to acitivity-based splitting + #' @param basis the data used as basis for splitting (either "commits", "mails", or "issues") + #' @param sliding.window whether sliding window splitting is enabled or not [default: FALSE] + #' @param revisions the revisions of the study + #' @param revisions.dates the revision dates of the study + set.splitting.info = function(type, length, basis, sliding.window, revisions, revisions.dates) { + ## assemble splitting information + split.info = list( + ## basic slpitting information + split.type = type, + split.length = length, + split.basis = basis, + split.sliding.window = sliding.window, + ## splitting information on ranges + split.revisions = revisions, + split.revisions.dates = revisions.dates, + split.ranges = construct.ranges(revisions, sliding.window = sliding.window) + + ) + ## change structure of values (i.e., insert 'default' sublists and set 'updatable' value) + split.info = lapply(split.info, function(entry) { + return(list(value = entry, updatable = FALSE)) + }) + + ## insert new values (update if needed) + for (name in names(split.info)) { + private[["attributes"]][[name]] = split.info[[name]] + } + } + + ) +) + + ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## NetworkConf ------------------------------------------------------------- @@ -385,6 +705,12 @@ NetworkConf = R6::R6Class("NetworkConf", inherit = Conf, type = "numeric", allowed = Inf, allowed.number = 1 + ), + unify.date.ranges = list( + default = FALSE, + type = "logical", + allowed = c(TRUE, FALSE), + allowed.number = 1 ) ) @@ -419,327 +745,6 @@ NetworkConf = R6::R6Class("NetworkConf", inherit = Conf, ) -## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / -## ProjectConf ------------------------------------------------------------- - -ProjectConf = R6::R6Class("ProjectConf", inherit = Conf, - - ## * private ----------------------------------------------------------- - - private = list( - - ## * * project info ------------------------------------------------ - - data = NULL, # character - selection.process = NULL, # character - casestudy = NULL, # character - artifact = NULL, # character - - ## * * attributes --------------------------------------------------- - - attributes = list( - artifact.filter.base = list( - default = TRUE, - type = "logical", - allowed = c(TRUE, FALSE), - allowed.number = 1 - ), - synchronicity = list( - default = FALSE, - type = "logical", - allowed = c(TRUE, FALSE), - allowed.number = 1 - ), - synchronicity.time.window = list( - default = 5, - type = "numeric", - allowed = c(1, 5, 10, 15), - allowed.number = 1 - ), - pasta = list( - default = FALSE, - type = "logical", - allowed = c(TRUE, FALSE), - allowed.number = 1 - ) - ), - - ## * * revisions and ranges ---------------------------------------- - - #' Change the revision names to a equal name standard. - #' - #' @param ranges the list of ranges to be postprocessed - #' - #' @return the postprocessed ranges - postprocess.revision.list = function(ranges) { - # remove names ,e.g. "version", from release cycle names - casestudy = private$casestudy - to.remove = c( - "version-", "v-","version_", "v_","version", "v", - paste0(casestudy, "-"), paste0(casestudy,"-"), - paste0(casestudy, "_"), paste0(casestudy,"_"), - casestudy, casestudy - ) - - # run gsub for all pattern - ranges = tolower(ranges) - for (string in to.remove) { - ranges = gsub(string, "", ranges) - } - - # return simplified list of ranges - return(ranges) - }, - - #' Change the revision names of callgraph data to a equal name standard. - #' - #' @param r list of revisions to be postprocessed - #' - #' @return list of postprocessed revisions - postprocess.revision.list.for.callgraph.data = function(r) { - r = gsub("version-", "", r) # remove version prefix (SQLite) - r = gsub("OpenSSL_", "", r) # remove name prefix (OpenSSL) - r = gsub("\\.", "_", r) # replace dots by underscores - return(r) - }, - - ## * * path construction ------------------------------------------- - - subfolder.configurations = "configurations", - subfolder.results = "results", - - #' Construct and return the path to the configuration folder of Codeface. - #' - #' @param data the path to the codeface-data folder - #' @param selection.process the selection process of the current study ('threemonth', 'releases') - #' - #' @return the path to the configuration folder - get.configurations.folder = function(data, selection.process) { - return(file.path(data, private$subfolder.configurations, selection.process)) - }, - - #' Construct and return the path to a Codeface configuration. - #' - #' @param data the path to the codeface-data folder - #' @param selection.process the selection process of the current study ('threemonth', 'releases') - #' @param casestudy the current casestudy - #' @param tagging the current tagging ('feature', 'proximity') - #' - #' @return the path to the configuration - construct.conf.path = function(data, selection.process, casestudy, tagging) { - ## construct the base name of the configuration - conf.basename = paste(casestudy, "_", tagging, ".conf", sep = "") - ## construct complete path - conf.file = file.path(private$get.configurations.folder(data, selection.process), conf.basename) - ## return path to config file - return(conf.file) - }, - - #' Construct and return the path to the results folder of Codeface. - #' - #' @param data the path to the codeface-data folder - #' @param selection.process the selection process of the current study ('threemonth', 'releases') - #' @param casestudy the current casestudy - #' @param suffix the suffix of the casestudy's results folder - #' @param subfolder an optional subfolder - #' - #' @return the path to the results folder - #' (i.e., "{data}/{selection.process}/{casestudy}_{suffix}[/{subfolder}]") - get.results.folder = function(data, selection.process, casestudy, suffix, subfolder = NULL) { - path = file.path(data, private$subfolder.results, selection.process, paste(casestudy, suffix, sep = "_")) - if (!is.null(subfolder)) { - path = file.path(path, subfolder) - } - return(path) - } - - ), - - ## * public ------------------------------------------------------------ - - public = list( - - #' Constructor of the class. - #' - #' @param data the path to the codeface-data folder - #' @param selection.process the selection process of the current study ('threemonth', 'releases') - #' @param casestudy the current casestudy - #' @param artifact the artifact to study ('feature','function','file') - initialize = function(data, selection.process, casestudy, artifact = "feature") { - super$initialize() - - if (!missing(data) && is.character(data)) { - private$data <- data - } - if (!missing(selection.process) && is.character(selection.process)) { - private$selection.process <- selection.process - } - if (!missing(casestudy) && is.character(casestudy)) { - private$casestudy <- casestudy - } - if (!missing(artifact) && is.character(artifact)) { - private$artifact <- artifact - } - - logging::loginfo("Construct configuration: starting.") - - ## convert artifact to tagging - tagging = ARTIFACT.TO.TAGGING[[ artifact ]] - if (is.null(tagging)) { - logging::logerror("Artifact '%s' cannot be converted to a proper Codeface tagging! Stopping...", artifact) - stop("Stopped due to wrong configuration parameters!") - } - - ## construct file name for configuration - conf.file = private$construct.conf.path(data, selection.process, casestudy, tagging) - - ## load case-study confuration from given file - logging::loginfo("Attempting to load configuration file: %s", conf.file) - conf = yaml::yaml.load_file(conf.file) - - ## store basic information - conf$selection.process = selection.process - conf$casestudy = casestudy - - ## store artifact in configuration - conf$artifact = artifact - conf$artifact.short = ARTIFACT.TO.ABBREVIATION[[ conf$artifact ]] - conf$artifact.codeface = ARTIFACT.CODEFACE[[ conf$artifact ]] - - ## store path to actual Codeface data - conf$datapath = private$get.results.folder(data, selection.process, casestudy, tagging, subfolder = tagging) - ## store path to call graphs - conf$datapath.callgraph = private$get.results.folder(data, selection.process, casestudy, "callgraphs") - ## store path to synchronicity data - conf$datapath.synchronicity = private$get.results.folder(data, selection.process, casestudy, "synchronicity") - ## store path to pasta data - conf$datapath.pasta = private$get.results.folder(data, selection.process, casestudy, "pasta") - ## store path to issue data - conf$datapath.issues = private$get.results.folder(data, selection.process, casestudy, tagging, subfolder = tagging) - - ## READ REVISIONS META-DATA - - ## read revisions file - revisions.file = file.path(conf$datapath, "revisions.list") - revisions.df <- try(read.table(revisions.file, header = FALSE, sep = ";", strip.white = TRUE, - encoding = "UTF-8"), silent = TRUE) - ## break if the list of revisions is empty or any other error occurs - if (inherits(revisions.df, 'try-error')) { - logging::logerror("There are no revisions available for the current casestudy.") - logging::logerror("Attempted to load following file: %s", revisions.file) - stop("Stopped due to missing revisions.") - } - ## convert columns accordingly - revisions.cols = c(revision = "as.character", date = "as.POSIXct") - for (i in 1:ncol(revisions.df)) { - revisions.df[i] = do.call(c, lapply(revisions.df[[i]], revisions.cols[i])) - colnames(revisions.df)[i] = names(revisions.cols)[i] - } - revisions = revisions.df[["revision"]] - revisions.dates = revisions.df[["date"]] - if (!is.null(revisions.dates)) names(revisions.dates) = revisions - conf[["revisions"]] = NULL - - ## change structure of values (i.e., insert 'default' sublists) - conf = lapply(conf, function(entry) { - return(list(value = entry, updatable = FALSE)) - }) - - ## SAVE FULL CONFIGURATION OBJECT - private$attributes = c(conf, private$attributes) - - ## construct and save revisions and ranges - ## (this has to be done after storing conf due to the needed access to the conf object) - self$set.revisions(revisions, revisions.dates) - - # ## logging - # self$print(allowed = TRUE) - - logging::loginfo("Construct configuration: finished.") - }, - - ## * * helper methods ---------------------------------------------- - - #' Get the corresponding callgraph revision for the given range. - #' - #' @param range the range for the callgraph revisions - #' - #' @return the callgraph revisions - get.callgraph.revision.from.range = function(range) { - idx = which(self$get.value("ranges") == range) - rev = self$get.value("revisions.callgraph")[idx + 1] - return(rev) - }, - - ## * * updating revisions and splitting information ---------------- - - #' Set the revisions and ranges for the study. - #' - #' @param revisions the revisions of the study - #' @param revisions.dates the revision dates of the study - #' @param sliding.window whether sliding window splitting is enabled or not - #' default: 'FALSE' - set.revisions = function(revisions, revisions.dates, sliding.window = FALSE) { - ## construct revisions for call-graph data - revisions.callgraph = private$postprocess.revision.list.for.callgraph.data(revisions) - - ## assemble revision data - rev.data = list( - revisions = revisions, - revisions.dates = revisions.dates, - revisions.callgraph = revisions.callgraph, - ranges = construct.ranges(revisions, sliding.window = sliding.window), - ranges.callgraph = construct.ranges(revisions.callgraph, sliding.window = sliding.window) - ) - ## change structure of values (i.e., insert 'default' sublists and set 'updatable' value) - rev.data = lapply(rev.data, function(entry) { - return(list(value = entry, updatable = FALSE)) - }) - - ## insert new values (update if needed) - for (name in names(rev.data)) { - private[["attributes"]][[name]] = rev.data[[name]] - } - }, - - #' Update the information on revisions and ranges regarding splitting. - #' - #' @param type either "time-based" or "activity-based", depending on splitting function - #' @param length the string given to time-based splitting (e.g., "3 months") or the activity - #' amount given to acitivity-based splitting - #' @param basis the data used as basis for splitting (either "commits", "mails", or "issues") - #' @param sliding.window whether sliding window splitting is enabled or not [default: FALSE] - #' @param revisions the revisions of the study - #' @param revisions.dates the revision dates of the study - set.splitting.info = function(type, length, basis, sliding.window, revisions, revisions.dates) { - ## assemble splitting information - split.info = list( - ## basic slpitting information - split.type = type, - split.length = length, - split.basis = basis, - split.sliding.window = sliding.window, - ## splitting information on ranges - split.revisions = revisions, - split.revisions.dates = revisions.dates, - split.ranges = construct.ranges(revisions, sliding.window = sliding.window) - - ) - ## change structure of values (i.e., insert 'default' sublists and set 'updatable' value) - split.info = lapply(split.info, function(entry) { - return(list(value = entry, updatable = FALSE)) - }) - - ## insert new values (update if needed) - for (name in names(split.info)) { - private[["attributes"]][[name]] = split.info[[name]] - } - } - - ) -) - - ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Helper functions -------------------------------------------------------- diff --git a/util-core-peripheral.R b/util-core-peripheral.R index 0e14435f..a66b0323 100644 --- a/util-core-peripheral.R +++ b/util-core-peripheral.R @@ -968,7 +968,7 @@ get.commit.data = function(range.data, columns = c("author.name", "author.email" logging::logdebug("get.commit.data: starting.") ## Get commit data - commits.df = range.data$get.commits.raw() + commits.df = range.data$get.commits() ## In case no commit data is available, return NA if(nrow(commits.df) == 0) { diff --git a/util-data.R b/util-data.R index 572fcc31..2c451900 100644 --- a/util-data.R +++ b/util-data.R @@ -16,6 +16,18 @@ requireNamespace("logging") # for logging requireNamespace("parallel") # for parallel computation +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Constants --------------------------------------------------------------- + +## mapping of relation to data source +RELATION.TO.DATASOURCE = list( + "cochange" = "commits", + "callgraph" = "commits", + "mail" = "mails", + "issue" = "issues" +) + + ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## ProjectData ------------------------------------------------------------- @@ -34,7 +46,7 @@ ProjectData = R6::R6Class("ProjectData", ## commits and commit data commits.filtered = NULL, # data.frame commits.filtered.empty = NULL, #data.frame - commits.raw = NULL, # data.frame + commits = NULL, # data.frame artifacts = NULL, # list synchronicity = NULL, # data.frame pasta = NULL, # data.frame @@ -42,8 +54,10 @@ ProjectData = R6::R6Class("ProjectData", mails = NULL, # data.frame ## authors authors = NULL, # list - ##issues + ## issues issues = NULL, #data.frame + ## timestamps of mail, issue and commit data + data.timestamps = NULL, #data.frame ## * * filtering commits ------------------------------------------- @@ -93,7 +107,7 @@ ProjectData = R6::R6Class("ProjectData", } ## get raw commit data - commit.data = self$get.commits.raw() + commit.data = self$get.commits() ## break if the list of commits is empty if (nrow(commit.data) == 0) { @@ -157,6 +171,57 @@ ProjectData = R6::R6Class("ProjectData", } } return(data) + }, + + ## * * timestamps ------------------------------------------- + + #' Call the getters of the specified data sources in order to + #' initialize the sources and extract the timestamps. + #' + #' @param data.sources the data sources to be prepated + prepare.timestamps = function(data.sources) { + for(source in data.sources) { + self[[ paste0("get.", source) ]]() + } + }, + + #' Extract the earliest and the latest date from the specified data source + #' and store it to the timestamps data.frame. + #' + #' @param source the specified data source + extract.timestamps = function(source) { + ## initialize data structure for timestamp + if(is.null(private$data.timestamps)) { + private$data.timestamps = data.frame(start = numeric(0), end = numeric(0)) + } + + ## collect minimum and maximum date for data source + ## 1) if we have data available + if (nrow(private[[source]]) > 0) { + source.date.min = min(private[[source]][, "date"]) + source.date.max = max(private[[source]][, "date"]) + } + ## NAs otherwise + else { + source.date.min = NA + source.date.max = NA + } + + ## remove old line if existing + private$data.timestamps = subset( + private$data.timestamps, + !(rownames(private$data.timestamps) == source) + ) + + ## store the data in the timestamp data set + private$data.timestamps = rbind( + private$data.timestamps, + data.frame( + start = source.date.min, + end = source.date.max, + row.names = source + ) + ) } ), @@ -192,12 +257,13 @@ ProjectData = R6::R6Class("ProjectData", reset.environment = function() { private$commits.filtered = NULL private$commits.filtered.empty = NULL - private$commits.raw = NULL + private$commits = NULL private$artifacts = NULL private$synchronicity = NULL private$mails = NULL private$authors = NULL private$pasta = NULL + private$data.timestamps = NULL }, ## * * configuration ----------------------------------------------- @@ -278,6 +344,9 @@ ProjectData = R6::R6Class("ProjectData", return(data.path) }, + #' Get the absolute path to the result folder for issue data. + #' + #' @return the path to the issue data get.data.path.issues = function() { data.path = private$project.conf$get.value("datapath.issues") return(data.path) @@ -319,27 +388,47 @@ ProjectData = R6::R6Class("ProjectData", #' If it doesn´t already exist call the read method first. #' #' @return the list of commits - get.commits.raw = function() { + get.commits = function() { logging::loginfo("Getting raw commit data.") ## if commits are not read already, do this - if (is.null(private$commits.raw)) { - private$commits.raw = read.commits.raw( + if (is.null(private$commits)) { + private$commits = read.commits( self$get.data.path(), private$project.conf$get.value("artifact") ) } + private$extract.timestamps(source = "commits") + + return(private$commits) + }, - return(private$commits.raw) + #' Get the complete list of commits. + #' If it doesn´t already exist call the read method first. + #' + #' Note: This is just a delegate for \code{ProjectData$get.commits()}. + #' + #' @return the list of commits + get.commits.raw = function() { + return(self$get.commits()) }, #' Set the commit list of the project to a new one. #' #' @param data the new list of commits - set.commits.raw = function(data) { + set.commits = function(data) { logging::loginfo("Setting raw commit data.") if (is.null(data)) data = data.frame() - private$commits.raw = data + private$commits = data + }, + + #' Set the commit list of the project to a new one. + #' + #' Note: This is just a delegate for \code{ProjectData$set.commits(data)}. + #' + #' @param data the new list of commits + set.commits.raw = function(data) { + self$set.commits(data) }, #' Get the synchronicity data. @@ -409,6 +498,7 @@ ProjectData = R6::R6Class("ProjectData", private$mails = private$add.pasta.data(private$mails) } } + private$extract.timestamps(source = "mails") return(private$mails) }, @@ -456,6 +546,8 @@ ProjectData = R6::R6Class("ProjectData", if(is.null(private$issues)) { private$issues = read.issues(self$get.data.path.issues()) } + private$extract.timestamps(source = "issues") + return(private$issues) }, @@ -519,6 +611,69 @@ ProjectData = R6::R6Class("ProjectData", } }, + ## * * data cutting ----------------------------------------- + + #' Get the timestamps (earliest and latest date) of the specified data sources. + #' If 'simple' is TRUE, return the overall latest start and earliest end date + #' in order to cut the specified data sources to the same date ranges. + #' + #' If there are no actual data available for a data source, the result indicates NA + #' + #' @param data.sources the specified data sources + #' @param simple whether or not the timestamps get simplified + #' + #' @return a data.frame with the timestamps of each data source as columns "start" and "end", + #' with the data source as corresponding row name + get.data.timestamps = function(data.sources = c("mails", "commits", "issues"), simple = FALSE) { + ## check arguments + data.sources = match.arg(arg = data.sources, several.ok = TRUE) + + ## read all data sources and prepare list of timestamps + private$prepare.timestamps(data.sources = data.sources) + + ## get the needed subset of timestamp data + subset.timestamps = private$data.timestamps[data.sources, ] + + ## get the proper subset of timestamps for returning + if(simple) { + ## get minima and maxima across data sources (rows) + timestamps = data.frame( + start = max(subset.timestamps[, "start"], na.rm = TRUE), + end = min(subset.timestamps[, "end"], na.rm = TRUE) + ) + } else { + ## select the complete raw data + timestamps = subset.timestamps + } + + return(timestamps) + }, + + #' Cut the specified data sources to the same date range depending on the extracted + #' timestamps. + #' + #' @param data.sources the specified data sources + #' + #' @return a list of the cut data.sources + get.data.cut.to.same.date = function(data.sources = c("mails", "commits", "issues")) { + ## check arguments + data.sources = match.arg(arg = data.sources, several.ok = TRUE) + + ## get the timestamp data as vector + timestamps.df = self$get.data.timestamps(data.sources = data.sources , simple = TRUE) + timestamps = c(start = timestamps.df[, "start"], end = timestamps.df[, "end"]) + + ## check consistency + if(timestamps["start"] > timestamps["end"]) { + logging::logwarn("The datasources don't overlap. The result will be empty!") + } + + ## split data based on the timestamps and get the single result + result = split.data.time.based(self, bins = timestamps)[[1]] + + return(result) + }, + ## * * processed data ---------------------------------------------- #' Map the corresponding authors to each artifact and return the list. @@ -593,7 +748,9 @@ ProjectData = R6::R6Class("ProjectData", return(mylist) }, - + #' Map the corresponding authors to each issue and return the list. + #' + #' @return the list of authors for each issue get.issue2author = function() { logging::loginfo("Getting issue--author data") @@ -602,6 +759,9 @@ ProjectData = R6::R6Class("ProjectData", return(mylist) }, + #' Map the corresponding issues to each author and return the list. + #' + #' @return the list of issues for each author get.author2issue = function() { logging::loginfo("Getting author--issue data") @@ -617,7 +777,7 @@ ProjectData = R6::R6Class("ProjectData", logging::loginfo("Getting author--commit data.") ## store the authors per artifact - mylist = get.key.to.value.from.df(self$get.commits.raw(), "author.name", "hash") + mylist = get.key.to.value.from.df(self$get.commits(), "author.name", "hash") mylist = parallel::mclapply(mylist, unique) return(mylist) @@ -646,7 +806,6 @@ ProjectData = R6::R6Class("ProjectData", return(mylist) } - ) ) diff --git a/util-init.R b/util-init.R index dcd270df..0ab03439 100644 --- a/util-init.R +++ b/util-init.R @@ -23,3 +23,4 @@ source("util-motifs.R") source("util-bulk.R") source("util-plot.R") source("util-core-peripheral.R") +source("util-networks-metrics.R") diff --git a/util-networks-metrics.R b/util-networks-metrics.R new file mode 100644 index 00000000..b6b4e248 --- /dev/null +++ b/util-networks-metrics.R @@ -0,0 +1,188 @@ +## (c) Thomas Bock, February 2015 +## bockthom@fim.uni-passau.de +## (c) Raphael Nömmer, 2017 +## noemmer@fim.uni-passau.de + + +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Libraries --------------------------------------------------------------- + +requireNamespace("igraph") + + +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Metric functions -------------------------------------------------------- + +#' Determine the maximum degree for the given network. +#' +#' @param network the network to be examined +#' @param mode the mode to be used for determining the degrees +#' +#' @return A dataframe containing the name of the vertex with with maximum degree its degree. +metrics.hub.degree = function(network, mode = c("total", "in", "out")) { + mode = match.arg(mode) + degrees = igraph::degree(network, mode = c(mode)) + vertex = which.max(degrees) + df = data.frame("name" = names(vertex), "degree" = unname(degrees[vertex])) + return(df) +} + +#' Calculate the average degree of a network. +#' +#' @param network the network to be examined +#' @param mode the mode to be used for determining the degrees +#' +#' @return The average degree of the nodes in the network. +metrics.avg.degree = function(network, mode = c("total", "in", "out")) { + mode = match.arg(mode) + degrees = igraph::degree(network, mode = c(mode)) + avg = mean(degrees) + return(c(avg.degree = avg)) +} + +#' Calculate all node degrees for the given network +#' +#' @param network the network to be examined +#' @param sort whether the resulting dataframe is to be sorted by the node degree +#' @param sort.decreasing if sorting is active, this says whether the dataframe is to be +#' sorted in descending or ascending order +#' +#' @return A dataframe containing the nodes and their respective degrees. +metrics.node.degrees = function(network, sort = TRUE, sort.decreasing = TRUE) { + if(sort) { + degrees = sort(igraph::degree(network, mode = "total"), decreasing = sort.decreasing) + } else { + degrees = igraph::degree(network, mode = "total") + } + return(data.frame("name" = names(degrees), "degree" = unname(degrees))) +} + +#' Calculate the density of the given network. +#' +#' @param network the network to be examined +#' +#' @return The density of the network. +metrics.density = function(network) { + density = igraph::graph.density(network) + return(c(density = density)) +} + +#' Calculate the average path length for the given network. +#' +#' @param network the network to be examined +#' @param directed whether to consider directed paths in directed networks +#' @param unconnected whether all nodes of the network are connected +#' +#' @return The average pathlength of the given network. +metrics.avg.pathlength = function(network, directed, unconnected) { + avg.pathlength = igraph::average.path.length(network, directed = directed, unconnected = unconnected) + return(c(avg.pathlength = avg.pathlength)) +} + +#' Calculate the average local clustering coefficient for the given network. +#' +#' @param network the network to be examined +#' @param cc.type the type of cluserting coefficient to be calculated +#' +#' @return The clustering coefficient of the network. +metrics.clustering.coeff = function(network, cc.type = c("global", "local", "barrat", "localaverage")) { + cc.type = match.arg(cc.type) + cc = igraph::transitivity(network, type = cc.type, vids = NULL) + return(c(clustering = cc)) +} + +#' Calculate the modularity metric for the given network. +#' +#' @param network the network to be examined +#' @param community.detection.algorithm the algorithm to be used for the detection of communities +#' which is required for the calculation of the clustering coefficient +#' +#' @return The modularity value for the given network. +metrics.modularity = function(network, community.detection.algorithm = igraph::cluster_walktrap) { + comm = community.detection.algorithm(network) + mod = igraph::modularity(network, igraph::membership(comm)) + return(c(modularity = mod)) +} + +#' This function determines whether a network can be considered a +#' small-world network based on a quantitative categorical decision. +#' +#' The procedure used in this function is based on the work "Network +#' 'Small-World-Ness': A Quantitative Method for Determining Canonical +#' Network Equivalence" by Mark D. Humphries and Kevin Gurney [1]. +#' [1] http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0002051 +#' +#' The algorithm relies on the Erdös-Renyi random network with the same number +#' of nodes and edges as the given network. +#' +#' To check the result value \code{s.delta} for a binary (true/false) decision on smallworldness, +#' do this: \code{is.smallworld = s.delta > 1}. +#' +#' Important: The given network needs to be simplified for the calculation to work! +#' +#' @param network the simplified network to be examined +#' +#' @return The smallworldness value of the network. +metrics.smallworldness = function(network) { + # construct Erdös-Renyi network with same number of nodes and edges as g + h = igraph::erdos.renyi.game(n = igraph::vcount(network), + p.or.m = igraph::ecount(network), + type = "gnm", + directed = FALSE) + + # compute clustering coefficients + g.cc = igraph::transitivity(network, type = "global") + h.cc = igraph::transitivity(h, type = "global") + # compute average shortest-path length + g.l = igraph::average.path.length(network, unconnected = TRUE) + h.l = igraph::average.path.length(h, unconnected = TRUE) + + # binary decision + # intermediate variables + gamma = g.cc / h.cc + lambda = g.l / h.l + + # indicator s.delta + s.delta = gamma / lambda + + ## if s.delta > 1, then the network is a small-world network + # is.smallworld = s.delta > 1 + return (c(smallworldness = s.delta)) +} + +#' Determine scale freeness of a network using the power law fitting method. +#' +#' @param network the network to be examined +#' +#' @return A dataframe containing the different values, connected to scale-freeness. +metrics.scale.freeness = function(network) { + v.degree = sort(igraph::degree(network, mode = "total"), decreasing = TRUE) + + ## Power-law fiting + ## (by Mitchell Joblin , Siemens AG, 2012, 2013) + p.fit = igraph::power.law.fit(v.degree, implementation = "plfit") + param.names = c("alpha", "xmin", "KS.p") + res = list() + res[param.names] = p.fit[param.names] + + ## Check percent of vertices under power-law + res$num.power.law = length(which(v.degree >= res$xmin)) + res$percent.power.law = 100 * (res$num.power.law / length(v.degree)) + df = as.data.frame(res, row.names = "scale.freeness") + return(df) +} + +#' Calculate the hierarchy for a network. +#' +#' @param network the network to be examined +#' +#' @return A dataframe containing the logarithm of the node degree and the logarithm +#' of the local clustering coefficient for each node. +metrics.hierarchy = function(network) { + degrees = igraph::degree(network, mode = "total") + cluster.coeff = igraph::transitivity(network, type = "local", vids = NULL) + degrees.without.cluster.coeff = subset(degrees, !(is.nan(cluster.coeff) | cluster.coeff == 0)) + cluster.coeff = subset(cluster.coeff, !(is.nan(cluster.coeff) | cluster.coeff == 0)) + return(data.frame(log.deg = log(degrees.without.cluster.coeff), log.cc = log(cluster.coeff))) +} + diff --git a/util-networks.R b/util-networks.R index 6e69d0b7..2e9cfd60 100644 --- a/util-networks.R +++ b/util-networks.R @@ -60,6 +60,7 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", ## * * data and configuration -------------------------------------- proj.data = NULL, + proj.data.original = NULL, network.conf = NULL, ## * * network caching --------------------------------------------- @@ -72,6 +73,26 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", artifacts.network.mail = NULL, # igraph artifacts.network.issue = NULL, # igraph + ## * * data cutting --------------------------------------------- + + + #' Cut the data sources of the data object to the same date ranges. + cut.data.to.same.timestamps = function() { + cut.data = private$proj.data$get.data.cut.to.same.date(data.sources = private$get.data.sources()) + private$proj.data = cut.data + }, + + #' Determine which data sources should be cut depending on the artifact and author relation. + #' + #' @return the data sources to be cut + get.data.sources = function() { + author.relation = private$network.conf$get.value("author.relation") + artifact.relation = private$network.conf$get.value("artifact.relation") + data.sources = unique(c(RELATION.TO.DATASOURCE[[author.relation]], + RELATION.TO.DATASOURCE[[artifact.relation]])) + return(data.sources) + }, + ## * * author networks --------------------------------------------- #' Get the co-change-based author relation as network. @@ -372,6 +393,7 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", #' @param network.conf the network configuration initialize = function(project.data, network.conf) { private$proj.data = project.data + private$proj.data.original = project.data if(!missing(network.conf) && "NetworkConf" %in% class(network.conf)) { private$network.conf = network.conf @@ -379,6 +401,10 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", if (class(self)[1] == "ProjectData") logging::loginfo("Initialized data object %s", self$get.class.name()) + + if(private$network.conf$get.value("unify.date.ranges")) { + private$cut.data.to.same.timestamps() + } }, ## * * resetting environment --------------------------------------- @@ -388,8 +414,13 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", reset.environment = function() { private$authors.network.mail = NULL private$authors.network.cochange = NULL + private$authors.network.issue = NULL private$artifacts.network.cochange = NULL private$artifacts.network.callgraph = NULL + private$proj.data = private$proj.data.original + if(private$network.conf$get.value("unify.date.ranges")) { + private$cut.data.to.same.timestamps() + } }, ## * * configuration ----------------------------------------------- @@ -426,6 +457,14 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", private$network.conf$update.value(entry, value) }, + #' Get the project data Object of the NetworkBuilder. + #' This Method is mainly used for testing purposes at the moment. + #' + #' @return the project data object of the NetworkBuilder + get.project.data = function() { + return(private$proj.data) + }, + #' Update the network configuration based on the given list #' of values and reset the environment afterwards #' @@ -543,7 +582,7 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", ## remove vertices that are not committers if wanted if (private$network.conf$get.value("author.only.committers")) { - committers = unique(private$proj.data$get.commits.raw()[["author.name"]]) + committers = unique(private$proj.data$get.commits()[["author.name"]]) authors = igraph::get.vertex.attribute(u, "name", igraph::V(u)[ type == TYPE.AUTHOR ]) authors.to.remove = setdiff(authors, committers) u = igraph::delete.vertices(u, authors.to.remove) diff --git a/util-read.R b/util-read.R index 28690b39..e1b392a6 100644 --- a/util-read.R +++ b/util-read.R @@ -21,11 +21,11 @@ requireNamespace("digest") # for sha1 hashing of IDs #' Read the commits from the 'commits.list' file. #' #' @param data.path the path to the commit list -#' @param artifact the artifact whichs commits are read +#' @param artifact the artifact whose commits are read #' #' @return the read commits -read.commits.raw = function(data.path, artifact) { - logging::logdebug("read.commits.raw: starting.") +read.commits = function(data.path, artifact) { + logging::logdebug("read.commits: starting.") file = file.path(data.path, "commits.list") @@ -91,10 +91,22 @@ read.commits.raw = function(data.path, artifact) { commit.data[["commit.id"]] = sprintf("", commit.data[["commit.id"]]) ## store the commit data - logging::logdebug("read.commits.raw: finished.") + logging::logdebug("read.commits: finished.") return(commit.data) } +#' Read the commits from the 'commits.list' file. +#' +#' @param data.path the path to the commit list +#' @param artifact the artifact whose commits are read +#' +#' Note: This is just a delegate for \code{read.commits(data.path, artifact)}. +#' +#' @return the read commits +read.commits.raw = function(data.path, artifact) { + return(read.commits(data.path = data.path, artifact = artifact)) +} + ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Synchronicity data ------------------------------------------------------ @@ -104,7 +116,7 @@ read.commits.raw = function(data.path, artifact) { #' where artifact and time.window are the given variables. #' #' @param data.path the path to the synchronicity data -#' @param artifact the artifact whichs synchronicity data get read +#' @param artifact the artifact whose synchronicity data get read #' @param time.window the time window of the data to be read #' #' @return the read synchronicity data @@ -229,15 +241,15 @@ read.authors = function(data.path) { stop("Stopped due to missing authors.") } + ## if there is no third column, we need to add e-mail-address dummy data (NAs) + if (ncol(authors.df) != 3) { + authors.df[3] = NA + } + ## set proper column names based on Codeface extraction: ## ## SELECT a.name AS authorName, a.email1, m.creationDate, m.subject, m.threadId - cols.names = c("author.id", "author.name") - ## if there is a third column, we have e-mail-address data available - if (ncol(authors.df) == 3) { - cols.names = c(cols.names, "author.email") - } - colnames(authors.df) = cols.names + colnames(authors.df) = c("author.id", "author.name", "author.email") ## store the ID--author mapping logging::logdebug("read.authors: finished.") @@ -249,7 +261,7 @@ read.authors = function(data.path) { ## PaStA data -------------------------------------------------------------- #' Read and parse the pasta data from the 'similar-mailbox' file. -#' The form in the file is : => commit.hash. +#' The form in the file is : ... => commit.hash commit.hash2 .... #' The parsed form is a data frame with message IDs as keys and commit hashes as values. #' #' @param data.path the path to the pasta data @@ -286,14 +298,18 @@ read.pasta = function(data.path) { # 1) split at arrow # 2) split keys - # 3) insert all key-value pairs by iteration (works also if there is only one key) + # 3) split values + # 4) insert all key-value pairs by iteration (works also if there is only one key) line.split = unlist(strsplit(line, SEPERATOR)) keys = line.split[1] - value = line.split[2] + values = line.split[2] keys.split = unlist(strsplit(keys, KEY.SEPERATOR)) + values.split = unlist(strsplit(values, KEY.SEPERATOR)) # Transform data to data.frame - df = data.frame(message.id = keys.split, commit.hash = value) + #df = data.frame(message.id = keys.split, commit.hash = values.split) + df = merge(keys.split, values.split) + colnames(df) = c("message.id", "commit.hash") return(df) }) result.df = plyr::rbind.fill(result.list) @@ -306,11 +322,10 @@ read.pasta = function(data.path) { ## Issue data -------------------------------------------------------------- #' Read and parse the issue data from the 'issues.list' file. -#' The parsed format is a data frame with message IDs as keys and commit hashes as values. #' -#' @param data.path the path to the pasta data +#' @param data.path the path to the issue data #' -#' @return the read and parsed pasta data +#' @return the read and parsed issue data read.issues = function(data.path) { logging::logdebug("read.issues: starting.") @@ -331,14 +346,11 @@ read.issues = function(data.path) { ## set proper column names colnames(issue.data) = c( "issue.id", "issue.state", "creation.date", "closing.date", "is.pull.request", # issue information - "author.id", "author.name", "author.email", # author information + "author.name", "author.email", # author information "date", # the date - "event.name" # the event describing the row's entry + "ref.name", "event.name" # the event describing the row's entry ) - ## remove unneeded columns from data - issue.data["author.id"] = NULL - ## set pattern for issue ID for better recognition issue.data[["issue.id"]] = sprintf("", issue.data[["issue.id"]]) @@ -348,7 +360,7 @@ read.issues = function(data.path) { ## convert dates and sort by 'date' column issue.data[["date"]] = as.POSIXct(issue.data[["date"]]) issue.data[["creation.date"]] = as.POSIXct(issue.data[["creation.date"]]) - issue.data[["closing.date"]][ issue.data[["closing.date"]] == "null" ] = NA + issue.data[["closing.date"]][ issue.data[["closing.date"]] == "" ] = NA issue.data[["closing.date"]] = as.POSIXct(issue.data[["closing.date"]]) issue.data = issue.data[order(issue.data[["date"]], decreasing = FALSE), ] # sort! diff --git a/util-split.R b/util-split.R index 9ca3b677..76b644dd 100644 --- a/util-split.R +++ b/util-split.R @@ -40,7 +40,7 @@ split.data.time.based = function(project.data, time.period = "3 months", bins = split.basis = c("commits", "mails", "issues"), sliding.window = FALSE) { ## get actual raw data data = list( - commits = project.data$get.commits.raw(), + commits = project.data$get.commits(), mails = project.data$get.mails(), issues = project.data$get.issues() ) @@ -110,7 +110,7 @@ split.data.time.based = function(project.data, time.period = "3 months", bins = ## set data ## 1) commits - cf.range.data$set.commits.raw(df.list[["commits"]]) + cf.range.data$set.commits(df.list[["commits"]]) ## 2) mails cf.range.data$set.mails(df.list[["mails"]]) ## 3) issues @@ -203,7 +203,7 @@ split.data.activity.based = function(project.data, activity.type = c("commits", ## get actual raw data data = list( - commits = project.data$get.commits.raw(), + commits = project.data$get.commits(), mails = project.data$get.mails(), issues = project.data$get.issues() ) @@ -280,7 +280,7 @@ split.data.activity.based = function(project.data, activity.type = c("commits", ## clone the project data and update raw data to split it again project.data.clone = project.data$clone() - project.data.clone$set.commits.raw(data[["commits"]]) + project.data.clone$set.commits(data[["commits"]]) project.data.clone$set.mails(data[["mails"]]) ## split data for sliding windows