diff --git a/NEWS.md b/NEWS.md index 3d093756..dddf0ac9 100644 --- a/NEWS.md +++ b/NEWS.md @@ -6,12 +6,13 @@ ### Added -- Add commit-interaction data and add functions `read.commit.interactions` for reading, as well as `get.commit.interactions`, `set.commit.interactions` and utility functions for working with commit-interaction data (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, b4fd2a29c9b5fd561b1106c6febb54a32b0085ab, fd0aa05f824b93545ae8e05833b95b3bd9809286, bca35760eb0aac86c04923f2d534b2d8cece204e) as well as tests for these features (PR #252, eeba7e29932bc973513c963fb9e716e9230d570f, 8bb39f4df39b49dfaff8f19feb6db5e5fbd81fac, 54b6f655248720436af116fe72521f9cb0348429, 7a5497aaf9114017d1b3b9b68b6cccd7ca8ac114, 7b8585f87675795822c07230192d6454de31dcc7, ef725407bf8818c8fff96ea6f343338b7162cbe0) +- Add commit-interaction data and add functions `read.commit.interactions` for reading, as well as `get.commit.interactions`, `set.commit.interactions` and utility functions for working with commit-interaction data (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, b4fd2a29c9b5fd561b1106c6febb54a32b0085ab, fd0aa05f824b93545ae8e05833b95b3bd9809286, bca35760eb0aac86c04923f2d534b2d8cece204e, PR #263, 849123a8b7d898fbb1343745ecffc1f6000c9367, 3fb7437b68950303916b62984fa449732c70353e, 170bc66eb779d7cf2ab504db7c3f4ec483103838) as well as tests for these features (PR #252, eeba7e29932bc973513c963fb9e716e9230d570f, 8bb39f4df39b49dfaff8f19feb6db5e5fbd81fac, 54b6f655248720436af116fe72521f9cb0348429, 7a5497aaf9114017d1b3b9b68b6cccd7ca8ac114, 7b8585f87675795822c07230192d6454de31dcc7, ef725407bf8818c8fff96ea6f343338b7162cbe0,) - Add commit-interaction networks that can be created with `create.author.network` and `create.artifact.network` if the `artifact.relation` and `author.relation` is configured to be `commit.interaction` (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, 329d97ec3de36a9e1bcadc0c7a53c1d92e8b481c) as well as tests for these features (PR #252, 07e7ed744209b0251217fa8f7f35d9b9875face2, 7068cfa10d993dcae3f5e3f76f8cafa99fa8b350) - Add helper function for prefixing function names with file names in `util-read.R` (PR #252, f8ea987b138173cf0509c7910e0572d8ee1b3f1f) - Add line-based code coverage reports into CI pipeline. Coverage reports are generated by `coverage.R` (PR #262, 10cac49d005e87c3964cc61711e7f5acef749626, b3b9f4ac7a9911bd00293c68fac88e0f9033bdfb, c815d18dc6266d620a7a145493417b87ac08679e, e8093525fdaf46e54f2f7fcc6358ca7892e795e5, 32d04823e2007c63d2a43ce59bea3057327c19a7) - Add the possibility to split data time-based by multiple data sources (PR #261, 1088395f46b84028c8d7c463ca86b5dc38500c26, e1f79fc9e40cd6f41c946be42db364b2101cfe10, 0bb187fec0fd801d7634bf8d5180525770f6ab0b, 371a97ac6ebf3de4fe9360dea79d62e2ed3ef585) - Add tests for uncovered functionality in `util-misc.R` and `util-networks.R` (PR #264, ff30f3238b1bf2539280d0d055a5d925c197c271, af80551d0615a49b86e45ff596bd75941ee88f91) +- Add commit network as a new type of network. It uses commits as vertices and connects them either via cochange or commit interactions. This includes adding new config parameters and the function `add.vertex.attribute.commit.network` for adding vertex attributes to a commit network (PR #263, ab73271781e8e9a0715f784936df4b371d64c338, ab73271781e8e9a0715f784936df4b371d64c338, cd9a930fcb54ff465c2a5a7c43cfe82ac15c134d) ### Changed/Improved @@ -19,6 +20,7 @@ - Replace deprecated `igraph` functions by their preferred alternatives (PR #264, 0df9d5bf6bafbb5d440f4c47db4ec901cf11f037) - Deprecate support for R version 3.6 (PR #264, c8e6f45111e487fadbe7f0a13c7595eb23f3af6e, fb3f5474259d4a88f4ff545691cca9d1ccde90e3) - Explicitly add R version 4.4 to the CI test pipeline (c8e6f45111e487fadbe7f0a13c7595eb23f3af6e) +- Refactor function `construct.edge.list.from.key.value.list` to be more readable (PR #263, 05c3bc09cb1d396fd59c34a88030cdca58fd04dd) ### Fixed diff --git a/README.md b/README.md index 86b2671c..dc2cba45 100644 --- a/README.md +++ b/README.md @@ -234,6 +234,11 @@ There are four types of networks that can be built using this library: author ne * The vertices in an artifact network denote any kind of artifact, e.g., source-code artifact (such as features or files) or communication artifact (such as mail threads or issues). All artifact-type vertices are uniquely identifiable by their name. There are only unipartite edges among artifacts in this type of network. * The relations (i.e., the edges' meaning and source) can be configured using the [`NetworkConf`](#networkconf) attribute `artifact.relation`. The relation also describes which kinds of artifacts are represented as vertices in the network. (For example, if "mail" is selected as `artifact.relation`, only mail-thread vertices are included in the network.) +- Commit networks + * The vertices in a commit network denote any commits in the data. All vertices + are uniquely identifyable by the hash of the commit. There are only unipartite edges among commits in this type of network. + * The relations (i.e., the edges' meaning and source) can be configured using the [`networkConf`](#networkconf) attribute `commit.relation`. The relation also describes the type of data used for network construction (`cochange` uses commit data, `commit.interaction` uses commit interaction data). + - Bipartite networks * The vertices in a bipartite network denote both authors and artifacts. There are only bipartite edges from authors to artifacts in this type of network. * The relations (i.e., the edges' meaning and source) can be configured using the [`NetworkConf`](#networkconf) attribute `artifact.relation`. @@ -249,6 +254,7 @@ Relations determine which information is used to construct edges among the verti - `cochange` * For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who change the same source-code artifact are connected with an edge. * For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), source-code artifacts that are concurrently changed in the same commit are connected with an edge. + * For commit networks (configured vie `commit.relation` in the [`NetworkConf`](#networkconf)), commits are connected if they change the same artifact. * For bipartite networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), authors get linked to all source-code artifacts they have changed in their respective commits. - `mail` @@ -269,6 +275,7 @@ Relations determine which information is used to construct edges among the verti - `commit.interaction` * For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who contribute to interacting commits are connected with an edge. * For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), artifacts are connected when there is an interaction between two commits that occur in the artifacts. + * For commit networks (configured via `commit.relation` in the [`NetworkConf`](#networkconf)), commits are connected when they interact in the commit-interaction data. * This relation does not apply for bipartite networks. #### Edge-construction algorithms for author networks @@ -623,7 +630,7 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(. - `author.relation` * The relation(s) among authors, encoded as edges in an author network * **Note**: The author--artifact relation in bipartite and multi networks is configured by `artifact.relation`! - * possible values: [*`"mail"`*, `"cochange"`, `"issue"`] + * possible values: [*`"mail"`*, `"cochange"`, `"issue"`, `"commit.interaction"`] - `author.directed` * The directedness of edges in an author network * [`TRUE`, *`FALSE`*] @@ -642,11 +649,17 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(. - `artifact.relation` * The relation(s) among artifacts, encoded as edges in an artifact network * **Note**: Additionally, this relation configures also the author--artifact relation in bipartite and multi networks! - * possible values: [*`"cochange"`*, `"callgraph"`, `"mail"`, `"issue"`] + * possible values: [*`"cochange"`*, `"callgraph"`, `"mail"`, `"issue"`, `"commit.interaction"`] - `artifact.directed` * The directedness of edges in an artifact network * **Note**: This parameter does only affect the `issue` relation, as the `cochange` relation is always undirected, while the `callgraph` relation is always directed. For the `mail`, we currently do not have data available to exhibit edge information. * [`TRUE`, *`FALSE`*] +- `commit.relation` + * The relation(s) among commits, encoded as edges in a commit network + * possible values: [*`"cochange"`*, `"commit.interaction"`] +- `commit.directed` + * The directedness of edges in a commit network + * [`TRUE`, *`FALSE`*] - `edge.attributes` * The list of edge-attribute names and information * a subset of the following as a single vector: diff --git a/showcase.R b/showcase.R index 74da2497..4cb95d4a 100644 --- a/showcase.R +++ b/showcase.R @@ -24,6 +24,7 @@ ## Copyright 2021 by Niklas Schneider ## Copyright 2022 by Jonathan Baumann ## Copyright 2024 by Maximilian Löffler +## Copyright 2024 by Leo Sendelbach ## All Rights Reserved. @@ -65,6 +66,7 @@ ARTIFACT = "feature" # function, feature, file, featureexpression (only relevant AUTHOR.RELATION = "mail" # mail, cochange, issue ARTIFACT.RELATION = "cochange" # cochange, callgraph, mail, issue +COMMIT.RELATION = "commit.interaction" # commit.interaction, cochange ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @@ -73,13 +75,16 @@ ARTIFACT.RELATION = "cochange" # cochange, callgraph, mail, issue ## initialize project configuration proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT) proj.conf$update.value("commits.filter.base.artifact", TRUE) +proj.conf$update.value("commit.interactions", TRUE) ## specify that custom event timestamps should be read from 'custom-events.list' proj.conf$update.value("custom.event.timestamps.file", "custom-events.list") proj.conf$print() ## initialize network configuration net.conf = NetworkConf$new() -net.conf$update.values(updated.values = list(author.relation = AUTHOR.RELATION, artifact.relation = ARTIFACT.RELATION)) +net.conf$update.values(updated.values = list(author.relation = AUTHOR.RELATION, + artifact.relation = ARTIFACT.RELATION, + commit.relation = COMMIT.RELATION)) net.conf$print() ## get ranges @@ -141,6 +146,7 @@ x$get.author.network() x$update.network.conf(updated.values = list(author.directed = FALSE)) x$get.author.network() x$get.artifact.network() +x$get.commit.network() x$reset.environment() x$get.networks() x$update.network.conf(updated.values = list(author.only.committers = FALSE, author.directed = FALSE)) @@ -201,6 +207,7 @@ y$update.network.conf(updated.values = list(edge.attributes = c("date"))) y$get.author.network() y$update.network.conf(updated.values = list(edge.attributes = c("hash"))) y$get.artifact.network() +y$get.commit.network() y$get.networks() y$update.network.conf(updated.values = list(author.only.committers = FALSE, author.directed = TRUE)) h = y$get.bipartite.network() @@ -232,6 +239,8 @@ sample.pull.requests = add.vertex.attribute.author.issue.count(my.networks, x.da ## add vertex attributes for the project-level network x.net.as.list = list("1970-01-01 00:00:00-2030-01-01 00:00:00" = x$get.author.network()) sample.entire = add.vertex.attribute.author.commit.count(x.net.as.list, x.data, aggregation.level = "complete") +## add vertex attributes to commit network. Default value 'NO_AUTHOR' is used if vertex is not in commit data +add.vertex.attribute.commit.network(x$get.commit.network(), x.data, attr.name = "author.name", default.value = "NO_AUTHOR") ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / diff --git a/tests/test-data.R b/tests/test-data.R index 88ce0e42..c983946d 100644 --- a/tests/test-data.R +++ b/tests/test-data.R @@ -564,15 +564,15 @@ test_that("Compare two ProjectData Objects with commit.interactions", { proj.data.two$set.commits(create.empty.commits.list()) ## create empty data frame of correct size - commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8)) + commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 9)) ## assure that the correct type is used - for(i in seq_len(8)) { + for(i in seq_len(9)) { commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]]) } ## set everything except for authors as expected colnames(commit.interactions.data.expected) = c("commit.hash", "base.hash", "func", "file", - "base.func", "base.file", "base.author", - "interacting.author") + "base.func", "base.file","artifact.type", + "base.author", "interacting.author") commit.interactions.data.expected[["commit.hash"]] = c("0a1a5c523d835459c42f33e863623138555e2526", "418d1dc4929ad1df251d2aeb833dd45757b04a6f", @@ -588,6 +588,8 @@ test_that("Compare two ProjectData Objects with commit.interactions", { commit.interactions.data.expected[["base.func"]] = c("test2.c::test2", "test2.c::test2", "test3.c::test_function", "test2.c::test2") commit.interactions.data.expected[["base.file"]] = c("test2.c", "test2.c", "test3.c", "test2.c") + commit.interactions.data.expected[["artifact.type"]] = c("CommitInteraction", "CommitInteraction", + "CommitInteraction", "CommitInteraction") expect_equal(proj.data.two$get.commit.interactions(), commit.interactions.data.expected) diff --git a/tests/test-networks-artifact.R b/tests/test-networks-artifact.R index 432840fc..1d847b54 100644 --- a/tests/test-networks-artifact.R +++ b/tests/test-networks-artifact.R @@ -252,6 +252,7 @@ patrick::with_parameters_test_that("Network construction with commit-interaction "test3.c::test_function", "test2.c::test2"), base.author = c("Olaf", "Thomas", "Karl", "Thomas"), interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"), + artifact.type = c("File", "File", "File", "File"), weight = c(1, 1, 1, 1), type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction") @@ -301,6 +302,7 @@ patrick::with_parameters_test_that("Network construction with commit-interaction base.file = c("test2.c", "test2.c", "test3.c", "test2.c"), base.author = c("Olaf", "Thomas", "Karl", "Thomas"), interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"), + artifact.type = c("Function", "Function", "Function", "Function"), weight = c(1, 1, 1, 1), type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction") diff --git a/tests/test-networks-author.R b/tests/test-networks-author.R index 2910ba51..d343a0c5 100644 --- a/tests/test-networks-author.R +++ b/tests/test-networks-author.R @@ -720,6 +720,7 @@ patrick::with_parameters_test_that("Network construction with commit-interaction base.func = c("test2.c::test2", "test2.c::test2", "test3.c::test_function", "test2.c::test2"), base.file = c("test2.c", "test2.c", "test3.c", "test2.c"), + artifact.type = c("CommitInteraction", "CommitInteraction", "CommitInteraction", "CommitInteraction"), weight = c(1, 1, 1, 1), type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction") diff --git a/tests/test-networks-commit.R b/tests/test-networks-commit.R new file mode 100644 index 00000000..7de34eed --- /dev/null +++ b/tests/test-networks-commit.R @@ -0,0 +1,338 @@ +## This file is part of coronet, which is free software: you +## can redistribute it and/or modify it under the terms of the GNU General +## Public License as published by the Free Software Foundation, version 2. +## +## This program is distributed in the hope that it will be useful, +## but WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +## GNU General Public License for more details. +## +## You should have received a copy of the GNU General Public License along +## with this program; if not, write to the Free Software Foundation, Inc., +## 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. +## +## Copyright 2024 by Leo Sendelbach + +## All Rights Reserved. + + +context("Network-building functionality.") + +## +## Context +## + +CF.DATA = file.path(".", "codeface-data") +CF.SELECTION.PROCESS = "testing" +CASESTUDY = "test" + +## use only when debugging this file independently +if (!dir.exists(CF.DATA)) CF.DATA = file.path(".", "tests", "codeface-data") + + +## +## Tests for author.all.authors and author.only.committers +## + + + +patrick::with_parameters_test_that("Network construction with commit-interactions as relation", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") + proj.conf$update.value("commit.interactions", TRUE) + proj.conf$update.value("commit.interactions.filter.global", FALSE) + proj.data = ProjectData$new(project.conf = proj.conf) + + net.conf = NetworkConf$new() + net.conf$update.values(updated.values = list(commit.relation = "commit.interaction", + commit.directed = test.directed)) + + network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf) + network.built = network.builder$get.commit.network() + ## build the expected network + vertices = data.frame( + name = c("3a0ed78458b3976243db6829f63eba3eead26774", + "0a1a5c523d835459c42f33e863623138555e2526", + "1143db502761379c2bfcecc2007fc34282e7ee61", + "418d1dc4929ad1df251d2aeb833dd45757b04a6f", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "d01921773fae4bed8186b0aa411d6a2f7a6626e6"), + kind = TYPE.COMMIT, + type = TYPE.COMMIT + ) + edges = data.frame( + base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774", + "0a1a5c523d835459c42f33e863623138555e2526", + "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526"), + hash = c("0a1a5c523d835459c42f33e863623138555e2526", + "418d1dc4929ad1df251d2aeb833dd45757b04a6f", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "d01921773fae4bed8186b0aa411d6a2f7a6626e6"), + func = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"), + interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"), + file = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"), + base.author = c("Olaf", "Thomas", "Karl", "Thomas"), + base.func = c("test2.c::test2", "test2.c::test2", + "test3.c::test_function", "test2.c::test2"), + base.file = c("test2.c", "test2.c", "test3.c", "test2.c"), + artifact.type = c("CommitInteraction", "CommitInteraction", "CommitInteraction", "CommitInteraction"), + weight = c(1, 1, 1, 1), + type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), + relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction") + ) + network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices) + expect_true(igraph::identical_graphs(network.built, network)) + + network.new.attr = add.vertex.attribute.commit.network(network.built, proj.data, "deleted.lines", "NO_DATA") + expect_identical(igraph::V(network.new.attr)$deleted.lines, c("0", "0","0", "NO_DATA", "0", "NO_DATA")) +}, patrick::cases( + "directed: FALSE" = list(test.directed = FALSE), + "directed: TRUE" = list(test.directed = TRUE) +)) + +patrick::with_parameters_test_that("Network construction with cochange as relation, file as artifact", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") + proj.data = ProjectData$new(project.conf = proj.conf) + + net.conf = NetworkConf$new() + net.conf$update.values(updated.values = list(commit.relation = "cochange", + commit.directed = test.directed)) + + network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf) + network.built = network.builder$get.commit.network() + ## build the expected network + vertices = data.frame( + name = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "3a0ed78458b3976243db6829f63eba3eead26774", + "0a1a5c523d835459c42f33e863623138555e2526", + "1143db502761379c2bfcecc2007fc34282e7ee61"), + date = get.date.from.string(c("2016-07-12 15:58:59", + "2016-07-12 16:00:45", + "2016-07-12 16:05:41", + "2016-07-12 16:06:32", + "2016-07-12 16:06:10")), + kind = TYPE.COMMIT, + type = TYPE.COMMIT + ) + edges = data.frame( + from = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "3a0ed78458b3976243db6829f63eba3eead26774"), + to = c("5a5ec9675e98187e1e92561e1888aa6f04faa338", "0a1a5c523d835459c42f33e863623138555e2526"), + date = get.date.from.string(c("2016-07-12 16:00:45", "2016-07-12 16:06:32")), + artifact.type = c("File", "File"), + artifact = c("test.c", "test2.c"), + weight = c(1, 1), + type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), + relation = c("cochange", "cochange") + ) + + if (test.directed) { + edges <- edges[, c(2, 1, 3, 4, 5, 6, 7, 8), ] + } + network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices) + + expect_true(igraph::identical_graphs(network.built, network)) +}, patrick::cases( + "directed: FALSE" = list(test.directed = FALSE), + "directed: TRUE" = list(test.directed = TRUE) +)) + +patrick::with_parameters_test_that("Network construction with cochange as relation, function as artifact", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "function") + proj.conf$update.value("commits.filter.base.artifact", FALSE) + proj.data = ProjectData$new(project.conf = proj.conf) + + net.conf = NetworkConf$new() + net.conf$update.values(updated.values = list(commit.relation = "cochange", + commit.directed = test.directed)) + + network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf) + network.built = network.builder$get.commit.network() + ## build the expected network + vertices = data.frame( + name = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "3a0ed78458b3976243db6829f63eba3eead26774", + "0a1a5c523d835459c42f33e863623138555e2526", + "1143db502761379c2bfcecc2007fc34282e7ee61"), + date = get.date.from.string(c("2016-07-12 15:58:59", + "2016-07-12 16:00:45", + "2016-07-12 16:05:41", + "2016-07-12 16:06:32", + "2016-07-12 16:06:10")), + kind = TYPE.COMMIT, + type = TYPE.COMMIT + ) + edges = data.frame( + from = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", "3a0ed78458b3976243db6829f63eba3eead26774"), + to = c("5a5ec9675e98187e1e92561e1888aa6f04faa338", "3a0ed78458b3976243db6829f63eba3eead26774", + "3a0ed78458b3976243db6829f63eba3eead26774", "0a1a5c523d835459c42f33e863623138555e2526", + "0a1a5c523d835459c42f33e863623138555e2526", "0a1a5c523d835459c42f33e863623138555e2526"), + date = get.date.from.string(c("2016-07-12 16:00:45", "2016-07-12 16:05:41", "2016-07-12 16:05:41", + "2016-07-12 16:06:32", "2016-07-12 16:06:32", "2016-07-12 16:06:32")), + artifact.type = c("Function", "Function", "Function", "Function", "Function", "Function"), + artifact = c("File_Level", "File_Level", "File_Level", "File_Level", "File_Level", "File_Level"), + weight = c(1, 1, 1, 1, 1, 1), + type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, + TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), + relation = c("cochange", "cochange", "cochange", "cochange", "cochange", "cochange") + ) + + if (test.directed) { + edges <- edges[, c(2, 1, 3, 4, 5, 6, 7, 8), ] + } + network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices) + + expect_true(igraph::identical_graphs(network.built, network)) +}, patrick::cases( + "directed: FALSE" = list(test.directed = FALSE), + "directed: TRUE" = list(test.directed = TRUE) +)) + +patrick::with_parameters_test_that("Network construction with cochange as relation, feature as artifact", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "feature") + proj.conf$update.value("commits.filter.base.artifact", FALSE) + proj.data = ProjectData$new(project.conf = proj.conf) + + net.conf = NetworkConf$new() + net.conf$update.values(updated.values = list(commit.relation = "cochange", + commit.directed = test.directed)) + + network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf) + network.built = network.builder$get.commit.network() + ## build the expected network + vertices = data.frame( + name = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "3a0ed78458b3976243db6829f63eba3eead26774", + "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526"), + date = get.date.from.string(c("2016-07-12 15:58:59", + "2016-07-12 16:00:45", + "2016-07-12 16:05:41", + "2016-07-12 16:06:10", + "2016-07-12 16:06:32")), + kind = TYPE.COMMIT, + type = TYPE.COMMIT + ) + edges = data.frame( + from = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "3a0ed78458b3976243db6829f63eba3eead26774", + "3a0ed78458b3976243db6829f63eba3eead26774", "1143db502761379c2bfcecc2007fc34282e7ee61"), + to = c("5a5ec9675e98187e1e92561e1888aa6f04faa338", "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526", "0a1a5c523d835459c42f33e863623138555e2526"), + date = get.date.from.string(c("2016-07-12 16:00:45", "2016-07-12 16:06:10", "2016-07-12 16:06:32", "2016-07-12 16:06:32")), + artifact.type = c("Feature", "Feature", "Feature", "Feature"), + artifact = c("A", "Base_Feature", "Base_Feature", "Base_Feature"), + weight = c(1, 1, 1, 1), + type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), + relation = c("cochange", "cochange", "cochange", "cochange") + ) + + if (test.directed) { + edges <- edges[, c(2, 1, 3, 4, 5, 6, 7, 8), ] + } + network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices) + + expect_true(igraph::identical_graphs(network.built, network)) +}, patrick::cases( + "directed: FALSE" = list(test.directed = FALSE), + "directed: TRUE" = list(test.directed = TRUE) +)) + +test_that("Adding vertex attributes to a commit network", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "feature") + proj.conf$update.value("commits.filter.base.artifact", FALSE) + proj.data = ProjectData$new(project.conf = proj.conf) + + net.conf = NetworkConf$new() + net.conf$update.values(updated.values = list(commit.relation = "cochange", + commit.directed = FALSE)) + + network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf) + network.built = network.builder$get.commit.network() + network.new.attr = add.vertex.attribute.commit.network(network.built, proj.data, "author.name", "NO_AUTHOR") + ## build the expected network + vertices = data.frame( + name = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "3a0ed78458b3976243db6829f63eba3eead26774", + "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526"), + date = get.date.from.string(c("2016-07-12 15:58:59", + "2016-07-12 16:00:45", + "2016-07-12 16:05:41", + "2016-07-12 16:06:10", + "2016-07-12 16:06:32")), + kind = TYPE.COMMIT, + type = TYPE.COMMIT, + author.name = c("Björn", + "Olaf", + "Olaf", + "Karl", + "Thomas") + ) + edges = data.frame( + from = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "3a0ed78458b3976243db6829f63eba3eead26774", + "3a0ed78458b3976243db6829f63eba3eead26774", "1143db502761379c2bfcecc2007fc34282e7ee61"), + to = c("5a5ec9675e98187e1e92561e1888aa6f04faa338", "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526", "0a1a5c523d835459c42f33e863623138555e2526"), + date = get.date.from.string(c("2016-07-12 16:00:45", "2016-07-12 16:06:10", "2016-07-12 16:06:32", "2016-07-12 16:06:32")), + artifact.type = c("Feature", "Feature", "Feature", "Feature"), + artifact = c("A", "Base_Feature", "Base_Feature", "Base_Feature"), + weight = c(1, 1, 1, 1), + type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), + relation = c("cochange", "cochange", "cochange", "cochange") + ) + + network = igraph::graph.data.frame(edges, directed = FALSE, vertices = vertices) + + expect_true(igraph::identical_graphs(network.new.attr, network)) + + network.new.attr = add.vertex.attribute.commit.network(network.new.attr, proj.data, "commit.id", "NO_ID") + + ## build the expected network + vertices = data.frame( + name = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "3a0ed78458b3976243db6829f63eba3eead26774", + "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526"), + date = get.date.from.string(c("2016-07-12 15:58:59", + "2016-07-12 16:00:45", + "2016-07-12 16:05:41", + "2016-07-12 16:06:10", + "2016-07-12 16:06:32")), + kind = TYPE.COMMIT, + type = TYPE.COMMIT, + author.name = c("Björn", + "Olaf", + "Olaf", + "Karl", + "Thomas"), + commit.id = c("", "", + "", "", "") + ) + edges = data.frame( + from = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "3a0ed78458b3976243db6829f63eba3eead26774", + "3a0ed78458b3976243db6829f63eba3eead26774", "1143db502761379c2bfcecc2007fc34282e7ee61"), + to = c("5a5ec9675e98187e1e92561e1888aa6f04faa338", "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526", "0a1a5c523d835459c42f33e863623138555e2526"), + date = get.date.from.string(c("2016-07-12 16:00:45", "2016-07-12 16:06:10", "2016-07-12 16:06:32", "2016-07-12 16:06:32")), + artifact.type = c("Feature", "Feature", "Feature", "Feature"), + artifact = c("A", "Base_Feature", "Base_Feature", "Base_Feature"), + weight = c(1, 1, 1, 1), + type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), + relation = c("cochange", "cochange", "cochange", "cochange") + ) + + network.two = igraph::graph.data.frame(edges, directed = FALSE, vertices = vertices) + + expect_true(igraph::identical_graphs(network.new.attr, network.two)) +}) \ No newline at end of file diff --git a/tests/test-read.R b/tests/test-read.R index c617e091..f01d16c1 100644 --- a/tests/test-read.R +++ b/tests/test-read.R @@ -505,15 +505,15 @@ test_that("Read the commit-interactions data.", { commit.interactions.data.read = read.commit.interactions(proj.conf$get.value("datapath")) ## build the expected data.frame - commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8)) + commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 9)) ## assure that the correct type is used - for(i in seq_len(8)) { + for(i in seq_len(ncol(commit.interactions.data.expected))) { commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]]) } ## set everything except for authors as expected colnames(commit.interactions.data.expected) = c("func", "commit.hash", "file", "base.hash", "base.func", "base.file", "base.author", - "interacting.author") + "interacting.author", "artifact.type") commit.interactions.data.expected[["commit.hash"]] = c("5a5ec9675e98187e1e92561e1888aa6f04faa338", "0a1a5c523d835459c42f33e863623138555e2526", @@ -529,6 +529,8 @@ test_that("Read the commit-interactions data.", { commit.interactions.data.expected[["base.func"]] = c("test3.c::test_function", "test2.c::test2", "test2.c::test2", "test2.c::test2") commit.interactions.data.expected[["base.file"]] = c("test3.c", "test2.c", "test2.c", "test2.c") + commit.interactions.data.expected[["artifact.type"]] = c("CommitInteraction", "CommitInteraction", + "CommitInteraction", "CommitInteraction") ## check the results expect_identical(commit.interactions.data.read, commit.interactions.data.expected, info = "commit interaction data.") @@ -543,11 +545,11 @@ test_that("Read the empty commit-interactions data.", { commit.interactions.data.read = read.commit.interactions("./codeface-data/results/testing/ test_empty_proximity/proximity") ## build the expected data.frame - commit.interactions.data.expected = data.frame(matrix(nrow = 0, ncol = 8)) + commit.interactions.data.expected = data.frame(matrix(nrow = 0, ncol = 9)) colnames(commit.interactions.data.expected) = c("func", "commit.hash", "file", "base.hash", "base.func", "base.file", - "base.author", "interacting.author") - for(i in seq_len(8)) { + "base.author", "interacting.author", "artifact.type") + for(i in seq_len(ncol(commit.interactions.data.expected))) { commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]]) } ## check the results diff --git a/util-conf.R b/util-conf.R index d1b8c0c8..85aec34a 100644 --- a/util-conf.R +++ b/util-conf.R @@ -63,6 +63,8 @@ ARTIFACT.CODEFACE = list( "file" = "File" ) +ARTIFACT.COMMIT.INTERACTION = "CommitInteraction" + ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Conf -------------------------------------------------------------------- @@ -837,6 +839,18 @@ NetworkConf = R6::R6Class("NetworkConf", inherit = Conf, allowed = c(TRUE, FALSE), allowed.number = 1 ), + commit.relation = list( + default = "cochange", + type = "character", + allowed = c("cochange", "commit.interaction"), + allowed.number = Inf + ), + commit.directed = list( + default = FALSE, + type = "logical", + allowed = c(TRUE, FALSE), + allowed.number = 1 + ), edges.for.base.artifacts = list( default = TRUE, type = "logical", diff --git a/util-data.R b/util-data.R index 988146a5..90c01ca4 100644 --- a/util-data.R +++ b/util-data.R @@ -416,7 +416,12 @@ ProjectData = R6::R6Class("ProjectData", #' This method should be called whenever the field \code{commit.interactions} is changed. update.commit.interactions = function() { if (self$is.data.source.cached("commit.interactions")) { - if (!self$is.data.source.cached("commits.unfiltered")) { + ## check if caller was 'set.commits'. If so, or if commits are already filtered, + ## do not get the commits again. + stacktrace = get.stacktrace(sys.calls()) + caller = get.second.last.element(stacktrace) + if (!self$is.data.source.cached("commits.unfiltered") && + (is.na(caller) || paste(caller, collapse = " ") != "self$set.commits(commit.data)")) { self$get.commits() } @@ -2143,6 +2148,32 @@ ProjectData = R6::R6Class("ProjectData", return(mylist) }, + #' Group the commits of the given \code{data.source} by the given \code{group.column}. + #' For each group, the column \code{"hash"} is duplicated and prepended to each + #' group's data as first column (see below for details). + #' + #' Example: To obtain the commits that changed the same source-code artifact, + #' call \code{group.commits.by.data.column("commits", "artifact")}. + #' + #' @param data.source The specified data source. One of \code{"commits"}, + #' \code{"mails"}, and \code{"issues"}. [default: "commits"] + #' @param group.column The column to group the commits of the given \code{data.source} by + #' [default: "artifact"] + #' + #' @return a list mapping each distinct item in \code{group.column} to all corresponding + #' data items from \code{data.source}, with the column \code{"hash"} duplicated + #' as first column (with name \code{"data.vertices"}) + #' + #' @seealso ProjectData$group.data.by.column + group.commits.by.data.column = function(group.column = "artifact") { + logging::loginfo("Grouping commits by data column.") + + ## store the commits per group that is determined by 'group.column' + mylist = self$group.data.by.column("commits", group.column, "hash") + + return(mylist) + }, + #' Group the authors of the given \code{data.source} by the given \code{group.column}. #' For each group, the column \code{"author.name"} is duplicated and prepended to each #' group's data as first column (see below for details). diff --git a/util-networks-covariates.R b/util-networks-covariates.R index 95a3021a..700b5e9f 100644 --- a/util-networks-covariates.R +++ b/util-networks-covariates.R @@ -22,6 +22,7 @@ ## Copyright 2022 by Niklas Schneider ## Copyright 2022 by Jonathan Baumann ## Copyright 2024 by Maximilian Löffler +## Copyright 2024 by Leo Sendelbach ## All Rights Reserved. ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @@ -140,6 +141,41 @@ add.vertex.attribute = function(net.to.range.list, attr.name, default.value, com return(nets.with.attr) } +#' Utility function to add a vertex attribute from commit data to a commit network. +#' Attribute name should be a column name of the commit data dataframe. +#' Default column names can be seen in 'COMMITS.LIST.COLUMNS' in 'util-read.R', +#' though more might be possible. +#' +#' @param network the commit network +#' @param project.data the project data from which to extract the values +#' @param attr.name the name of the attribute +#' @param default.value the default value that is used if the current hash +#' is not contained in the commit data at all +#' +#' @return a network with new vertex attribute +add.vertex.attribute.commit.network = function(network, project.data, + attr.name, default.value) { + # get the commit data and extract the required data + commit.data = project.data$get.commits() + hashes = commit.data[["hash"]] + attribute = commit.data[[attr.name]] + attribute.values = c() + for (hash in igraph::V(network)$name) { + # for each vertex, find the position in the data frame + hash.index = match(hash, hashes, nomatch = NA) + + value = c() + # extract the correct value from the data or use the default value + if (!is.na(hash.index)) { + value = attribute[[hash.index]] + } else { + value = default.value + } + attribute.values = c(attribute.values, value) + } + net.with.attr = igraph::set.vertex.attribute(network, attr.name, value = attribute.values) +} + ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Author network functions ------------------------------------------------ diff --git a/util-networks.R b/util-networks.R index a9b19e11..da1b1da6 100644 --- a/util-networks.R +++ b/util-networks.R @@ -44,6 +44,7 @@ requireNamespace("lubridate") # for date conversion ## vertex types TYPE.AUTHOR = "Author" TYPE.ARTIFACT = "Artifact" +TYPE.COMMIT = "Commit" ## edge types TYPE.EDGES.INTRA = "Unipartite" @@ -122,6 +123,8 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", artifacts.network.callgraph = NULL, # igraph artifacts.network.mail = NULL, # igraph artifacts.network.issue = NULL, # igraph + commits.network.commit.interaction = NULL, #igraph + commits.network.cochange = NULL, #igraph ## * * relation-to-vertex-kind mapping ----------------------------- @@ -245,6 +248,9 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", colnames(edges)[1] = "to" colnames(edges)[2] = "from" colnames(edges)[4] = "hash" + if (nrow(edges) > 0) { + edges[["artifact.type"]] = ARTIFACT.COMMIT.INTERACTION + } author.net.data = list(vertices = vertices, edges = edges) ## construct the network author.net = construct.network.from.edge.list( @@ -352,7 +358,7 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", network.conf = private$network.conf, directed = FALSE, respect.temporal.order = TRUE, - artifact.edges = TRUE + network.type = "artifact" ) ## construct network from obtained data @@ -398,6 +404,9 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", edges = edges[, c("file", "base.file", "func", "commit.hash", "base.hash", "base.func", "base.author", "interacting.author")] + if (nrow(edges) > 0) { + edges[["artifact.type"]] = ARTIFACT.CODEFACE[[proj.conf.artifact]] + } colnames(edges)[colnames(edges) == "commit.hash"] = "hash" } else if (proj.conf.artifact == "function") { ## change the vertices to the functions from the commit-interaction data @@ -407,6 +416,9 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", edges = edges[, c("func", "base.func", "commit.hash", "file", "base.hash", "base.file", "base.author", "interacting.author")] + if (nrow(edges) > 0) { + edges[["artifact.type"]] = ARTIFACT.CODEFACE[[proj.conf.artifact]] + } colnames(edges)[colnames(edges) == "commit.hash"] = "hash" } else { ## If neither 'function' nor 'file' was configured, send a warning @@ -679,6 +691,92 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", return(artifacts.net) }, + #' Build and get the commit network with commit-interactions as the relation. + #' + #' @return the commit-interaction commit network + get.commit.network.commit.interaction = function() { + + logging::logdebug("get.commit.network.commit.interaction: starting.") + + ## do not compute anything more than once + if (!is.null(private$commits.network.commit.interaction)) { + logging::logdebug("get.commit.network.commit.interaction: finished. (already existing)") + return(private$commits.network.commit.interaction) + } + + ## get the hashes that appear in the commit-interaction data as the vertices of the network + vertices = unique(c(private$proj.data$get.commit.interactions()[["base.hash"]], + private$proj.data$get.commit.interactions()[["commit.hash"]])) + vertices = data.frame(name = vertices) + + ## get the commit-interaction data as the edge data of the network + edges = private$proj.data$get.commit.interactions() + ## set the commits as the 'to' and 'from' of the network and order the dataframe + edges = edges[, c("base.hash", "commit.hash", "func", "interacting.author", + "file", "base.author", "base.func", "base.file")] + if (nrow(edges) > 0) { + edges[["artifact.type"]] = ARTIFACT.COMMIT.INTERACTION + } + colnames(edges)[1] = "to" + colnames(edges)[2] = "from" + commit.net.data = list(vertices = vertices, edges = edges) + ## construct the network + commit.net = construct.network.from.edge.list( + commit.net.data[["vertices"]], + commit.net.data[["edges"]], + network.conf = private$network.conf, + directed = private$network.conf$get.value("commit.directed"), + available.edge.attributes = private$proj.data$ + get.data.columns.for.data.source("commit.interactions") + ) + + private$commits.network.commit.interaction = commit.net + logging::logdebug("get.commit.network.commit.interaction: finished.") + + return(commit.net) + }, + + #' Get the cochange-based commit network, + #' If it does not already exist build it first. + #' + #' @return the commit network with cochange realtion + get.commit.network.cochange = function() { + + logging::logdebug("get.commit.network.cochange: starting.") + + ## do not compute anything more than once + if (!is.null(private$commits.network.cochange)) { + logging::logdebug("get.commit.network.cochange: finished. (already existing)") + return(private$commits.network.cochange) + } + + ## construct edge list based on commit--artifact data + commit.net.data.raw = private$proj.data$group.commits.by.data.column("artifact") + + commit.net.data = construct.edge.list.from.key.value.list( + commit.net.data.raw, + network.conf = private$network.conf, + directed = private$network.conf$get.value("commit.directed"), + respect.temporal.order = TRUE, + network.type = "commit" + ) + + ## construct network from obtained data + commit.net = construct.network.from.edge.list( + commit.net.data[["vertices"]], + commit.net.data[["edges"]], + network.conf = private$network.conf, + directed = private$network.conf$get.value("commit.directed"), + available.edge.attributes = private$proj.data$get.data.columns.for.data.source("commits") + ) + + ## store network + private$commits.network.cochange = commit.net + logging::logdebug("get.commit.network.cochange: finished.") + + return(commit.net) + }, + ## * * bipartite relations ------------------------------------------ #' Get the key-value data for the bipartite relations, @@ -753,6 +851,8 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", private$artifacts.network.cochange = NULL private$artifacts.network.issue = NULL private$artifacts.network.mail = NULL + private$commits.network.commit.interaction = NULL + private$commits.network.cochange = NULL private$proj.data = private$proj.data.original if (private$network.conf$get.value("unify.date.ranges")) { private$cut.data.to.same.timestamps() @@ -929,6 +1029,48 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", return(net) }, + #' Get the generic commit network. + #' + #' @return the generic commit network + get.commit.network = function() { + logging::loginfo("Constructing commit network.") + + ## construct network + relations = private$network.conf$get.value("commit.relation") + networks = lapply(relations, function(relation) { + network = switch( + relation, + cochange = private$get.commit.network.cochange(), + commit.interaction = private$get.commit.network.commit.interaction(), + stop(sprintf("The commit relation '%s' does not exist.", relation)) + ) + + ## set edge attributes on all edges + igraph::E(network)$type = TYPE.EDGES.INTRA + igraph::E(network)$relation = relation + + return(network) + }) + net = merge.networks(networks) + + ## set vertex and edge attributes for identifaction + igraph::V(net)$kind = TYPE.COMMIT + igraph::V(net)$type = TYPE.COMMIT + + ## simplify network if wanted + if (private$network.conf$get.value("simplify")) { + net = simplify.network(net, simplify.multiple.relations = + private$network.conf$get.value("simplify.multiple.relations")) + } + + ## add range attribute for later analysis (if available) + if ("RangeData" %in% class(private$proj.data)) { + attr(net, "range") = private$proj.data$get.range() + } + + return(net) + }, + #' Get the (real) bipartite network. #' #' @return the bipartite network @@ -1050,12 +1192,15 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", authors.net = self$get.author.network() ## artifact relation artifacts.net = self$get.artifact.network() + ## commit relation + commit.net = self$get.commit.network() return(list( "authors.to.artifacts" = authors.to.artifacts, "bipartite.net" = bipartite.net, "authors.net" = authors.net, - "artifacts.net" = artifacts.net + "artifacts.net" = artifacts.net, + "commits.net" = commit.net )) }, @@ -1185,14 +1330,17 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", #' i.e., whether to only add edges from the later event to the previous one. #' If \code{NA} is passed, the default value is taken. #' [default: directed] -#' @param artifact.edges whether the key value data represents edges in an artifact network based -#' on the cochange relation -#' [default: FALSE] +#' @param network.type the type of network for which the key value data is to be used as edges +#' (one out of "author", "artifact", or "commit")[default: "author"] #' #' @return a list of two data.frames named 'vertices' and 'edges' (compatible with return value #' of \code{igraph::as.data.frame}) construct.edge.list.from.key.value.list = function(list, network.conf, directed = FALSE, - respect.temporal.order = directed, artifact.edges = FALSE) { + respect.temporal.order = directed, + network.type = c("author", "artifact", "commit")) { + + network.type = match.arg.or.default(network.type, default = "author", several.ok = FALSE) + logging::loginfo("Create edges.") logging::logdebug("construct.edge.list.from.key.value.list: starting.") @@ -1214,7 +1362,7 @@ construct.edge.list.from.key.value.list = function(list, network.conf, directed ## replace it with the \code{author.name} attribute as artifacts cannot cause ## edges in artifact networks, authors can edge.attributes = network.conf$get.value("edge.attributes") - if (artifact.edges) { + if (network.type == "artifact") { artifact.index = match("artifact", edge.attributes, nomatch = NA) if (!is.na(artifact.index)) { edge.attributes = edge.attributes[-artifact.index] @@ -1222,138 +1370,212 @@ construct.edge.list.from.key.value.list = function(list, network.conf, directed } } + ## if edges in a commit network contain 'hash' or 'file' attributes, remove them + ## as they belong to commits, which are the vertices in commit networks + if (network.type == "commit") { + cols.which = which(edge.attributes %in% c("hash", "file")) + edge.attributes = edge.attributes[-cols.which] + } + if (respect.temporal.order) { ## for all subsets (sets), connect all items in there with the previous ones - edge.list.data = parallel::mclapply(list, function(set) { - number.edges = sum(seq_len(nrow(set)) - 1) - logging::logdebug("[%s/%s] Constructing edges for %s '%s': starting (%s edges to construct).", - match(attr(set, "group.name"), keys), keys.number, - attr(set, "group.type"), attr(set, "group.name"), number.edges) - - ## Skip artifacts with many, many edges - if (number.edges > network.conf$get.value("skip.threshold")) { - logging::logwarn("Skipping edges for %s '%s' due to amount (> %s).", - attr(set, "group.type"), attr(set, "group.name"), network.conf$get.value("skip.threshold")) - return(NULL) - } + edge.list.data = parallel::mclapply(list, construct.edges.temporal.order, network.conf, + edge.attributes, keys, keys.number, network.type) - ## queue of already processed artifacts - edge.list.set = data.frame() - vertices.processed.set = c() + edge.list = plyr::rbind.fill(edge.list.data) + vertices.processed = unlist(parallel::mclapply(edge.list.data, function(data) { + return(attr(data, "vertices.processed")) + })) - ## connect the current item to all previous ones - for (item.no in seq_len(nrow(set))) { - item = set[item.no, ] + } else { - ## get vertex data - item.vertex = item[["data.vertices"]] + ## for all items in the sublists, construct the cartesian product + edge.list.data = parallel::mclapply(list, construct.edges.no.temporal.order, network.conf, + edge.attributes, keys, keys.number) - ## get edge attributes - cols.which = edge.attributes %in% colnames(item) - item.edge.attrs = item[ , edge.attributes[cols.which], drop = FALSE] + edge.list = plyr::rbind.fill(edge.list.data) + vertices.processed = unlist(parallel::mclapply(edge.list.data, function(data) { + return(attr(data, "vertices.processed")) + })) - ## construct edges - combinations = expand.grid(item.vertex, vertices.processed.set, stringsAsFactors = FALSE) - if (nrow(combinations) > 0 & nrow(item.edge.attrs) == 1) { - combinations = cbind(combinations, item.edge.attrs, row.names = NULL) # add edge attributes - } - edge.list.set = rbind(edge.list.set, combinations) # add to edge list + } - ## mark current item as processed - vertices.processed.set = c(vertices.processed.set, item.vertex) - } + logging::logdebug("construct.edge.list.from.key.value.list: finished.") + + if (network.type == "commit") { + vertices.dates.processed = unlist(parallel::mclapply(edge.list.data, function(data) { + return (attr(data, "vertices.dates.processed")) + })) + return(list( + vertices = data.frame( + name = unique(vertices.processed), + date = get.date.from.string(unique(vertices.dates.processed)) + ), + edges = edge.list + )) + } else { + return(list( + vertices = data.frame( + name = unique(vertices.processed) + ), + edges = edge.list + )) + } +} + +#' Constructs edge list from the given key value list respecting temporal order. +#' Helper method which is called by 'construct.edge.list.by.key.value.list'. +#' +#' @param set the given key value list +#' @param network.conf the network configuration +#' @param edge.attributes the attributes that should be on the edges of the network +#' @param keys the keys of the key value list +#' @param keys.number the amount of keys in the key value list +#' @param network.type the type of network that should be created +#' +#' @return the data for the edge list +construct.edges.temporal.order = function(set, network.conf, edge.attributes, keys, keys.number, network.type) { + number.edges = sum(seq_len(nrow(set)) - 1) + logging::logdebug("[%s/%s] Constructing edges for %s '%s': starting (%s edges to construct).", + match(attr(set, "group.name"), keys), keys.number, + attr(set, "group.type"), attr(set, "group.name"), number.edges) + + ## Skip artifacts with many, many edges + if (number.edges > network.conf$get.value("skip.threshold")) { + logging::logwarn("Skipping edges for %s '%s' due to amount (> %s).", + attr(set, "group.type"), attr(set, "group.name"), network.conf$get.value("skip.threshold")) + return(NULL) + } - ## store set of processed vertices - attr(edge.list.set, "vertices.processed") = vertices.processed.set + if (network.type == "commit") { + set = set[order(set[["date"]]), ] + } - logging::logdebug("Constructing edges for %s '%s': finished.", attr(set, "group.type"), attr(set, "group.name")) + ## queue of already processed artifacts + edge.list.set = data.frame() + vertices.processed.set = c() - return(edge.list.set) - }) + ## connect the current item to all previous ones + for (item.no in seq_len(nrow(set))) { + item = set[item.no, ] - edge.list = plyr::rbind.fill(edge.list.data) - vertices.processed = unlist( parallel::mclapply(edge.list.data, function(data) attr(data, "vertices.processed")) ) + ## get vertex data + item.vertex = item[["data.vertices"]] + if (network.type == "commit") { + item.vertex = data.frame(commit = item.vertex, date = get.date.string(item[["date"]])) + } - } else { + ## get edge attributes + cols.which = edge.attributes %in% colnames(item) + item.edge.attrs = item[ , edge.attributes[cols.which], drop = FALSE] + + ## construct edges + combinations = c() + if (network.type == "commit") { + combinations = expand.grid(item.vertex[["commit"]], + vertices.processed.set[["commit"]], stringsAsFactors = FALSE) + } else { + combinations = expand.grid(item.vertex, vertices.processed.set, stringsAsFactors = FALSE) + } - ## for all items in the sublists, construct the cartesian product - edge.list.data = parallel::mclapply(list, function(set) { - number.edges = sum(table(set[["data.vertices"]]) * (dim(table(set[["data.vertices"]])) - 1)) - logging::logdebug("[%s/%s] Constructing edges for %s '%s': starting (%s edges to construct).", - match(attr(set, "group.name"), keys), keys.number, - attr(set, "group.type"), attr(set, "group.name"), number.edges) - - ## Skip artifacts with many, many edges - if (number.edges > network.conf$get.value("skip.threshold")) { - logging::logwarn("Skipping edges for %s '%s' due to amount (> %s).", - attr(set, "group.type"), attr(set, "group.name"), network.conf$get.value("skip.threshold")) - return(NULL) - } + if (nrow(combinations) > 0 && nrow(item.edge.attrs) == 1) { + combinations = cbind(combinations, item.edge.attrs, row.names = NULL) # add edge attributes + } + edge.list.set = rbind(edge.list.set, combinations) # add to edge list - ## get vertex data - vertices = unique(set[["data.vertices"]]) + ## mark current item as processed + if (network.type == "commit") { + vertices.processed.set = rbind(vertices.processed.set, item.vertex) + } else { + vertices.processed.set = c(vertices.processed.set, item.vertex) + } + } - ## break if there is no author - if (length(vertices) < 1) { - return(NULL) - } + ## store set of processed vertices + if (network.type == "commit") { + attr(edge.list.set, "vertices.processed") = vertices.processed.set[["commit"]] + attr(edge.list.set, "vertices.dates.processed") = vertices.processed.set[["date"]] + } else { + attr(edge.list.set, "vertices.processed") = vertices.processed.set + } - ## if there is only one author, just create the vertex, but no edges - if (length(vertices) == 1) { - edges = data.frame() - attr(edges, "vertices.processed") = vertices # store set of processed vertices - return(edges) - } + logging::logdebug("Constructing edges for %s '%s': finished.", attr(set, "group.type"), attr(set, "group.name")) - ## get combinations - combinations = combn(vertices, 2) # all unique pairs of authors + return(edge.list.set) +} - ## construct edge list - edges = apply(combinations, 2, function(comb) { +#' Constructs edge list from the given key value list not respecting temporal order. +#' Helper method which is called by 'construct.edge.list.by.key.value.list'. +#' +#' @param set the given key value list +#' @param network.conf the network configuration +#' @param edge.attributes the attributes that should be on the edges of the network +#' @param keys the keys of the key value list +#' @param keys.number the amount of keys in the key value list +#' +#' @return the data for the edge list +construct.edges.no.temporal.order = function(set, network.conf, edge.attributes, keys, keys.number) { + number.edges = sum(table(set[["data.vertices"]]) * (dim(table(set[["data.vertices"]])) - 1)) + logging::logdebug("[%s/%s] Constructing edges for %s '%s': starting (%s edges to construct).", + match(attr(set, "group.name"), keys), keys.number, + attr(set, "group.type"), attr(set, "group.name"), number.edges) + + ## Skip artifacts with many, many edges + if (number.edges > network.conf$get.value("skip.threshold")) { + logging::logwarn("Skipping edges for %s '%s' due to amount (> %s).", + attr(set, "group.type"), attr(set, "group.name"), network.conf$get.value("skip.threshold")) + return(NULL) + } - ## iterate over each of the two data vertices of the current combination to determine the edges - ## for which it is the sender of the edge and use the second one as the receiver of the edge - edges.by.comb.item = lapply(comb, function(comb.item) { - ## basic edge data - edge = data.frame(from = comb.item, to = comb[comb != comb.item]) + ## get vertex data + vertices = unique(set[["data.vertices"]]) - ## get edge attibutes - edge.attrs = set[set[["data.vertices"]] %in% comb.item, ] # get data for current combination item - cols.which = edge.attributes %in% colnames(edge.attrs) - edge.attrs = edge.attrs[ , edge.attributes[cols.which], drop = FALSE] + ## break if there is no author + if (length(vertices) < 1) { + return(NULL) + } - # add edge attributes to edge list - edgelist = cbind(edge, edge.attrs) - return(edgelist) - }) + ## if there is only one author, just create the vertex, but no edges + if (length(vertices) == 1) { + edges = data.frame() + attr(edges, "vertices.processed") = vertices # store set of processed vertices + return(edges) + } - ## union the edge lists for the combination items - edges.union = plyr::rbind.fill(edges.by.comb.item) - return(edges.union) + ## get combinations + combinations = combn(vertices, 2) # all unique pairs of authors - }) - edges = plyr::rbind.fill(edges) + ## construct edge list + edges = apply(combinations, 2, function(comb) { + + ## iterate over each of the two data vertices of the current combination to determine the edges + ## for which it is the sender of the edge and use the second one as the receiver of the edge + edges.by.comb.item = lapply(comb, function(comb.item) { + ## basic edge data + edge = data.frame(from = comb.item, to = comb[comb != comb.item]) - ## store set of processed vertices - attr(edges, "vertices.processed") = vertices + ## get edge attibutes + edge.attrs = set[set[["data.vertices"]] %in% comb.item, ] # get data for current combination item + cols.which = edge.attributes %in% colnames(edge.attrs) + edge.attrs = edge.attrs[ , edge.attributes[cols.which], drop = FALSE] - return(edges) + # add edge attributes to edge list + edgelist = cbind(edge, edge.attrs) + return(edgelist) }) - edge.list = plyr::rbind.fill(edge.list.data) - vertices.processed = unlist( parallel::mclapply(edge.list.data, function(data) attr(data, "vertices.processed")) ) + ## union the edge lists for the combination items + edges.union = plyr::rbind.fill(edges.by.comb.item) + return(edges.union) - } + }) + edges = plyr::rbind.fill(edges) - logging::logdebug("construct.edge.list.from.key.value.list: finished.") + ## store set of processed vertices + attr(edges, "vertices.processed") = vertices - return(list( - vertices = data.frame( - name = unique(vertices.processed) - ), - edges = edge.list - )) + return(edges) } #' Construct a network from the given lists of vertices and edges. diff --git a/util-read.R b/util-read.R index f4fe7025..06c082e5 100644 --- a/util-read.R +++ b/util-read.R @@ -863,14 +863,15 @@ create.empty.pasta.list = function() { COMMIT.INTERACTION.LIST.COLUMNS = c( "func", "commit.hash", "file", "base.hash", "base.func", "base.file", - "base.author", "interacting.author" + "base.author", "interacting.author", + "artifact.type" ) ## declare the datatype for each column in the constant 'COMMIT.INTERACTION.LIST.COLUMNS' COMMIT.INTERACTION.LIST.DATA.TYPES = c( "character", "character", "character", "character", "character", "character", - "character", "character" + "character", "character", "character" ) COMMIT.INTERACTION.GLOBAL.FILE.FUNCTION.NAME = "GLOBAL" @@ -952,6 +953,7 @@ read.commit.interactions = function(data.path = NULL) { ## Author data will be merged from commit data in \code{update.commit.interactions}. interactions["base.author"] = NA_character_ interactions["interacting.author"] = NA_character_ + interactions["artifact.type"] = ARTIFACT.COMMIT.INTERACTION return(interactions) })))