Skip to content

Commit

Permalink
Merge pull request se-sic#263 from Leo-Send/commitnetwork
Browse files Browse the repository at this point in the history
Add Commit Networks

Reviewed-by: Thomas Bock <[email protected]>
Reviewed-by: Christian Hechtl <[email protected]>
  • Loading branch information
hechtlC authored Aug 28, 2024
2 parents 74ebe0b + 5842073 commit 55dc0cc
Show file tree
Hide file tree
Showing 13 changed files with 798 additions and 124 deletions.
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,21 @@

### Added

- Add commit-interaction data and add functions `read.commit.interactions` for reading, as well as `get.commit.interactions`, `set.commit.interactions` and utility functions for working with commit-interaction data (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, b4fd2a29c9b5fd561b1106c6febb54a32b0085ab, fd0aa05f824b93545ae8e05833b95b3bd9809286, bca35760eb0aac86c04923f2d534b2d8cece204e) as well as tests for these features (PR #252, eeba7e29932bc973513c963fb9e716e9230d570f, 8bb39f4df39b49dfaff8f19feb6db5e5fbd81fac, 54b6f655248720436af116fe72521f9cb0348429, 7a5497aaf9114017d1b3b9b68b6cccd7ca8ac114, 7b8585f87675795822c07230192d6454de31dcc7, ef725407bf8818c8fff96ea6f343338b7162cbe0)
- Add commit-interaction data and add functions `read.commit.interactions` for reading, as well as `get.commit.interactions`, `set.commit.interactions` and utility functions for working with commit-interaction data (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, b4fd2a29c9b5fd561b1106c6febb54a32b0085ab, fd0aa05f824b93545ae8e05833b95b3bd9809286, bca35760eb0aac86c04923f2d534b2d8cece204e, PR #263, 849123a8b7d898fbb1343745ecffc1f6000c9367, 3fb7437b68950303916b62984fa449732c70353e, 170bc66eb779d7cf2ab504db7c3f4ec483103838) as well as tests for these features (PR #252, eeba7e29932bc973513c963fb9e716e9230d570f, 8bb39f4df39b49dfaff8f19feb6db5e5fbd81fac, 54b6f655248720436af116fe72521f9cb0348429, 7a5497aaf9114017d1b3b9b68b6cccd7ca8ac114, 7b8585f87675795822c07230192d6454de31dcc7, ef725407bf8818c8fff96ea6f343338b7162cbe0,)
- Add commit-interaction networks that can be created with `create.author.network` and `create.artifact.network` if the `artifact.relation` and `author.relation` is configured to be `commit.interaction` (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, 329d97ec3de36a9e1bcadc0c7a53c1d92e8b481c) as well as tests for these features (PR #252, 07e7ed744209b0251217fa8f7f35d9b9875face2, 7068cfa10d993dcae3f5e3f76f8cafa99fa8b350)
- Add helper function for prefixing function names with file names in `util-read.R` (PR #252, f8ea987b138173cf0509c7910e0572d8ee1b3f1f)
- Add line-based code coverage reports into CI pipeline. Coverage reports are generated by `coverage.R` (PR #262, 10cac49d005e87c3964cc61711e7f5acef749626, b3b9f4ac7a9911bd00293c68fac88e0f9033bdfb, c815d18dc6266d620a7a145493417b87ac08679e, e8093525fdaf46e54f2f7fcc6358ca7892e795e5, 32d04823e2007c63d2a43ce59bea3057327c19a7)
- Add the possibility to split data time-based by multiple data sources (PR #261, 1088395f46b84028c8d7c463ca86b5dc38500c26, e1f79fc9e40cd6f41c946be42db364b2101cfe10, 0bb187fec0fd801d7634bf8d5180525770f6ab0b, 371a97ac6ebf3de4fe9360dea79d62e2ed3ef585)
- Add tests for uncovered functionality in `util-misc.R` and `util-networks.R` (PR #264, ff30f3238b1bf2539280d0d055a5d925c197c271, af80551d0615a49b86e45ff596bd75941ee88f91)
- Add commit network as a new type of network. It uses commits as vertices and connects them either via cochange or commit interactions. This includes adding new config parameters and the function `add.vertex.attribute.commit.network` for adding vertex attributes to a commit network (PR #263, ab73271781e8e9a0715f784936df4b371d64c338, ab73271781e8e9a0715f784936df4b371d64c338, cd9a930fcb54ff465c2a5a7c43cfe82ac15c134d)

### Changed/Improved

- Change the default value for the `issues.from.source` configuration parameter. Instead of reading JIRA and GitHub issues together, which was the previous default, the new default value causes only GitHub issue data to be read. To restore the previous default behavior and read data from both issue sources, this now needs to be manually configured when needed. (PR #264, 5ff83c364f6bfc1e6ff95e9c5f1087e031c48a5d, 8c8080cb9caf115f19d9f145ad6e6c108b131a67, 8bcbc81db521877908d2e5c2989082ed672f2a3b)
- Replace deprecated `igraph` functions by their preferred alternatives (PR #264, 0df9d5bf6bafbb5d440f4c47db4ec901cf11f037)
- Deprecate support for R version 3.6 (PR #264, c8e6f45111e487fadbe7f0a13c7595eb23f3af6e, fb3f5474259d4a88f4ff545691cca9d1ccde90e3)
- Explicitly add R version 4.4 to the CI test pipeline (c8e6f45111e487fadbe7f0a13c7595eb23f3af6e)
- Refactor function `construct.edge.list.from.key.value.list` to be more readable (PR #263, 05c3bc09cb1d396fd59c34a88030cdca58fd04dd)

### Fixed

Expand Down
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,11 @@ There are four types of networks that can be built using this library: author ne
* The vertices in an artifact network denote any kind of artifact, e.g., source-code artifact (such as features or files) or communication artifact (such as mail threads or issues). All artifact-type vertices are uniquely identifiable by their name. There are only unipartite edges among artifacts in this type of network.
* The relations (i.e., the edges' meaning and source) can be configured using the [`NetworkConf`](#networkconf) attribute `artifact.relation`. The relation also describes which kinds of artifacts are represented as vertices in the network. (For example, if "mail" is selected as `artifact.relation`, only mail-thread vertices are included in the network.)

- Commit networks
* The vertices in a commit network denote any commits in the data. All vertices
are uniquely identifyable by the hash of the commit. There are only unipartite edges among commits in this type of network.
* The relations (i.e., the edges' meaning and source) can be configured using the [`networkConf`](#networkconf) attribute `commit.relation`. The relation also describes the type of data used for network construction (`cochange` uses commit data, `commit.interaction` uses commit interaction data).

- Bipartite networks
* The vertices in a bipartite network denote both authors and artifacts. There are only bipartite edges from authors to artifacts in this type of network.
* The relations (i.e., the edges' meaning and source) can be configured using the [`NetworkConf`](#networkconf) attribute `artifact.relation`.
Expand All @@ -249,6 +254,7 @@ Relations determine which information is used to construct edges among the verti
- `cochange`
* For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who change the same source-code artifact are connected with an edge.
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), source-code artifacts that are concurrently changed in the same commit are connected with an edge.
* For commit networks (configured vie `commit.relation` in the [`NetworkConf`](#networkconf)), commits are connected if they change the same artifact.
* For bipartite networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), authors get linked to all source-code artifacts they have changed in their respective commits.

- `mail`
Expand All @@ -269,6 +275,7 @@ Relations determine which information is used to construct edges among the verti
- `commit.interaction`
* For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who contribute to interacting commits are connected with an edge.
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), artifacts are connected when there is an interaction between two commits that occur in the artifacts.
* For commit networks (configured via `commit.relation` in the [`NetworkConf`](#networkconf)), commits are connected when they interact in the commit-interaction data.
* This relation does not apply for bipartite networks.

#### Edge-construction algorithms for author networks
Expand Down Expand Up @@ -623,7 +630,7 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(.
- `author.relation`
* The relation(s) among authors, encoded as edges in an author network
* **Note**: The author--artifact relation in bipartite and multi networks is configured by `artifact.relation`!
* possible values: [*`"mail"`*, `"cochange"`, `"issue"`]
* possible values: [*`"mail"`*, `"cochange"`, `"issue"`, `"commit.interaction"`]
- `author.directed`
* The directedness of edges in an author network
* [`TRUE`, *`FALSE`*]
Expand All @@ -642,11 +649,17 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(.
- `artifact.relation`
* The relation(s) among artifacts, encoded as edges in an artifact network
* **Note**: Additionally, this relation configures also the author--artifact relation in bipartite and multi networks!
* possible values: [*`"cochange"`*, `"callgraph"`, `"mail"`, `"issue"`]
* possible values: [*`"cochange"`*, `"callgraph"`, `"mail"`, `"issue"`, `"commit.interaction"`]
- `artifact.directed`
* The directedness of edges in an artifact network
* **Note**: This parameter does only affect the `issue` relation, as the `cochange` relation is always undirected, while the `callgraph` relation is always directed. For the `mail`, we currently do not have data available to exhibit edge information.
* [`TRUE`, *`FALSE`*]
- `commit.relation`
* The relation(s) among commits, encoded as edges in a commit network
* possible values: [*`"cochange"`*, `"commit.interaction"`]
- `commit.directed`
* The directedness of edges in a commit network
* [`TRUE`, *`FALSE`*]
- `edge.attributes`
* The list of edge-attribute names and information
* a subset of the following as a single vector:
Expand Down
11 changes: 10 additions & 1 deletion showcase.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
## Copyright 2021 by Niklas Schneider <[email protected]>
## Copyright 2022 by Jonathan Baumann <[email protected]>
## Copyright 2024 by Maximilian Löffler <[email protected]>
## Copyright 2024 by Leo Sendelbach <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -65,6 +66,7 @@ ARTIFACT = "feature" # function, feature, file, featureexpression (only relevant

AUTHOR.RELATION = "mail" # mail, cochange, issue
ARTIFACT.RELATION = "cochange" # cochange, callgraph, mail, issue
COMMIT.RELATION = "commit.interaction" # commit.interaction, cochange


## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
Expand All @@ -73,13 +75,16 @@ ARTIFACT.RELATION = "cochange" # cochange, callgraph, mail, issue
## initialize project configuration
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("commits.filter.base.artifact", TRUE)
proj.conf$update.value("commit.interactions", TRUE)
## specify that custom event timestamps should be read from 'custom-events.list'
proj.conf$update.value("custom.event.timestamps.file", "custom-events.list")
proj.conf$print()

## initialize network configuration
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = AUTHOR.RELATION, artifact.relation = ARTIFACT.RELATION))
net.conf$update.values(updated.values = list(author.relation = AUTHOR.RELATION,
artifact.relation = ARTIFACT.RELATION,
commit.relation = COMMIT.RELATION))
net.conf$print()

## get ranges
Expand Down Expand Up @@ -141,6 +146,7 @@ x$get.author.network()
x$update.network.conf(updated.values = list(author.directed = FALSE))
x$get.author.network()
x$get.artifact.network()
x$get.commit.network()
x$reset.environment()
x$get.networks()
x$update.network.conf(updated.values = list(author.only.committers = FALSE, author.directed = FALSE))
Expand Down Expand Up @@ -201,6 +207,7 @@ y$update.network.conf(updated.values = list(edge.attributes = c("date")))
y$get.author.network()
y$update.network.conf(updated.values = list(edge.attributes = c("hash")))
y$get.artifact.network()
y$get.commit.network()
y$get.networks()
y$update.network.conf(updated.values = list(author.only.committers = FALSE, author.directed = TRUE))
h = y$get.bipartite.network()
Expand Down Expand Up @@ -232,6 +239,8 @@ sample.pull.requests = add.vertex.attribute.author.issue.count(my.networks, x.da
## add vertex attributes for the project-level network
x.net.as.list = list("1970-01-01 00:00:00-2030-01-01 00:00:00" = x$get.author.network())
sample.entire = add.vertex.attribute.author.commit.count(x.net.as.list, x.data, aggregation.level = "complete")
## add vertex attributes to commit network. Default value 'NO_AUTHOR' is used if vertex is not in commit data
add.vertex.attribute.commit.network(x$get.commit.network(), x.data, attr.name = "author.name", default.value = "NO_AUTHOR")


## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
Expand Down
10 changes: 6 additions & 4 deletions tests/test-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -564,15 +564,15 @@ test_that("Compare two ProjectData Objects with commit.interactions", {
proj.data.two$set.commits(create.empty.commits.list())

## create empty data frame of correct size
commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8))
commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 9))
## assure that the correct type is used
for(i in seq_len(8)) {
for(i in seq_len(9)) {
commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]])
}
## set everything except for authors as expected
colnames(commit.interactions.data.expected) = c("commit.hash", "base.hash", "func", "file",
"base.func", "base.file", "base.author",
"interacting.author")
"base.func", "base.file","artifact.type",
"base.author", "interacting.author")
commit.interactions.data.expected[["commit.hash"]] =
c("0a1a5c523d835459c42f33e863623138555e2526",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f",
Expand All @@ -588,6 +588,8 @@ test_that("Compare two ProjectData Objects with commit.interactions", {
commit.interactions.data.expected[["base.func"]] = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2")
commit.interactions.data.expected[["base.file"]] = c("test2.c", "test2.c", "test3.c", "test2.c")
commit.interactions.data.expected[["artifact.type"]] = c("CommitInteraction", "CommitInteraction",
"CommitInteraction", "CommitInteraction")

expect_equal(proj.data.two$get.commit.interactions(), commit.interactions.data.expected)

Expand Down
2 changes: 2 additions & 0 deletions tests/test-networks-artifact.R
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ patrick::with_parameters_test_that("Network construction with commit-interaction
"test3.c::test_function", "test2.c::test2"),
base.author = c("Olaf", "Thomas", "Karl", "Thomas"),
interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"),
artifact.type = c("File", "File", "File", "File"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
Expand Down Expand Up @@ -301,6 +302,7 @@ patrick::with_parameters_test_that("Network construction with commit-interaction
base.file = c("test2.c", "test2.c", "test3.c", "test2.c"),
base.author = c("Olaf", "Thomas", "Karl", "Thomas"),
interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"),
artifact.type = c("Function", "Function", "Function", "Function"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
Expand Down
1 change: 1 addition & 0 deletions tests/test-networks-author.R
Original file line number Diff line number Diff line change
Expand Up @@ -720,6 +720,7 @@ patrick::with_parameters_test_that("Network construction with commit-interaction
base.func = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2"),
base.file = c("test2.c", "test2.c", "test3.c", "test2.c"),
artifact.type = c("CommitInteraction", "CommitInteraction", "CommitInteraction", "CommitInteraction"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
Expand Down
Loading

0 comments on commit 55dc0cc

Please sign in to comment.