-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data cutting and network metrics #78
Conversation
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-Off-By: Raphael Nömmer <[email protected]>
Signed-Off-By: Raphael Nömmer <[email protected]>
Signed-Off-By: Raphael Nömmer <[email protected]>
Signed-Off-By: Raphael Nömmer <[email protected]>
Signed-Off-By: Raphael Nömmer <[email protected]>
Add functionality to cut data sources to the same date ranges Add parameter in NetworkConf for that purpose Add cutting functionalities in the NetworkBuilder fixes #38 Signed-off-by: Christian Hechtl <[email protected]>
Signed-off-by: Christian Hechtl <[email protected]>
Signed-Off-By: Raphael Nömmer <[email protected]>
Signed-Off-By: Raphael Nömmer <[email protected]>
The cutting is replaced by the already existing splitting mechanism Signed-off-by: Christian Hechtl <[email protected]>
The timestamps are now extracted when the issue getter is called A warning message is printed when the data sources don't overlap Add project data getter in the NetworkBuilder for testing reasons Signed-off-by: Christian Hechtl <[email protected]>
Signed-off-by: Christian Hechtl <[email protected]>
Add checking of mail data in the cutting tests. Signed-off-by: Christian Hechtl <[email protected]>
Signed-off-by: Christian Hechtl <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Fix "author.email". Introduce "ref.name" column in tests (WIP and temporary fix). Add TODO item for fixing the test file "issues.list". Signed-off-by: Claus Hunsen <[email protected]> Signed-off-by: Christian Hechtl <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
To avoid a merge conflict in PR #78, we fix the indentation of a statement in the file 'util-read.R'. Signed-off-by: Claus Hunsen <[email protected]>
Merging the changes from se-passau/dev to avoid merge conflicts in #78. Signed-off-by: Claus Hunsen <[email protected]>
We now always have e-mail-address data available for authors, independent of the real data source containing those. If there is no data available, we add NAs. This is a follow-up for issue #69 and PR #71. Signed-off-by: Claus Hunsen <[email protected]>
For easier internal use of data-source names (i.e., "commits", "issues", and "mails"), the data item containing the commit data is now called "commits" -- and not "commits.raw" anymore. All corresponding methods and method calls are renamed accordingly. This change will make it easier to handle data sources by their specific name, e.g., when performing a parameterizable subset 'proj.data[[data.source.name]]'. Note: The methods 'ProjectData$get.commits.raw()' and 'ProjectData$set.commits.raw()' are still there for compatibility reasons. They are now mere delegates to the new methods. Signed-off-by: Claus Hunsen <[email protected]>
The attribute |
After the landing of commit a803425e6bdb54c1654fb9de1f9375499e3aa829, the general code for the data cutting mechanism is streamlined: 1) The 'ProjectData$data.timestamps' attribute is now transposed -- to map data sources per line to their respective timestamps in the columns. This way is more intuitive and better for later access. 2) All related methods are adapted accordingly. 3) The amount of inline documentation is increased significantly. Signed-off-by: Claus Hunsen <[email protected]> Reviewed-by: Thomas Bock <[email protected]>
This patch only applies some more readable code formatting to the networks-metrics module. Signed-off-by: Claus Hunsen <[email protected]>
In this patch, we fix two minor bugs in the network metrics 'node.degree' and 'modularity'. In the first, the unsorted result was not assigned correctly to the return value. In the latter, the single modularity value does not need a name (which was also an undefined variable). Props to @ecklbarb for reporting these two mistakes. Signed-off-by: Claus Hunsen <[email protected]>
Fix mistakes and add 'unify.date.ranges' documentation. Signed-off-by: Claus Hunsen <[email protected]>
Now, in the README file and the configuration module, we have 'ProjectConf' documentation first and 'NetworkConf' documentation second. Signed-off-by: Claus Hunsen <[email protected]>
This is just a code movement of the method 'get.pasta.items' for better structure in the 'ProjectData' class. Signed-off-by: Claus Hunsen <[email protected]>
In the metrics module, single-value returns are now named vectors. The name is the metrics name as used in the respective function definition. Additionally, the column names for the scale-freeness metric is changed to not include 'res.' at each column name's beginning. Signed-off-by: Claus Hunsen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, the PR looked good right away, but I improved it nevertheless by pushing some convenient commits myself. For me, this is ready to merge.
README.md
Outdated
@@ -124,19 +124,22 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(. | |||
- issue information: *`"issue.id"`*, *`"event.name"`*, `"issue.state"`, `"creation.date"`, `"closing.date"`, `"is.pull.request"` | |||
* **Note**: `"date"` is always included as this information is needed for several parts of the library, e.g., time-based splitting. | |||
* **Note**: For each type of network that can be built, only the applicable part of the given vector of names is respected. | |||
* **Note**: For the edge attributes `"pasta"` and `"synchronicty"`, the network configuration's parameters `pasta` and `synchronicity` need to be set to `TRUE`, respectively (see below). | |||
* **Note**: For the edge attributes `"pasta"` and `"synchronicty"`, the project configuration's parameters `pasta` and `synchronicity` need to be set to `TRUE`, respectively (see below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clhunsen Could you please fix the typo in "synchronicty"
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good so far, beside the README.
There are several issues to fix in the README:
- The files
util-read.R
andutil-networks-metrics.R
are missing in the section "File overview". - The "How to" section is not up to date any more as there should appear the
NetworkBuilder
, I guess. - In the section "Configuration Classes" the subsectioning is wrong: "Project Conf" should be on the same level as "Network Conf" and, therefore, start with
###
.
- Add missing files and their respective descriptions. - Update how-to section's code snippet. - Fix some typos. - Fix the indentation of the 'ProjectConf' sections. - Add proper links to intra-document sections. - Add syntax highlighting for multi-line code snippets. Props to @bockthom for mentioning some of the points in PR #78. Signed-off-by: Claus Hunsen <[email protected]>
Fix typos in the roxygen documentation of some of the reading functions. Remove wrong documentation for the issue-reading function. Props to @bockthom for pointing this out. Signed-off-by: Claus Hunsen <[email protected]>
For better comprehensibility, the showcase file is renamed to 'showcase.R', as the name 'test.R' was misleading regarding the tests. Signed-off-by: Claus Hunsen <[email protected]>
In this PR, we aim to introduce data cutting (see issue #38) and a bunch
of network metrics (see issue #73). See the respective issues for more
details.