Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data cutting and network metrics #78

Merged
merged 47 commits into from
Dec 13, 2017
Merged

Data cutting and network metrics #78

merged 47 commits into from
Dec 13, 2017

Conversation

hechtlC
Copy link
Contributor

@hechtlC hechtlC commented Dec 11, 2017

In this PR, we aim to introduce data cutting (see issue #38) and a bunch
of network metrics (see issue #73). See the respective issues for more
details.

Raphael-N and others added 30 commits August 23, 2017 20:00
Signed-Off-By: Raphael Nömmer <[email protected]>
Add functionality to cut data sources to the same date ranges
Add parameter in NetworkConf for that purpose
Add cutting functionalities in the NetworkBuilder

fixes #38

Signed-off-by: Christian Hechtl <[email protected]>
Signed-Off-By: Raphael Nömmer <[email protected]>
The cutting is replaced by the already existing splitting mechanism

Signed-off-by: Christian Hechtl <[email protected]>
The timestamps are now extracted when the issue getter is called
A warning message is printed when the data sources don't overlap
Add project data getter in the NetworkBuilder for testing reasons

Signed-off-by: Christian Hechtl <[email protected]>
Add checking of mail data in the cutting tests.

Signed-off-by: Christian Hechtl <[email protected]>
Signed-off-by: Christian Hechtl <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Fix "author.email". Introduce "ref.name" column in tests (WIP and
temporary fix).

Add TODO item for fixing the test file "issues.list".

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Christian Hechtl <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
Signed-off-by: Raphael Nömmer <[email protected]>
@clhunsen clhunsen added this to the v3.0 milestone Dec 11, 2017
To avoid a merge conflict in PR #78, we fix the indentation of a
statement in the file 'util-read.R'.

Signed-off-by: Claus Hunsen <[email protected]>
Merging the changes from se-passau/dev to avoid merge conflicts in #78.

Signed-off-by: Claus Hunsen <[email protected]>
We now always have e-mail-address data available for authors,
independent of the real data source containing those. If there is no
data available, we add NAs.

This is a follow-up for issue #69 and PR #71.

Signed-off-by: Claus Hunsen <[email protected]>
@clhunsen
Copy link
Collaborator

clhunsen commented Dec 11, 2017

I had no choice but merging the changes from se-passau/dev to resolve the merge conflicts effectively... Next time, we would like to aim for a rebase.

I still need to review the changes!

I also added a commit (7bfbe84) to enhance issue #69 and PR #71.

For easier internal use of data-source names (i.e., "commits", "issues",
and "mails"), the data item containing the commit data is now called
"commits" -- and not "commits.raw" anymore. All corresponding methods
and method calls are renamed accordingly.

This change will make it easier to handle data sources by their specific
name, e.g., when performing a parameterizable subset
'proj.data[[data.source.name]]'.

Note: The methods 'ProjectData$get.commits.raw()' and
'ProjectData$set.commits.raw()' are still there for compatibility
reasons. They are now mere delegates to the new methods.

Signed-off-by: Claus Hunsen <[email protected]>
@ecklbarb
Copy link
Contributor

The attribute name does not exist in the function metrics.modularity() (line 104).

After the landing of commit a803425e6bdb54c1654fb9de1f9375499e3aa829,
the general code for the data cutting mechanism is streamlined:
1) The 'ProjectData$data.timestamps' attribute is now transposed -- to
map data sources per line to their respective timestamps in the columns.
This way is more intuitive and better for later access.
2) All related methods are adapted accordingly.
3) The amount of inline documentation is increased significantly.

Signed-off-by: Claus Hunsen <[email protected]>
Reviewed-by: Thomas Bock <[email protected]>
This patch only applies some more readable code formatting to the
networks-metrics module.

Signed-off-by: Claus Hunsen <[email protected]>
In this patch, we fix two minor bugs in the network metrics
'node.degree' and 'modularity'. In the first, the unsorted result was
not assigned correctly to the return value. In the latter, the single
modularity value does not need a name (which was also an undefined
variable).

Props to @ecklbarb for reporting these two mistakes.

Signed-off-by: Claus Hunsen <[email protected]>
Fix mistakes and add 'unify.date.ranges' documentation.

Signed-off-by: Claus Hunsen <[email protected]>
Now, in the README file and the configuration module, we have
'ProjectConf' documentation first and 'NetworkConf'
documentation second.

Signed-off-by: Claus Hunsen <[email protected]>
This is just a code movement of the method 'get.pasta.items' for better
structure in the 'ProjectData' class.

Signed-off-by: Claus Hunsen <[email protected]>
In the metrics module, single-value returns are now named vectors. The
name is the metrics name as used in the respective function definition.

Additionally, the column names for the scale-freeness metric is changed
to not include 'res.' at each column name's beginning.

Signed-off-by: Claus Hunsen <[email protected]>
clhunsen
clhunsen previously approved these changes Dec 13, 2017
Copy link
Collaborator

@clhunsen clhunsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the PR looked good right away, but I improved it nevertheless by pushing some convenient commits myself. For me, this is ready to merge.

README.md Outdated
@@ -124,19 +124,22 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(.
- issue information: *`"issue.id"`*, *`"event.name"`*, `"issue.state"`, `"creation.date"`, `"closing.date"`, `"is.pull.request"`
* **Note**: `"date"` is always included as this information is needed for several parts of the library, e.g., time-based splitting.
* **Note**: For each type of network that can be built, only the applicable part of the given vector of names is respected.
* **Note**: For the edge attributes `"pasta"` and `"synchronicty"`, the network configuration's parameters `pasta` and `synchronicity` need to be set to `TRUE`, respectively (see below).
* **Note**: For the edge attributes `"pasta"` and `"synchronicty"`, the project configuration's parameters `pasta` and `synchronicity` need to be set to `TRUE`, respectively (see below).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clhunsen Could you please fix the typo in "synchronicty" here?

Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good so far, beside the README.

There are several issues to fix in the README:

  • The files util-read.R and util-networks-metrics.R are missing in the section "File overview".
  • The "How to" section is not up to date any more as there should appear the NetworkBuilder, I guess.
  • In the section "Configuration Classes" the subsectioning is wrong: "Project Conf" should be on the same level as "Network Conf" and, therefore, start with ###.

- Add missing files and their respective descriptions.
- Update how-to section's code snippet.
- Fix some typos.
- Fix the indentation of the 'ProjectConf' sections.
- Add proper links to intra-document sections.
- Add syntax highlighting for multi-line code snippets.

Props to @bockthom for mentioning some of the points in PR #78.

Signed-off-by: Claus Hunsen <[email protected]>
Fix typos in the roxygen documentation of some of the reading functions.
Remove wrong documentation for the issue-reading function.

Props to @bockthom for pointing this out.

Signed-off-by: Claus Hunsen <[email protected]>
For better comprehensibility, the showcase file is renamed to
'showcase.R', as the name 'test.R' was misleading regarding the tests.

Signed-off-by: Claus Hunsen <[email protected]>
@bockthom bockthom merged commit 86f5c3a into se-sic:dev Dec 13, 2017
@bockthom bockthom mentioned this pull request Dec 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants