Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tensor representation and EDCPTD centrality calculation #173

Merged
merged 5 commits into from
Nov 8, 2020

Conversation

fehnkera
Copy link
Contributor

@fehnkera fehnkera commented Oct 21, 2019

Functions that enable to represent OSS projects as forth-order tensors and to calculate EDCPTD centrality on them are added to a temporary file 'util-tensor.R'. The functions are not final. Feedback and discussions are welcome.

Signed-off-by: Anselm Fehnker [email protected]

Prerequisites

  • I adhere to the coding conventions (described here) in my code.
  • I have updated the copyright headers of the files I have modified.
  • I have written appropriate commit messages, i.e., I have recorded the goal, the need, the needed changes, and the location of my code modifications for each commit. This includes also, e.g., referencing to relevant issues.
  • I have put signed-off tags in all commits.
  • I have updated the changelog file NEWS.md appropriately.
  • The pull request is opened against the branch dev.

Description

Changelog

Added

  • Add a new file util-tensor.R containing the class FourthOrderTensor to create (author x relation x author x relation) tensors from a list of networks (with each network having a different relation) and its corresponding utility function get.author.networks.for.multiple.relations (PR Add tensor representation and EDCPTD centrality calculation #173, c136b1f, e4ee0dc, 051a5f0)
  • Add function calculate.EDCPTD.centrality for calculating the EDCPTD centrality for a fourth-order tensor in the above described form (c136b1f, e4ee0dc, 051a5f0)
  • Add new file util-networks-misc.R which contains miscellaneous functions for processing network data and creating and converting various kinds of adjacency matrices: get.author.names.from.networks, get.author.names.from.data, get.expanded.adjacency, get.expanded.adjacency.matrices, get.expanded.adjacency.matrices.cumulated, convert.adjacency.matrix.list.to.array (051a5f0)

Changed/Improved

  • Adjust the function get.authors.by.data.source: Rename its single parameter to data.sources and change the function so that it can extract the authors for multiple data sources at once. The default value of the parameter is a vector containing all the available data sources (commits, mails, issues) (051a5f0)

util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Show resolved Hide resolved
Copy link
Collaborator

@clhunsen clhunsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fehnkera!

Thank you very much for your effort to bring this functionality to coronet! 😀👍 I had a quick look at your changes/additions and identified some issues that need to be solved. I tried my best to give you some guidance in my comments to solve them appropriately. 🙂

There are mainly the following things to do:

  • Adaptation of the code to the coding conventions,
  • addition of license and copyright headers,
  • potential movement of functions to other files (open question), and
  • addition of information to the file README.md.

As always, please do not hesitate to ask if you have any questions. If you do not have any time left, please tell us (we will likely adopt this PR then).

util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-init.R Show resolved Hide resolved
install.R Show resolved Hide resolved
@clhunsen clhunsen changed the base branch from master to dev November 5, 2019 13:58
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this WIP pull request @fehnkera.

In addition to the comments of @clhunsen, I just want to add that everything which is taken from the dev-network-growth repository needs to be adjusted to the coding conventions in coronet (as they are different or, more clearly, not enforced in the dev-network-growth repo). In addition, you should keep track of the original authors and comments. If you have questions regarding that or need help with that, please contact me via e-mail -- when transferring those functions to coronet, we also need to add some more documentation which may be somewhere in old commit messages of dev-network-growth, to make sure to add everything which is necessary to understand those functions. However, I would suggest to deal with that (the functions to take from the other repository) after the rest of this PR is fixed.

Please also see my individual comments.

util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated
}

# function that builds a forth-order tensor containing the same information as the single-layer developer networks
build.tensor.from.adjacency = function(networks, active.developers){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clhunsen I did not have time yet to have a closer look at the code, but keep in mind that we have lots of cases where we do not derive the active developers from the networks! In the networks we can have developers which are no longer active but are just contained in the network for a recent edge (depending on the kind of data/network splitting). In such cases, it often happens that the active users need to passed separately, independent from the network. As I did not have a closer look at the code here, I don't know whether specifying active developers is necessary in this context, but in general it can make sense.

util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Show resolved Hide resolved
showcase.R Show resolved Hide resolved
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fehnkera for continuing your work on this pull request.

The new structure looks way better than before. However, there are still some issues that need to be fixed, please have a look at my individual comments.

As already stated earlier, you need to adjust the code to the coding conventions. And also the functions taken from another repository need to be attributed correctly and we should also move them to another file, together with their companion functions.

After fixing those issues, I will have a detailed look at the pull request again.

(And in the end, you need to rebase on the current dev branch, but it would be fine to do this after the next review cycle. And also some commit messages need to be adjusted to our conventions -- therefore, you also need further rebases...)

util-tensor.R Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fehnkera Thanks for your changes -- I had a quick look at them and still found lots of lines which violate our contribution guide. Please see my individual review comments below to fix them. I highly suggest to rebase your last commit to fix all the following issues. (If you don't remember interactive rebasing or ammending your last commit, please just let me know, then I can add more details on how to do that.)

In addition to the coding convention issues, there are still some minor parts missing from what we have mentioned in earlier review comments. Please also rebase them into the last commit, thanks!

showcase.R Outdated Show resolved Hide resolved
showcase.R Outdated Show resolved Hide resolved
showcase.R Outdated Show resolved Hide resolved
util-init.R Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
util-tensor.R Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @fehnkera. This pull request now goes into the right direction 😄 I think we are already close to finishing this PR, but there are still some issues. I hope that this is the last review of this PR, as most things really look good -- but let's see how you address my comments.

In general, there are four remaining issues:

  1. Minor issues (documentation, function calls, etc.): I spent some time on looking at all the outdated review comments and checked for each of them, which one you already had addressed and which one still is not addressed (yes, there were a lot of comments by @clhunsen and me in this PR, so I spent more time on that than expected...). As it was even time-consuming for me to find the unaddressed comments, I marked all the outdated but not yet addressed comments as "resolved" and added them as new comments to this review -- that makes it easier for you (and also for me) to see what still needs to be changed 😉 So, please see the individual comments below.

  2. There is one problem in the implementation I have spotted right now: Your implementation of get.expanded.adjacency operates on a single network only, whereas Angelika's version operated on a list of networks. Hence, the function get.expanded.adjacency.cumulated does not work at the moment. However, this is easy to fix via introducing an additional function -- I have provided a description for a fix already in my comments below.

  3. Commit messages: After fixing all the things mentioned in my comments, please adjust the commit messages (via interactive rebasing):
    The commit "Changes process of calculating EDCPTD centrality" needs to be renamed to "Change process of calculating EDCPTD centrality" (Change instead of Changes).
    And in the latest commit, your Signed-off tag does not contain your name - please replace "fehnkera" by your name there. Thanks!

  4. Last but not least: After you have fixed all the problems within this pull request, we need to make this pull request ready to be merged. Therefore, it needs to have the correct base (that it uses not the correct base can you see at Claus's merge commits showing up in your PR). Your branch "fehnkera:tensor" still has the wrong base: It is based on master, not on dev. Therefore, please checkout the current dev branch and rebase your changes of this PR onto it, resulting in only having your commits in this PR. After that, please don't forget to update the changelog file (it is important to do that after your last rebase as commit hashes change during rebasing.) And then, your PR should be ready to merge 😉 But before that, I will check if issues 1, 2, and 3 are addressed correctly - step 4 is really the final step after everything else has been fixed.

If you have any questions regarding my comments or regarding rebasing onto dev branch, just don't hesitate to ask. The end of this PR already is in sight, only few steps missing 😄

util-networks-misc.R Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-tensor.R Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
util-networks-misc.R Show resolved Hide resolved
util-tensor.R Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
@fehnkera
Copy link
Contributor Author

Okay, I think I have completed the four remaining issues now @bockthom. Is there anything else that needs to be done in this PR?

util-networks-misc.R Outdated Show resolved Hide resolved
util-networks-misc.R Outdated Show resolved Hide resolved
@bockthom
Copy link
Collaborator

Okay, I think I have completed the four remaining issues now @bockthom. Is there anything else that needs to be done in this PR?

There are some minor issues in the one function getting the names, but when they are fixed, I think we are done with this PR, if @clhunsen agrees. However, after rebasing your last commit for the last time, you need to add your adjustments to the changelog file NEWS.md which should be the very last step before merging.

If you are still willing to work on this PR you could write tests for it. Otherwise (if you already run out of working hours), we are done with this PR as soon as the last bugs are fixed and the changelog file is adjusted in a reasonable way 😉

Apply the Review from @bockthom and @clhunsen on the previous changes.
This includes compliance of coding conventions, update of copyright
headers and improvement of documentation. Move the functions for
'get.author.names.from.networks' and 'get.expanded.adjacency' to new
file 'util-networks-misc.R'. Also add two functions
'get.author.names.from.data' and
'convert.adjacency.matrix.list.to.array' from the 'dev-network-growth'
project to the new file.

Signed-off-by: Anselm Fehnker <[email protected]>
bockthom
bockthom previously approved these changes Oct 16, 2020
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, we need to adjust the changelog. I have provided a draft for the changelog, please check if this is correct:

  • Add a new line ## Unversioned above version 3.6.
  • Add a new section ### Added below ## Unversioned
  • Add the following bullet points:
- Add a new file `util-tensor.R` containing the class `FourthOrderTensor` to create (author x relation x author x relation) tensors from a list of networks (with each network having a different relation) and its corresponding utility function `get.author.networks.for.multiple.relations` (PR #173, c136b1f6127d73c25f08ae2f317246747aa9ea2b, e4ee0dc926b22ff75d5fd801c1f131bcff4c22eb, 051a5f0287022f97e2367ed0e9591b9df9dbdb3d)
- Add function `calculate.EDCPTD.centrality` for calculating the EDCPTD centrality for a fourth-order tensor in the above described form (c136b1f6127d73c25f08ae2f317246747aa9ea2b, e4ee0dc926b22ff75d5fd801c1f131bcff4c22eb, 051a5f0287022f97e2367ed0e9591b9df9dbdb3d)
- Add new file `util-networks-misc.R` which contains miscellaneous functions for processing network data and creating and converting various kinds of adjacency matrices: `get.author.names.from.networks`, `get.author.names.from.data`, `get.expanded.adjacency`, `get.expanded.adjacency.matrices`, `get.expanded.adjacency.matrices.cumulated`, `convert.adjacency.matrix.list.to.array` (051a5f0287022f97e2367ed0e9591b9df9dbdb3d)

  • Add a new section ### Changed/Improved
  • Add the following bullet points in the Changed/Improved section:
- Adjust the function `get.authors.by.data.source`: Rename its single parameter to `data.sources` and change the function so that it can extract the authors for multiple data sources at once. The default value of the parameter is a vector containing all the available data sources (commits, mails, issues) (051a5f0287022f97e2367ed0e9591b9df9dbdb3d)

Please check if this is correct and whether I have forgotten anything.

@bockthom bockthom changed the title WIP: Add tensor representation and EDCPTD centrality calculation Add tensor representation and EDCPTD centrality calculation Oct 16, 2020
@fehnkera
Copy link
Contributor Author

Thank you very much for this good draft @bockthom ! I checked it and everything seems to be correct. I hope I have implemented it in the wanted way.

Overall I have to say thank you to all of you @bockthom @clhunsen @ecklbarb ! Thank you for your patience and your really good help. I have learned a lot by working with you.

And again I am very sorry that this PR took so much time!

@clhunsen
Copy link
Collaborator

Overall I have to say thank you to all of you @bockthom @clhunsen @ecklbarb ! Thank you for your patience and your really good help. I have learned a lot by working with you.

It was a pleasure working with you, @fehnkera! Good luck and all the best for everything that is ahead of you! Take care, Anselm. 🥲:hugs:

Thank you very much for this good draft @bockthom ! I checked it and everything seems to be correct. I hope I have implemented it in the wanted way.

That is your part, @bockthom! Merge if everything is okay for you. 🙂👍

@bockthom
Copy link
Collaborator

That is your part, @bockthom! Merge if everything is okay for you. slightly_smiling_face+1

Sorry for the late reply, I had vacation for the last weeks. I just had a look at this PR right now. Looks good, except for one minor issue:

@fehnkera The signed-off tag is missing in your last commit 55ad8d8. Could you please add the signed-off tag to your commit message? Thanks in advance. This will be the final change, I will merge this PR right afterwards 😄

Overall I have to say thank you to all of you @bockthom @clhunsen @ecklbarb ! Thank you for your patience and your really good help. I have learned a lot by working with you.
Thank you very much for this good draft @bockthom ! I checked it and everything seems to be correct. I hope I have implemented it in the wanted way.

Also thanks to you @fehnkera for all of your work and, especially, also for your endurance (that's not to be taken for granted). Working on this last PR took longer than expected (for all of us, I'll guess) and also getting familiar with all the details again (as this PR was opened one year ago) was tedious for everyone. However, finally, we have added lots of useful functionality to coronet and this will definitively improve the future use of coronet. So, I am glad that we will finally merge this PR soon. I hope you enjoyed the work on coronet (even though there were lots of reviews and change requests) and also hope that your work effort also comes along with good working experiences for you. I and also the whole @se-passau/codeface-admins team wish you all the best for your future. 🎈

@fehnkera
Copy link
Contributor Author

Sorry for the late reply, I had vacation for the last weeks. I just had a look at this PR right now. Looks good, except for one minor issue:

No problem. I hope you had a nice vacation 😄

@fehnkera The signed-off tag is missing in your last commit 55ad8d8. Could you please add the signed-off tag to your commit message? Thanks in advance. This will be the final change, I will merge this PR right afterwards smile

Oh yes, sorry. I will add this right away!

Also thanks to you @fehnkera for all of your work and, especially, also for your endurance (that's not to be taken for granted). Working on this last PR took longer than expected (for all of us, I'll guess) and also getting familiar with all the details again (as this PR was opened one year ago) was tedious for everyone. However, finally, we have added lots of useful functionality to coronet and this will definitively improve the future use of coronet. So, I am glad that we will finally merge this PR soon. I hope you enjoyed the work on coronet (even though there were lots of reviews and change requests) and also hope that your work effort also comes along with good working experiences for you. I and also the whole @se-passau/codeface-admins team wish you all the best for your future. 🎈

Yes, i have definitely enjoyed the work and gained very good working experiences. Thank you!

The changelog file 'News.md' is adjusted. The section '## Unversioned'
is added above the latest release version. The new features added in
this Pull-Request are described in the subscetions '### Added' and '###
Changed/Improved'.

Signed-off-by: Signed-off-by: Anselm Fehnker <[email protected]>
@fehnkera
Copy link
Contributor Author

Is there any reason why the checks have failed now? All I did was changing the commit message of the last commit 😅

@bockthom
Copy link
Collaborator

Is there any reason why the checks have failed now? All I did was changing the commit message of the last commit sweat_smile

I just checked the travis builds a few minutes ago and found out that this is a general problem with travis using certain R versions, as it also fails for the current master and dev branches. However, the builds are successful for R version 3.6 but not for previous R versions, which is weird. The reason why the checks fail is that the log output of the test runs is too large for travis, so it just terminates the build runs before they are finished.

Thanks for asking, but this is definitively not related to your changes, but rather to our build configuration. So no worries, everything is fine with your PR 😄 I will try to deal with the build problems as soon as I have time to do that. This is not part of this PR.

@bockthom
Copy link
Collaborator

bockthom commented Nov 8, 2020

As the build errors of the Travis CI check are not related to this pull request (as already stated above and later on partly discussed in #161), I will merge this PR right away (despite having failing builds here) -- the Travis build problems are going to be fixed very soon in another PR, I already have implemented some fixes for these problems.
So, after more than one year of working on integrating the new enhancements of @fehnkera, I am glad to finally incorporate them into coronet now 🎉

@bockthom bockthom merged commit 5e3baa8 into se-sic:dev Nov 8, 2020
@bockthom bockthom mentioned this pull request Dec 2, 2020
bockthom added a commit to bockthom/coronet that referenced this pull request Dec 14, 2020
In PR se-sic#173, we have missed a violation of our coding conventions.
So, let's fix this violation now.

Signed-off-by: Thomas Bock <[email protected]>
bockthom added a commit to bockthom/coronet that referenced this pull request Dec 14, 2020
In PR se-sic#173, we have missed a violation of our coding conventions.
So, let's fix this violation now.

Signed-off-by: Thomas Bock <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants