-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRAN Task View: NetworkAnalysis #61
Comments
Thanks for the proposal Fabio @FATelarico & Co! I appreciate that you have compiled this list of packages along with the corresponding description. This is a useful overview of a collection of packages on block modeling. However, I think that this is too specialized and not substantial enough for a standalone task view. I'm also cc'ing Bettina @bettinagruen as the principal maintainer of the "Cluster" task view to see what she thinks and whether this could become a section in the "Cluster" task view. However, overall my feeling is that this would better fit a "Network Analysis" task view which we currently do not have. Also I'm cc'ing Søren @hojsgaard in case he has any thoughts or recommendations. |
I agree that this would be best suited for a Network Analysis task view. The Cluster task view contains already a number of sections. One could thus rather easily add a section on block modeling. However, the description of the packages would need to be more detailed covering each package separately to be in line with how other packages are described in the task view. Presumably none of the packages would then also qualify as core for this more general task view. |
Following on your suggestions the porposed task view was updated to address the entire field of network analysis |
Thank you @FATelarico ! I checked your new proposal and to me, the core packages are well identified and relevant. However,
|
Fabio, thanks for all the work and the quick revision! This is a useful start for a network analysis task view. In addition to Nathalie's comments a few additional thoughts:
|
Thanks to the editors for their comments.
|
Thanks! I liked the new version. However, I am not convinced by your section on "Bio-Chemical Networks": In your answer above, you explain that network inference is out-of-the-scope of your TV (and I could agree with that) but you are citing three packages for network inference that are far from being the most known and used (citing A few additional minor remarks:
|
I agree that there is good progress here. However, I feel that the inclusion/exclusion criteria are not as clear yet as they should be. Getting contributions/feedback from someone with more ergm expertise would probably be good. And the separation with graphical models has also some room for improvement. Hence, I'm pinging Søren @hojsgaard again: Could you please have a look at the proposal? And I suggest that we wait until you have feedback/suggestions from two more potential co-maintainers who can increase the diversity among the maintainer team. |
Pavel Krivitsky from Statnet here. Thank you, @FATelarico, for inviting me. I'll read through the discussion and the draft in detail later, but I want to flag a few items as a matter of first impression.
|
Pavel @krivit, thank you for your inputs, this is very much appreciated! I think it would be great if you could change the team of co-maintainers of the task view, in order to bring a new perspective to the team based on your expertise. For a short introduction to the idea of CRAN task views and the corresponding file format, see Documentation.md. If you are interested in more details and some background information, see doi:10.48550/arXiv.2305.17573. Some quick feedback regarding the points you raised:
Thanks & best wishes! |
@FATelarico , if I want to make edits, should I use PRs or push to the repository directly?
Thanks!
This is more about functionality rather than endorsement. The short of it is that
I don't think there is any harm in doing both.
A good phrasing might be challenging to come up with, but I suppose we can play around with it and see what happens. |
Re: In this case, my feeling is that the scope of the package belongs rather clearly to "Epidemiology" and thus I would avoid the duplication. Feel free to iterate, if I'm missing something here (e.g., if If you feel that the "Epidemiology" task view should have a dedicated section on disease networks, I would encourage you to raise this with the Epidemiology task view maintainers. |
@krivit pushing to the main branch is okay, I have a local copy indexed by version being download after every commit.
I agree. As mentioned in the currnt draft, |
A new draft is online, I apologise for the delay.
The choice was quite arbitrary because none of us is directly involved in this field, but several colleagues highlighted these packages as the 'most relevant'. Incidentially,
I started by adding them either under
Taking also into account @zeileis arguments, I added only a brief metion of
The fact that ERGM is about modeling, simulation, and everything in between makes it difficult to slap a label on it or even put it on par with other approaches. But if you feel there is a satisfactory way to do so, the result would be incredibly useful for new users! Thanks everyone for the feedback and active involvement! |
For previous release's changelog see: cran-task-views/ctv#61 (comment)
Dear all, I am not entirely sure I understand the proposal. Regarding the GraphicalModels task view my approach has been very pragmatic: Package authors contact me to have their package on the task view and I usually add it. If a few packages appear in more than one task view, then I do not see that as a problem. Another topic: Perhaps it could be an idea to agree on how packages are described on the task views? I generally copy the package description unless it is too lengthy. Maybe there are other practices? Best |
Søren, thanks for your feedback. Regarding your comments:
|
Apologies for the silence; down with COVID at the moment.
@FATelarico, I don't think I have push access. I just tried it on a test branch.
In my experience, the line between graphical models and network analysis that in graphical models (and neural network models, for that matter), the graph is a prespecified component of the model specification that does not depend on the data; whereas in network analysis the graph is the object being observed and summarised or modelled. |
In response to @krivit: If you have a database / dataset and do a model search for a graphical model (as e.g. the gRim package can do) then I do believe the graph is not specified on beforehand? So this is perhaps not the best way ahead for discriminating between graphical models and network analysis. A more general comment: In graphical models (at least traditionally), focus is on some kind of (conditional) independence restriction which is a probabilistic statement. A missing edge represent a conditional independence restriction. That is the classical connection between a graph and a probabilistic model. In larger models with many variables, the graphs become less interesting as visual objects. It is hard to make sense of a graph with 1000 variables :) So in the distinction between graphical models and network analysis, one view is that it comes down to what is being analyzed? What is the key component in network analysis? Is that conditional independence? Is it another well defined mathematical / statistical concept? Perhaps, I can say it more directly: I am uncertain what network analysis really is... In response to @zeileis: You are right that small overlaps between task views are desirable but also that overlaps are unavoidable. Would it be feasible to have a package "belonging" primarily to one specific task view and then one can refer to that from any other task view? In addition to standardizing the description of packages one thing that perhaps could be nice is to be able to automatically generate an "update history" for each package just to give people an idea about how active a package is maintained. |
I am aware of this type of problem, but I didn't want to get too far into the weeds; the main distinction is that the graph is not the object of observation or analysis. I would classify this problem as a model selection problem for graphical models, rather than a network analysis problem. However, if one then, as you say, tries to understand the properties of this graph, say by visualising it or by detecting groups of variables with similar structural roles in the graph, then it it becomes a network analysis problem. The tools one would use would often be agnostic to whether the graph represents friendships between people or conditional dependence between variables. I know there are also other intermediate cases. For example, Frank and Strauss (1986) "Markov Graphs" specified a probability model for network structure by constructing a conditional dependence graph (i.e., a graphical model) for edge variables and then using Hammersley-Clifford Theorem to derive the form for the probability of a given graph under the model. This approach and its extensions were then used to infer social forces affecting the structure of the network ever since. |
I think the issue in understanding the connection (and difference) between NetworkAnalysis and GraphicalModels is that the former offers tools that are not limited to 'describe statistical models as graphs/networks'. Rather, as @krivit pointed out (rightly, in my humble opinion), network analysis allows to deal with networks representing a/some connection/s between a/some defined set/s of entities. If the entities happen to be variables and the connection between them is conditional dependence (with independence being implied by lack of ties), then you get a graphical model. Obviously, Markovian graphs lie in somewhat of a gray area, but since they are covered in the GraphicalModels CTV, we are not dealing with them.
Wish you a speedy recovery!
It should be fixed now. Let me know |
Thanks for the clarifications @FATelarico @krivit @hojsgaard, I think this is very useful and something to build upon. I suggest that you review the description of the scope of the NetworkAnalysis view to make it sharper with respect to this distinction. Also add a cross-reference to the GraphicalModels task view. When the NetworkAnalysis task view is published, its scope description should be adapted correspondingly. Similarly, the first section ("Representation, manipulation and display of graphs") should be streamlined (with yet another cross-reference) once NetworkAnalysis is available. |
Thank you for the interesting discussion. As @zeileis I think that, for readers, it is important that the distinction is clearly made at the beginning of both task views with cross references: that will help them identify which TV they have to read to answer their specific question. Also, I share the view that GraphicalModels includes packages dealing with graphs as a way to represent some kind of conditional dependency structure between variables (nodes). For me, Markovian graphs and Bayesian networks are more in this task view than in NetworkAnalysis for instance but I agree that the distinction is not easy and clear to make (at least, that is where I would search for information on this topic). However, I am under the impression that a coordination between the two TV is necessary (maybe if you could find someone to be a maintainer of both TV, that would help). Finally, just a minor comment: |
Renamed a section to Biology and (Bio)-Chemistry Networks in agreement to cran-task-views/ctv#61 (comment)
Improving the inclusion/exclusion criterion in agreement with - cran-task-views/ctv#61 (comment) - cran-task-views/ctv#61 (comment) - cran-task-views/ctv#61 (comment) - cran-task-views/ctv#61 (comment)
Thank you for your continued feedback. With the last two edits I improved on the following aspects:
Regarding coordination, I agree that it may be useful. Perhaps @hojsgaard could join our team if he feels okay with it. |
@FATelarico : Thanks! you might also want to correct "Notably, the underlying data mining approach has been used beyond biochemistry."? (I was referring to this sentence actually.) Regarding the rest, I think that it heads in the right direction. I also think that we should wait until the team of maintainers is completely set. |
…ges (rem, relevent, dnr, RSiena, lolog). references cran-task-views/ctv#61
A last minor note perhaps: I am under the impression than the way you cite function is not always the same. I saw:
Also, I'm just thinking of it now but one of the most used network tool in the field of molecular biology is Cytoscape, embedded into |
Thanks, Fabio @FATelarico, looks good to me. I think we can proceed now with CRAN publication. Do you agree Roger @rsbivand, Dirk @eddelbuettel, Julia @jpiaskowski, and Nathalie @tuxette? (Thumbs up is sufficient for endorsement.) |
Mostly OK. Ecosystems |
Fabio, great, thanks for this. We also have enough endorsements to proceed to the CRAN publication. You can transfer the repository to me and I can transfer it to the
and then
to "zeileis". |
Thank you, done. |
Great, thanks. I have now done the following:
With the official release I suggest to wait until the second week of January so that more people see it when it is being announced. In the meantime you could maybe improve the references a bit more. In the main text I omitted all |
Happy New Year! Coming back to this with fresh eyes, some thoughts and and questions:
I'll take a crack at making some of these changes in a branch. |
Also, "Model-based clustering" doesn't describe many of the blockmodelling techniques, so it needs to be moved to a more specific location. |
One more thing: if we are going to have an extensive bibliography, we should probably have some bibliography management mechanism. @zeileis , does |
An extensive bibliography is usually not intended for task views. For example, the individual packages typically have CITATIONs that include JSS and R Journal papers etc. Hence we felt that it was most important that readers of the task views find the package and then from there they can discovers further materials about the package themselves. Therefore we usually recommend to include only key references in the task view. This helps to keep the maintenance workload more manageable. |
That's a bit of a maintainability dilemma, then. @FATelarico , how did you generate these references? If we have an extensive list like this, we should add ERGM stuff as well. |
I have the references saved in Zotero and used regex to place all the Regarding the other changes, while not beung game-changing, they make sense to me. Yet, it may be tough to implement them before release. |
Here's a draft: https://github.com/cran-task-views/NetworkAnalysis/tree/section-reorg . TOC not yet updated. |
Regarding bibliography, I ended up piggybacking on R's citation printing machinery to implement a rudimentary citation engine: It's in a branch: https://github.com/cran-task-views/NetworkAnalysis/tree/process-cite . If we want author-year, it shouldn't be hard to implement. I've designed the code to minimally interfere with @FATelarico , if we use this approach, can you please add your Zotero entries to the bib file? |
I appreciate that you are both putting considerable efforts into the references for this task view. However, this increases the burden for you as maintainers as well as all future maintainers and contributors. Also you deviate from the standard workflow for task views and having both the source Rmd and target md file checked in increases the risk of the files going out of sync. My feeling is that it is not worth to do this and I would rather limit the bibliography to key references in the field. |
Seconded. For better, or worse, we do have a standard look, feel, workflow, repo layout, ... for at least the submitted task views. As this is meant to be a collection of some coherence beyond merely being R-related, it may be better if you followed the existing setup without inventing new parts. |
@zeileis , @eddelbuettel, I understand completely. Although I think that a full bibliography linking packages to papers would be beneficial (and I think I've already front-loaded the maintenance effort), I appreciate the consistency consideration. However, the bibliography in its current form is both long and incomplete. If we can't have a proper bibliography, then I think we should either cut it altogether or have strict rules about what's allowed in and what's not. A possible rule is that we don't allow primary sources, only textbooks, handbooks, and review papers. @FATelarico , thoughts? I know I am asking you to cut a lot of your work, but my bibliography engine is getting cut as well. |
Thanks @krivit for bringing this up and the editing team for the ongoing, last-minute support. Given how the conversation evolved, I would argue that as of now there are not too many references. My criterion was, rather banally, the following:
I suppose entries for the non-core packages could be removed altogether. But theBediting team did not raise any complain on the length of the bibliography nor its composition. Hence, it may very well be that we're already hit the sweet spot between completeness and manageability (or somewhere nearby). @zeileis I will clean the |
A couple of comments:
|
As @zeileis points out, if we don't go all in on the citations (which it doesn't sound like we are, much as I may wish it were otherwise), the place for this information is in the package description and documentation. We can describe what the package tries to accomplish in the CTV, and let the package itself speak to the method. By the way, have you had a look at my reorg branch? If there are no objections, I'll update the TOC and merge it in. |
I have no objection to the re-structuring. I intended to edit the version in you branch by removing the |
I'd rather we merge first. I need to make some miscellaneous edits as well, so let's minimise the risk of edit conflicts. Is the TOC made manually, or is there some machinery for regenerating it? |
Here's the tool for the ToC: https://bitdowntoc.derlin.ch/ |
I'm also happy for this to be merged as it is currently, and then we can concentrate on maintenance issues and other improvements moving forward. |
Scope
The proposed CRAN Task View contains a list of packages that can be used for dealing with networks (also known as relational data and graphs).
Packages
Core packages include:
intergraph
igraph
statnet
sna
network
The other packages:
graph
BoolNet
egor
ionet
networkDynamic
tidygraph
centiserve
birankr
goldfish
amen
ergm
ergm.count
ergm.ego
ergm.multi
ergm.rank
ergmgp
ergmito
biergm
dnr
bootnet
localboot
dyads
fastnet
multinets
nda
baycn
BayesianNetwork
implements
bgms
bnma
econetwork
AnimalHabitatNetwork
aniSNA
assocInd
ATNr
BIEN
bipartite
cassandRa
bibliometrix
bibliometrixData
biblionetwork
Diderot
c3net
Ac3net
ahnr
BASiNET
bionetdata
Cascade
evolqg
NetworkToolbox
qgraph
HospitalNetwork
geonetwork
chessboard
epanet2toolkit
intensitynet
epinet
hybridModels
netdiffuseR
FinNet
ITNr
modnets
multinet
visNetwork
networkD3
bipartiteD3
diagram
ndtv
neatmaps
ggnetwork
ggraph
ggsom
graphlayouts
cencrne
linkcomm
concoR
blockmodeling
BlockmodelingGUI
kmBlock
dBlockmodeling
signnet
blockmodels
sbm
dynsbm
MLVSBM
StochBlock
GREMLINS
Overlap
The only TaskView that could overlap with thematic ones (e.g.,
epanet2toolkit
is also in the hidrology TaskView), but this is ineherent ot a method-oriented CTV as sopposed to a substantive one.In general, there does not appear to be substantial overlap with existing CRAN task views.
Maintainers
The text was updated successfully, but these errors were encountered: