Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAN Task View: NetworkAnalysis #61

Open
FATelarico opened this issue Apr 14, 2024 · 63 comments
Open

CRAN Task View: NetworkAnalysis #61

FATelarico opened this issue Apr 14, 2024 · 63 comments

Comments

@FATelarico
Copy link

FATelarico commented Apr 14, 2024

Scope

The proposed CRAN Task View contains a list of packages that can be used for dealing with networks (also known as relational data and graphs).

Packages

Core packages include:

intergraph
igraph
statnet
sna
network

The other packages:

graph
BoolNet
egor
ionet
networkDynamic
tidygraph
centiserve
birankr
goldfish
amen
ergm
ergm.count
ergm.ego
ergm.multi
ergm.rank
ergmgp
ergmito
biergm
dnr
bootnet
localboot
dyads
fastnet
multinets
nda
baycn
BayesianNetwork
implements
bgms
bnma
econetwork
AnimalHabitatNetwork
aniSNA
assocInd
ATNr
BIEN
bipartite
cassandRa
bibliometrix
bibliometrixData
biblionetwork
Diderot
c3net
Ac3net
ahnr
BASiNET
bionetdata
Cascade
evolqg
NetworkToolbox
qgraph
HospitalNetwork
geonetwork
chessboard
epanet2toolkit
intensitynet
epinet
hybridModels
netdiffuseR
FinNet
ITNr
modnets
multinet
visNetwork
networkD3
bipartiteD3
diagram
ndtv
neatmaps
ggnetwork
ggraph
ggsom
graphlayouts
cencrne
linkcomm
concoR
blockmodeling
BlockmodelingGUI
kmBlock
dBlockmodeling
signnet
blockmodels
sbm
dynsbm
MLVSBM
StochBlock
GREMLINS

Overlap

The only TaskView that could overlap with thematic ones (e.g., epanet2toolkit is also in the hidrology TaskView), but this is ineherent ot a method-oriented CTV as sopposed to a substantive one.

In general, there does not appear to be substantial overlap with existing CRAN task views.

Maintainers

  • Main maintainer: Fabio Ashtar Telarico (@FATelarico)
  • Co-maintainers:...
@zeileis
Copy link
Contributor

zeileis commented Apr 14, 2024

Thanks for the proposal Fabio @FATelarico & Co! I appreciate that you have compiled this list of packages along with the corresponding description. This is a useful overview of a collection of packages on block modeling. However, I think that this is too specialized and not substantial enough for a standalone task view.

I'm also cc'ing Bettina @bettinagruen as the principal maintainer of the "Cluster" task view to see what she thinks and whether this could become a section in the "Cluster" task view.

However, overall my feeling is that this would better fit a "Network Analysis" task view which we currently do not have. Also I'm cc'ing Søren @hojsgaard in case he has any thoughts or recommendations.

@bettinagruen
Copy link

I agree that this would be best suited for a Network Analysis task view.

The Cluster task view contains already a number of sections. One could thus rather easily add a section on block modeling. However, the description of the packages would need to be more detailed covering each package separately to be in line with how other packages are described in the task view. Presumably none of the packages would then also qualify as core for this more general task view.

@FATelarico FATelarico changed the title CRAN Task View: Blockmodeling CRAN Task View: NetworkAnalysis Apr 15, 2024
@FATelarico
Copy link
Author

Following on your suggestions the porposed task view was updated to address the entire field of network analysis

@tuxette
Copy link
Contributor

tuxette commented Apr 15, 2024

Thank you @FATelarico ! I checked your new proposal and to me, the core packages are well identified and relevant. However,

  • The two sections that describe the different functions of some of the packages is not (because task view are mainly made to describe the packages and not their features / functions in such a precise way: the user guide is made for this).
  • Overall, the current proposal is organized by purpose with many redundancy among packages in the different topics (because you describe the functions more than the packages): this would also be good to limite this redundancy.
  • Sometimes you forgot to use the macro r pkg( to cite a package and sometimes the references do not seem to be cited in the Reference section.
  • I think that the package blockmodels could be added to the Block model section.
  • Similarly to Bayesian network inference, other packages perform network inference with other type of models, like huge or glasso (among many) for Gaussian Graphical Models and GENIE3 (Bioconductor) for inference with RF.
  • Some regression methods also use networks as input and could be added to the TV as well (e.g., genlasso).

@zeileis
Copy link
Contributor

zeileis commented Apr 15, 2024

Fabio, thanks for all the work and the quick revision! This is a useful start for a network analysis task view. In addition to Nathalie's comments a few additional thoughts:

  • The co-maintainers are still the ones from your blockmodeling proposal but it would be good to bring in a couple of persons with more expertise in statnet/sna/ergm.
  • The inclusion/exclusion criteria should be worked out better.
  • The overlap with "Graphical Models" (maintained by Søren @hojsgaard) needs to be addresses in the inclusion/exclusion criteria. Especially with regard to graph/network infrastructure (basic computations, manipulations, visualizations) and with regard to Bayesian networks.

@FATelarico
Copy link
Author

Thanks to the editors for their comments.

@tuxette

  • I am aware that usually functions are not described in task views and those sections have now been removed. I had originally thought they could be useful because many people who begin doing network analysis are often perplexed about whether to use igraph or statnet/sna and only realise they picked up the wrong package for their needs after they are already shoulders deep in it.
  • All mentioned packages should be tagged with the correct macro now (the problems were mostly in the last section Clustering-Others);
  • After double-checking, ] blockmodels seems to be already included;
  • As the inclusion/exclusion criteria was refined (see below), this set of packages became less relevant as they would better fit in the GraphicalModels CTV;
  • Similarly, packages offering methods to run regression over graphs representing variables may rather belong to the GraphicalModels CTV than here.

@zeileis

  • Emails were sent out to other possible co-maintainers from the ergm development team
  • The inclusion/exclusion criterion was refined
  • The distinction between network analysis and graphical modeling was introduced, references to the that CTV are provided in the section on Network Modeling as well as in the introduction.

@tuxette
Copy link
Contributor

tuxette commented Apr 18, 2024

Thanks! I liked the new version. However, I am not convinced by your section on "Bio-Chemical Networks": In your answer above, you explain that network inference is out-of-the-scope of your TV (and I could agree with that) but you are citing three packages for network inference that are far from being the most known and used (citing c3net and not WGCNA, among others, seem to a highly biased choice). Also, I don't get the relation between the TV topic and evolqg.

A few additional minor remarks:

  • In Section "Bio-Chemical Networks", you have an empty bullet point line.
  • The titles of your sections are not all capitalized similarly.
  • In Section "Bio-Chemical Networks", the reference "Simoes and Emmert-Streib" is not formatted properly.
  • You have a typo in the title with the word "Psychology" in it.
  • In Section "Social and Economic networks", sna is not cited with the proper macro.
  • I think that "Extension for ggplot2" should be "Extensions for ggplot2".
  • The package https://cran.r-project.org/web/packages/greed/index.html could also be worth citing.

@zeileis
Copy link
Contributor

zeileis commented Apr 18, 2024

I agree that there is good progress here. However, I feel that the inclusion/exclusion criteria are not as clear yet as they should be. Getting contributions/feedback from someone with more ergm expertise would probably be good. And the separation with graphical models has also some room for improvement.

Hence, I'm pinging Søren @hojsgaard again: Could you please have a look at the proposal?

And I suggest that we wait until you have feedback/suggestions from two more potential co-maintainers who can increase the diversity among the maintainer team.

@krivit
Copy link

krivit commented Apr 23, 2024

Pavel Krivitsky from Statnet here. Thank you, @FATelarico, for inviting me. I'll read through the discussion and the draft in detail later, but I want to flag a few items as a matter of first impression.

  1. statnet is a metapackage: all it does is pull in the most popular packages from the project. The actual functionality is in its reverse-dependencies. It's been a while since I've looked at what's in igraph, but loosely, igraphnetwork + sna.
  2. There is a number of dynamic network packages that aren't listed (tergm, relevent, btergm, tsna, just off the top of my head). We may want a dynamic network section.
  3. There is the EpiModel suite of packages that builds on Statnet's for epidemic modelling.
  4. It may make sense to split packages based on the kinds of questions they answer. E.g., clustering tells you which nodes belong to each group, whereas ERGMs tell you about the "big picture" social forces.

@zeileis
Copy link
Contributor

zeileis commented Apr 24, 2024

Pavel @krivit, thank you for your inputs, this is very much appreciated! I think it would be great if you could change the team of co-maintainers of the task view, in order to bring a new perspective to the team based on your expertise.

For a short introduction to the idea of CRAN task views and the corresponding file format, see Documentation.md. If you are interested in more details and some background information, see doi:10.48550/arXiv.2305.17573.

Some quick feedback regarding the points you raised:

  1. For the task view it would be good to list both statnet and the constituting packages and briefly explain what they do. Similarly, both igraph and network + sna should be listed and explained. The main purpose of task views is to provide an overview - and not to endorse/recommend the best packages for a given task.
  2. Sounds like a good idea to me.
  3. EpiModel is listed in the Epidemiology task view. So for the topic of "disease networks" etc. I would simply link to that task view.
  4. Sounds like a good idea to me.

Thanks & best wishes!

@krivit
Copy link

krivit commented Apr 24, 2024

Pavel @krivit, thank you for your inputs, this is very much appreciated! I think it would be great if you could change the team of co-maintainers of the task view, in order to bring a new perspective to the team based on your expertise.

@FATelarico , if I want to make edits, should I use PRs or push to the repository directly?

For a short introduction to the idea of CRAN task views and the corresponding file format, see Documentation.md. If you are interested in more details and some background information, see doi:10.48550/arXiv.2305.17573.

Thanks!

Some quick feedback regarding the points you raised:

1. For the task view it would be good to list both `statnet` and the constituting packages and briefly explain what they do. Similarly, both `igraph` and `network` + `sna` should be listed and explained. The main purpose of task views is to provide an overview - and not to endorse/recommend the best packages for a given task.

This is more about functionality rather than endorsement. The short of it is that network contains tools for managing the data structure, and sna contains EDA tools for networks, which can use both network objects and edgelists, as well as some inferential tools (e.g., QAP and MRQAP). igraph, from what I understand, contains both the data structure management tools and the EDA tools.

3. `EpiModel` is listed in the [Epidemiology](https://CRAN.R-project.org/view=Epidemiology) task view. So for the topic of "disease networks" etc. I would simply link to that task view.

I don't think there is any harm in doing both.

4. Sounds like a good idea to me.

A good phrasing might be challenging to come up with, but I suppose we can play around with it and see what happens.

@zeileis
Copy link
Contributor

zeileis commented Apr 24, 2024

Re: EpiModel. We try to avoid overlap, if possible, in order to keep the task views more focused and more manageable (both for readers and for maintainers).

In this case, my feeling is that the scope of the package belongs rather clearly to "Epidemiology" and thus I would avoid the duplication. Feel free to iterate, if I'm missing something here (e.g., if EpiModel contains algorithms that will often be used in other network analyses, beyond infectious disease modeling).

If you feel that the "Epidemiology" task view should have a dedicated section on disease networks, I would encourage you to raise this with the Epidemiology task view maintainers.

@FATelarico
Copy link
Author

FATelarico commented Apr 26, 2024

@FATelarico , if I want to make edits, should I use PRs or push to the repository directly?

@krivit pushing to the main branch is okay, I have a local copy indexed by version being download after every commit.

This is more about functionality rather than endorsement. The short of it is that network contains tools for managing the data structure, and sna contains EDA tools for networks, which can use both network objects and edgelists, as well as some inferential tools (e.g., QAP and MRQAP). igraph, from what I understand, contains both the data structure management tools and the EDA tools.

I agree. As mentioned in the currnt draft, igraph is more of a one-stop shop for data-managing tasks, basic modeling, and clustering. It provides more or less the same data-centered features as network plus some of sna's inferential tools. But many people do not actually need most of what sna has to offer and igraph has so many specialised add-ons/reverse-dependencies that many people prefer/have to use that. Any suggestion on how to elucidate this point further in the text will be welcome!

@FATelarico
Copy link
Author

FATelarico commented Apr 26, 2024

A new draft is online, I apologise for the delay.


@tuxette

citing c3net and not WGCNA, among others, seem to a highly biased choice

The choice was quite arbitrary because none of us is directly involved in this field, but several colleagues highlighted these packages as the 'most relevant'. Incidentially, evolqg should not have been included, as pointed out. After some reading in specialised journals, I edited the list of packages in this section. Namely, besides removing a few packages, BioNAR and WGCNA were added.

A few additional minor remarks:

  • In Section "Bio-Chemical Networks", you have an empty bullet point line.
  • The titles of your sections are not all capitalized similarly.
  • In Section "Bio-Chemical Networks", the reference "Simoes and Emmert-Streib" is not formatted properly.
  • You have a typo in the title with the word "Psychology" in it.
  • In Section "Social and Economic networks", sna is not cited with the proper macro.
  • I think that "Extension for ggplot2" should be "Extensions for ggplot2".
  • The package https://cran.r-project.org/web/packages/greed/index.html could also be worth citing.

@krivit

There is a number of dynamic network packages that aren't listed (tergm, relevent, btergm, tsna, just off the top of my head). We may want a dynamic network section.

I started by adding them either under ergm or in the most relevant sections. Feel free to move these and other dynamic-network packages to a separate section if you think there is enough material for it.

There is the EpiModel suite of packages that builds on Statnet's for epidemic modelling.

Taking also into account @zeileis arguments, I added only a brief metion of EpiModel (because it is officially part of statnet) and linked the relevant CTV.

It may make sense to split packages based on the kinds of questions they answer. E.g., clustering tells you which nodes belong to each group, whereas ERGMs tell you about the "big picture" social forces.

The fact that ERGM is about modeling, simulation, and everything in between makes it difficult to slap a label on it or even put it on par with other approaches. But if you feel there is a satisfactory way to do so, the result would be incredibly useful for new users!


Thanks everyone for the feedback and active involvement!

FATelarico added a commit to FATelarico/ctv-network-submitted that referenced this issue Apr 26, 2024
FATelarico added a commit to FATelarico/ctv-network-submitted that referenced this issue Apr 26, 2024
For previous release's changelog see: cran-task-views/ctv#61 (comment)
@hojsgaard
Copy link

Dear all,

I am not entirely sure I understand the proposal.

Regarding the GraphicalModels task view my approach has been very pragmatic: Package authors contact me to have their package on the task view and I usually add it. If a few packages appear in more than one task view, then I do not see that as a problem.

Another topic: Perhaps it could be an idea to agree on how packages are described on the task views? I generally copy the package description unless it is too lengthy. Maybe there are other practices?

Best
Søren

@zeileis
Copy link
Contributor

zeileis commented Apr 29, 2024

Søren, thanks for your feedback. Regarding your comments:

  • Connection between NetworkAnalysis and GraphicalModels: Both the GraphicalModels task view, maintained by you, and the newly proposed NetworkAnalysis task views describe models that can be represented by graphs/networks. Hence, the question is whether both task views have a clear profile, have enough value added, and can be cross-referenced where appropriate. Do you think this is the case here? Do you have any recommendations for how to deal with it?
  • Overlap in general: With increasing number of task views and increasing number of packages per task view, it becomes more important that task views have a sharp profile so that it is clear what should go in and what should stay out. While it is not necessary or desirable to avoid overlap completely, we should still try to not have too much overlap. First, less overlap means less duplication of efforts for the maintainers. Second, less overlap but with cross-references between task views means that users will ideally be pointed to one place with useful documentation for them.
  • Package descriptions: I agree that the package title/description is a useful starting point. However, you can probably often improve the description within the task view if you embed it into the context of the appropriate section. In any case, there probably is no "one size fits all" approach here which is why we put this at the maintainers' discretion.

@krivit
Copy link

krivit commented Apr 30, 2024

Apologies for the silence; down with COVID at the moment.

@krivit pushing to the main branch is okay, I have a local copy indexed by version being download after every commit.

@FATelarico, I don't think I have push access. I just tried it on a test branch.

Connection between NetworkAnalysis and GraphicalModels: Both the GraphicalModels task view, maintained by you, and the newly proposed NetworkAnalysis task views describe models that can be represented by graphs/networks. Hence, the question is whether both task views have a clear profile, have enough value added, and can be cross-referenced where appropriate. Do you think this is the case here? Do you have any recommendations for how to deal with it?

In my experience, the line between graphical models and network analysis that in graphical models (and neural network models, for that matter), the graph is a prespecified component of the model specification that does not depend on the data; whereas in network analysis the graph is the object being observed and summarised or modelled.

@hojsgaard
Copy link

In response to @krivit:

If you have a database / dataset and do a model search for a graphical model (as e.g. the gRim package can do) then I do believe the graph is not specified on beforehand? So this is perhaps not the best way ahead for discriminating between graphical models and network analysis.

A more general comment: In graphical models (at least traditionally), focus is on some kind of (conditional) independence restriction which is a probabilistic statement. A missing edge represent a conditional independence restriction. That is the classical connection between a graph and a probabilistic model. In larger models with many variables, the graphs become less interesting as visual objects. It is hard to make sense of a graph with 1000 variables :)

So in the distinction between graphical models and network analysis, one view is that it comes down to what is being analyzed? What is the key component in network analysis? Is that conditional independence? Is it another well defined mathematical / statistical concept? Perhaps, I can say it more directly: I am uncertain what network analysis really is...

In response to @zeileis:

You are right that small overlaps between task views are desirable but also that overlaps are unavoidable. Would it be feasible to have a package "belonging" primarily to one specific task view and then one can refer to that from any other task view?

In addition to standardizing the description of packages one thing that perhaps could be nice is to be able to automatically generate an "update history" for each package just to give people an idea about how active a package is maintained.

@krivit
Copy link

krivit commented Apr 30, 2024

@hojsgaard

If you have a database / dataset and do a model search for a graphical model (as e.g. the gRim package can do) then I do believe the graph is not specified on beforehand? So this is perhaps not the best way ahead for discriminating between graphical models and network analysis.

I am aware of this type of problem, but I didn't want to get too far into the weeds; the main distinction is that the graph is not the object of observation or analysis. I would classify this problem as a model selection problem for graphical models, rather than a network analysis problem.

However, if one then, as you say, tries to understand the properties of this graph, say by visualising it or by detecting groups of variables with similar structural roles in the graph, then it it becomes a network analysis problem. The tools one would use would often be agnostic to whether the graph represents friendships between people or conditional dependence between variables.

I know there are also other intermediate cases. For example, Frank and Strauss (1986) "Markov Graphs" specified a probability model for network structure by constructing a conditional dependence graph (i.e., a graphical model) for edge variables and then using Hammersley-Clifford Theorem to derive the form for the probability of a given graph under the model. This approach and its extensions were then used to infer social forces affecting the structure of the network ever since.

@FATelarico
Copy link
Author

@zeileis: Connection between NetworkAnalysis and GraphicalModels: Both the GraphicalModels task view, maintained by you, and the newly proposed NetworkAnalysis task views describe models that can be represented by graphs/networks. Hence, the question is whether both task views have a clear profile, have enough value added, and can be cross-referenced where appropriate. Do you think this is the case here? Do you have any recommendations for how to deal with it?

I think the issue in understanding the connection (and difference) between NetworkAnalysis and GraphicalModels is that the former offers tools that are not limited to 'describe statistical models as graphs/networks'. Rather, as @krivit pointed out (rightly, in my humble opinion), network analysis allows to deal with networks representing a/some connection/s between a/some defined set/s of entities. If the entities happen to be variables and the connection between them is conditional dependence (with independence being implied by lack of ties), then you get a graphical model. Obviously, Markovian graphs lie in somewhat of a gray area, but since they are covered in the GraphicalModels CTV, we are not dealing with them.


@krivit : Apologies for the silence; down with COVID at the moment.

Wish you a speedy recovery!

@krivit : I don't think I have push access

It should be fixed now. Let me know

@zeileis
Copy link
Contributor

zeileis commented May 1, 2024

Thanks for the clarifications @FATelarico @krivit @hojsgaard, I think this is very useful and something to build upon.

I suggest that you review the description of the scope of the NetworkAnalysis view to make it sharper with respect to this distinction. Also add a cross-reference to the GraphicalModels task view.

When the NetworkAnalysis task view is published, its scope description should be adapted correspondingly. Similarly, the first section ("Representation, manipulation and display of graphs") should be streamlined (with yet another cross-reference) once NetworkAnalysis is available.

@tuxette
Copy link
Contributor

tuxette commented May 4, 2024

Thank you for the interesting discussion. As @zeileis I think that, for readers, it is important that the distinction is clearly made at the beginning of both task views with cross references: that will help them identify which TV they have to read to answer their specific question.

Also, I share the view that GraphicalModels includes packages dealing with graphs as a way to represent some kind of conditional dependency structure between variables (nodes). For me, Markovian graphs and Bayesian networks are more in this task view than in NetworkAnalysis for instance but I agree that the distinction is not easy and clear to make (at least, that is where I would search for information on this topic). However, I am under the impression that a coordination between the two TV is necessary (maybe if you could find someone to be a maintainer of both TV, that would help).

Finally, just a minor comment: WGCNA deals with gene networks (co-expression networks actually), which is not really "biochemistry" (pure biology instead, even though, I agree that, in the end, everything is mostly chemistry, that is not how most people would think of it).

FATelarico added a commit to FATelarico/ctv-network-submitted that referenced this issue May 8, 2024
Renamed a section to Biology and (Bio)-Chemistry Networks in agreement to cran-task-views/ctv#61 (comment)
FATelarico added a commit to FATelarico/ctv-network-submitted that referenced this issue May 8, 2024
@FATelarico
Copy link
Author

@zeileis @tuxette
I suggest that you review the description of the scope of the NetworkAnalysis view to make it sharper with respect to this distinction. Also add a cross-reference to the GraphicalModels task view. I think that, for readers, it is important that the distinction is clearly made at the beginning of both task views with cross references: that will help them identify which TV they have to read to answer their specific question.

Thank you for your continued feedback. With the last two edits I improved on the following aspects:

Regarding coordination, I agree that it may be useful. Perhaps @hojsgaard could join our team if he feels okay with it.

@tuxette
Copy link
Contributor

tuxette commented May 13, 2024

@FATelarico : Thanks! you might also want to correct "Notably, the underlying data mining approach has been used beyond biochemistry."? (I was referring to this sentence actually.)

Regarding the rest, I think that it heads in the right direction. I also think that we should wait until the team of maintainers is completely set.

krivit added a commit to FATelarico/ctv-network-submitted that referenced this issue May 14, 2024
@tuxette
Copy link
Contributor

tuxette commented Dec 8, 2024

A last minor note perhaps: I am under the impression than the way you cite function is not always the same. I saw:

  • pkg::function (the most common)
  • pkg::function()
  • function (in the General Section at the beginning)

Also, I'm just thinking of it now but one of the most used network tool in the field of molecular biology is Cytoscape, embedded into r bioc("Rcy3"). It might be worth citing it.

FATelarico added a commit to FATelarico/ctv-network-submitted that referenced this issue Dec 16, 2024
@FATelarico
Copy link
Author

Thank you @tuxette and @zeileis.

With commit 61fef42, I took care of including you comments and corrections.

In detail... - Used `r doi` for the two references pointing to arXiv. - Homogenised the references to functions within packages as `pkg::foo`. - Added a mention of `r bioc("Rcy3")`

@zeileis
Copy link
Contributor

zeileis commented Dec 16, 2024

Thanks, Fabio @FATelarico, looks good to me. I think we can proceed now with CRAN publication. Do you agree Roger @rsbivand, Dirk @eddelbuettel, Julia @jpiaskowski, and Nathalie @tuxette? (Thumbs up is sufficient for endorsement.)

@rsbivand
Copy link
Contributor

Mostly OK. Ecosystems manynet Comprehensiveness is missing a leading space, I think, so uplisting the point. Spatial networks: maybe add sfnetworks building on tidygraph? Same section: double back tick at end of geonetwork entry.

FATelarico added a commit to FATelarico/ctv-network-submitted that referenced this issue Dec 17, 2024
@FATelarico
Copy link
Author

FATelarico commented Dec 17, 2024

Thank you @zeileis for the quick turnaround and @rsbivand for you comments.

All the edits were implemented with commit 1ba845d

@zeileis
Copy link
Contributor

zeileis commented Dec 19, 2024

Fabio, great, thanks for this. We also have enough endorsements to proceed to the CRAN publication. You can transfer the repository to me and I can transfer it to the cran-task-views org and then release it. For the transfer go to

Settings > Collaborators > Public Repository > Manage

and then

Danger Zone > Transfer ownership

to "zeileis".

@FATelarico
Copy link
Author

Fabio, great, thanks for this. We also have enough endorsements to proceed to the CRAN publication. You can transfer the repository to me and I can transfer it to the cran-task-views org and then release it. For the transfer go to

Settings > Collaborators > Public Repository > Manage

and then

Danger Zone > Transfer ownership

to "zeileis".

Thank you, done.

@zeileis
Copy link
Contributor

zeileis commented Dec 30, 2024

Great, thanks. I have now done the following:

With the official release I suggest to wait until the second week of January so that more people see it when it is being announced.

In the meantime you could maybe improve the references a bit more. In the main text I omitted all doi(...) code chunks which are already provided in the references at the end. However, there are still a few references with doi(...) in the main text that do not appear in the listing at the end. I suggest to add these.

@krivit
Copy link

krivit commented Jan 5, 2025

Happy New Year!

Coming back to this with fresh eyes, some thoughts and and questions:

  • I think we should rename "Network Construction" to "Relational Data Management"; this is what those packages are about.
  • We've talked about the category hierarchy, and given what we've learned, I think we might want to revisit this. In particular, I think we should reorganise as follows:
    • "Ecosystems and data" (?)
      • "Ecosystems"
      • "Data management" (formerly "Network construction")
    • "Exploratory data analysis" (new)
      • "Visualisation"
      • "Centrality"
    • "Group detection" (new)
      • "Community detection"
      • "Stochastic blockmodels"
      • "Generalised blockmodelling"
      • "Other"
  • The modelling section might need to be cleaned up a bit.
  • Do we have criteria for inclusion in references? For example, JoSS papers for R packages?

I'll take a crack at making some of these changes in a branch.

@krivit
Copy link

krivit commented Jan 5, 2025

Also, "Model-based clustering" doesn't describe many of the blockmodelling techniques, so it needs to be moved to a more specific location.

@krivit
Copy link

krivit commented Jan 5, 2025

One more thing: if we are going to have an extensive bibliography, we should probably have some bibliography management mechanism. @zeileis , does ctv support Rmarkdown-style @key citations?

@zeileis
Copy link
Contributor

zeileis commented Jan 5, 2025

An extensive bibliography is usually not intended for task views. For example, the individual packages typically have CITATIONs that include JSS and R Journal papers etc. Hence we felt that it was most important that readers of the task views find the package and then from there they can discovers further materials about the package themselves.

Therefore we usually recommend to include only key references in the task view. This helps to keep the maintenance workload more manageable.

@krivit
Copy link

krivit commented Jan 6, 2025

That's a bit of a maintainability dilemma, then. @FATelarico , how did you generate these references? If we have an extensive list like this, we should add ERGM stuff as well.

@FATelarico
Copy link
Author

That's a bit of a maintainability dilemma, then. @FATelarico , how did you generate these references? If we have an extensive list like this, we should add ERGM stuff as well.

I have the references saved in Zotero and used regex to place all the doi(), etc. There aren't that many entrie as of now. And, considering that most ctvs have short bibliographies as well, I felt that it will be relatively easy to manage them manually.

Regarding the other changes, while not beung game-changing, they make sense to me. Yet, it may be tough to implement them before release.

@krivit
Copy link

krivit commented Jan 6, 2025

Regarding the other changes, while not beung game-changing, they make sense to me. Yet, it may be tough to implement them before release.

Here's a draft: https://github.com/cran-task-views/NetworkAnalysis/tree/section-reorg . TOC not yet updated.

@krivit
Copy link

krivit commented Jan 7, 2025

Regarding bibliography, I ended up piggybacking on R's citation printing machinery to implement a rudimentary citation engine: `r cite(KEY)` will add a numeric citation, which links to the entry in the bibliography that it creates. The entries are in the order of first citation.

It's in a branch: https://github.com/cran-task-views/NetworkAnalysis/tree/process-cite .

If we want author-year, it shouldn't be hard to implement.

I've designed the code to minimally interfere with ctv: it's in an Rmd file that compiles to an md file with citations, passing ctv-specific functions such as doi() through.

@FATelarico , if we use this approach, can you please add your Zotero entries to the bib file?

@zeileis
Copy link
Contributor

zeileis commented Jan 7, 2025

I appreciate that you are both putting considerable efforts into the references for this task view. However, this increases the burden for you as maintainers as well as all future maintainers and contributors. Also you deviate from the standard workflow for task views and having both the source Rmd and target md file checked in increases the risk of the files going out of sync. My feeling is that it is not worth to do this and I would rather limit the bibliography to key references in the field.

@eddelbuettel
Copy link
Contributor

Seconded. For better, or worse, we do have a standard look, feel, workflow, repo layout, ... for at least the submitted task views. As this is meant to be a collection of some coherence beyond merely being R-related, it may be better if you followed the existing setup without inventing new parts.

@krivit
Copy link

krivit commented Jan 7, 2025

@zeileis , @eddelbuettel, I understand completely. Although I think that a full bibliography linking packages to papers would be beneficial (and I think I've already front-loaded the maintenance effort), I appreciate the consistency consideration.

However, the bibliography in its current form is both long and incomplete. If we can't have a proper bibliography, then I think we should either cut it altogether or have strict rules about what's allowed in and what's not. A possible rule is that we don't allow primary sources, only textbooks, handbooks, and review papers.

@FATelarico , thoughts? I know I am asking you to cut a lot of your work, but my bibliography engine is getting cut as well.

@FATelarico
Copy link
Author

FATelarico commented Jan 7, 2025

Thanks @krivit for bringing this up and the editing team for the ongoing, last-minute support. Given how the conversation evolved, I would argue that as of now there are not too many references.

My criterion was, rather banally, the following:

a citation is added if all or almost all the package does is implementing a method described in a single academic work that cannot be introduced satisfactorily and concisely in the ctv.

I suppose entries for the non-core packages could be removed altogether. But theBediting team did not raise any complain on the length of the bibliography nor its composition. Hence, it may very well be that we're already hit the sweet spot between completeness and manageability (or somewhere nearby).

@zeileis I will clean the doi() calls and link your previous comments in the commit for reference.

@zeileis
Copy link
Contributor

zeileis commented Jan 7, 2025

A couple of comments:

  • Linking packages to papers: This is definitely very useful but I think that task views are not the right place for this. The DESCRIPTION and CITATION of a package would be the canonical place - along with the documentation (Rd files and vignettes etc.). These are both prominently linked from a package's web page so it is typically sufficient to link from the task view to the package page without duplicating the references.
  • Criterion for inclusion: References that help to set the scope of the task view (e.g., prominent textbooks, survey papers, etc.) and/or overview publications describing groups/families of packages (e.g., books on "XY in R", tutorial papers, etc.).
  • Too many references: I agree with Pavel that the current list is both long and incomplete. And usually such lists become even more incomplete over time. That's why I also think it would benefit from streamlining it to the key references.

@krivit
Copy link

krivit commented Jan 7, 2025

@FATelarico,

a citation is added if all or almost all the package does is implementing a method described in a single academic work that cannot be introduced satisfactorily and concisely in the ctv.

As @zeileis points out, if we don't go all in on the citations (which it doesn't sound like we are, much as I may wish it were otherwise), the place for this information is in the package description and documentation. We can describe what the package tries to accomplish in the CTV, and let the package itself speak to the method.

By the way, have you had a look at my reorg branch? If there are no objections, I'll update the TOC and merge it in.

@FATelarico
Copy link
Author

FATelarico commented Jan 7, 2025

@FATelarico,

a citation is added if all or almost all the package does is implementing a method described in a single academic work that cannot be introduced satisfactorily and concisely in the ctv.

As @zeileis points out, if we don't go all in on the citations (which it doesn't sound like we are, much as I may wish it were otherwise), the place for this information is in the package description and documentation. We can describe what the package tries to accomplish in the CTV, and let the package itself speak to the method.

By the way, have you had a look at my reorg branch? If there are no objections, I'll update the TOC and merge it in.

I have no objection to the re-structuring. I intended to edit the version in you branch by removing the doi() calls and reduce the number of references before merging the edits in the main branch 😃

@krivit
Copy link

krivit commented Jan 7, 2025

I have no objection to the re-structuring. I intended to edit the version in you branch by removing the doi() calls and reduce the number of references before merging the edits in the main branch 😃

I'd rather we merge first. I need to make some miscellaneous edits as well, so let's minimise the risk of edit conflicts. Is the TOC made manually, or is there some machinery for regenerating it?

@FATelarico
Copy link
Author

FATelarico commented Jan 7, 2025

I'd rather we merge first. I need to make some miscellaneous edits as well, so let's minimise the risk of edit conflicts. Is the TOC made manually, or is there some machinery for regenerating it?

Here's the tool for the ToC: https://bitdowntoc.derlin.ch/

@jhollway
Copy link

I'd rather we merge first. I need to make some miscellaneous edits as well, so let's minimise the risk of edit conflicts. Is the TOC made manually, or is there some machinery for regenerating it?

I'm also happy for this to be merged as it is currently, and then we can concentrate on maintenance issues and other improvements moving forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants