diff --git a/docs/bibliography.rst b/docs/bibliography.rst index 87df6b5..81443bd 100644 --- a/docs/bibliography.rst +++ b/docs/bibliography.rst @@ -2,7 +2,7 @@ Bibliography ************ -``CDlib`` was developed for research purposes. Here you can find the complete list of papers that contributed to the algorithms and methods it exposes. +``CDlib`` was developed for research purposes. Here you can find a(n almost) complete list of papers that contributed to the algorithms and methods it exposes. ---------- Algorithms @@ -107,14 +107,101 @@ Evaluation measures Researches using CDlib ---------------------- -So far it has been used to support the following research activities: +So far CDlib has been referenced in the following research works: -- Hubert, M. Master Thesis. (2020) `Crawling and Analysing code review networks on industry and open source data `_ -- Pister, A., Buono, P., Fekete, J. D., Plaisant, C., & Valdivia, P. (2020). `Integrating Prior Knowledge in Mixed Initiative Social Network Clustering `_. arXiv preprint arXiv:2005.02972. -- Mohammadmosaferi, K. K., & Naderi, H. (2020). `Evolution of communities in dynamic social networks: An efficient map-based approach. `_ Expert Systems with Applications, 147, 113221. -- Cazabet, Remy, Souaad Boudebza, and Giulio Rossetti. "Evaluating community detection algorithms for progressively evolving graphs." arXiv preprint arXiv:2007.08635 (2020). -- Citraro, Salvatore, and Giulio Rossetti. "Identifying and exploiting homogeneous communities in labeled networks." Applied Network Science 5.1 (2020): 1-20. -- Citraro, Salvatore, and Giulio Rossetti. "Eva: Attribute-Aware Network Segmentation." International Conference on Complex Networks and Their Applications. Springer, Cham, 2019. -- Rossetti, Giulio. "ANGEL: efficient, and effective, node-centric community discovery in static and dynamic networks." Applied Network Science 5.1 (2020): 1-23. -- Jaiswal, Rajesh, and Sheela Ramanna. "Detecting Overlapping Communities Using Distributed Neighbourhood Threshold in Social Networks." International Joint Conference on Rough Sets. Springer, Cham, 2020. -- Rossetti, Giulio. "Exorcising the Demon: Angel, Efficient Node-Centric Community Discovery." International Conference on Complex Networks and Their Applications. Springer, Cham, 2019. +- Rezaei, M., Faramarzpour, M., Shobeiri, P., Seyedmirzaei, H., Sarasyabi, M. S., & Dabiri, S. (2023). A systematic review, meta-analysis, and network analysis of diagnostic microRNAs in glaucoma. European Journal of Medical Research, 28(1), 137. +- Bharadwaj, A. G., & Starly, B. (2022). Knowledge graph construction for product designs from large CAD model repositories. Advanced Engineering Informatics, 53, 101680. +- Sieranoja, S., & Fränti, P. (2022). Adapting k-means for graph clustering. Knowledge and Information Systems, 64(1), 115-142. +- Roghani, H., & Bouyer, A. (2022). A fast local balanced label diffusion algorithm for community detection in social networks. IEEE Transactions on Knowledge and Data Engineering. +- Peng, J., Zhou, Y., & Wang, K. (2021). Multiplex gene and phenotype network to characterize shared genetic pathways of epilepsy and autism. Scientific reports, 11(1), 952. +- Citraro, S., & Rossetti, G. (2020). Identifying and exploiting homogeneous communities in labeled networks. Applied Network Science, 5(1), 55. +- Gomes Ferreira, C. H., Murai, F., Silva, A. P., Trevisan, M., Vassio, L., Drago, I., ... & Almeida, J. M. (2022). On network backbone extraction for modeling online collective behavior. Plos one, 17(9), e0274218. +- Yao, X., Wang, D., Yu, T., Luan, C., & Fu, J. (2023). A machining feature recognition approach based on hierarchical neural network for multi-feature point cloud models. Journal of Intelligent Manufacturing, 34(6), 2599-2610. +- Hottenrott, H., Rose, M. E., & Lawson, C. (2021). The rise of multiple institutional affiliations in academia. Journal of the Association for Information Science and Technology, 72(8), 1039-1058. +- Vilela, J., Asif, M., Marques, A. R., Santos, J. X., Rasga, C., Vicente, A., & Martiniano, H. (2023). Biomedical knowledge graph embeddings for personalized medicine: Predicting disease‐gene associations. Expert Systems, 40(5), e13181 +- Frąszczak, D. (2023). Detecting rumor outbreaks in online social networks. Social Network Analysis and Mining, 13(1), 91. +- Pister, A., Buono, P., Fekete, J. D., Plaisant, C., & Valdivia, P. (2020). Integrating prior knowledge in mixed-initiative social network clustering. IEEE Transactions on Visualization and Computer Graphics, 27(2), 1775-1785. +- Mohammadmosaferi, K. K., & Naderi, H. (2020). Evolution of communities in dynamic social networks: An efficient map-based approach. Expert Systems with Applications, 147, 113221 +- Amira, A., Derhab, A., Hadjar, S., Merazka, M., Alam, M. G. R., & Hassan, M. M. (2023). Detection and Analysis of Fake News Users’ Communities in Social Media. IEEE Transactions on Computational Social Systems. +- Yassin, A., Haidar, A., Cherifi, H., Seba, H., & Togni, O. (2023). An evaluation tool for backbone extraction techniques in weighted complex networks. Scientific Reports, 13(1), 17000. +- Sobolevsky, S., & Belyi, A. (2022). Graph neural network inspired algorithm for unsupervised network community detection. Applied Network Science, 7(1), 63. +- Oestreich, Marie, et al. "hCoCena: horizontal integration and analysis of transcriptomics datasets." Bioinformatics 38.20 (2022): 4727-4734. +- Rustamaji, H. C., Kusuma, W. A., Nurdiati, S., & Batubara, I. (2024). Community detection with greedy modularity disassembly strategy. Scientific Reports, 14(1), 4694. +- Aref, S., Mostajabdaveh, M., & Chheda, H. (2023, June). Heuristic modularity maximization algorithms for community detection rarely return an optimal partition or anything similar. In International Conference on Computational Science (pp. 612-626). Cham: Springer Nature Switzerland. +- Galan-Vasquez, E., & Perez-Rueda, E. (2021). A landscape for drug-target interactions based on network analysis. Plos one, 16(3), e0247018. +- Groza, V., Udrescu, M., Bozdog, A., & Udrescu, L. (2021). Drug repurposing using modularity clustering in drug-drug similarity networks based on drug–gene interactions. Pharmaceutics, 13(12), 2117. +- Zafarmand, M., Talebirad, Y., Austin, E., Largeron, C., & Zaïane, O. R. (2023). Fast local community discovery relying on the strength of links. Social Network Analysis and Mining, 13(1), 112 +- Cazabet, R., Boudebza, S., & Rossetti, G. (2020). Evaluating community detection algorithms for progressively evolving graphs. Journal of Complex Networks, 8(6), cnaa027. +- Rani, S., & Kumar, M. (2022). Ranking community detection algorithms for complex social networks using multilayer network design approach. International Journal of Web Information Systems, 18(5/6), 310-341. +- Tariq, R., Lavangnananda, K., Bouvry, P., & Mongkolnam, P. (2023). Partitioning Graph Clustering With User-Specified Density. IEEE Access, 11, 122273-122294. +- Pavel, A., Federico, A., Del Giudice, G., Serra, A., & Greco, D. (2021). Volta: adVanced mOLecular neTwork analysis. Bioinformatics, 37(23), 4587-4588. +- Krishna, V., Vasiliauskaite, V., & Antulov-Fantulin, N. (2022). Question routing via activity-weighted modularity-enhanced factorization. Social Network Analysis and Mining, 12(1), 155. +- Sahu, S., & Rani, T. S. (2022). A neighbour-similarity based community discovery algorithm. Expert Systems with Applications, 206, 117822. +- Aref, S., Chheda, H., & Mostajabdaveh, M. (2022). The Bayan algorithm: detecting communities in networks through exact and approximate optimization of modularity. arXiv preprint arXiv:2209.04562. +- Leventidis, A., Di Rocco, L., Gatterbauer, W., Miller, R. J., & Riedewald, M. (2023). DomainNet: Homograph Detection and Understanding in Data Lake Disambiguation. ACM Transactions on Database Systems, 48(3), 1-40. +- Rossetti, G. (2020). ANGEL: efficient, and effective, node-centric community discovery in static and dynamic networks. Applied Network Science, 5(1), 26. +- Citraro, S., & Rossetti, G. (2021). X-Mark: A benchmark for node-attributed community discovery algorithms. Social Network Analysis and Mining, 11(1), 99 +- Kumar, M., Mishra, S., Singh, S. S., & Biswas, B. (2024). Community-enhanced Link Prediction in Dynamic Networks. ACM Transactions on the Web, 18(2), 1-32. +- Shrestha, A., Mielke, K., Nguyen, T. A., & Giabbanelli, P. J. (2022, December). Automatically explaining a model: Using deep neural networks to generate text from causal maps. In 2022 Winter Simulation Conference (WSC) (pp. 2629-2640). IEEE. +- Ye, Q., Xu, R., Li, D., Kang, Y., Deng, Y., Zhu, F., ... & Hou, T. (2023). Integrating multi-modal deep learning on knowledge graph for the discovery of synergistic drug combinations against infectious diseases. Cell Reports Physical Science, 4(8). +- Peixoto, A. R., de Almeida, A., António, N., Batista, F., Ribeiro, R., & Cardoso, E. (2023). Unlocking the power of Twitter communities for startups. Applied Network Science, 8(1), 66. +- Hottenrott, H., & Lawson, C. (2022). What is behind multiple institutional affiliations in academia?. Science and public policy, 49(3), 382-402. +- Sarmiento, H., Bravo-Marquez, F., Graells-Garrido, E., & Poblete, B. (2022, May). Identifying and Characterizing New Expressions of Community Framing during Polarization. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 16, pp. 841-851). +- Mouronte-López, M. L., & Subirán, M. (2022). Modeling the interaction networks about the climate change on twitter: A characterization of its network structure. Complexity, 2022. +- Akbaritabar, A. (2021). A quantitative view of the structure of institutional scientific collaborations using the example of Berlin. Quantitative Science Studies, 2(2), 753-777. +- Das, S., Devarapalli, R. K., & Biswas, A. (2024). Leveraging cascading information for community detection in social networks. Information Sciences, 120696. +- Xiao, J., Wang, Y. J., & Xu, X. K. (2021). Fuzzy community detection based on elite symbiotic organisms search and node neighborhood information. IEEE Transactions on Fuzzy Systems, 30(7), 2500-2514. +- Al-Debagy, O., & Martinek, P. (2022). Dependencies-based microservices decomposition method. International Journal of Computers and Applications, 44(9), 814-821. +- Frąszczak, D. (2022). RPaSDT—rumor propagation and source detection Toolkit. SoftwareX, 17, 100988. +- Aref, S., & Mostajabdaveh, M. (2024). Analyzing modularity maximization in approximation, heuristic, and graph neural network algorithms for community detection. Journal of Computational Science, 78, 102283. +- Mohammadmosaferi, K. K., & Naderi, H. (2021). AFIF: Automatically Finding Important Features in community evolution prediction for dynamic social networks. Computer Communications, 176, 66-80 +- Monterde, B., Rojano, E., Córdoba-Caballero, J., Seoane, P., Perkins, J. R., Medina, M. Á., & Ranea, J. A. (2023). Integrating differential expression, co-expression and gene network analysis for the identification of common genes associated with tumor angiogenesis deregulation. Journal of Biomedical Informatics, 144, 104421 +- Böhle, T., Kuehn, C., & Thalhammer, M. (2022). On the reliable and efficient numerical integration of the Kuramoto model and related dynamical systems on graphs. International Journal of Computer Mathematics, 99(1), 31-57. +- Xiao, J., Zou, Y. C., & Xu, X. K. (2023). A Metaheuristic-Based Modularity Optimization Algorithm Driven by Edge Directionality for Directed Networks. IEEE Transactions on Network Science and Engineering. +- Vilela, J., Martiniano, H., Marques, A. R., Santos, J. X., Asif, M., Rasga, C., ... & Vicente, A. M. (2023). Identification of Neurotransmission and Synaptic Biological Processes Disrupted in Autism Spectrum Disorder Using Interaction Networks and Community Detection Analysis. Biomedicines, 11(11), 2971. +- Zhu, W., Sun, Y., Fang, R., & Xu, B. (2023). A Low-Memory Community Detection Algorithm with Hybrid Sparse Structure and Structural Information for Large-scale Networks. IEEE Transactions on Parallel and Distributed Systems. +- Das, S., & Biswas, A. (2024). TSInc: Tie strength based incremental community detection using information cascades. International Journal of Information Technology, 1-11. +- Rossetti, G. (2020). Exorcising the demon: angel, efficient node-centric community discovery. In Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8 (pp. 152-163). Springer International Publishing. +- Frąszczak, D., & Frąszczak, E. (2024). NetCenLib: A comprehensive python library for network centrality analysis and evaluation. SoftwareX, 26, 101699 +- Adams, C., Bozhidarova, M., Chen, J., Gao, A., Liu, Z., Priniski, J. H., ... & Brantingham, P. J. (2022, December). Knowledge graphs of the QAnon Twitter network. In 2022 IEEE International Conference on Big Data (Big Data) (pp. 2903-2912). IEEE. +- Böhle, T., Thalhammer, M., & Kuehn, C. (2022). Community integration algorithms (CIAs) for dynamical systems on networks. Journal of Computational Physics, 469, 111524. +- Soh Tsin Howe, J. (2021). Simulating subject communities in case law citation networks. Frontiers in Physics, 9, 665563. +- Goodbrake, C., Beers, D., Thompson, T. B., Harrington, H. A., & Goriely, A. (2024). Brain chains as topological signatures for Alzheimer’s disease. Journal of Applied and Computational Topology, 1-42. +- Citraro, S., & Rossetti, G. (2020). Eva: Attribute-aware network segmentation. In Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8 (pp. 141-151). Springer International Publishing. +- Park, Y. J., & Li, D. (2024). Lower Ricci Curvature for Efficient Community Detection. arXiv preprint arXiv:2401.10124. +- Ghoroghchian, N., Dasarathy, G., & Draper, S. (2021, March). Graph community detection from coarse measurements: Recovery conditions for the coarsened weighted stochastic block model. In International Conference on Artificial Intelligence and Statistics (pp. 3619-3627). PMLR. +- Sun, P. G., Wu, X., Quan, Y., & Miao, Q. (2022). Rearranging'indivisible'blocks for community detection. IEEE Transactions on Knowledge and Data Engineering. +- Kumar, P., & Dohare, R. (2022). An interaction-based method for detecting overlapping community structure in real-world networks. International Journal of Data Science and Analytics, 14(1), 27-44. +- Vera, J., & Palma, W. (2021). The community structure of word co-occurrence networks: Experiments with languages from the Americas. Europhysics Letters, 134(5), 58002 +- Rutkowski, E., Sargant, J., Houghten, S., & Brown, J. A. (2021, June). Evaluation of communities from exploratory evolutionary compression of weighted graphs. In 2021 IEEE Congress on Evolutionary Computation (CEC) (pp. 434-441). IEEE +- Jaguzović, M., Grbić, M., Ðukanović, M., & Matić, D. (2022, March). Identification of protein complexes by overlapping community detection algorithms: A comparative study. In 2022 21st International Symposium INFOTEH-JAHORINA (INFOTEH) (pp. 1-6). IEEE +- Gurukar, S., Venkatakrishnan, S. B., Ravindran, B., & Parthasarathy, S. (2023, November). PolicyClusterGCN: Identifying Efficient Clusters for Training Graph Convolutional Networks. In Proceedings of the International Conference on Advances in Social Networks Analysis and Mining (pp. 245-252). +- Prokop, P., Dráždilová, P., & Platoš, J. (2023, November). Hierarchical Overlapping Community Detection for Weighted Networks. In International Conference on Complex Networks and Their Applications (pp. 159-171). Cham: Springer Nature Switzerland. +- Hiel, S., Nicolaers, L., Vázquez, C. O., Mitrović, S., Baesens, B., & De Weerdt, J. (2022, November). Evaluation of Joint Modeling Techniques for Node Embedding and Community Detection on Graphs. In 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 403-410). IEEE. +- Das, S., & Biswas, A. (2021, June). Community detection in social networks using local topology and information exchange. In 2021 International Conference on Intelligent Technologies (CONIT) (pp. 1-7). IEEE +- Hassine, M. B., Jabbour, S., Kmimech, M., Raddaoui, B., & Graiet, M. (2023, August). A Non-overlapping Community Detection Approach Based on α-Structural Similarity. In International Conference on Big Data Analytics and Knowledge Discovery (pp. 197-211). Cham: Springer Nature Switzerland. +- Jeong, H., Kim, Y., Jung, Y. S., Kang, D. R., & Cho, Y. R. (2021). Entropy-based graph clustering of PPI networks for predicting overlapping functional modules of proteins. Entropy, 23(10), 1271. +- Ghoroghchian, N., Anguluri, R., Dasarathy, G., & Draper, S. C. (2022). Controllability of Coarsely Measured Networked Linear Dynamical Systems (Extended Version). arXiv preprint arXiv:2206.10569. +- Cruz, F., Monteiro, P. T., & Teixeira, A. S. (2023, March). Community Structure in Transcriptional Regulatory Networks of Yeast Species. In International Workshop on Complex Networks (pp. 38-49). Cham: Springer Nature Switzerland. +- Sreevalsan-Nair, J., & Jakher, A. (2022). CAP-DSDN: Node Co-association Prediction in Communities in Dynamic Sparse Directed Networks and a Case Study of Migration Flow. In KDIR (pp. 63-74). +- Szufel, P. (2024, April). Towards Graph Clustering for Distributed Computing Environments. In International Workshop on Algorithms and Models for the Web-Graph (pp. 146-158). Cham: Springer Nature Switzerland. +- Jaiswal, R., & Ramanna, S. (2021). Detecting overlapping communities using ensemble-based distributed neighbourhood threshold method in social networks. Intelligent Decision Technologies, 15(2), 251-267. +- Salter-Duke, M. (2024). Tangled webs: A practical investigation of graph tangles (Doctoral dissertation, Open Access Te Herenga Waka-Victoria University of Wellington). +- Bharadwaj, A. G. (2023). Driving Reasoning Systems for Product Design and Flexible Robotic Manipulation Using 3D Design-Based Knowledge Graphs. North Carolina State University. +- Lei, G., Sheng, Y., Shaozi, L., & Qingshou, W. (2022). Hierarchical community‐discovery algorithm combining core nodes and three‐order structure model. Concurrency and Computation: Practice and Experience, 34(4), e6669. +- Stav, G. B. (2023). Network Analysis of the 3D Genome (Master's thesis). +- Gibbs, C. P. (2023). Causality and clustering in complex settings (Doctoral dissertation, Colorado State University). +- Das, S., & Biswas, A. (2022, December). Towards Direct Comparison of Community Structures in Social Networks. In 2022 IEEE 1st International Conference on Data, Decision and Systems (ICDDS) (pp. 1-6). IEEE. +- Krathaus, A. (2023). Impacts of Social and Transportation Networks on Social Activity-Travel Participation: An Exploratory Analysis Using Location-Based Social Network Data (Master's thesis, State University of New York at Buffalo) +- Svete, A., & Hostnik, J. (2020). It is not just about the melody: how Europe votes for its favorite songs. arXiv preprint arXiv:2002.06609. +- Rutkowski, E. (2022). Weighted Graph Compression using Genetic Algorithms. +- Zhou, X., Pan, Y., & Qin, J. (2022, April). Intelligent Control of Shield Tunneling from the Perspective of Complex Network. In International Conference on Green Building, Civil Engineering and Smart City (pp. 1226-1233). Singapore: Springer Nature Singapore. +- Rózemberczki, B. (2021). Graph mining on static, multiplex and attributed networks. +- Chatzi, I. (2024). Σύγκριση μεθόδων εντοπισμού κοινοτήτων για ανίχνευση botnets. +- Oostenbach, R. Fairness-Aware Analysis of Community Detection. +- AL-DYANI, W. Z. A. AN ENHANCED BINARY BAT AND MARKOV CLUSTERING ALGORITHMS TO IMPROVE EVENT DETECTION FOR HETEROGENEOUS NEWS TEXT DOCUMENTS. +- Campos, G. A., Ribeiro, J. M., Vieira, V. F., & Xavier, C. R. (2023, August). Estudo do impacto da seleção de sementes baseada em centralidade e em informações de comunidades sobrepostas. In Anais do XII Brazilian Workshop on Social Network Analysis and Mining (pp. 163-174). SBC. +- Akbaritabar, A. (2021). Quantitative View of the Structure of Institutional Scientific Collaborations Using the Examples of Halle, Jena and Leipzig. arXiv preprint arXiv:2101.05784. +- Barros, J. S. A. Facultad de Ingeniería Carrera de Ingeniería de Sistemas (Doctoral dissertation, Universidad de Cuenca). +- HUBERT, M. CRAWLING AND ANALYSING CODE REVIEW NETWORKS ON INDUSTRY AND OPEN SOURCE DATA. +- Svete, A., Hostnik, J., & Šubelj, L. (2020). Ne gre le za melodijo: kako Evropa glasuje za svoje najljubše skladbe. Uporabna Informatika, 28. diff --git a/docs/index.rst b/docs/index.rst index 55a09c0..251c464 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -7,12 +7,12 @@ CDlib - Community Detection Library =================================== -``CDlib`` is a Python software package that allows to extract, compare and evaluate communities from complex networks. +``CDlib`` is a Python software package that allows extracting, comparing, and evaluating communities from complex networks. -The library provides a standardized input/output for several existing Community Detection algorithms. -The implementations of all CD algorithms are inherited from existing projects, each one of them acknowledged in the dedicated method reference page. +The library provides a standardized input/output for several Community Detection algorithms. +The implementations of all CD algorithms are inherited from existing projects; each acknowledged in the dedicated method reference page. -If you would like to test ``CDlib`` functionalities without installing it on your machine consider using the preconfigured Jupyter Hub instances offered by the H2020 `SoBigData++`_ research project. +If you want to test ``CDlib`` functionalities without installing it on your machine, consider using the preconfigured Jupyter Hub instances offered by the EU funded `SoBigData`_ research infrastructure. If you use ``CDlib`` in your research please cite the following paper: @@ -36,6 +36,7 @@ CDlib Dev Team `Letizia Milli`_ Community Models Integration `Rémy Cazabet`_ Visualization `Salvatore Citraro`_ Community Models Integration +`Andrea Failla`_ Community Models Integration ======================= ============================ @@ -50,10 +51,11 @@ CDlib Dev Team bibliography.rst -.. _`Giulio Rossetti`: http://www.about.giuliorossetti.net +.. _`Giulio Rossetti`: http://giuliorossetti.github.io .. _`Letizia Milli`: https://github.com/letiziam .. _`Salvatore Citraro`: https://github.com/dsalvaz .. _`Rémy Cazabet`: http://cazabetremy.fr +.. _`Andrea Failla`: http://andreafailla.github.io .. _`Source`: https://github.com/GiulioRossetti/CDlib .. _`Distribution`: https://pypi.python.org/pypi/CDlib -.. _`SoBigData++`: https://sobigdata.d4science.org/group/sobigdata-gateway/explore?siteId=20371853 \ No newline at end of file +.. _`SoBigData`: https://sobigdata.d4science.org/group/sobigdata-gateway/explore?siteId=20371853 \ No newline at end of file diff --git a/docs/installing.rst b/docs/installing.rst index 4b2680f..830497a 100644 --- a/docs/installing.rst +++ b/docs/installing.rst @@ -4,16 +4,16 @@ Installing CDlib ``CDlib`` *requires* python>=3.8. -To install the latest version of our library just download (or clone) the current project, open a terminal and run the following commands: +To install the latest version of our library, download (or clone) the current project, open a terminal, and run the following commands: .. code-block:: python pip install -r requirements.txt - pip install -r requirements_optional.txt # (Optional) this might not work in Windows systems due to C-based dependencies. + pip install -r requirements_optional.txt # (Optional) This might not work in Windows systems due to C-based dependencies. pip install . -Alternatively use pip +Alternatively, use pip .. code-block:: python @@ -46,7 +46,7 @@ Optional Dependencies PyPi package ^^^^^^^^^^^^ -To simplify the installation process, the default installation does not include optional dependencies (e.g., ``graph-tool``). If you need them, you can install them manually or run the following command: +The default installation does not include optional dependencies (e.g., ``graph-tool``) to simplify the installation process. If you need them, you can install them manually or run the following command: .. code-block:: python @@ -70,34 +70,34 @@ This option will install all optional dependencies accessible with the flag C an Advanced ^^^^^^^^ -Due to some strict requirements, the installation of a subset of optional dependencies is left outside the previous procedures. +Due to strict requirements, installing a subset of optional dependencies is left outside the previous procedures. ---------- graph-tool ---------- ``CDlib`` integrates the support for SBM models offered by ``graph-tool``. -To install it refer to the official `documentation `_ and install the conda-forge version of the package (or the deb version if in a *nix system). +To install it, refer to the official `documentation `_ and install the conda-forge version of the package (or the deb version if in a *nix system). ------ ASLPAw ------ -Since its 2.1.0 release ``ASLPAw`` relies on ``gmpy2`` whose installation through pip is not easy to automatize due to some C dependencies. -To address such issue test the following recipe: +Since its 2.1.0 release, ``ASLPAw`` relies on ``gmpy2``, whose installation through pip is difficult to automate due to some C dependencies. +To address such an issue, test the following recipe: .. code-block:: python conda install gmpy2 pip install shuffle_graph>=2.1.0 similarity-index-of-label-graph>=2.0.1 ASLPAw>=2.1.0 -In case ASLPAw installation fails, please refer to the official ``gmpy2`` `repository `_. +If ASLPAw installation fails, please refer to the official ``gmpy2`` `repository `_. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Optional Dependencies (Conda package) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -``CDlib`` relies on a few packages not available through conda: to install them please use pip. +``CDlib`` relies on a few packages unavailable through conda: to install them, please use pip. .. code-block:: python @@ -109,4 +109,3 @@ Optional Dependencies (Conda package) In case ASLPAw installation fails, please refer to the official ``gmpy2`` repository `repository `_. - diff --git a/docs/overview.rst b/docs/overview.rst index 0db6602..9be996e 100644 --- a/docs/overview.rst +++ b/docs/overview.rst @@ -2,7 +2,7 @@ Overview ******** -``cdlib`` is a powerful Python package that allows for the extraction, comparison, and evaluation of communities from complex networks. +``cdlib`` is a powerful Python package that allows for extracting, comparing, and evaluating communities from complex networks. The potential audience for ``cdlib`` includes mathematicians, physicists, biologists, computer scientists, and social scientists. @@ -24,8 +24,12 @@ We welcome contributions from the community. EU H2020 -------- -``CDlib`` is a result of an European H2020 project: +``CDlib`` is a result of a stream of European H2020 projects: - SoBigData_ “Social Mining & Big Data Ecosystem”: under the scheme “INFRAIA-1-2014-2015: Research Infrastructures”, grant agreement #654024. +- "SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics" (http://www.sobigdata.eu); +- "SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics" – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021; +- FAIR: Future Artificial Intelligence Research. EU NextGenerationEU programme under the funding schemes PNRR-PE-AI. -.. _SoBigData: http://www.sobigdata.eu + +.. _SoBigData: http://www.sobigdata.eu \ No newline at end of file diff --git a/docs/reference/algorithms.rst b/docs/reference/algorithms.rst index cf57ad1..3d5a6d6 100644 --- a/docs/reference/algorithms.rst +++ b/docs/reference/algorithms.rst @@ -9,7 +9,7 @@ To maintain the library organization as clean and resilient to changes as possib 1. Algorithms designed for static networks, and 2. Algorithms designed for dynamic networks. -Moreover, within each category, ``CDlib`` groups together approaches sharing the same set of high-level characteristics. +Moreover, within each category, ``CDlib`` groups together approaches sharing the same high-level characteristics. In particular, static algorithms are organized into: @@ -42,7 +42,7 @@ Ensemble Methods ``CDlib`` implements basilar ensemble facilities to simplify the design of complex analytical pipelines requiring the instantiation of several community discovery algorithms. -Learn how to (i) pool multiple algorithms on the same network, (ii) perform fitness-driven methods' parameter grid search, and (iii) combine the two in few lines of code. +Learn how to (i) pool multiple algorithms on the same network, (ii) perform fitness-driven methods' parameter grid search, and (iii) combine the two in a few lines of code. .. toctree:: @@ -54,7 +54,7 @@ Learn how to (i) pool multiple algorithms on the same network, (ii) perform fitn Summary ------- -If you need a summary on the available algorithms and their properties (accepted graph types, community characteristics, computational complexity) refer to: +If you need a summary of the available algorithms and their properties (accepted graph types, community characteristics, computational complexity), refer to: .. toctree:: :maxdepth: 1 diff --git a/docs/reference/benchmark.rst b/docs/reference/benchmark.rst index 61d8e5f..7739e10 100644 --- a/docs/reference/benchmark.rst +++ b/docs/reference/benchmark.rst @@ -2,7 +2,7 @@ Synthetic Benchmarks ******************** -Evaluating Community Detection algorithms on ground truth communities can be tricky when the annotation is based on external semantic information, not on topological ones. +Evaluating Community Detection algorithms on ground truth communities can be tricky when the annotation is based on external semantic information, not topological ones. For this reason, ``cdlib`` integrates synthetic network generators with planted community structures. @@ -42,7 +42,7 @@ Benchmarks for node-attributed static networks. Dynamic Networks with Community Ground Truth ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Time evolving network topologies with planted community life-cycles. +Time-evolving network topologies with planted community life cycles. All generators return a tuple: (``dynetx.DynGraph``, ``cdlib.TemporalClustering``) .. autosummary:: diff --git a/docs/reference/cd_algorithms/algorithms.rst b/docs/reference/cd_algorithms/algorithms.rst index 9ef8c54..720202c 100644 --- a/docs/reference/cd_algorithms/algorithms.rst +++ b/docs/reference/cd_algorithms/algorithms.rst @@ -2,17 +2,17 @@ Algorithms' Table ================= -In the following table you can find an up-to-date list of the Community Detection algorithms made available within ``cdlib``. +The following table shows an up-to-date list of the Community Detection algorithms made available within ``cdlib``. Algorithms are listed in alphabetical order along with: - a few additional information on the graph typologies they handle, and - the main expected characteristics of the clustering they produce, -- (when available) the theoretical computational complexity as estimated by their authors. +- (when available) the theoretical computational complexity estimated by their authors. -All algorithms are assumed - apart few, reported, exceptions - to work on undirected and unweighted graphs. +Apart from a few reported exceptions, all algorithms are assumed to work on undirected and unweighted graphs. -**Complexity notation.** When discussing the time complexity the following notation is assumed: +**Complexity notation.** When discussing the time complexity, the following notation is assumed: - *n*: number of nodes - *m*: number of edges diff --git a/docs/reference/cd_algorithms/edge_clustering.rst b/docs/reference/cd_algorithms/edge_clustering.rst index 8709511..e2833a7 100644 --- a/docs/reference/cd_algorithms/edge_clustering.rst +++ b/docs/reference/cd_algorithms/edge_clustering.rst @@ -2,7 +2,7 @@ Edge Clustering =============== -Algorithms falling in this category generates communities composed by edges. +Algorithms falling in this category generate communities composed of edges. They return as result a ``EdgeClustering`` object instance. .. note:: diff --git a/docs/reference/cd_algorithms/node_clustering.rst b/docs/reference/cd_algorithms/node_clustering.rst index d666dd9..306260d 100644 --- a/docs/reference/cd_algorithms/node_clustering.rst +++ b/docs/reference/cd_algorithms/node_clustering.rst @@ -6,8 +6,8 @@ Static Community Discovery Node Clustering --------------- -Algorithms falling in this category generate communities composed by nodes. -The communities can represent neat, *crisp*, partition as well as *overlapping* or even *fuzzy* ones. +Algorithms falling in this category generate communities composed of nodes. +The communities can represent neat, *crisp*, partitions and *overlapping* or even *fuzzy* ones. .. note:: The following lists are aligned to CD methods available in the *GitHub main branch* of `CDlib`_. @@ -21,8 +21,8 @@ The communities can represent neat, *crisp*, partition as well as *overlapping* Crisp Communities ^^^^^^^^^^^^^^^^^ -A clustering is said to be a *partition* if each node belongs to one and only one community. -Methods in this subclass return as result a ``NodeClustering`` object instance. +A clustering is considered a *partition* if each node belongs to one and only one community. +As a result, methods in this subclass return a ``NodeClustering`` object instance. .. autosummary:: @@ -74,7 +74,7 @@ Overlapping Communities ^^^^^^^^^^^^^^^^^^^^^^^ A clustering is said to be *overlapping* if any generic node can be assigned to more than one community. -Methods in this subclass return as result a ``NodeClustering`` object instance. +As a result, methods in this subclass return a ``NodeClustering`` object instance. .. autosummary:: :toctree: algs/ @@ -113,8 +113,8 @@ Methods in this subclass return as result a ``NodeClustering`` object instance. Fuzzy Communities ^^^^^^^^^^^^^^^^^ -A clustering is said to be a *fuzzy* if each node can belongs (with a different degree of likelihood) to more than one community. -Methods in this subclass return as result a ``FuzzyNodeClustering`` object instance. +A clustering is *fuzzy* if each node can belong (with a different degree of likelihood) to more than one community. +As a result, methods in this subclass return a ``FuzzyNodeClustering`` object instance. .. autosummary:: :toctree: algs/ @@ -127,7 +127,7 @@ Methods in this subclass return as result a ``FuzzyNodeClustering`` object insta Node Attribute ^^^^^^^^^^^^^^ -Methods in this subclass return as result a ``AttrNodeClustering`` object instance. +As a result, methods in this subclass return a ``AttrNodeClustering`` object instance. .. autosummary:: :toctree: algs/ @@ -140,7 +140,7 @@ Methods in this subclass return as result a ``AttrNodeClustering`` object instan Bipartite Graph Communities ^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Methods in this subclass return as result a ``BiNodeClustering`` object instance. +As a result, methods in this subclass return a ``BiNodeClustering`` object instance. .. autosummary:: :toctree: algs/ @@ -156,7 +156,7 @@ Methods in this subclass return as result a ``BiNodeClustering`` object instance Antichain Communities ^^^^^^^^^^^^^^^^^^^^^ -Methods in this subclass are designed to extract communities from Directed Acyclic Graphs (DAG) and return as result a ``NodeClustering`` object instance. +Methods in this subclass are designed to extract communities from Directed Acyclic Graphs (DAG) and return. As a result, a ``NodeClustering`` object instance. .. autosummary:: :toctree: algs/ @@ -168,8 +168,8 @@ Methods in this subclass are designed to extract communities from Directed Acycl Edge Clustering --------------- -Algorithms falling in this category generates communities composed by edges. -They return as result a ``EdgeClustering`` object instance. +Algorithms falling in this category generate communities composed of edges. +They return, as a result, a ``EdgeClustering`` object instance. .. autosummary:: :toctree: algs/ diff --git a/docs/reference/cd_algorithms/temporal_clustering.rst b/docs/reference/cd_algorithms/temporal_clustering.rst index 72b8384..f6a5f89 100644 --- a/docs/reference/cd_algorithms/temporal_clustering.rst +++ b/docs/reference/cd_algorithms/temporal_clustering.rst @@ -2,7 +2,7 @@ Dynamic Community Discovery =========================== -Algorithms falling in this category generates communities that evolve as time goes by. +Algorithms falling in this category generate communities that evolve as time goes by. .. automodule:: cdlib.algorithms @@ -11,16 +11,16 @@ Algorithms falling in this category generates communities that evolve as time go Instant Optimal ^^^^^^^^^^^^^^^ -This first class of approaches is derived directly from the application of static community discovery methods to the dynamic case. -A succession of steps is used to model network evolution, and for each of them is identified an optimal partition. +This first class of approaches is derived directly from applying static community discovery methods to the dynamic case. +A succession of steps is used to model network evolution, and an optimal partition is identified for each. Dynamic communities are defined from these optimal partitions by specifying relations that connect topologies found in different, possibly consecutive, instants. -``cdlib`` implements a templating approach to transform every static community discovery algorithm in a dynamic one following a standard *Two-Stage* approach: +``cdlib`` implements a templating approach to transform every static community discovery algorithm into a dynamic one following a standard *Two-Stage* approach: - Identify: detect static communities on each step of evolution; -- Match: align the communities found at step t with the ones found at step t − 1, for each step. +- Match: align the communities found at step t with those found at step t − 1, for each step. -Here's an example of a two-step built on top of Louvain partitions of a dynamic snapshot-sequence graph (where each snapshot is an LFR synthetic graph). +Here is an example of a two-step built on top of Louvain partitions of a dynamic snapshot-sequence graph (where each snapshot is an LFR synthetic graph). .. code-block:: python @@ -34,28 +34,27 @@ Here's an example of a two-step built on top of Louvain partitions of a dynamic coms = algorithms.louvain(g) # here any CDlib algorithm can be applied tc.add_clustering(coms, t) -For what concerns the second stage (snapshots' node clustering matching) it is possible to parametrize the set similarity function as follows (example made with a standard Jaccard similarity): +For what concerns the second stage (snapshots' node clustering matching), it is possible to parametrize the set similarity function as follows (example made with a standard Jaccard similarity): .. code-block:: python jaccard = lambda x, y: len(set(x) & set(y)) / len(set(x) | set(y)) matches = tc.community_matching(jaccard, two_sided=True) -For all details on the available methods to extract and manipulate dynamic communities please refer to the ``TemporalClustering`` documentation. +For all details on the available methods to extract and manipulate dynamic communities, please refer to the ``TemporalClustering`` documentation. ^^^^^^^^^^^^^^^^^^ Temporal Trade-Off ^^^^^^^^^^^^^^^^^^ Algorithms belonging to the Temporal Trade-off class process iteratively the evolution of the network. -Moreover, unlike Instant optimal approaches, they take into account the network and the communities found in the previous step – or n-previous steps – to identify communities in the current one. +Moreover, unlike Instant optimal approaches, they consider the network and the communities found in the previous step – or n-previous steps – to identify communities in the current one. Dynamic Community Discovery algorithms falling into this category can be described by an iterative process: - Initialization: find communities for the initial state of the network; -- Update: for each incoming step, find communities at step t using graph at t and past information. +- Update: find communities at step t using graph at t and past information for each incoming step. .. autosummary:: :toctree: algs/ tiles - diff --git a/docs/reference/classes.rst b/docs/reference/classes.rst index f5b736f..5220316 100644 --- a/docs/reference/classes.rst +++ b/docs/reference/classes.rst @@ -2,20 +2,20 @@ Community Objects ***************** -``cdlib`` aims at standardizing the representation of network communities. -To fulfill such a goal, several Clustering classes are introduced, each one capturing specific community characteristics. -All classes inherit from a same interface, thus sharing some common functionalities. +``cdlib`` aims to standardize the representation of network communities. +To fulfill such a goal, several Clustering classes are introduced, each capturing specific community characteristics. +All classes inherit from the same interface, thus sharing some common functionalities. -In particular ``cdlib`` algorithms can output the following Clustering types: +In particular, ``cdlib`` algorithms can output the following Clustering types: - **NodeClustering**: Node communities (either crisp partitions or overlapping groups); -- **FuzzyNodeClustering**: Overlapping node communities with explicit node to community belonging score; -- **BiNodeClustering**: Clustering of a Bipartite graphs (with the explicit representation of class homogeneous communities); +- **FuzzyNodeClustering**: Overlapping node communities with explicit node-to-community belonging score; +- **BiNodeClustering**: Clustering of Bipartite graphs (with the explicit representation of class homogeneous communities); - **AttrNodeClustering**: Clustering of feature-rich (node-attributed) graphs; - **EdgeClustering**: Edge communities; - **TemporalClustering**: Clustering of Temporal Networks; -For a complete overview of the methods exposed by ``cdlib`` clustering objects refer to the following documentation. +Refer to the following documentation for a complete overview of the methods exposed by ``cdlib`` clustering objects. .. toctree:: :maxdepth: 1 @@ -29,14 +29,14 @@ For a complete overview of the methods exposed by ``cdlib`` clustering objects r ------------------------------------------------ -Using Clustering objects with your own algorithm +Using Clustering objects with your algorithm ------------------------------------------------ - I have a clustering obtained by an algorithm not included in ``CDlib``. Can I load it in a Clustering object to leverage the evaluation and visualization facilities of your library? + I have a clustering obtained by an algorithm not included in ``CDlib``. Can I load it in a Clustering object to leverage your library's evaluation and visualization facilities? -Yes you can. +Yes, you can. -Just transform your clustering in a list of lists (we represent each community as a list of node ids) and then create a NodeClustering (or any other Clustering) object from it. +Just transform your clustering into a list of lists (we represent each community as a list of node IDs) and then create a NodeClustering (or any other Clustering) object. .. code-block:: python @@ -45,6 +45,5 @@ Just transform your clustering in a list of lists (we represent each community a communities = [[1,2,3], [4,5,6], [7,8,9,10,11]] coms = NodeClustering(communities, graph=None, method_name="your_method") -Of course, to compute some evaluation scores/plot community-networks you'll also have to pass the original graph (as igraph/networkx object) while building the NodeClustering instance. - +Of course, to compute some evaluation scores/plot community networks, you will also have to pass the original graph (as igraph/networkx object) while building the NodeClustering instance. diff --git a/docs/reference/datasets.rst b/docs/reference/datasets.rst index 0548f7c..e0ba4da 100644 --- a/docs/reference/datasets.rst +++ b/docs/reference/datasets.rst @@ -2,8 +2,7 @@ Network Datasets With Annotated Communities ******************************************* -``cdlib`` allows to retrieve existing datasets, along with their ground truth partitions (if available), from an ad-hoc remote `repository`_. - +``cdlib`` allows retrieving existing datasets and their ground truth partitions (if available) from an ad-hoc remote `repository`_. .. automodule:: cdlib.datasets diff --git a/docs/reference/ensemble.rst b/docs/reference/ensemble.rst index 201f72e..8567d3d 100644 --- a/docs/reference/ensemble.rst +++ b/docs/reference/ensemble.rst @@ -2,7 +2,7 @@ Ensemble Methods ================ -Methods to automate the execution of multiple instances of community detection algorithm(s). +Methods to automate the execution of multiple community detection algorithm(s) instances. .. automodule:: cdlib.ensemble @@ -11,7 +11,7 @@ Methods to automate the execution of multiple instances of community detection a Configuration Objects --------------------- -Ranges can be specified to automate the execution of a same method while varying (part of) its inputs. +Ranges can be specified to automate the execution of the same method while varying (part of) its inputs. ``Parameter`` allows to specify ranges for numeric parameters, while ``BoolParamter`` for boolean ones. @@ -28,7 +28,7 @@ Multiple Instantiation Two scenarios often arise when applying community discovery algorithms to a graph: -1. the need to compare the results obtained by a give algorithm while varying its parameters +1. the need to compare the results obtained by a given algorithm while varying its parameters 2. the need to compare the multiple algorithms ``cdlib`` allows to do so by leveraging, respectively, ``grid_execution`` and ``pool``. @@ -44,13 +44,13 @@ Two scenarios often arise when applying community discovery algorithms to a grap Optimal Configuration Search ---------------------------- -In some scenarios it could be helpful delegate to the library the selection of the method parameters to obtain a partition that optimize a given quality function. +In some scenarios, it could be helpful to delegate to the library the selection of the method parameters to obtain a partition that optimizes a given quality function. ``cdlib`` allows to do so using the methods ``grid_search`` and ``random_search``. -Finally, ``pool_grid_filter`` generalizes such approach allowing to obtain the optimal partitions from a pool of different algorithms. +Finally, ``pool_grid_filter`` generalizes such an approach, allowing one to obtain the optimal partitions from a pool of different algorithms. .. autosummary:: :toctree: generated/ grid_search random_search - pool_grid_filter + pool_grid_filter \ No newline at end of file diff --git a/docs/reference/evaluation.rst b/docs/reference/evaluation.rst index 17cb318..3fe7f23 100644 --- a/docs/reference/evaluation.rst +++ b/docs/reference/evaluation.rst @@ -6,11 +6,11 @@ The evaluation of Community Discovery algorithms is not an easy task. ``cdlib`` implements two families of evaluation strategies: - *Internal* evaluation through fitness scores; -- *External* evaluation through partitions comparison. +- *External* evaluation through partition comparison. -Moreover, ``cdlib`` integrates both standard *synthetic network benchmarks* and *real networks with annotated ground truths*, thus allowing for testing identified communities against ground-truths. +Moreover, ``cdlib`` integrates both standard *synthetic network benchmarks* and *real networks with annotated ground truths*, thus allowing for testing identified communities against ground truths. -Finally, ``cdlib`` also provides a way to *rank* clustering results generated by a set of algorithms over a given input graph. +Finally, ``cdlib`` also provides a way to generate *rank* clustering results algorithms over a given input graph. .. note:: @@ -20,7 +20,7 @@ Finally, ``cdlib`` also provides a way to *rank* clustering results generated by Internal Evaluation: Fitness scores ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Fitness functions allows to summarize the characteristics of a computed set of communities. ``cdlib`` implements the following quality scores: +Fitness functions allow to summarize the characteristics of a computed set of communities. ``cdlib`` implements the following quality scores: .. automodule:: cdlib.evaluation @@ -50,7 +50,7 @@ Fitness functions allows to summarize the characteristics of a computed set of c purity -Among the fitness function a well-defined family of measures is the Modularity-based one: +Among the fitness function, a well-defined family of measures is the Modularity-based one: .. autosummary:: :toctree: eval/ @@ -74,7 +74,7 @@ Some measures will return an instance of ``FitnessResult`` that takes together m External Evaluation: Partition Comparisons ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -It is often useful to compare different graph partition to assess their resemblance. +It is often useful to compare different graph partitions to assess their resemblance. ``cdlib`` implements the following partition comparisons scores: .. autosummary:: @@ -107,7 +107,7 @@ It is often useful to compare different graph partition to assess their resembla -Some measures will return an instance of ``MatchingResult`` that takes together mean and standard deviation values of the computed index. +Some measures will return an instance of ``MatchingResult`` that takes together the computed index's mean and standard deviation values. .. autosummary:: :toctree: eval/ @@ -119,9 +119,9 @@ Some measures will return an instance of ``MatchingResult`` that takes together Synthetic Benchmarks ^^^^^^^^^^^^^^^^^^^^ -External evaluation scores can be fruitfully used to compare alternative clusterings of the same network, but also to asses to what extent an identified node clustering matches a known *ground truth* partition. +External evaluation scores can be fruitfully used to compare alternative clusterings of the same network and to assess to what extent an identified node clustering matches a known *ground truth* partition. -To facilitate such standard evaluation task, ``cdlib`` exposes a set of standard synthetic network generators providing topological community ground truth annotations. +To facilitate such a standard evaluation task, ``cdlib`` exposes a set of standard synthetic network generators providing topological community ground truth annotations. In particular, ``cdlib`` make available benchmarks for: @@ -129,7 +129,7 @@ In particular, ``cdlib`` make available benchmarks for: - *dynamic* community discovery; - *feature-rich* (i.e., node-attributed) community discovery. -All details can be found in the dedicated page. +All details can be found on the dedicated page. .. toctree:: :maxdepth: 1 @@ -141,9 +141,9 @@ All details can be found in the dedicated page. Networks With Annotated Communities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Although evaluating a topological partition against an annotated "semantic" one is not among the safest path to follow [Peel17]_, ``cdlib`` natively integrates well-known medium-size network datasets with ground-truth communities. +Although evaluating a topological partition against an annotated "semantic" one is not among the safest paths to follow [Peel17]_, ``cdlib`` natively integrates well-known medium-size network datasets with ground-truth communities. -Due to the non-negligible sizes of such datasets, we designed a simple API to gather them from a dedicated remote repository transparently. +Due to the non-negligible sizes of such datasets, we designed a simple API to gather them transparently from a dedicated remote repository. All details on remote datasets can be found on the dedicated page. @@ -159,7 +159,7 @@ Ranking Algorithms Once a set of alternative clusterings have been extracted from a given network, is there a way to select the *best* one given a set of target fitness functions? -``cdlib`` exposes a few standard techniques to address such an issue: all details can be found in the dedicated documentation page. +``cdlib`` exposes a few standard techniques to address such an issue: all details can be found on the dedicated documentation page. .. toctree:: :maxdepth: 1 @@ -168,4 +168,4 @@ Once a set of alternative clusterings have been extracted from a given network, .. _`cdlib`: https://github.com/GiulioRossetti/cdlib -.. [Peel17] Peel, Leto, Daniel B. Larremore, and Aaron Clauset. "The ground truth about metadata and community detection in networks." Science advances 3.5 (2017): e1602548. \ No newline at end of file +.. [Peel17] Peel, Leto, Daniel B. Larremore, and Aaron Clauset. "The ground truth about metadata and community detection in networks." Science Advances 3.5 (2017): e1602548. \ No newline at end of file diff --git a/docs/reference/readwrite.rst b/docs/reference/readwrite.rst index 2db531b..fb2463b 100644 --- a/docs/reference/readwrite.rst +++ b/docs/reference/readwrite.rst @@ -9,7 +9,7 @@ CSV format ---------- The easiest way to save the result of a community discovery algorithm is to organize it in a .csv file. -The following methods allows to read/write communities to/from csv. +The following methods allow you to read/write communities to/from CSV. .. automodule:: cdlib.readwrite @@ -20,13 +20,13 @@ The following methods allows to read/write communities to/from csv. read_community_csv write_community_csv -.. note:: CSV formatting allows only to save/retrieve NodeClustering object loosing most of the metadata present in the CD computation result - e.g., algorithm name, parameters, coverage... +.. note:: CSV formatting allows only the saving/retrieving NodeClustering object to lose most of the metadata in the CD computation result - e.g., algorithm name, parameters, coverage... ----------- JSON format ----------- -JSON format allows to store/load community discovery algorithm results in a more comprehensive way. +JSON format allows the storage/loading of community discovery algorithm results more comprehensively. .. autosummary:: :toctree: generated/ @@ -34,4 +34,4 @@ JSON format allows to store/load community discovery algorithm results in a more read_community_json write_community_json -.. note:: JSON formatting allows only to save/retrieve all kind of Clustering object maintaining all their metadata - except for the graph object instance. \ No newline at end of file +.. note:: JSON formatting allows only saving/retrieving all kinds of Clustering objects and maintaining all their metadata - except for the graph object instance. \ No newline at end of file diff --git a/docs/reference/reference.rst b/docs/reference/reference.rst index fd8571c..1f2889e 100644 --- a/docs/reference/reference.rst +++ b/docs/reference/reference.rst @@ -2,7 +2,7 @@ Reference ********* -``cdlib`` composes of several modules, each one fulfilling a different task related to community detection. +``cdlib``comprises several modules, each fulfilling a different task related to community detection. .. toctree:: diff --git a/docs/reference/utils.rst b/docs/reference/utils.rst index df3c802..166e65a 100644 --- a/docs/reference/utils.rst +++ b/docs/reference/utils.rst @@ -23,11 +23,11 @@ Transform ``igraph`` to/from ``networkx`` objects. Identifier mapping ^^^^^^^^^^^^^^^^^^ -Remapping of graph nodes. It is often a good idea - to limit the memory usage - to use progressive integers as node labels. -``cdlib`` automatically - and transparently - makes the conversion for the user, however, this step can be costly: for such reason the library also exposes facilities to directly pre/post process the network/community data. +Remapping of graph nodes. It is often a good idea to limit memory usage and to use progressive integers as node labels. +``cdlib`` automatically - and transparently - makes the conversion for the user; however, this step can be costly: for such reason, the library also exposes facilities to directly pre/post process the network/community data. .. autosummary:: :toctree: generated/ nx_node_integer_mapping - remap_node_communities + remap_node_communities \ No newline at end of file diff --git a/docs/reference/validation.rst b/docs/reference/validation.rst index ef052ec..9a6c407 100644 --- a/docs/reference/validation.rst +++ b/docs/reference/validation.rst @@ -2,15 +2,15 @@ Ranking Algorithms ****************** -Let's assume that you ran a set **X** of community discovery algorithms on a given graph **G** and that, for each of the obtained clustering, you computed a set **Y** of fitness scores. +Let us assume that you ran a set **X** of community discovery algorithms on a given graph **G** and that you computed a set **Y** of fitness scores for each of the obtained clustering. - Is there a way to rank the obtained clusterings by their quality as expressed by **Y**? - Is it possible to validate the statistical significance of the obtained ranking? - Can we do the same while comparing different clustering (e.g., using NMI, NF1, ARI, AMI...)? -Don't worry, ``cdlib`` got you covered! +Do not worry, ``cdlib`` got you covered! -(Yes, we are aware that Community Detection is an ill-posed problem for which `No Free-Lunch`_ can be expected... however, we're not aiming at a general ranking here!) +(Yes, we know Community Detection is an ill-posed problem for which `No Free-Lunch`_ can be expected... however, we are not aiming at a general ranking here!) ------------------------- Ranking by Fitness Scores diff --git a/docs/reference/viz.rst b/docs/reference/viz.rst index d0f5ebe..ed01665 100644 --- a/docs/reference/viz.rst +++ b/docs/reference/viz.rst @@ -2,8 +2,8 @@ Visual Analytics **************** -At the end of the analytical process is it often useful to visualize the obtained results. -``cdlib`` provides a few built-in facilities to ease such task. +At the end of the analytical process, it is often useful to visualize the obtained results. +``cdlib`` provides a few built-in facilities to ease such tasks. ^^^^^^^^^^^^^^^^^^^^^ Network Visualization @@ -28,7 +28,7 @@ Visualizing a graph is always a good idea (if its size is reasonable). Analytics plots ^^^^^^^^^^^^^^^ -Community evaluation outputs can be easily used to generate a visual representation of the main partition characteristics. +Community evaluation outputs can be easily used to represent the main partition characteristics visually. .. autosummary:: :toctree: generated/ @@ -36,4 +36,4 @@ Community evaluation outputs can be easily used to generate a visual representat plot_sim_matrix plot_com_stat plot_com_properties_relation - plot_scoring + plot_scoring \ No newline at end of file diff --git a/docs/tutorial.rst b/docs/tutorial.rst index fbb7e1d..a80a969 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -2,28 +2,28 @@ Quick Start *********** -``CDlib`` is a python library that allows to extract, compare and evaluate network partitions. +``CDlib`` is a Python library that allows network partition extraction, comparison, and evaluation. We designed it to be agnostic w.r.t. the data structure used to represent the network to be clustered: all the algorithms it implements accept interchangeably igraph/networkx objects. -Of course, such a choice comes with advantages as well as drawbacks. Here's the main ones you have to be aware of: +Of course, such a choice comes with advantages as well as drawbacks. Here are the main ones you have to be aware of: **Advantages** - Easy integration of existing/novel (python implementation of) CD algorithms; - Standardization of input and output; -- Zero-configuration user interface (e.g., you don't have to reshape your data!) +- Zero-configuration user interface (e.g., you do not have to reshape your data!) **Drawbacks** -- Algorithms performances are not comparable (execution time, scalability... they all depends on how each algorithm was originally implemented); -- Memory (in)efficiency: depending by the type of structure each individual algorithm requires memory consumption could be high; +- Algorithm performances are not comparable (execution time, scalability... they all depend on how each algorithm was originally implemented); +- Memory (in)efficiency: Depending by the type of structure each algorithm requires, memory consumption could be high; - Hidden transformation times: usually not a bottleneck, moving from a graph representation to another can take "some" time (usually linear in the graph size) -Most importantly: remember that i) each algorithm will be able to handle graphs up to a given size, and that ii) that maximum size that may vary greatly across the exposed algorithms. +Most importantly, remember that i) each algorithm will be able to handle graphs up to a given size, and ii) that maximum size may vary greatly across the exposed algorithms. -------- Tutorial -------- -Extracting communities using ``CDlib`` is easy as this: +Extracting communities using ``CDlib`` is easy as: .. code-block:: python @@ -32,9 +32,9 @@ Extracting communities using ``CDlib`` is easy as this: G = nx.karate_club_graph() coms = algorithms.louvain(G, weight='weight', resolution=1., randomize=False) -Of course, you can choose among all the algorithms available (taking care of specifying the correct parameters): in any case, you'll get as a result a Clustering object (or a more specific subclass). +Of course, you can choose among all the algorithms available (taking care of specifying the correct parameters). As a result, you will get a Clustering object (or a more specific subclass). -Clustering objects expose a set of methods to perform evaluation and comparisons. For instance, to get the partition modularity just write +Clustering objects exposes a set of methods to perform evaluation and comparisons. For instance, to get the partition modularity, write: .. code-block:: python @@ -47,18 +47,18 @@ or, equivalently from cdlib import evaluation mod = evaluation.newman_girvan_modularity(g,communities) -Moreover, you can also visualize networks and communities, plot indicators and similarity matrices... just take a look to the module reference to get a few examples. +Moreover, you can also visualize networks and communities, plot indicators, and similarity matrices... take a look at the module reference to get a few examples. -I know, plain tutorials are overrated: if you want to explore ``CDlib`` functionalities, please start playing around with our interactive `Google Colab Notebook `_! +I know plain tutorials are overrated: if you want to explore ``CDlib`` functionalities, please start playing around with our interactive `Google Colab Notebook `_ ! --- FAQ --- -**Q1.** I developed a novel Community Discovery algorithm/evaluation/visual analytics method and I would like to see it integrated in ``CDlib``. What should I do? +**Q1.** I developed a novel Community Discovery algorithm/evaluation/visual analytics method and would like to see it integrated into ``CDlib``. What should I do? -**A1.** That's great! Just open an issue on the project `GitHub `_ briefly describing the method (provide a link to the paper where it has been firstly introduced) and links to a python implementation (if available). We'll came back to you as soon as possible to discuss the next steps. +**A1.** That is great! Just open an issue on the project `GitHub `_ briefly describing the method (provide a link to the paper where it was first introduced) and links to a Python implementation (if available). We will return to you soon to discuss the next steps. **Q2.** Can you add method XXX to your library? -**A2.** It depends. Do you have a link to a python implementation/are you willing to help us in implementing it? If so, that's perfect. If not, well... everything is possible but it is likely that it will require some time. +**A2.** It depends. Do you have a link to a Python implementation, or are you willing to help us implement it? If so, that is perfect. If not, everything is possible, but it will likely require some time. \ No newline at end of file