diff --git a/pages/blog/getting-started-with-rispy-a-python-library-for-ris-files/index.ipynb b/pages/blog/getting-started-with-rispy-a-python-library-for-ris-files/index.ipynb new file mode 100644 index 00000000..f7164ea9 --- /dev/null +++ b/pages/blog/getting-started-with-rispy-a-python-library-for-ris-files/index.ipynb @@ -0,0 +1,627 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "2c1a4561-c545-4656-99fb-f9b0f359c6f4", + "metadata": {}, + "source": [ + "# Blog Post: Getting Started with rispy - A Python Library for RIS Files\n", + "\n", + "## Introduction\n", + "\n", + "In the world of academic research and literature management, dealing with bibliographic data is a common task. One of the formats used for exporting and handling such data is RIS (Research Information System Format), which is a standardized tag format used by digital libraries. Managing these RIS files can be a daunting task, but thanks to Python libraries like Rispy, this process becomes much more manageable. In this blog post, we'll explore how to use rispy to handle RIS files effectively.\n", + "\n", + "Note: The pronunciation is rispee, like \"crispy\", but without the c.\n", + "\n", + "## What is rispy?\n", + "\n", + "Rispy is a Python library designed to parse and handle bibliographic data in RIS format. Developed by MrTango, it simplifies the process of reading and writing RIS files, making it an indispensable tool for researchers and developers working in academic environments. *rispy* is open-source and can be found on GitHub: [rispy on GitHub](https://github.com/MrTango/rispy).\n", + "\n", + "## Installation\n", + "\n", + "Before we dive into the usage of Rispy, the first step is to install the library. This can be easily done using pip, Python's package installer. Open your terminal or command prompt and run the following command:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "598b0d8d-03ba-4eee-ba97-a781955d8d02", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -q rispy" + ] + }, + { + "cell_type": "markdown", + "id": "5eceabc0-cbd9-4725-a8cd-4887f5b2c60b", + "metadata": {}, + "source": [ + "## Downloading samples\n", + "\n", + "In order to run the commands in this blog post, let's download some data sample with data from three different sources: EMBASE, Rayyan, and Web Of Science." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "5905af2f-e40d-4e32-950a-71100cb13ed4", + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2024-01-31 23:57:04-- https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_embase.ris\n", + "Resolving gist.github.com (gist.github.com)... 140.82.113.3\n", + "Connecting to gist.github.com (gist.github.com)|140.82.113.3|:443... connected.\n", + "HTTP request sent, awaiting response... 301 Moved Permanently\n", + "Location: https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_embase.ris [following]\n", + "--2024-01-31 23:57:04-- https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_embase.ris\n", + "Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...\n", + "Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.110.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 30718 (30K) [text/plain]\n", + "Saving to: ‘/tmp/embase.ris’\n", + "\n", + "/tmp/embase.ris 100%[===================>] 30.00K --.-KB/s in 0.003s \n", + "\n", + "2024-01-31 23:57:05 (11.3 MB/s) - ‘/tmp/embase.ris’ saved [30718/30718]\n", + "\n" + ] + } + ], + "source": [ + "!wget https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_embase.ris -O /tmp/embase.ris" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "1740c641-d94b-4b41-87ac-b804a0a2b811", + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2024-01-31 23:57:07-- https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_rayyan.ris\n", + "Resolving gist.github.com (gist.github.com)... 140.82.113.3\n", + "Connecting to gist.github.com (gist.github.com)|140.82.113.3|:443... connected.\n", + "HTTP request sent, awaiting response... 301 Moved Permanently\n", + "Location: https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_rayyan.ris [following]\n", + "--2024-01-31 23:57:08-- https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_rayyan.ris\n", + "Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...\n", + "Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.111.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 10817 (11K) [text/plain]\n", + "Saving to: ‘/tmp/rayyan.ris’\n", + "\n", + "/tmp/rayyan.ris 100%[===================>] 10.56K --.-KB/s in 0.001s \n", + "\n", + "2024-01-31 23:57:08 (8.81 MB/s) - ‘/tmp/rayyan.ris’ saved [10817/10817]\n", + "\n" + ] + } + ], + "source": [ + "!wget https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_rayyan.ris -O /tmp/rayyan.ris" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "d2849f85-d9f0-4b0c-8e72-c6a66d3a3570", + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2024-01-31 23:57:10-- https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_wos.ris\n", + "Resolving gist.github.com (gist.github.com)... 140.82.113.3\n", + "Connecting to gist.github.com (gist.github.com)|140.82.113.3|:443... connected.\n", + "HTTP request sent, awaiting response... 301 Moved Permanently\n", + "Location: https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_wos.ris [following]\n", + "--2024-01-31 23:57:11-- https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_wos.ris\n", + "Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...\n", + "Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 9526 (9.3K) [text/plain]\n", + "Saving to: ‘/tmp/wos.ris’\n", + "\n", + "/tmp/wos.ris 100%[===================>] 9.30K --.-KB/s in 0.001s \n", + "\n", + "2024-01-31 23:57:11 (15.5 MB/s) - ‘/tmp/wos.ris’ saved [9526/9526]\n", + "\n" + ] + } + ], + "source": [ + "!wget https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_wos.ris -O /tmp/wos.ris" + ] + }, + { + "cell_type": "markdown", + "id": "c7d78793-ed63-4eba-98dc-f7052e7924ca", + "metadata": {}, + "source": [ + "## Using rispy to Read RIS Files\n", + "\n", + "Once you have rispy installed and a sample RIS file downloaded, it's time to start coding. \n", + "Here's some examples about how to read a RIS file." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "ac70f2d6-7ae3-446f-a154-cc74adb36280", + "metadata": {}, + "outputs": [], + "source": [ + "import rispy" + ] + }, + { + "cell_type": "markdown", + "id": "cd022745-618a-4485-8b91-0e9e710201a1", + "metadata": {}, + "source": [ + "### Read from EMBASE RIS" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "50e5d182-c9cf-4149-96a3-0b6657315e76", + "metadata": {}, + "outputs": [], + "source": [ + "# Path to your downloaded RIS file\n", + "embase_path = '/tmp/embase.ris'\n", + "\n", + "# Read the RIS file\n", + "with open(embase_path, 'r') as bibliography_file:\n", + " embase_data = rispy.load(bibliography_file)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "0d7ea423-0a0b-4b5d-a052-528da99bd475", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pretty printing has been turned OFF\n" + ] + } + ], + "source": [ + "pprint(list(embase_data[0].keys()))" + ] + }, + { + "cell_type": "markdown", + "id": "e008a3da-193d-4b3d-9648-ad651ba95b64", + "metadata": {}, + "source": [ + "Let's check if the result correspond to the data we created:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "e5a3e284-d0fb-4c7f-b46a-568bd6657734", + "metadata": {}, + "outputs": [], + "source": [ + "assert len(embase_data) == 2\n", + "assert embase_data[0][\"primary_title\"] == \"Author Correction: Federated learning enables big data for rare cancer boundary detection (Nature Communications, (2022), 13, 1, (7346), 10.1038/s41467-022-33407-5)\"\n", + "assert embase_data[0][\"doi\"] == \"10.1038/s41467-023-36188-7\"\n", + "assert embase_data[0][\"notes_abstract\"] == \"In this article the author name Carmen Balaña was incorrectly written as Carmen Balaña Quintero. The original article has been corrected.\"" + ] + }, + { + "cell_type": "markdown", + "id": "8d641618-c55b-4a16-a4e4-6e84853cec40", + "metadata": {}, + "source": [ + "One attribute that calls the attention is `unknown_tag`, that probably means that they are EMBASE specific fields. Let's check that:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "c49c7ee0-9ed9-4f9b-aefd-263dfc89fab9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['U2', 'U4', 'LK'])" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "embase_data[0][\"unknown_tag\"].keys()" + ] + }, + { + "cell_type": "markdown", + "id": "eac00932-8a33-4990-a8f1-5646db4df7e6", + "metadata": {}, + "source": [ + "### Read from Rayyan RIS" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "018b64aa-0666-4651-9bff-cde62a4ef7f3", + "metadata": {}, + "outputs": [], + "source": [ + "# Path to your downloaded RIS file\n", + "rayyan_path = '/tmp/rayyan.ris'\n", + "\n", + "# Read the RIS file\n", + "with open(rayyan_path, 'r') as bibliography_file:\n", + " rayyan_data = rispy.load(bibliography_file)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "9b28e54b-4c6a-4964-b58c-12e60f7d8d5b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pretty printing has been turned ON\n" + ] + } + ], + "source": [ + "pprint(list(rayyan_data[0].keys()))" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "545686c7-f697-4eba-b45f-4b5c82622294", + "metadata": {}, + "outputs": [], + "source": [ + "assert len(rayyan_data) == 3\n", + "assert rayyan_data[0][\"title\"] == \"Organic fertilization and alternative products in the control of powdery mildew\"\n", + "assert rayyan_data[0][\"doi\"] == \"10.1590/2447-536x.v26i1.2109\"\n", + "assert rayyan_data[0][\"abstract\"] == \"Abstract Rose is a plant of high nutritional requirement, susceptible to powdery mildew disease caused by fungus Oidium leucoconium, which causes leaf fall and losses in flower production. The objective of this study was to evaluate powdery mildew severity in rose cultivar ‘Grand Gala’ in response to organic fertilization and the application of alternative products to disease control. The first experiment was set in a factorial arrangement, with 5 alternative products: spraying with water as a control (PA), lime sulfur (CS), neem oil (ON), mixture of sodium bicarbonate and canola oil (BC) and coffee pyroligneous acid (APC) and 2 organic fertilizers: chicken manure (EA) and biofertilizer based on banana stalk (B). Disease severity was assessed at 0, 15, 30 and 45 days after the treatments. In the second experiment, asymptomatic leaves or with different powdery mildew severity levels were sprayed only once with the same alternative products mentioned above. Severity was assessed at 0, 7 and 14 days. The organic fertilizations did not influence the reduction in powdery mildew severity in rose. At 45 days, APC yielded a greater reduction in disease severity (81.6%), followed by treatments based on BC, ON and CS. Greater reduction in disease severity in experiment 2 occurred in the treatments of BC and CS, followed by APC. Therefore, it is possible to conclude that APC and the BC have the potential to control rose powdery mildew in an organic cultivation system.\"" + ] + }, + { + "cell_type": "markdown", + "id": "bb08bc8c-b5a5-4f06-b580-94ca999d8ef2", + "metadata": {}, + "source": [ + "### Read from Web Of Science RIS" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "98484fe4-f250-4999-9f06-4f8c01a3cc42", + "metadata": {}, + "outputs": [], + "source": [ + "# Path to your downloaded RIS file\n", + "wos_path = '/tmp/wos.ris'\n", + "\n", + "# Read the RIS file\n", + "with open(wos_path, 'r') as bibliography_file:\n", + " wos_data = rispy.load(bibliography_file)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "0b68d0c4-74b3-4088-937d-bab8d975aa10", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pretty printing has been turned OFF\n" + ] + } + ], + "source": [ + "pprint(list(wos_data[0].keys()))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "909aa7f3-08d9-4345-92bf-031e9da940ea", + "metadata": {}, + "outputs": [], + "source": [ + "assert len(wos_data) == 3\n", + "assert wos_data[0][\"title\"] == \"A Survey on Computer Vision for Assistive Medical Diagnosis From Faces\"\n", + "assert wos_data[0][\"doi\"] == \"10.1109/JBHI.2017.2754861\"\n", + "assert wos_data[0][\"abstract\"] == \"Automatic medical diagnosis is an emerging center of interest in computer vision as it provides unobtrusive objective information on a patient's condition. The face, as a mirror of health status, can reveal symptomatic indications of specific diseases. Thus, the detection of facial abnormalities or atypical features is at up most importance when it comes to medical diagnostics. This survey aims to give an overview of the recent developments in medical diagnostics from facial images based on computer vision methods. Various approaches have been considered to assess facial symptoms and to eventually provide further help to the practitioners. However, the developed tools are still seldom used in clinical practice, since their reliability is still a concern due to the lack of clinical validation of the methodologies and their inadequate applicability. Nonetheless, efforts are being made to provide robust solutions suitable for healthcare environments, by dealing with practical issues such as real-time assessment or patients positioning. This survey provides an updated collection of the most relevant and innovative solutions in facial images analysis. The findings show that with the help of computer vision methods, over 30 medical conditions can be preliminarily diagnosed from the automatic detection of some of their symptoms. Furthermore, future perspectives, such as the need for interdisciplinary collaboration and collecting publicly available databases, are highlighted.\"" + ] + }, + { + "cell_type": "markdown", + "id": "796c9808-1e8b-4d40-a680-0b1ec1eb4ada", + "metadata": {}, + "source": [ + "Again, the attribute `unknown_tag` appears, that probably means that they are WoS specific fields. Let's check that:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "a663bbc3-13e8-4a01-b86c-a1c2d3e55fed", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['PU', 'PI', 'PA', 'J9', 'JI', 'WE'])" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wos_data[0][\"unknown_tag\"].keys()" + ] + }, + { + "cell_type": "markdown", + "id": "0923c38b-231c-462d-bd37-8a7fb6840599", + "metadata": {}, + "source": [ + "### Consistency of fields across the sources\n", + "\n", + "Some sources uses different fields, so it is important to pay attention to some fields like title, abstract, etc.\n", + "\n", + "So let's first check all the fields across the sources." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "31bbe38b-c1ed-4640-be9b-3db17dcaa51f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'urls', 'first_authors', 'notes', 'custom3', 'date', 'alternate_title1', 'accession_number', 'type_of_work', 'publication_year', 'journal_name', 'custom7', 'custom5', 'language', 'number', 'primary_title', 'type_of_reference', 'issn', 'name_of_database', 'start_page', 'alternate_title3', 'note', 'year', 'doi', 'title', 'end_page', 'file_attachments2', 'abstract', 'access_date', 'keywords', 'volume', 'secondary_title', 'unknown_tag', 'author_address', 'custom6', 'notes_abstract', 'authors'}" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fields = set()\n", + "\n", + "for dataset in [embase_data, rayyan_data, wos_data]:\n", + " for row in dataset:\n", + " fields |= set(row.keys())\n", + "\n", + "fields" + ] + }, + { + "cell_type": "markdown", + "id": "114907cd-bf91-4033-ae50-16c63040b756", + "metadata": {}, + "source": [ + "Now, let's check all the fields about title:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "66c5f0cc-05cf-45eb-ac96-03f4f2b24c41", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['alternate_title1', 'primary_title', 'alternate_title3', 'title', 'secondary_title']" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[field for field in fields if \"title\" in field]" + ] + }, + { + "cell_type": "markdown", + "id": "4667ba2c-70bb-4d76-9e73-bfccbd731b41", + "metadata": {}, + "source": [ + "Let's also check the fields about abstract:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "e05217a2-1a28-49c7-9a79-c52d4c527f35", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['abstract', 'notes_abstract']" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[field for field in fields if \"abstract\" in field]" + ] + }, + { + "cell_type": "markdown", + "id": "3a18f846-15c8-4efe-9d1a-7be5687490a1", + "metadata": {}, + "source": [ + "So, if we are planning to use rispy inside a pipeline, we need to be very careful to get the field we want from all possible values." + ] + }, + { + "cell_type": "markdown", + "id": "c14b3571-42ad-4092-a8a1-d85c29f5f98e", + "metadata": {}, + "source": [ + "## Writing RIS Files with Rispy\n", + "\n", + "In addition to reading RIS files, Rispy also provides functionality for writing RIS files. This feature is particularly useful for creating your bibliographic data programmatically and exporting it in the RIS format, which is widely accepted in academic and research circles. Let's delve into how to use Rispy for writing RIS files.\n", + "\n", + "For this example, it will use the data from Rayyan that was just loaded." + ] + }, + { + "cell_type": "markdown", + "id": "e37b6952-6d06-4249-b378-04af6a2a2bb8", + "metadata": {}, + "source": [ + "### Writing to a RIS File\n", + "\n", + "Once your data is structured correctly, you can proceed to write it to a RIS file. Rispy simplifies this process significantly. Here’s how you can do it:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "93c579d8-c84c-4a75-a042-c1c3f2f4b391", + "metadata": {}, + "outputs": [], + "source": [ + "# Path for the new RIS file\n", + "output_file_path = '/tmp/output.ris'\n", + "\n", + "# Write the data to a RIS file\n", + "with open(output_file_path, 'w') as output_file:\n", + " rispy.dump(rayyan_data, output_file)" + ] + }, + { + "cell_type": "markdown", + "id": "aeaa2e4f-b11d-4d22-a0c4-cace243d8b22", + "metadata": {}, + "source": [ + "Now, let's check the first 10 lines of this new file created." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "63c8904e-a32c-439d-bdb2-94f81c48ecfb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.\n", + "TY - JOUR\n", + "AN - rayyan-536206336\n", + "TI - Organic fertilization and alternative products in the control of powdery mildew\n", + "Y1 - 2020\n", + "Y2 - 3\n", + "T2 - Ornamental Horticulture\n", + "SN - 2447-536X\n", + "J2 - Ornamental Horticulture\n", + "VL - 26\n" + ] + } + ], + "source": [ + "!head -n 10 /tmp/output.ris" + ] + }, + { + "cell_type": "markdown", + "id": "0cb289df-5bf7-46a6-821c-f207fe5aeacc", + "metadata": {}, + "source": [ + "Writing RIS files with rispy is straightforward and efficient. Whether you are generating bibliographic data from another source or manually creating your entries for research purposes, rispy's writing functionality makes the process seamless." + ] + }, + { + "cell_type": "markdown", + "id": "5c10a314-a323-4540-9fe6-a3b0ff1fef3a", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "By mastering both reading and writing of RIS files using rispy, you can automate and streamline your bibliographic management tasks, saving time and reducing the potential for manual errors. This makes rispy a valuable tool in any researcher's toolkit. \n", + "\n", + "**rispy** is a powerful and easy-to-use library for anyone dealing with RIS files in Python. Its simplicity in installation and usage makes it an excellent choice for managing bibliographic data. Whether you're a researcher, academic, or developer working with bibliographic information, Rispy is definitely worth exploring." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/pages/blog/getting-started-with-rispy-a-python-library-for-ris-files/index.md b/pages/blog/getting-started-with-rispy-a-python-library-for-ris-files/index.md new file mode 100644 index 00000000..7e466c20 --- /dev/null +++ b/pages/blog/getting-started-with-rispy-a-python-library-for-ris-files/index.md @@ -0,0 +1,438 @@ +# Blog Post: Getting Started with rispy - A Python Library for RIS Files + +## Introduction + +In the world of academic research and literature management, dealing with bibliographic data is a common task. One of the formats used for exporting and handling such data is RIS (Research Information System Format), which is a standardized tag format used by digital libraries. Managing these RIS files can be a daunting task, but thanks to Python libraries like Rispy, this process becomes much more manageable. In this blog post, we'll explore how to use rispy to handle RIS files effectively. + +Note: The pronunciation is rispee, like "crispy", but without the c. + +## What is rispy? + +Rispy is a Python library designed to parse and handle bibliographic data in RIS format. Developed by MrTango, it simplifies the process of reading and writing RIS files, making it an indispensable tool for researchers and developers working in academic environments. *rispy* is open-source and can be found on GitHub: [rispy on GitHub](https://github.com/MrTango/rispy). + +## Installation + +Before we dive into the usage of Rispy, the first step is to install the library. This can be easily done using pip, Python's package installer. Open your terminal or command prompt and run the following command: + + +```python +!pip install -q rispy +``` + +## Downloading samples + +In order to run the commands in this blog post, let's download some data sample with data from three different sources: EMBASE, Rayyan, and Web Of Science. + + +```python +!wget https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_embase.ris -O /tmp/embase.ris +``` + +
+

+ OUTPUT + +

+
+  
+--2024-01-31 23:57:04--  https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_embase.ris
+Resolving gist.github.com (gist.github.com)... 140.82.113.3
+Connecting to gist.github.com (gist.github.com)|140.82.113.3|:443... connected.
+HTTP request sent, awaiting response... 301 Moved Permanently
+Location: https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_embase.ris [following]
+--2024-01-31 23:57:04--  https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_embase.ris
+Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
+Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.110.133|:443... connected.
+HTTP request sent, awaiting response... 200 OK
+Length: 30718 (30K) [text/plain]
+Saving to: ‘/tmp/embase.ris’
+
+/tmp/embase.ris     100%[===================>]  30.00K  --.-KB/s    in 0.003s
+
+2024-01-31 23:57:05 (11.3 MB/s) - ‘/tmp/embase.ris’ saved [30718/30718]
+
+
+
+
+
+ + +```python +!wget https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_rayyan.ris -O /tmp/rayyan.ris +``` + +
+

+ OUTPUT + +

+
+  
+--2024-01-31 23:57:07--  https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_rayyan.ris
+Resolving gist.github.com (gist.github.com)... 140.82.113.3
+Connecting to gist.github.com (gist.github.com)|140.82.113.3|:443... connected.
+HTTP request sent, awaiting response... 301 Moved Permanently
+Location: https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_rayyan.ris [following]
+--2024-01-31 23:57:08--  https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_rayyan.ris
+Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
+Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.111.133|:443... connected.
+HTTP request sent, awaiting response... 200 OK
+Length: 10817 (11K) [text/plain]
+Saving to: ‘/tmp/rayyan.ris’
+
+/tmp/rayyan.ris     100%[===================>]  10.56K  --.-KB/s    in 0.001s
+
+2024-01-31 23:57:08 (8.81 MB/s) - ‘/tmp/rayyan.ris’ saved [10817/10817]
+
+
+
+
+
+ + +```python +!wget https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_wos.ris -O /tmp/wos.ris +``` + +
+

+ OUTPUT + +

+
+  
+--2024-01-31 23:57:10--  https://gist.github.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_wos.ris
+Resolving gist.github.com (gist.github.com)... 140.82.113.3
+Connecting to gist.github.com (gist.github.com)|140.82.113.3|:443... connected.
+HTTP request sent, awaiting response... 301 Moved Permanently
+Location: https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_wos.ris [following]
+--2024-01-31 23:57:11--  https://gist.githubusercontent.com/xmnlab/edbca465fc6151e87ac5aaa6b0b8837d/raw/6f8bd30362c737565c6530732f1edeb882c8e879/sample_wos.ris
+Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
+Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.
+HTTP request sent, awaiting response... 200 OK
+Length: 9526 (9.3K) [text/plain]
+Saving to: ‘/tmp/wos.ris’
+
+/tmp/wos.ris        100%[===================>]   9.30K  --.-KB/s    in 0.001s
+
+2024-01-31 23:57:11 (15.5 MB/s) - ‘/tmp/wos.ris’ saved [9526/9526]
+
+
+
+
+
+ +## Using rispy to Read RIS Files + +Once you have rispy installed and a sample RIS file downloaded, it's time to start coding. +Here's some examples about how to read a RIS file. + + +```python +import rispy +``` + +### Read from EMBASE RIS + + +```python +# Path to your downloaded RIS file +embase_path = '/tmp/embase.ris' + +# Read the RIS file +with open(embase_path, 'r') as bibliography_file: + embase_data = rispy.load(bibliography_file) +``` + + +```python +pprint(list(embase_data[0].keys())) +``` + +
+

+ OUTPUT + +

+
+  
+Pretty printing has been turned OFF
+
+
+
+
+ +Let's check if the result correspond to the data we created: + + +```python +assert len(embase_data) == 2 +assert embase_data[0]["primary_title"] == "Author Correction: Federated learning enables big data for rare cancer boundary detection (Nature Communications, (2022), 13, 1, (7346), 10.1038/s41467-022-33407-5)" +assert embase_data[0]["doi"] == "10.1038/s41467-023-36188-7" +assert embase_data[0]["notes_abstract"] == "In this article the author name Carmen Balaña was incorrectly written as Carmen Balaña Quintero. The original article has been corrected." +``` + +One attribute that calls the attention is `unknown_tag`, that probably means that they are EMBASE specific fields. Let's check that: + + +```python +embase_data[0]["unknown_tag"].keys() +``` + + + + +
+

+ OUTPUT + +

+
+  
+dict_keys(['U2', 'U4', 'LK'])
+
+
+
+ + + +### Read from Rayyan RIS + + +```python +# Path to your downloaded RIS file +rayyan_path = '/tmp/rayyan.ris' + +# Read the RIS file +with open(rayyan_path, 'r') as bibliography_file: + rayyan_data = rispy.load(bibliography_file) +``` + + +```python +pprint(list(rayyan_data[0].keys())) +``` + +
+

+ OUTPUT + +

+
+  
+Pretty printing has been turned ON
+
+
+
+
+ + +```python +assert len(rayyan_data) == 3 +assert rayyan_data[0]["title"] == "Organic fertilization and alternative products in the control of powdery mildew" +assert rayyan_data[0]["doi"] == "10.1590/2447-536x.v26i1.2109" +assert rayyan_data[0]["abstract"] == "Abstract Rose is a plant of high nutritional requirement, susceptible to powdery mildew disease caused by fungus Oidium leucoconium, which causes leaf fall and losses in flower production. The objective of this study was to evaluate powdery mildew severity in rose cultivar ‘Grand Gala’ in response to organic fertilization and the application of alternative products to disease control. The first experiment was set in a factorial arrangement, with 5 alternative products: spraying with water as a control (PA), lime sulfur (CS), neem oil (ON), mixture of sodium bicarbonate and canola oil (BC) and coffee pyroligneous acid (APC) and 2 organic fertilizers: chicken manure (EA) and biofertilizer based on banana stalk (B). Disease severity was assessed at 0, 15, 30 and 45 days after the treatments. In the second experiment, asymptomatic leaves or with different powdery mildew severity levels were sprayed only once with the same alternative products mentioned above. Severity was assessed at 0, 7 and 14 days. The organic fertilizations did not influence the reduction in powdery mildew severity in rose. At 45 days, APC yielded a greater reduction in disease severity (81.6%), followed by treatments based on BC, ON and CS. Greater reduction in disease severity in experiment 2 occurred in the treatments of BC and CS, followed by APC. Therefore, it is possible to conclude that APC and the BC have the potential to control rose powdery mildew in an organic cultivation system." +``` + +### Read from Web Of Science RIS + + +```python +# Path to your downloaded RIS file +wos_path = '/tmp/wos.ris' + +# Read the RIS file +with open(wos_path, 'r') as bibliography_file: + wos_data = rispy.load(bibliography_file) +``` + + +```python +pprint(list(wos_data[0].keys())) +``` + +
+

+ OUTPUT + +

+
+  
+Pretty printing has been turned OFF
+
+
+
+
+ + +```python +assert len(wos_data) == 3 +assert wos_data[0]["title"] == "A Survey on Computer Vision for Assistive Medical Diagnosis From Faces" +assert wos_data[0]["doi"] == "10.1109/JBHI.2017.2754861" +assert wos_data[0]["abstract"] == "Automatic medical diagnosis is an emerging center of interest in computer vision as it provides unobtrusive objective information on a patient's condition. The face, as a mirror of health status, can reveal symptomatic indications of specific diseases. Thus, the detection of facial abnormalities or atypical features is at up most importance when it comes to medical diagnostics. This survey aims to give an overview of the recent developments in medical diagnostics from facial images based on computer vision methods. Various approaches have been considered to assess facial symptoms and to eventually provide further help to the practitioners. However, the developed tools are still seldom used in clinical practice, since their reliability is still a concern due to the lack of clinical validation of the methodologies and their inadequate applicability. Nonetheless, efforts are being made to provide robust solutions suitable for healthcare environments, by dealing with practical issues such as real-time assessment or patients positioning. This survey provides an updated collection of the most relevant and innovative solutions in facial images analysis. The findings show that with the help of computer vision methods, over 30 medical conditions can be preliminarily diagnosed from the automatic detection of some of their symptoms. Furthermore, future perspectives, such as the need for interdisciplinary collaboration and collecting publicly available databases, are highlighted." +``` + +Again, the attribute `unknown_tag` appears, that probably means that they are WoS specific fields. Let's check that: + + +```python +wos_data[0]["unknown_tag"].keys() +``` + + + + +
+

+ OUTPUT + +

+
+  
+dict_keys(['PU', 'PI', 'PA', 'J9', 'JI', 'WE'])
+
+
+
+ + + +### Consistency of fields across the sources + +Some sources uses different fields, so it is important to pay attention to some fields like title, abstract, etc. + +So let's first check all the fields across the sources. + + +```python +fields = set() + +for dataset in [embase_data, rayyan_data, wos_data]: + for row in dataset: + fields |= set(row.keys()) + +fields +``` + + + + +
+

+ OUTPUT + +

+
+  
+{'urls', 'first_authors', 'notes', 'custom3', 'date', 'alternate_title1', 'accession_number', 'type_of_work', 'publication_year', 'journal_name', 'custom7', 'custom5', 'language', 'number', 'primary_title', 'type_of_reference', 'issn', 'name_of_database', 'start_page', 'alternate_title3', 'note', 'year', 'doi', 'title', 'end_page', 'file_attachments2', 'abstract', 'access_date', 'keywords', 'volume', 'secondary_title', 'unknown_tag', 'author_address', 'custom6', 'notes_abstract', 'authors'}
+
+
+
+ + + +Now, let's check all the fields about title: + + +```python +[field for field in fields if "title" in field] +``` + + + + +
+

+ OUTPUT + +

+
+  
+['alternate_title1', 'primary_title', 'alternate_title3', 'title', 'secondary_title']
+
+
+
+ + + +Let's also check the fields about abstract: + + +```python +[field for field in fields if "abstract" in field] +``` + + + + +
+

+ OUTPUT + +

+
+  
+['abstract', 'notes_abstract']
+
+
+
+ + + +So, if we are planning to use rispy inside a pipeline, we need to be very careful to get the field we want from all possible values. + +## Writing RIS Files with Rispy + +In addition to reading RIS files, Rispy also provides functionality for writing RIS files. This feature is particularly useful for creating your bibliographic data programmatically and exporting it in the RIS format, which is widely accepted in academic and research circles. Let's delve into how to use Rispy for writing RIS files. + +For this example, it will use the data from Rayyan that was just loaded. + +### Writing to a RIS File + +Once your data is structured correctly, you can proceed to write it to a RIS file. Rispy simplifies this process significantly. Here’s how you can do it: + + +```python +# Path for the new RIS file +output_file_path = '/tmp/output.ris' + +# Write the data to a RIS file +with open(output_file_path, 'w') as output_file: + rispy.dump(rayyan_data, output_file) +``` + +Now, let's check the first 10 lines of this new file created. + + +```python +!head -n 10 /tmp/output.ris +``` + +
+

+ OUTPUT + +

+
+  
+1.
+TY  - JOUR
+AN  - rayyan-536206336
+TI  - Organic fertilization and alternative products in the control of powdery mildew
+Y1  - 2020
+Y2  - 3
+T2  - Ornamental Horticulture
+SN  - 2447-536X
+J2  - Ornamental Horticulture
+VL  - 26
+
+
+
+
+ +Writing RIS files with rispy is straightforward and efficient. Whether you are generating bibliographic data from another source or manually creating your entries for research purposes, rispy's writing functionality makes the process seamless. + +## Conclusion + +By mastering both reading and writing of RIS files using rispy, you can automate and streamline your bibliographic management tasks, saving time and reducing the potential for manual errors. This makes rispy a valuable tool in any researcher's toolkit. + +**rispy** is a powerful and easy-to-use library for anyone dealing with RIS files in Python. Its simplicity in installation and usage makes it an excellent choice for managing bibliographic data. Whether you're a researcher, academic, or developer working with bibliographic information, Rispy is definitely worth exploring.