Skip to content

miniproject: viral epidemics and country

Ambreen H edited this page Jul 8, 2020 · 38 revisions

What countries do viral epidemics occur in?

owner:

Ambreen H

collaborators:

Pooja Paareek

miniproject summary

proposed activities

  1. Use the communal corpus of 50 articles on viral epidemics. #c5f015 FINISHED
  2. Meticulously scrutinize the corpus to detect true positives and false positive articles ie whether the articles are really on viral epidemics or not #1589F0 STARTED
  3. Refine and rerun the query to create a corpus of 950 articles. This shall be the dataset for further metanalysis.#1589F0 STARTED * Create the country dictionary using amidict: #c5f015 `FINISHED
  4. Using ami search to get information about the countries where such epidemics are most likely to occur. #1589F0 STARTED
  5. Test sectioning on epidemic50noCov/ to extract only those modules where the information about countries is most likely to be present. Annotation with dictionaries to create ami dataTables shall also be done. #1589F0 STARTED
  6. For ML techniques this shall be split into training, validation and test sets. #f0b215NOT STARTED
  7. Use relevant machine learning techniques for the classification of data based on whether the papers are related to viral epidemics and the countries where the viral epidemics were reported. This shall primarily be done using Python. #f0b215 NOT STARTED
  8. The model shall be validated using the accuracy obtained when testing it upon the test data. #f0b215NOT STARTED

outcomes

  1. Development of relevant spreadsheets as well as graphs with regards to the countries where the viral epidemics were reported and their respective frequencies.
  2. Development of the ML model for data classification having acceptable accuracy

corpora

  1. Initially the communal corpus of 50 articles on viral epidemics
  2. Later a new corpus consisting of 950 papers shall be created using the country dictionary.

dictionaries

  • country dictionary

software

  1. ami for the creation of corpus, use of dictionaries, sectioning
  2. ami/SPARQL for the creation of dictionaries
  3. Python and relevant libraries (Keras, TensorFlow, NLP, etc) for ML and data visualization (NumPy, Matplotlib, Seaborn, ggplot, etc)

constraints

Time would be a major constraint since this must be completed within a maximum period of 6 weeks.



#c5f015 Update 1:

06/07/2020

  1. Updated ami
  2. Created the country dictionary using the following function:

amidict -v --dictionary country --directory country --input country.txt create --informat list --outformats xml,html --wikilinks wikipedia, wikidata

  1. Further details on dictionary creation: https://github.com/petermr/openVirus/blob/master/dictionaries/country/country_dict.md

  2. Link to the created dictionary: https://github.com/petermr/openVirus/blob/master/dictionaries/test/country_new.xml

  3. Started the creation of a spreadsheet of true and false positives for classification using both the communal corpus as well as Europe PMC search

  4. Tested ami section on a corpus of 50 articles. Details: https://github.com/petermr/openVirus/wiki/ami:section

  5. Tested ami search on a corpus of 50 articles Details: https://github.com/petermr/openVirus/wiki/ami-search



#c5f015 Update 2:

08/07/2020

CORPUS_950

Created a communal corpus and pushed it to GitHub. This was accomplished by downloading and installing Visual Studio Code: https://code.visualstudio.com/download Next Steps (for reference):

  1. Install git on your system

  2. Clone your repository using git clone https://github.com/petermr/openVirus.git

  3. Remember the location of your cloned document. Add folders in the specified location.

  4. Go to the VS Code and open the folder where you cloned the repository. (Check git is enabled from settings)

  5. Go to source control section & click on git icon

  6. Give commit message & Commit the changes

  7. Add remote repo (Github repo)

  8. Push committed changes to GitHub repo

  9. Check changes on GitHub repo

For Troubleshooting, check FAQ Pushed the corpus of 950 papers to GitHub : https://github.com/petermr/openVirus/tree/master/miniproject/corpus_950_papers

ami

Updated ami by:

  1. Navigating to the ami3 folder on command prompt using cd path/to/ami3
  2. Running the commands: git pull and mvn clean install -DskipTests

Dictionary

Converted SPARQL query results to XML format using the following command:

amidict -p country_project -v --dictionary country --directory=target/dictionary --input=country_wikidata.xml create --informat=wikisparqlxml 

Reference Dictionary: country_converted

#f0b215 Blocker:

None at the moment



Initial Summary

Submitter: Pooja Paareek

The project is all about viral epidemics with respect to different countries. The purpose of doing the project is putting all essential data in one place with the dictionary, country so it will be easy to understand each and every one.

Initial work:

Initially for getting started need to install all the necessary software. what I have done so far is here below:

  1. Installed getpapers:

installed getpapers with using the information provided here: https://github.com/ContentMine/getpapers/blob/master/README.md

  • went to the download page and installed nvm-setup-zip
  • run the nvm-setup-zip and installed included installer.
  • installed NODE by using command prompt and the command one after another nvm install 7 nvm use 7.10.1
  • tested installation by node --version
  • ran the command npm install --global getpapers and get papers was installed successfully.
  1. Installed ami:
  1. Installed git
  • downloaded git
  • launched git bash

[further work in progress with the help of mentor Ambreen H]

Clone this wiki locally