-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and country
Ambreen H
Pooja Paareek
proposed activities
- Use the communal corpus of 50 articles on viral epidemics.
FINISHED
- Meticulously scrutinize the corpus to detect true positives and false positive articles ie whether the articles are really on viral epidemics or not
STARTED
- Refine and rerun the query to create a corpus of 950 articles. This shall be the dataset for further metanalysis.
STARTED
* Create the country dictionary usingamidict
: `FINISHED - Using ami search to get information about the countries where such epidemics are most likely to occur.
STARTED
- Test sectioning on
epidemic50noCov/
to extract only those modules where the information about countries is most likely to be present. Annotation with dictionaries to create ami dataTables shall also be done.STARTED
- For ML techniques this shall be split into training, validation and test sets.
NOT STARTED
- Use relevant machine learning techniques for the classification of data based on whether the papers are related to viral epidemics and the countries where the viral epidemics were reported. This shall primarily be done using Python.
NOT STARTED
- The model shall be validated using the accuracy obtained when testing it upon the test data.
NOT STARTED
outcomes
- Development of relevant spreadsheets as well as graphs with regards to the countries where the viral epidemics were reported and their respective frequencies.
- Development of the ML model for data classification having acceptable accuracy
- Initially the communal corpus of 50 articles on viral epidemics
- Later a new corpus consisting of 950 papers shall be created using the country dictionary.
- country dictionary
- ami for the creation of corpus, use of dictionaries, sectioning
- ami/SPARQL for the creation of dictionaries
- Python and relevant libraries (Keras, TensorFlow, NLP, etc) for ML and data visualization (NumPy, Matplotlib, Seaborn, ggplot, etc)
Time would be a major constraint since this must be completed within a maximum period of 6 weeks.
06/07/2020
- Updated ami
- Created the country dictionary using the following function:
amidict -v --dictionary country --directory country --input country.txt create --informat list --outformats xml,html --wikilinks wikipedia, wikidata
-
Further details on dictionary creation: https://github.com/petermr/openVirus/blob/master/dictionaries/country/country_dict.md
-
Link to the created dictionary: https://github.com/petermr/openVirus/blob/master/dictionaries/test/country_new.xml
-
Started the creation of a spreadsheet of true and false positives for classification using both the communal corpus as well as Europe PMC search
-
Tested
ami section
on a corpus of 50 articles. Details: https://github.com/petermr/openVirus/wiki/ami:section -
Tested
ami search
on a corpus of 50 articles Details: https://github.com/petermr/openVirus/wiki/ami-search -
Created a corpus of 950 papers: https://github.com/petermr/openVirus/tree/master/miniproject/corpus_950_papers
08/07/2020
- Downloading and installing Visual Studio Code: https://code.visualstudio.com/download
- Next Steps:
1: Install git on your system
2: Clone your repository using git clone https://github.com/petermr/openVirus.git
3: Remember the location of your cloned document. Add folders in the specified location.
4: Go to the VS Code and open the folder where you cloned the repository. (Check git is enabled from settings)
5: Go to source control section & click on git icon
6: Give commit message & Commit the changes
7: Add remote repo (Github repo)
8: Push committed changes to GitHub repo
9: Check changes on GitHub repo
For Troubleshooting, check FAQ
Submitter: Pooja Paareek
The project is all about viral epidemics with respect to different countries. The purpose of doing the project is putting all essential data in one place with the dictionary, country so it will be easy to understand each and every one.
Initially for getting started need to install all the necessary software. what I have done so far is here below:
- Installed
getpapers
: installed getpapers with using information provided here: https://github.com/ContentMine/getpapers/blob/master/README.md
- went to the download page and installed
nvm-setup-zip
- run the nvm-setup-zip and installed included installer.
- installed NODE by using command prompt and the command one after another
nvm install 7
nvm use 7.10.1
- tested installation by
node --version
- ran the command
npm install --global getpapers
and get papers was installed successfully.
- Installed
ami
:
- with the help of https://github.com/petermr/openVirus/wiki/INSTALLING-ami3
- installed java.
- tested java installation by command
java -version
and got the java version 1.8 - installed JDK.
- set the path as per instruction.
- Installed git
- downloaded git
- launched git bash
[further work in progress with the help of mentor Ambreen H]