Skip to content

Latest commit

 

History

History
366 lines (247 loc) · 38.2 KB

README.md

File metadata and controls

366 lines (247 loc) · 38.2 KB

OAQA Biomedical Question Answering (BioASQ) System

The OAQA Biomedical Question Answering (BioASQ) System aims to identify relevant documents, concepts and passages (snippets) and automatically generate exact answer texts to arbitrary biomedical questions (factoid, list, yes/no). It won the best-performing system in the BioASQ QA Challenges in the factoid and list categories two years in a row in 2015 and 2016 (see official results).

System description papers have the most details about the design and implementation of the architecture and the algorithms:

  • Zi Yang, Niloy Gupta, Xiangyu Sun, Di Xu, Chi Zhang, and Eric Nyberg. Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B. In Proceedings of CLEF 2015 Evaluation Labs and Workshop, 2015. [pdf]
  • Zi Yang, Yue Zhou, and Eric Nyberg. Learning to Answer Biomedical Questions: OAQA at BioASQ 4B. In Proceedings of Workshop on Biomedical Natural Language Processing, 2016. [pdf]

Please contact Zi Yang if you have any questions or comments.

Overview

This system uses the ECD/CSE framework (an extension to the Apache UIMA framework which support formal, declarative YAML-based descriptors for the space of system and component configurations to be explored during system optimization), BaseQA type system as well as various natural language processing and information retrieval algorithms and tools.

The system employs a three layered design for both Java source code and YAML descriptors:

Layer Description
baseqa Domain independent QA components, and the basic input/output definition of a QA pipeline, intermediate data objects, QA evaluation components, and data processing components. [source] [descriptor]
bioqa Biomedical resources that can be used in any biomedical QA task (outside the context of BioASQ). [source] [descriptor]
bioasq BioASQ-specific components, e.g. GoPubMed services. [source] [descriptor]

Each layer contains packages for each processing step, e.g. preprocess, question analysis, abstract query generation, document retrieval and reranking, concept retrieval and reranking, passage retrieval, answer type prediction, evidence gathering, answer generation and ranking. Please refer to the architecture diagrams in the system description papers

Workflow Description Diagram
Phase A Document, concept, and snippet retrieval
Phase B (factoid & list) Exact answer generation for factoid and list questions
Phase B (yes/no) Answer prediction for yes/no questions

We define the following workflow descriptors (i.e. entry points) under bioasq for preprocessing, training, evaluation, and testing the Phase A (retrieval tasks) and Phase B (factoid, list and yes/no answer generation).

Descriptor Description
preprocess-kb-cache Cache the requests and responses of concept and concept search services
preprocess-answer-type-gslabel Label gold-standard answer types
phase-a-train-concept-document Train document and concept reranking models
phase-a-train-snippet Train snippet reranking models
phase-a-evaluate, phase-a-test Evaluate (using development subset) and test (using test set) retrieval performance
phase-b-train-answer-type Train answer type prediction model for factoid and list questions
phase-b-train-answer-score Train answer scoring model for factoid and list questions
phase-b-train-answer-collective-score Train answer collective scoring model for list questions
phase-b-train-yesno Train yes/no prediction model
phase-b-evaluate-factoid-list, phase-b-test-factoid-list Evaluate (using development subset) and test (using test set) factoid and list QA
phase-b-evaluate-yesno, phase-b-test-yesno Evaluate (using development subset) and test (using test set) yes/no QA

A workflow descriptor can be executed by the ECDDriver, which has been configured as the main class in the Maven exec goal, and thus it can be executed from the command line with the config specified as the path.to.the.descriptor.

The system also depends on other types of resources, including dictionaries, pretrained machine learning models, and service related properties.

Change Notes

  • Update Lucene from version 5.5.1 to 6.2.1, which results in change of default similarity.
  • Update skr-webapi from version 0.0.4 to 0.0.6, due to an upstream API update to version 2.3.
  • Update uts-api from version 0.0.2 to 0.0.3, due to an upstream API update.
  • Update the TmTool URL to HTTPS (https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#RESTfulAPIs).
  • Bug fixes, including stability enhanced to avoid ConcurrentModificationException in LuceneDocumentScorer and ShapeDistanceCollectiveAnswerScorer, possible DuplicateKey in LuceneInMemorySentenceRetrievalExecutor, retrying if UTS service fails to obtain service ticket.

Setting Up the System

Prerequisites

This system needs to access external structured and unstructured resources for question answering and files for evaluating the system. Due to licensing issues, you may have to obtain these resources or credentials on your own. If you are a CMU OAQA person, please read the internal resource preparation instruction instead.

  • Pre-prerequisites. Java 8, Maven 3, Python 2.

  • (Recommended) UMLS license/account. The system needs to access the online UMLS services (UTS and MetaMap), which require UMLS license/account (username, password, email). You can request them from https://uts.nlm.nih.gov//license.html. Otherwise, you need to remove all the *-uts-* and *-metamap-* steps from the descriptors, which will hugely hurt the performance.

    If you want to increase the system's throughput, you may consider to download and install local instances of UMLS and MetaMap services. Currently, we only have the Web services integrated.

  • (Recommended) Medline corpus and Lucene index. The system can use a local Medline index or the GoPubMed Web API for searching the PubMed. However, we recommend a local index because the reranking component may send up to hundreds of search requests per question. Using a Web API can take forever to process one question.

    1. Download .xml.gz or .xml files from https://www.nlm.nih.gov/databases/download/pubmed_medline.html.

    2. (Optional) Check out the medline-indexer project.

    3. Create a Lucene index using the StandardAnalyzer. The index should contain three mandatory fields: pmid, abstractText, and articleTitle. We include an example Java code MedlineCitationIndexer.java that indexes .xml.gz or .xml files inside a directory.

    4. Create a sqlite database that has a pmid2abstract table with two fields pmid and abstract, which is used to fix the section label errors in the provided development set. We include an example Java code MedlineAbstractStoreBuilder.java that builds the sqlite file.

  • Biomedical ontology dumps and Lucene index. You can skip this step if you don't need relevant concept retrieval, but please also remove the concept-retrieval and concept-rerank steps from the descriptors if you do so. If you prefer using a local biomedical ontology index (recommended) to the official GoPubMed services, you need to obtain the ontology dumps and create a Lucene index.

    1. Download the ontology dumps from all the sources according to the official resources guideline document.

    2. (Optional) Check out the biomedical-concept-indexer project.

    3. Create a Lucene index. The index should contain four mandatory fields: id, name, definition, and source. Different sources of ontologies need to be adapted into the same single schema, and specify the source and id of the concept in the original ontology source. Definition and name fields are intended to be used for retrieval. We include an example Java code BiomedicalConceptIndexer.java that indexes multiple ontologies.

  • BioASQ development and test files. You will need the test files for *-test-* workflows and the development files for *-evaluate-* and *-train-* workflows. However, the official development file has various errors. We created a python script bioasq-dev-fixer.py to fix the errors, include update_year, fix_go_url, normalize_yesno_answer, listify_ideal_answer, listify_exact_answer, split_parenthesis_answer, fix_section_label, etc.

    1. Obtain the test set and development set (containing the gold-standard answers) from the BioASQ website.

    2. Install the Python editdistance package.

    3. Use the provided script to fix the formatting errors in the development file.

      python bioasq-dev-fixer.py path_to_orig_4b_dev_set path_to_pmid2abstract_db 4b-dev.json.auto.fulltext
      
    4. The resulting file should have a md5 of 8751b3a962eafb5c2aa8f09d5998fcd4.

  • (Optional) PubMed Central corpus and document service. Since the PubMed Central full text is no longer used in the evaluation from BioASQ 2016, it is not integrated into the predefined workflow descriptors. If you plan to use the PubMed Central corpus for passage retrieval (see below), you also need to download the PMC corpus and set up a document server.

    1. Download the PMC open access subset: https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/

    2. Use the BioASQTasks.jar (provided in the official preparation package prior to 2015) to convert the xml files to a single JSON Array file.

      java -jar BioASQTasks.jar
      
    3. Create a directory pmc and split the JSON Array file to individual JSON documents, each containing a single document and named by its PMID, and put into the pmc directory.

    4. Set up a HTTP document server with the resource root being the directory that contains pmc directory. Make sure you can access each document using the URL: http://HOST:PORT/pmc/DOC_ID.

Install

  1. Clone the project into a local directory.

  2. Put the test json files into the input directory, and rename them to dryrun-a.json, dryrun-b.json, 1b-1-a.json, ..., 4b-5-b.json. (Read the collection-reader.file parameter value in each descriptor to understand what the system will look for.) If you use a customized input directory and/or json file names, please change the collection-reader.file parameter in the workflow descriptor.

  3. Create the result directory under the project folder, which is used for the system final output. If you use a customized output directory, you can change the following descriptors

  4. Create the persistence directory and download the oaqa-cse.db3 file into the persistence folder. As this project uses the CSE framework, the sqlite database persists the experiment metadata, the intermediate data objects (optionally) and the evaluation results for debugging and reporting purposes. If you use a customized persistence directory and/or file name, you can create your own persistence-provider descriptor and update the persistence-provider parameters wherever used.

  5. Create concept-search-cache, metamap-cache, synonym-cache, and tmtool-cache directories under src/main/resources/ directory. If you don't need the cache, you can replace the *-cached descriptors with the non-cached versions (direct access). If you use a customized cache directories, you need to update the db-file parameter in the *-cached descriptors, including

    (Checkpoint) At this point, the project structure should look like this unless you have customized it.

    |-- bioasq/
    |   |-- input/
    |   |   |-- 1b-1-a.json
    |   |   |-- .
    |   |   |-- .
    |   |   |-- .
    |   |   |-- 4b-5-b.json
    |   |   |-- 4b-dev.json.auto.fulltext
    |   |   |-- dryrun-a.json
    |   |   |-- dryrun-b.json
    |   |   |-- one-question.json
    |   |-- persistence/
    |   |   |-- oaqa-cse.db3
    |   |-- result/
    |   |-- src/
    |   |   |-- main/
    |   |   |   |-- java/
    |   |   |   |-- resources/
    |   |   |   |   |-- baseqa/
    |   |   |   |   |-- bioasq/
    |   |   |   |   |-- bioqa/
    |   |   |   |   |-- concept-search-cache/
    |   |   |   |   |-- dictionaries/
    |   |   |   |   |-- metamap-cache/
    |   |   |   |   |-- models/
    |   |   |   |   |-- synonym-cache/
    |   |   |   |   |-- tmtool-cache/
    |   |   |   |-- script/
    
  6. Update the index parameter in the lucene-bioconcept descriptors with the path to the Lucene index for the biomedical ontology. Also, you need to change other parameters if you use customized field names. Remove the .template suffix from the file names, including

  7. Update the index parameter in the lucene-medline descriptors with the path to the Lucene Medline index. Also, you need to change other parameters if you use customized field names. Remove the .template suffix from the file names, including

  8. Update the version, username, password, and email parameters in the uts and metamap related providers, and remove the .template suffix from the file names, including

    Note that the version parameter takes a string value, which means you have to add single or double quotes around the metamap version number, e.g.'1516'. Otherwise, YAML would intepret 1516 as an integer.

  9. Install the dependencies and compile the resources via Maven:

    mvn clean compile
    

    When you see BUILD SUCCESS, the installation is done.

Test on BioASQ Test Set

  1. (Optional, Recommended) Execute the preprocess-kb-cache workflow if you haven't done yet:

    mvn exec:exec -Dconfig=bioasq.preprocess-kb-cache
    

    At the end of the execution, you should see mapdb files generated in the *-cache directories. This step could be extremely slow (> 10 hours) depending on the workload on the UTS/MetaMap servers.

  2. Execute any *-test-* workflow descriptor to test the pipeline:

    mvn exec:exec -Dconfig=bioasq.phase-a-test
    mvn exec:exec -Dconfig=bioasq.phase-b-test-factoid-list
    mvn exec:exec -Dconfig=bioasq.phase-b-test-yest-no
    
  3. You should see the output in the result directory at the end of each execution.

Evaluate on BioASQ Test Set

The common evaluation metrics are defined in the BaseQA project's eval package. The system extends the evaluation metrics for the BioASQ task in the eval package. All the *-evaluate-* descriptors add additional post-processing steps to generate the evaluation results automatically.

  1. Put the 4b-dev.json.auto.fulltext file under the directory input if you haven't done yet. If you use a customized directory and/or file name, you need to change the resources/bioasq/gs/bioasq-qa-decorator.yaml descriptor content accordingly.

  2. (Optional, Recommended) Execute the preprocess-kb-cache workflow if you haven't done yet, and at the end of the execution, you should see mapdb files generated in the *-cache directories.

  3. Execute any *-evaluate-* workflow descriptor to test the pipeline.

    mvn exec:exec -Dconfig=bioasq.phase-a-evaluate
    mvn exec:exec -Dconfig=bioasq.phase-b-evaluate-factoid-list
    mvn exec:exec -Dconfig=bioasq.phase-b-evaluate-yest-no
    
  4. You should see the evaluation results at the end of the execution in the console.

    For example,

    Experiment: 8f5876cc-7dcf-41c2-9da3-7fe841ae92d9:1
    traceId,Answer/Answer/YESNO_COUNT,Answer/Answer/YESNO_MEAN_ACCURACY,Answer/Answer/YESNO_MEAN_NEG_ACCURACY,Answer/Answer/YESNO_MEAN_POS_ACCURACY
    1|QuestionParser[inherit:bioqa.question.parse.clearnlp-bioinformatics#parser-provider:inherit: bioqa.providers.parser.clearnlp-bioinformatics]>2|QuestionConceptRecognizer[inherit:bioqa.question.concept.metamap-cached#concept-provider:inherit: bioqa.providers.kb.metamap-cached]>3|QuestionConceptRecognizer[inherit:bioqa.question.concept.tmtool-cached#concept-provider:inherit: bioqa.providers.kb.tmtool-cached]>4|QuestionConceptRecognizer[inherit:bioqa.question.concept.lingpipe-genia#concept-provider:inherit: bioqa.providers.kb.lingpipe-genia]>5|PassageToViewCopier[inherit:baseqa.evidence.passage-to-view#view-name-prefix:ptv]>6|PassageParser[inherit:bioqa.evidence.parse.clearnlp-bioinformatics#parser-provider:inherit: bioqa.providers.parser.clearnlp-bioinformatics#view-name-prefix:ptv]>7|PassageConceptRecognizer[inherit:bioqa.evidence.concept.metamap-cached#allowed-concept-types:/dictionaries/allowed-umls-types.txt#concept-provider:inherit: bioqa.providers.kb.metamap-cached#view-name-prefix:ptv]>8|PassageConceptRecognizer[inherit:bioqa.evidence.concept.tmtool-cached#concept-provider:inherit: bioqa.providers.kb.tmtool-cached#view-name-prefix:ptv]>9|PassageConceptRecognizer[inherit:bioqa.evidence.concept.lingpipe-genia#concept-provider:inherit: bioqa.providers.kb.lingpipe-genia#view-name-prefix:ptv]>10|PassageConceptRecognizer[inherit:baseqa.evidence.concept.frequent-phrase#concept-provider:inherit: baseqa.providers.kb.frequent-phrase#view-name-prefix:ptv]>11|ConceptSearcher[inherit:bioqa.evidence.concept.search-uts-cached#concept-search-provider:inherit: bioqa.providers.kb.concept-search-uts-cached#synonym-expansion-provider:inherit: bioqa.providers.kb.synonym-uts-cached]>12|ConceptMerger[inherit:baseqa.evidence.concept.merge#include-default-view:true#view-name-prefix:ptv#use-name:true]>13|YesNoAnswerPredictor[inherit:bioqa.answer.yesno.liblinear-predict#classifier:inherit: bioqa.answer.yesno.liblinear#feature-file:result/answer-yesno-predict-liblinear.tsv#scorers:- inherit: baseqa.answer.yesno.scorers.concept-overlap - inherit: bioqa.answer.yesno.scorers.token-overlap - inherit: baseqa.answer.yesno.scorers.expected-answer-overlap - inherit: baseqa.answer.yesno.scorers.sentiment - inherit: baseqa.answer.yesno.scorers.negation - inherit: bioqa.answer.yesno.scorers.alternate-answer ],28.0000,0.5714,0.3333,0.6842
    1|QuestionParser[inherit:bioqa.question.parse.clearnlp-bioinformatics#parser-provider:inherit: bioqa.providers.parser.clearnlp-bioinformatics]>2|QuestionConceptRecognizer[inherit:bioqa.question.concept.metamap-cached#concept-provider:inherit: bioqa.providers.kb.metamap-cached]>3|QuestionConceptRecognizer[inherit:bioqa.question.concept.tmtool-cached#concept-provider:inherit: bioqa.providers.kb.tmtool-cached]>4|QuestionConceptRecognizer[inherit:bioqa.question.concept.lingpipe-genia#concept-provider:inherit: bioqa.providers.kb.lingpipe-genia]>5|PassageToViewCopier[inherit:baseqa.evidence.passage-to-view#view-name-prefix:ptv]>6|PassageParser[inherit:bioqa.evidence.parse.clearnlp-bioinformatics#parser-provider:inherit: bioqa.providers.parser.clearnlp-bioinformatics#view-name-prefix:ptv]>7|PassageConceptRecognizer[inherit:bioqa.evidence.concept.metamap-cached#allowed-concept-types:/dictionaries/allowed-umls-types.txt#concept-provider:inherit: bioqa.providers.kb.metamap-cached#view-name-prefix:ptv]>8|PassageConceptRecognizer[inherit:bioqa.evidence.concept.tmtool-cached#concept-provider:inherit: bioqa.providers.kb.tmtool-cached#view-name-prefix:ptv]>9|PassageConceptRecognizer[inherit:bioqa.evidence.concept.lingpipe-genia#concept-provider:inherit: bioqa.providers.kb.lingpipe-genia#view-name-prefix:ptv]>10|PassageConceptRecognizer[inherit:baseqa.evidence.concept.frequent-phrase#concept-provider:inherit: baseqa.providers.kb.frequent-phrase#view-name-prefix:ptv]>11|ConceptSearcher[inherit:bioqa.evidence.concept.search-uts-cached#concept-search-provider:inherit: bioqa.providers.kb.concept-search-uts-cached#synonym-expansion-provider:inherit: bioqa.providers.kb.synonym-uts-cached]>12|ConceptMerger[inherit:baseqa.evidence.concept.merge#include-default-view:true#view-name-prefix:ptv#use-name:true]>13|AllYesYesNoAnswerPredictor[inherit:baseqa.answer.yesno.all-yes],28.0000,0.6786,0.0000,1.0000
    1|QuestionParser[inherit:bioqa.question.parse.clearnlp-bioinformatics#parser-provider:inherit: bioqa.providers.parser.clearnlp-bioinformatics]>2|QuestionConceptRecognizer[inherit:bioqa.question.concept.metamap-cached#concept-provider:inherit: bioqa.providers.kb.metamap-cached]>3|QuestionConceptRecognizer[inherit:bioqa.question.concept.tmtool-cached#concept-provider:inherit: bioqa.providers.kb.tmtool-cached]>4|QuestionConceptRecognizer[inherit:bioqa.question.concept.lingpipe-genia#concept-provider:inherit: bioqa.providers.kb.lingpipe-genia]>5|PassageToViewCopier[inherit:baseqa.evidence.passage-to-view#view-name-prefix:ptv]>6|PassageParser[inherit:bioqa.evidence.parse.clearnlp-bioinformatics#parser-provider:inherit: bioqa.providers.parser.clearnlp-bioinformatics#view-name-prefix:ptv]>7|PassageConceptRecognizer[inherit:bioqa.evidence.concept.metamap-cached#allowed-concept-types:/dictionaries/allowed-umls-types.txt#concept-provider:inherit: bioqa.providers.kb.metamap-cached#view-name-prefix:ptv]>8|PassageConceptRecognizer[inherit:bioqa.evidence.concept.tmtool-cached#concept-provider:inherit: bioqa.providers.kb.tmtool-cached#view-name-prefix:ptv]>9|PassageConceptRecognizer[inherit:bioqa.evidence.concept.lingpipe-genia#concept-provider:inherit: bioqa.providers.kb.lingpipe-genia#view-name-prefix:ptv]>10|PassageConceptRecognizer[inherit:baseqa.evidence.concept.frequent-phrase#concept-provider:inherit: baseqa.providers.kb.frequent-phrase#view-name-prefix:ptv]>11|ConceptSearcher[inherit:bioqa.evidence.concept.search-uts-cached#concept-search-provider:inherit: bioqa.providers.kb.concept-search-uts-cached#synonym-expansion-provider:inherit: bioqa.providers.kb.synonym-uts-cached]>12|ConceptMerger[inherit:baseqa.evidence.concept.merge#include-default-view:true#view-name-prefix:ptv#use-name:true]>13|YesNoAnswerPredictor[inherit:bioqa.answer.yesno.weka-logistic-predict#classifier:inherit: bioqa.answer.yesno.weka-logistic#feature-file:result/answer-yesno-predict-weka-logistic.tsv#scorers:- inherit: baseqa.answer.yesno.scorers.concept-overlap - inherit: bioqa.answer.yesno.scorers.token-overlap - inherit: baseqa.answer.yesno.scorers.expected-answer-overlap - inherit: baseqa.answer.yesno.scorers.sentiment - inherit: baseqa.answer.yesno.scorers.negation - inherit: bioqa.answer.yesno.scorers.alternate-answer ],28.0000,0.6429,0.2222,0.8421
    1|QuestionParser[inherit:bioqa.question.parse.clearnlp-bioinformatics#parser-provider:inherit: bioqa.providers.parser.clearnlp-bioinformatics]>2|QuestionConceptRecognizer[inherit:bioqa.question.concept.metamap-cached#concept-provider:inherit: bioqa.providers.kb.metamap-cached]>3|QuestionConceptRecognizer[inherit:bioqa.question.concept.tmtool-cached#concept-provider:inherit: bioqa.providers.kb.tmtool-cached]>4|QuestionConceptRecognizer[inherit:bioqa.question.concept.lingpipe-genia#concept-provider:inherit: bioqa.providers.kb.lingpipe-genia]>5|PassageToViewCopier[inherit:baseqa.evidence.passage-to-view#view-name-prefix:ptv]>6|PassageParser[inherit:bioqa.evidence.parse.clearnlp-bioinformatics#parser-provider:inherit: bioqa.providers.parser.clearnlp-bioinformatics#view-name-prefix:ptv]>7|PassageConceptRecognizer[inherit:bioqa.evidence.concept.metamap-cached#allowed-concept-types:/dictionaries/allowed-umls-types.txt#concept-provider:inherit: bioqa.providers.kb.metamap-cached#view-name-prefix:ptv]>8|PassageConceptRecognizer[inherit:bioqa.evidence.concept.tmtool-cached#concept-provider:inherit: bioqa.providers.kb.tmtool-cached#view-name-prefix:ptv]>9|PassageConceptRecognizer[inherit:bioqa.evidence.concept.lingpipe-genia#concept-provider:inherit: bioqa.providers.kb.lingpipe-genia#view-name-prefix:ptv]>10|PassageConceptRecognizer[inherit:baseqa.evidence.concept.frequent-phrase#concept-provider:inherit: baseqa.providers.kb.frequent-phrase#view-name-prefix:ptv]>11|ConceptSearcher[inherit:bioqa.evidence.concept.search-uts-cached#concept-search-provider:inherit: bioqa.providers.kb.concept-search-uts-cached#synonym-expansion-provider:inherit: bioqa.providers.kb.synonym-uts-cached]>12|ConceptMerger[inherit:baseqa.evidence.concept.merge#include-default-view:true#view-name-prefix:ptv#use-name:true]>13|YesNoAnswerPredictor[inherit:bioqa.answer.yesno.weka-cvr-predict#classifier:inherit: bioqa.answer.yesno.weka-cvr#feature-file:result/answer-yesno-predict-weka-cvr.tsv#scorers:- inherit: baseqa.answer.yesno.scorers.concept-overlap - inherit: bioqa.answer.yesno.scorers.token-overlap - inherit: baseqa.answer.yesno.scorers.expected-answer-overlap - inherit: baseqa.answer.yesno.scorers.sentiment - inherit: baseqa.answer.yesno.scorers.negation - inherit: bioqa.answer.yesno.scorers.alternate-answer ],28.0000,0.6071,0.6667,0.5789
    

    For better visualization, you can split the lines into cells using the comma separators, like this:

    traceId COUNT ACCURACY NEG_ACCURACY POS_ACCURACY
    `...>13 YesNoAnswerPredictor[inherit:bioqa.answer.yesno.liblinear-predict#classifier:inherit: bioqa.answer.yesno.liblinear#feature-file:result/answer-yesno-predict-liblinear.tsv#scorers:- inherit: baseqa.answer.yesno.scorers.concept-overlap - inherit: bioqa.answer.yesno.scorers.token-overlap - inherit: baseqa.answer.yesno.scorers.expected-answer-overlap - inherit: baseqa.answer.yesno.scorers.sentiment - inherit: baseqa.answer.yesno.scorers.negation - inherit: bioqa.answer.yesno.scorers.alternate-answer ]` 28.0000 0.5714 0.3333
    `...>13 AllYesYesNoAnswerPredictor[inherit:baseqa.answer.yesno.all-yes]` 28.0000 0.6786 0.0000
    `...>13 YesNoAnswerPredictor[inherit:bioqa.answer.yesno.weka-logistic-predict#classifier:inherit: bioqa.answer.yesno.weka-logistic#feature-file:result/answer-yesno-predict-weka-logistic.tsv#scorers:- inherit: baseqa.answer.yesno.scorers.concept-overlap - inherit: bioqa.answer.yesno.scorers.token-overlap - inherit: baseqa.answer.yesno.scorers.expected-answer-overlap - inherit: baseqa.answer.yesno.scorers.sentiment - inherit: baseqa.answer.yesno.scorers.negation - inherit: bioqa.answer.yesno.scorers.alternate-answer ]` 28.0000 0.6429 0.2222
    `...>13 YesNoAnswerPredictor[inherit:bioqa.answer.yesno.weka-cvr-predict#classifier:inherit: bioqa.answer.yesno.weka-cvr#feature-file:result/answer-yesno-predict-weka-cvr.tsv#scorers:- inherit: baseqa.answer.yesno.scorers.concept-overlap - inherit: bioqa.answer.yesno.scorers.token-overlap - inherit: baseqa.answer.yesno.scorers.expected-answer-overlap - inherit: baseqa.answer.yesno.scorers.sentiment - inherit: baseqa.answer.yesno.scorers.negation - inherit: bioqa.answer.yesno.scorers.alternate-answer ]` 28.0000 0.6071 0.6667

Retrain the Models

The system includes pretrained models using the predefined *-train-* descriptors (i.e. 4b-dev set minus 3b-5 test set). However, if you plan to retrain the models, you can follow these steps. Please be aware that the models are saved under resources/models, and loaded from classpath directly, which means you might want to recompile the project using mvn clean compile to copy the newly generated models into the target directory between the training processes, so that the next training can use the models from the previous one.

  1. Put the 4b-dev.json.auto.fulltext file under the directory input if you haven't done so. If you use a customized directory and/or gold-standard file, you need to change the resources/bioasq/gs/bioasq-qa-decorator.yaml descriptor content accordingly.

  2. (Optional, Recommended) Execute the preprocess-kb-cache workflow if you haven't done yet, and at the end of the execution, you should see mapdb files generated in the *-cache directories.

  3. Execute the preprocess-answer-type-gslabel workflow if you haven't done yet, and at the end of the execution, you should see 4b-dev-gslabel-tmtool.json and 4b-dev-gslabel-uts.json files generated in the resources/models/bioqa/answer_type directories.

    mvn clean compile exec:exec -Dconfig=bioasq.preprocess-answer-type-gslabel
    

    This step could take about 30 minutes.

  4. Training Phase A requires execution of phase-a-train-concept-document before phase-a-train-snippet.

    mvn clean compile exec:exec -Dconfig=bioasq.phase-a-train-concept-document
    mvn clean compile exec:exec -Dconfig=bioasq.phase-a-train-snippet
    

    Executing phase-a-train-concept-document could take 3-4 hours, and executing phase-a-train-snippet could take 80 minutes.

    Training Phase B factoid and list QA requires execution of phase-b-train-answer-type first, then phase-b-train-answer-score, and finally phase-b-train-answer-collective-score.

    mvn clean compile exec:exec -Dconfig=bioasq.phase-b-train-answer-type
    mvn clean compile exec:exec -Dconfig=bioasq.phase-b-train-answer-score
    mvn clean compile exec:exec -Dconfig=bioasq.phase-b-train-answer-collective-score
    

    Executing phase-b-train-answer-type or phase-b-train-answer-score could take 30 minutes each. Executing phase-b-train-answer-collective-score could take 10 minutes.

    Training Phase B yes/no QA requires execution of phase-b-train-yesno.

    mvn clean compile exec:exec -Dconfig=bioasq.phase-b-train-yesno
    

    Executing phase-b-train-answer-collective-score could take about 10 minutes.

  5. You should see cross-validation results at the end of each training.

Test on Arbitrary Biomedical Questions

You can use your own biomedical questions to test the system in either Phase A or Phase B, similar to testing on BioASQ test set.

For Phase A

  1. You can refer to the input/one-question.json file, and update the question.

    {
      "questions": [
        {
          "body": "What is the role of MMP-1 in breast cancer?",
          "type": "factoid",
          "id": "0"
        }
      ]
    }
  2. You need to change the collection-reader.file parameter to input/one-question.json in the phase-a-test descriptor to test Phase A.

For Phase B

  1. You need to manually add relevant snippets to the input/one-question.json file, similar to the Phase B test file (i.e. *b-*-b.json).

  2. You need to change the collection-reader.file parameter to input/one-question.json in the phase-b-test-factoid-list descriptor to test Phase B.

We are working on testing an end-to-end QA system that combines Phase A and Phase B workflows. You may also creatively combine the steps from both descriptors on your own.

(Advanced, Optional) Use the PubMed Central Content

Since the PubMed Central full text is not used in the evaluation from BioASQ 2016, it is not integrated into the predefined workflow descriptors. However, you can still use it for relevant passage retrieval.

  1. Make sure you have the PubMed Central full text and document server.

  2. Update the url-format parameter in the resources/bioasq/passage/pmc-content.yaml.template with the PubMed Central document server URL, and remove the .template suffix from the file name.

  3. Add the pmc-content step after the document-retrieval/document-rerank step, but before passage-retrieval step, in the descriptor.

(Advanced, Optional) Use a Local or Proxy GoPubMed Server

The official GoPubMed is sometimes slow. If you use a local or proxy GoPubMed server different from the official server, as those specified in the properties folder, and you plan to use the GoPubMed components, which are not used the predefined workflow descriptors, you can change the conf parameter in the gopubmed related descriptors, including

Component Development

The system is far from perfect, and it needs tuning and component development. In addition to the system description papers, you may also read the UIMA and OAQA Tutorial to get familiar with the UIMA/ECD/CSE frameworks used by this system.

Acknowledgement

We thank Ying Li, Xing Yang, Venus So, James Cai and the other team members at Roche Innovation Center New York for their support of OAQA and biomedical question answering research and development.

License

This project is licensed under the Apache License ver 2.0 - see the LICENSE.txt file for details. However, please note that some third-party dependencies may be licensed differently.