Skip to content

Review Paper 1

Shivam Marathe edited this page Aug 18, 2020 · 4 revisions

Research and reviews in question answering system, 2013

Sanjay K Dwivedi, Vaishali Singh


Introduction

The majority of Information Retrieval Systems leaves users to extract useful information from an ordered list. Users are presented with a list of relevant documents in the quest for an accurate answer.

Challenges

  • One of the challenges of existing QA systems is to understand the natural language questions correctly and deduce the precise meaning to retrieve exact responses.
  • A proper validation process is required so that the answer deduced from the system is perfect.

Three Stages

  • Question Analysis
    1. Parsing
    2. Question Classification
    3. Query Formulation
  • Document Analysis
    1. Extract candidate documents.
    2. Identity Answers
  • Answer Analysis
    1. Extract candidate answers
    2. Rank the best one.

Knowledge needed for solving generic QA systems:

  1. Artificial Intelligence
  2. Natural language processing
  3. Statistical Analysis
  4. Pattern matching
  5. Information Retrieval
  6. Information Extraction (similar to point 5)

Approaches by various systems:

  1. Linguistic Approach
  2. Statistical Approach
  3. Pattern Matching Approach

Linguistic Approach (LA)

This approach of knowledge representation is based on the production rules( similar to TOC), logics, frames, templates, ontologies,semantic networks which is analysed during QA pair analysis. Tokenization, POS tagging and Parsing are some of the techniques which are used in LA. Queries are formulated in a precise way so that they can ask for response to structured databases only. Building structured knowledge bases is a time consuming process hence this approach is used in long term information needs for a particular domain.

Disadvantages:

  1. Less portable for different grammar and mapping rules
  2. time consuming

Statistical Approach

This approach is useful for available online text repositories and web data. It also deals with large amounts of data and heterogeneity present in it. It can also create a query formulation in natural language form. It requires decent amount of data for precise statistical learning. Some of the statistical approaches which are successful are, SVM's, Bayesian Classifiers and maximum entropy models that have been used for question type classification. Statistical techniques which are used in answer finding task in QA are N-gram mining, sentence similarity models and Okapi similarity.

Disadvantages:

  1. Fails to identify linguistic features for combination of words and phrases
  2. Treat each term independently.

Pattern Matching Approach

This approach uses expressive power of text patterns. This approach is considered as a simple approach and is quite favourable in small and medium sized websites. PMA is of two types:

  1. Surface Pattern Based
  • Either hand crafted or automatic patterns are used through examples
  • The answers are extracted using statistical techniques or data mining measures.
  • Answers obtained are not in formatted form.
  1. Template Based
  • A template is preformatted for questions where entity slots are dynamically filled
  • Uses structured query to extract answer for database.
  • The answers obtained are in formatted form.

Future Scope:

There are systems available in either of the approaches or in combination of the two but if all approaches are hybridised then it can define innovation in the field of QA systems. QA systems are all about faster speed, increased relevancy, and higher precision and Recall measures.