Description

This project introduces a concurrent application for information retrieval, using the Standard Boolean Model. More precisely, this implementation offers the possibility of parallel query processing, over the Cranfield Collection of text documents, using Atomic Memory Transactions implemented in C++.

Standard Boolean Model

Based on Boolean logic and classical set theory, the Boolean Model corresponds documents and queries to set of terms. As a result, retrieval is based on whether documents contain query terms or not.

For example, given a set of documents Doc_i and a query Q:

Doc₁ -> {word₁, word₂, word₃}
Doc₂ -> {word₂, word₃}
Doc₃ -> {word₃}
Q -> {word₁, word₂, word₃}

The Boolean model would evaluate the documents as follows:

Doc₁ -> score = 3 (contains 3 terms)
Doc₂ -> score = 2 (contains 2 terms)
Doc₃ -> score = 1 (contains 3 terms)

Cranfield Collection

The test collection of Cranfield includes 1400 abstracts of aeronautical journal articles, a set of 225 queries, and exhaustive relevance evaluations of all (query, document) pairs.

Pre-Processing

Initially, the Cranfield collection was stored in two files:

cran.all.1400, which contains 1400 abstracts of aeronautical journal articles
cran.qry, which contains 225 relevant queries

In order to facilitate parallel processing, documents and queries are splitted to 1400 text files for the documents and 225 for the queries.

Apart from splitting, the SnowballAnalyzer and StopAnalyzer classes of Apache Lucene are used for stemming and stop-words removal.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
InputFiles/Cranfield		InputFiles/Cranfield
src		src
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Standard Boolean Model

Cranfield Collection

Pre-Processing

About

Releases

Packages

Languages

License

pinac0099/information-retrieval

Folders and files

Latest commit

History

Repository files navigation

Description

Standard Boolean Model

Cranfield Collection

Pre-Processing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages