Skip to content

Latest commit

 

History

History

M1

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Milestone #1: Data Preparation

Group T01G15

Name GitHub
Bárbara Rodrigues @babs3
Rubén Monteiro @FafnirNithhoggr
Sofia Merino Costa @sophie-mc-dev
Tiago Ribeiro @TiagoMRib

Milestone Description

The first milestone is achieved with the preparation and characterisation of the datasets selected for the project. The datasets are the foundation for the project and the goal of the first task is to prepare and explore them. This task is heavily dependent on the datasets, which may require some extraction actions such as crawling or scraping.

The following list identifies the actions which are required in M1:

  • search repositories for datasets;
  • select convenient data subsets;
  • assess the authority of the data source and data quality;
  • perform exploratory data analysis;
  • prepare and document a data processing pipeline;
  • characterize the datasets, identifying and describing some of their properties (e.g., number of documents, documents characteristics, term metrics);
  • identify the conceptual model for the data domain;
  • define and characterize the documents in the final collection;
  • identify and characterize follow-up information needs for the project.

Project Details

  • Topic: Plants
  • Dataset: Scraped from Wikipedia