Skip to content

Latest commit

 

History

History
58 lines (48 loc) · 1.61 KB

README-meetups-corpus.md

File metadata and controls

58 lines (48 loc) · 1.61 KB
component-id name description type release-date release-number project resource work-package pilot licence release link contributors related-components
meetups-corpus
MEETUPS Corpus
This repository contains the corpus of people in the music scene in Europe
Corpus
20/07/2022
v1.0
polifonia-project
WP4
MEETUPS
Apache-2.0
generated-by
meetups-corpus-collection

MEETUPS Corpus collection

DOI

Collecting Wikipedia pages of people in the music scene in Europe

Details of dataset

SPARQL queries to retrieve authors' names and dbo:wikiPageID information using Dbpedia SPARQL Endpoint https://dbpedia.org/sparql

Query filters:

Categories: <http://dbpedia.org/resource/Category:Music_people>
            <http://dbpedia.org/resource/Category:People
Location:
            sparqlQueryResults/query.sparql
Query results"
            sparqlQueryResults/Q<1>_sparql.csv

Dataset:

Location:
            dataset/
Format:
            Text files .txt
Name convention:
            <Author_wikiPageID>.txt
Total biographies collected: 
            33,309 authors wikipedia webpage
Summary total biographies collected: 
            sparqlQueryResults/TOTAL_download_biography.csv
Meetups pilot sample: 1.002

Select random biographies -> sampleBiographies.py