Skip to content

Resource Source

Carlos Badenes edited this page Mar 31, 2016 · 1 revision

A source is a repository that may contains documents. It could be static or dynamic.

  • static source: repository that will no change along the time. So, once it is processed, new information never will be available from it again.
  • dynamic source: repository that may will have new documents in the future. This type of resources will be continuously polled by the hoarder module.

source

Range

A source contains zero or more documents.

Domain

One or more sources are contained in a document.

Examples

  • A single file (static repository):
 http://world.std.com/~rjs/indinf56.pdf
  • A closed time-based expression for a digital repository (static repository):
http://www.worldsciencepublisher.org/journals/index.php/AASS/oai?from=2012-01-01T00:00:00Z&until=2013-01-01T00:00:00Z
  • A digital publisher (dynamic repository):
http://oa.upm.es/perl/oai2
  • A RSS feed (dynamic repository):
http://rss.slashdot.org/Slashdot/slashdot
  • A remote directory (dynamic repository):
//192.168.5.125/Public
  • A web page (dynamic repository):
https://en.wikipedia.org/wiki/Artificial_intelligence

Currently Supported

  • Open Archives Initiative Protocol for Metadata Harvesting (AOI-PMH): http://www.openarchives.org/
  • Really Simple Syndication (RSS): http://www.rssboard.org/rss-specification

Future Integrations

  • CIFS/SMB:
smb://[email protected]/sharename
  • FTP/FTPS:
ftp://[username@]hostname[:port]/directoryname[?options]
  • Dropbox:
https://www.dropbox.com/developers
  • Websites:
http://papers.nips.cc/
  • Elsevier API:
http://dev.elsevier.com/
  • Figshare API:
http://api.figshare.com/docs/intro.html
  • arXiv API:
http://arxiv.org/help/api/index
  • DBLP Corpora:
http://dblp.uni-trier.de/faq/How+can+I+download+the+whole+dblp+dataset
Clone this wiki locally