Skip to content

CommonActionForum/liqen-scraper

 
 

Repository files navigation

Build Status Coverage Status

Liqen Scrapper 2

Find news and get the relevant information of them.

This project uses

  1. Google Custom Search to search into the medias websites.
  2. Scraping techniques to extract the content of an article.

Usage

This package includes 2 functions that can be used together or separately:

  • googleSearch(term, options) => Promise<Object> to perform a Google Search
  • downloadArticle(uri) => Promise<Object> to parse an article

Examples

Using only googleSearch

const { googleSearch } = require('liqen-scrapper')

const options = {
  apiKey: 'MY_GOOGLE_API_KEY',
  cx: 'MY_CX'
}

googleSearch('climate change', options)
  .then(result => result.items)
  .then(items => items.forEach(item => {
    console.log(item.title)
    console.log(item.link)
  }))

Using only downloadArticle

const { downloadArticle } = require('liqen-scrapper')

  .then(article => {
    console.log(article.metadata.title)
    console.log(article.body.html.slice(0, 80))
    downloadArticle('http://cultura.elpais.com/cultura/2017/02/08/actualidad/1486573775_868895.html')
  })

Using both functions together

const { googleSearch, downloadArticle } = require('liqen-scrapper')
const options = {
  apiKey: 'MY_GOOGLE_API_KEY',
  cx: 'MY_CX'
}
const promiseOfArticles = googleSearch('climate change', options)
  .then(result => result.items.map(item => item.link))
  .then(links => links.map(downloadArticle))

Promise.all(promiseOfArticles)
  .then(articles => articles.map(article => article.body.html))
  .then(bodies => {
    bodies.forEach(body => {
      console.log(body.slice(0,80))
    })
  })

docs

See /docs directory for more docs

About

tool to collect news about environmental issues

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 99.4%
  • JavaScript 0.6%