This gem crawls and parses the the list of lobbyists which is published as a PDF by the German Bundestag. Our goal is to provide a simple and easy to maintain parser.
Add this line to your application's Gemfile:
gem 'lobbyliste'
And then execute:
$ bundle
Or install it yourself as:
$ gem install lobbyliste
NOTE: This gem requires JAVA to be installed. We use PDFBox for PDF extraction as this currently seems to be the best alternative
require 'lobbyliste'
list = Lobbyliste.fetch_and_parse
organisation = list.organisations.first
organisation.name #=> 1219. Deutsche Stiftung für interreligiösen und interkulturellen Dialog e. V.
organisation.people.map {|person| person.name} #=> ["Claudius Groß", "Markus Hoymann", "Thomas M. Schimmel"]
organisation.tags #=> ["Kultur", "Religion"]
organisation.abbreviations #=> []
address = organisation.address
puts address.full_address
# 1219. Deutsche Stiftung für interreligiösen und #interkulturellen Dialog e. V.
# Hinter der katholischen Kirche 3
# 10117 Berlin
# Deutschland
# Tel: +4930 51057773
# Fax: +4930 51057785
# Email: [email protected]
# http://www.1219.eu
You can also use this gem on your comandline. It will dump the complete list as JSON
For example to create a gziped json file run:
$ lobbyliste | gzip > lobbyliste.json.gz
- Sebastian Vollnhals (@yetzt) - for his excellent node based scraper for the lobbyliste (https://github.com/yetzt/scraper-lobbyliste) from which many lines were reused.
- Apache PDFBox project for making working with PDFs a lot easier. PDFBox jar file is contained under ext folder (including license information).
Bug reports and pull requests are welcome on GitHub at https://github.com/FHG-IMW/lobbyliste. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
The gem is available as open source under the terms of the MIT License.