Skip to content

Commit

Permalink
Merge pull request #90 from niryariv/social_poster
Browse files Browse the repository at this point in the history
Social poster to Facebook and Twitter
  • Loading branch information
niryariv committed Jan 10, 2015
2 parents 874cb28 + 0a7322a commit 1fab235
Show file tree
Hide file tree
Showing 22 changed files with 309 additions and 108 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ install:
- pip install Flask==$FLASK

before_script:
- python tools/create_db.py --force -m all
- python scripts/create_db.py --force -m all
- mkdir filecache
- chmod -R 0777 filecache
- python scrape.py -g 30649
Expand Down
28 changes: 23 additions & 5 deletions DEPLOYMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ you are deploying)
To deploy a server/database for a new municipality, follow these steps:
1. Make sure the GeoJSON map file with the name of the municipality has
been added to the [map repository](http://github.com/niryariv/israel_gushim)
2. Run `fab create_server:holon,"חולון"`. This will add the new gush ids to the tools/gushim.py file, create & configure the new Heroku app / MongoDB, and finally run the scraper to get all municipality's plans.
2. Run `fab create_server:holon,"חולון"`. This will add the new gush ids to the lib/gushim.py file, create & configure the new Heroku app / MongoDB, and finally run the scraper to get all municipality's plans.
3. When the task finishes running, a browser window (or tab) will be open with
the new app's scheduler dashboard. Add a new scheduled task with the
command: `python scrape.py -g all ; python worker.py`. Do not change dyno settings.
Expand All @@ -23,6 +23,21 @@ To deploy a new municipality, run: `fab create_client:holon,"חולון"` after
To change client configuration, you can edit `munis.js` manually later on, according to the [Municipality
Index File syntax](http://github.com/niryariv/opentaba-client/blob/master/DEPLOYMENT.md#municipality-index-file).

##Automatic Facebook and Twitter Posting
The server is able to post a plan's content to a Facebook page and Twitter feed every time a plan is created or updated, using a running instance of [opentaba-poster](https://github.com/florpor/opentaba-poster).
To enable this feature, environment variables need to be set on the server with things like access tokens, consumer keys etc.
You can enable Facebook only, Twitter only or both.

###Environment Variables
####Poster
To enable social posting, we must be configured to work with an instance of [opentaba-poster](https://github.com/florpor/opentaba-poster).
To do that, we must make sure we are defined as a poster on the opentaba-poster app, and then set two environment variables -
`POSTER_SERVICE_URL` must be set to the url of the opentaba-poster app, and `POSTER_ID` must be set to our assigned id, eg:
```
heroku config:set POSTER_SERVICE_URL="http://poster.service.com/" --app opentaba-server-holon
heroku config:set POSTER_ID="holon_id" --app opentaba-server-holon
```

##All Fabric Tasks
###Server
+ `fab create_server:muni_name, "display_name"`
Expand All @@ -37,9 +52,9 @@ To change client configuration, you can edit `munis.js` manually later on, accor
ignore_errors is set to false by default because if this task fails it most
likely means the app does not exist to begin with.

+ `fab update_gushim_server:muni_name` Update the [tools/gushim.py](tools/gushim.py) file with the
+ `fab update_gushim_server:muni_name` Update the [lib/gushim.py](lib/gushim.py) file with the
gushim of a new municipality or the updated ones of an existing municipality.
This task downloads the gush map file from [israel_gushim](http://github.com/niryariv/israel_gushim), parses its data, and if there are new gushim it updates the [tools/gushim.py](tools/gushim.py) file and the
This task downloads the gush map file from [israel_gushim](http://github.com/niryariv/israel_gushim), parses its data, and if there are new gushim it updates the [lib/gushim.py](lib/gushim.py) file and the
[Tests/functional_tests/test_return_json.py](Tests/functional_tests/test_return_json.py) file (with the new amount of gushim), commits and pushes on the master branch. Note that this task does not deploy
anywhere, and the new gushim data will not exist on active servers until you
deploy changes to them.
Expand All @@ -50,10 +65,10 @@ To change client configuration, you can edit `munis.js` manually later on, accor
+ `fab deploy_server_all` Find servers by looking at your `heroku list` and filtering
out the ones that don't match our server name pattern. Run deploy_server task
on each of the discovered servers.
+ `fab create_db:muni_name` Run the [tools/create_db.py](tools/create_db.py) script on the given
+ `fab create_db:muni_name` Run the [scripts/create_db.py](scripts/create_db.py) script on the given
municipality's heroku app. Will only create db for the given municipality's
gushim.
+ `fab update_db:muni_name` Run the [tools/update_db.py](tools/update_db.py) script on the given
+ `fab update_db:muni_name` Run the [scripts/update_db.py](scripts/update_db.py) script on the given
municipality's heroku app. Will only update db for the given municipality's
gushim.
+ `fab scrape:muni_name,<show_output=False|True>` Run the [scrape.py](scrape.py) script on the
Expand All @@ -66,6 +81,9 @@ To change client configuration, you can edit `munis.js` manually later on, accor
+ `fab refresh_db:muni_name` Update the DB with new gushim via update_db and run scrape tasks
+ `fab refresh_db_all` Find servers by looing at your `heroku list` and filtering
by our naming pattern. Run the refresh_db task on each one discovered.
+ `fab sync_poster:muni_name,min_date` Run the [scripts/sync_poster.py](scripts/sync_poster.py) script on the given
municipality's heroku app. min_date is the minimum date of plans to post,
and should be of the format: 1/1/2015.

###Client
+ `fab create_client:muni_name,"display_name"` For client creation, all we need
Expand Down
2 changes: 1 addition & 1 deletion Tests/functional_tests/test_return_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def test_api_get_plan():
eq_(response.mimetype, 'application/json')

# I don't know the correct number, since it's changes with each update, but it should be more then this
assert_true(len(j) >= 19)
assert_true(len(j) >= 17)


def test_api_wakeup():
Expand Down
2 changes: 1 addition & 1 deletion Tests/unit_test/test_scrape.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from app import app
from nose.tools import eq_, assert_true
from nose import with_setup
from tools.scrapelib import scrape_gush
from lib.scrapelib import scrape_gush
import os

testapp = app.test_client()
Expand Down
97 changes: 14 additions & 83 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,79 +12,14 @@
from flask import Flask
from flask import abort, make_response, request

from tools.conn import *
from tools.gushim import GUSHIM
from tools.cache import cached, _setup_cache
from lib.conn import *
from lib.cache import cached, _setup_cache
import lib.helpers as helpers

app = Flask(__name__)
app.debug = RUNNING_LOCAL # if we're local, keep debug on


#### Helpers ####

def _get_plans(count=1000, query={}):
return list(db.plans.find(query, limit=count).sort(
[("year", pymongo.DESCENDING), ("month", pymongo.DESCENDING), ("day", pymongo.DESCENDING)]))


def _get_gushim(query={}, fields=None):
return list(db.gushim.find(query, fields=fields))


def _create_response_json(data):
"""
Convert dictionary to JSON. json_util.default adds automatic mongoDB result support
"""
r = make_response(json.dumps(data, ensure_ascii=False, default=json_util.default))
r.headers['Access-Control-Allow-Origin'] = "*"
r.headers['Content-Type'] = "application/json; charset=utf-8"
return r


def _create_response_atom_feed(request, plans, feed_title=''):
"""
Create an atom feed of plans fetched from the DB based on an optional query
"""
feed = AtomFeed(feed_title, feed_url=request.url, url=request.url_root)

for p in plans:
url = p['details_link']

# special emphasizing for some statuses
if p['status'] in [u'פרסום ההפקדה', u'פרסום בעיתונות להפקדה']:
status = u'»»%s««' % p['status']
else:
status = p['status']

content = p['essence'] + ' [' + status + ', ' + '%02d/%02d/%04d' % (p['day'], p['month'], p['year']) + \
', ' + p['number'] + ']'
title = p['location_string']
# 'not title' is not supposed to happen anymore because every plan currently has a location
if not title:
title = p['number']

if p['mavat_code'] == '':
links = [{'href' : 'http://www.mavat.moin.gov.il/MavatPS/Forms/SV3.aspx?tid=4&tnumb=' + p['number'], 'rel': 'related', 'title': u'מבא"ת'}]
else:
links = [{'href': '%splan/%s/mavat' % (request.url_root, p['plan_id']), 'rel': 'related', 'title': u'מבא"ת'}]

feed.add(
title=title,
content=content,
content_type='html',
author="OpenTABA.info",
# id=url + '&status=' + p['status'],
# ^^ it seems like the &tblView= value keeps changing in the URL, which causes the ID to change and dlvr.it to republish items.
id="%s-%s" % (title, p['status']),
# this is a unique ID (not real URL) so adding status to ensure uniqueness in TBA stages
url=url,
links=links,
updated=datetime.date(p['year'], p['month'], p['day'])
)

return feed


#### Cache Helper ####

@app.before_first_request
Expand All @@ -105,18 +40,14 @@ def get_gushim():
get gush_id metadata
"""
detailed = request.args.get('detailed', '') == 'true'
gushim = _get_gushim(fields={'gush_id': True, 'last_checked_at': True, '_id': False})
gushim = helpers._get_gushim(fields={'gush_id': True, 'last_checked_at': True, '_id': False})
if detailed:
# Flatten list of gushim into a dict
g_flat = dict((g['gush_id'], {"gush_id": g['gush_id'],
"last_checked_at": g['last_checked_at'],
"plan_stats": {}}) for g in gushim)
# Get plan statistics from DB
stats = db.plans.aggregate([
{"$unwind" : "$gushim" },
{"$project": {"gush_id": "$gushim", "status": "$status", "_id": 0}},
{"$group": {"_id": {"gush_id": "$gush_id", "status": "$status"}, "count": {"$sum": 1}}}
])
stats = helpers._get_plan_statistics()

# Merge stats into gushim dict
for g in stats['result']:
Expand All @@ -132,7 +63,7 @@ def get_gushim():
# De-flatten our dict
gushim = g_flat.values()

return _create_response_json(gushim)
return helpers._create_response_json(gushim)


@app.route('/gush/<gush_id>.json')
Expand All @@ -141,10 +72,10 @@ def get_gush(gush_id):
"""
get gush_id metadata
"""
gush = _get_gushim(query={"gush_id": gush_id})
gush = helpers._get_gushim(query={"gush_id": gush_id})
if gush is None or len(gush) == 0:
abort(404)
return _create_response_json(gush[0])
return helpers._create_response_json(gush[0])


@app.route('/gush/<gushim>/plans.json')
Expand All @@ -160,7 +91,7 @@ def get_plans(gushim):
else:
gushim_query = {'gushim': gushim[0]}

return _create_response_json(_get_plans(query=gushim_query))
return helpers._create_response_json(helpers._get_plans(query=gushim_query))


@app.route('/recent.json')
Expand All @@ -169,7 +100,7 @@ def get_recent_plans():
"""
Get the 10 most recent plans to show on the site's home page
"""
return _create_response_json(_get_plans(count=10))
return helpers._create_response_json(helpers._get_plans(count=10))


@app.route('/plans.atom')
Expand All @@ -180,7 +111,7 @@ def atom_feed():
else:
title = u'תב"ע פתוחה'

return _create_response_atom_feed(request, _get_plans(count=20), feed_title=title).get_response()
return helpers._create_response_atom_feed(request, helpers._get_plans(count=20), feed_title=title).get_response()


@app.route('/gush/<gushim>/plans.atom')
Expand All @@ -196,7 +127,7 @@ def atom_feed_gush(gushim):
else:
gushim_query = {'gushim': gushim[0]}

return _create_response_atom_feed(request, _get_plans(query=gushim_query), feed_title=u'תב״ע פתוחה - גוש %s' % ', '.join(gushim)).get_response()
return helpers._create_response_atom_feed(request, helpers._get_plans(query=gushim_query), feed_title=u'תב״ע פתוחה - גוש %s' % ', '.join(gushim)).get_response()


@app.route('/plans/search/<path:plan_name>')
Expand All @@ -205,7 +136,7 @@ def find_plan(plan_name):
"""
Find plans that contain the search query and return a json array of their plan and gush ids
"""
return _create_response_json(_get_plans(count=3, query={'number': {'$regex': '.*%s.*' % plan_name}}))
return helpers._create_response_json(helpers._get_plans(count=3, query={'number': {'$regex': '.*%s.*' % plan_name}}))


@app.route('/plan/<plan_id>/mavat')
Expand Down Expand Up @@ -246,7 +177,7 @@ def wakeup():
wake up Heroku dyno from idle. perhaps can if >1 dynos
used as endpoint for a "wakeup" request when the client inits
"""
return _create_response_json({'morning': 'good'})
return helpers._create_response_json({'morning': 'good'})


#### MAIN ####
Expand Down
2 changes: 1 addition & 1 deletion fabfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from scripts.client_fabfile import create_client

from scripts.server_fabfile import create_server, delete_server, update_gushim_server, deploy_server, deploy_server_all, create_db
from scripts.server_fabfile import update_db, scrape, renew_db, renew_db_all, refresh_db, refresh_db_all
from scripts.server_fabfile import update_db, scrape, renew_db, renew_db_all, refresh_db, refresh_db_all, sync_poster


@task
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
116 changes: 116 additions & 0 deletions lib/helpers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# -*- coding: utf-8 -*-

"""
Helpers for our web and worker (scraper) instances
"""

from werkzeug.contrib.atom import AtomFeed
from flask import make_response
import json
from bson import json_util
import datetime
import pymongo

from conn import db


def _get_plans(count=1000, query={}):
return list(db.plans.find(query, limit=count).sort(
[("year", pymongo.DESCENDING), ("month", pymongo.DESCENDING), ("day", pymongo.DESCENDING)]))


def _get_gushim(query={}, fields=None):
return list(db.gushim.find(query, fields=fields))


def _get_plan_statistics():
return db.plans.aggregate([
{"$unwind" : "$gushim" },
{"$project": {"gush_id": "$gushim", "status": "$status", "_id": 0}},
{"$group": {"_id": {"gush_id": "$gush_id", "status": "$status"}, "count": {"$sum": 1}}}
])


def _create_response_json(data):
"""
Convert dictionary to JSON. json_util.default adds automatic mongoDB result support
"""
r = make_response(json.dumps(data, ensure_ascii=False, default=json_util.default))
r.headers['Access-Control-Allow-Origin'] = "*"
r.headers['Content-Type'] = "application/json; charset=utf-8"
return r


def _create_response_atom_feed(request, plans, feed_title=''):
"""
Create an atom feed of plans fetched from the DB based on an optional query
"""
feed = AtomFeed(feed_title, feed_url=request.url, url=request.url_root)

for p in plans:
formatted = _format_plan(p, request.url_root)

feed.add(
title=formatted['title'],
content=formatted['content'],
content_type='html',
author="OpenTABA.info",
# id=url + '&status=' + p['status'],
# ^^ it seems like the &tblView= value keeps changing in the URL, which causes the ID to change and dlvr.it to republish items.
id="%s-%s" % (formatted['title'], p['status']),
# this is a unique ID (not real URL) so adding status to ensure uniqueness in TBA stages
url=formatted['url'],
links=formatted['links'],
updated=formatted['last_update']
)

return feed


def _format_plan(plan, server_root=None):
"""
Take a plan and format it for atom feed and social networks
"""
formatted_plan = {}

formatted_plan['url'] = plan['details_link']

# special emphasizing for some statuses
if plan['status'] in [u'פרסום ההפקדה', u'פרסום בעיתונות להפקדה']:
formatted_plan['status'] = u'»»%s««' % plan['status']
else:
formatted_plan['status'] = plan['status']

# the plan's content
formatted_plan['content'] = plan['essence'] + ' [' + formatted_plan['status'] + ', ' + \
'%02d/%02d/%04d' % (plan['day'], plan['month'], plan['year']) + ', ' + plan['number'] + ']'

# the title
formatted_plan['title'] = plan['location_string']
# 'not title' is not supposed to happen anymore because every plan currently has a location
if not formatted_plan['title']:
formatted_plan['title'] = plan['number']

# mavat link - if we have a code and the base url for this server (currently only from the atom feed) we can give a direct link
# (through our server). otherwise link to the search page with parameters
if plan['mavat_code'] == '' or server_root is None:
formatted_plan['links'] = [{'href' : 'http://www.mavat.moin.gov.il/MavatPS/Forms/SV3.aspx?tid=4&tnumb=' + plan['number'], 'rel': 'related', 'title': u'מבא"ת'}]
else:
formatted_plan['links'] = [{'href': '%splan/%s/mavat' % (server_root, plan['plan_id']), 'rel': 'related', 'title': u'מבא"ת'}]

# plan last update
formatted_plan['last_update'] = datetime.date(plan['year'], plan['month'], plan['day'])

return formatted_plan


"""
A small class to enable json-serializing of datetime.date objects
To use it: json.dumps(json_object, cls=helpers.DateTimeEncoder)
"""
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if hasattr(obj, 'isoformat'):
return obj.isoformat()
else:
return json.JSONEncoder.default(self, obj)
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 1fab235

Please sign in to comment.