Merge pull request #90 from niryariv/social_poster

Social poster to Facebook and Twitter
niryariv · Jan 10, 2015 · 1fab235 · 1fab235
2 parents 874cb28 + 0a7322a
commit 1fab235
Show file tree

Hide file tree

Showing 22 changed files with 309 additions and 108 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -15,7 +15,7 @@ install:
   - pip install Flask==$FLASK
 
 before_script:
-  - python tools/create_db.py --force -m all
+  - python scripts/create_db.py --force -m all
   - mkdir filecache
   - chmod -R 0777 filecache 
   - python scrape.py -g 30649

diff --git a/DEPLOYMENT.md b/DEPLOYMENT.md
@@ -6,7 +6,7 @@ you are deploying)
 To deploy a server/database for a new municipality, follow these steps:
   1. Make sure the GeoJSON map file with the name of the municipality has 
      been added to the [map repository](http://github.com/niryariv/israel_gushim)
-  2. Run `fab create_server:holon,"חולון"`. This will add the new gush ids to the tools/gushim.py file, create & configure the new Heroku app / MongoDB, and finally run the scraper to get all municipality's plans. 
+  2. Run `fab create_server:holon,"חולון"`. This will add the new gush ids to the lib/gushim.py file, create & configure the new Heroku app / MongoDB, and finally run the scraper to get all municipality's plans. 
   3. When the task finishes running, a browser window (or tab) will be open with 
      the new app's scheduler dashboard. Add a new scheduled task with the 
      command: `python scrape.py -g all ; python worker.py`. Do not change dyno settings.
@@ -23,6 +23,21 @@ To deploy a new municipality, run: `fab create_client:holon,"חולון"` after
 To change client configuration, you can edit `munis.js` manually later on, according to the [Municipality 
      Index File syntax](http://github.com/niryariv/opentaba-client/blob/master/DEPLOYMENT.md#municipality-index-file).
 
+##Automatic Facebook and Twitter Posting
+The server is able to post a plan's content to a Facebook page and Twitter feed every time a plan is created or updated, using a running instance of [opentaba-poster](https://github.com/florpor/opentaba-poster).
+To enable this feature, environment variables need to be set on the server with things like access tokens, consumer keys etc.
+You can enable Facebook only, Twitter only or both.
+
+###Environment Variables
+####Poster
+To enable social posting, we must be configured to work with an instance of [opentaba-poster](https://github.com/florpor/opentaba-poster).
+To do that, we must make sure we are defined as a poster on the opentaba-poster app, and then set two environment variables -
+`POSTER_SERVICE_URL` must be set to the url of the opentaba-poster app, and `POSTER_ID` must be set to our assigned id, eg:
+```
+heroku config:set POSTER_SERVICE_URL="http://poster.service.com/" --app opentaba-server-holon
+heroku config:set POSTER_ID="holon_id" --app opentaba-server-holon
+```
+
 ##All Fabric Tasks
 ###Server
 + `fab create_server:muni_name, "display_name"`
@@ -37,9 +52,9 @@ To change client configuration, you can edit `munis.js` manually later on, accor
   ignore_errors is set to false by default because if this task fails it most
   likely means the app does not exist to begin with.
 
-+ `fab update_gushim_server:muni_name` Update the [tools/gushim.py](tools/gushim.py) file with the
++ `fab update_gushim_server:muni_name` Update the [lib/gushim.py](lib/gushim.py) file with the
   gushim of a new municipality or the updated ones of an existing municipality.
-  This task downloads the gush map file from [israel_gushim](http://github.com/niryariv/israel_gushim), parses its  data, and if there are new gushim it updates the [tools/gushim.py](tools/gushim.py) file and the 
+  This task downloads the gush map file from [israel_gushim](http://github.com/niryariv/israel_gushim), parses its  data, and if there are new gushim it updates the [lib/gushim.py](lib/gushim.py) file and the 
   [Tests/functional_tests/test_return_json.py](Tests/functional_tests/test_return_json.py) file (with the new amount of gushim), commits and pushes on the master branch. Note that this task does not deploy
   anywhere, and the new gushim data will not exist on active servers until you
   deploy changes to them.
@@ -50,10 +65,10 @@ To change client configuration, you can edit `munis.js` manually later on, accor
 + `fab deploy_server_all` Find servers by looking at your `heroku list` and filtering
   out the ones that don't match our server name pattern. Run deploy_server task
   on each of the discovered servers.
-+ `fab create_db:muni_name` Run the [tools/create_db.py](tools/create_db.py) script on the given
++ `fab create_db:muni_name` Run the [scripts/create_db.py](scripts/create_db.py) script on the given
   municipality's heroku app. Will only create db for the given municipality's
   gushim.
-+ `fab update_db:muni_name` Run the [tools/update_db.py](tools/update_db.py) script on the given
++ `fab update_db:muni_name` Run the [scripts/update_db.py](scripts/update_db.py) script on the given
   municipality's heroku app. Will only update db for the given municipality's
   gushim.
 + `fab scrape:muni_name,<show_output=False|True>` Run the [scrape.py](scrape.py) script on the
@@ -66,6 +81,9 @@ To change client configuration, you can edit `munis.js` manually later on, accor
 + `fab refresh_db:muni_name` Update the DB with new gushim via update_db and run scrape tasks
 + `fab refresh_db_all` Find servers by looing at your `heroku list` and filtering
   by our naming pattern. Run the refresh_db task on each one discovered.
++ `fab sync_poster:muni_name,min_date` Run the [scripts/sync_poster.py](scripts/sync_poster.py) script on the given
+  municipality's heroku app. min_date is the minimum date of plans to post, 
+  and should be of the format: 1/1/2015.
 
 ###Client
 + `fab create_client:muni_name,"display_name"` For client creation, all we need

diff --git a/Tests/functional_tests/test_return_json.py b/Tests/functional_tests/test_return_json.py
@@ -97,7 +97,7 @@ def test_api_get_plan():
     eq_(response.mimetype, 'application/json')
 
     # I don't know the correct number, since it's changes with each update, but it should be more then this
-    assert_true(len(j) >= 19)
+    assert_true(len(j) >= 17)
 
 
 def test_api_wakeup():

diff --git a/Tests/unit_test/test_scrape.py b/Tests/unit_test/test_scrape.py
@@ -3,7 +3,7 @@
 from app import app
 from nose.tools import eq_, assert_true
 from nose import with_setup
-from tools.scrapelib import scrape_gush
+from lib.scrapelib import scrape_gush
 import os
 
 testapp = app.test_client()

diff --git a/app.py b/app.py
@@ -12,79 +12,14 @@
 from flask import Flask
 from flask import abort, make_response, request
 
-from tools.conn import *
-from tools.gushim import GUSHIM
-from tools.cache import cached, _setup_cache
+from lib.conn import *
+from lib.cache import cached, _setup_cache
+import lib.helpers as helpers
 
 app = Flask(__name__)
 app.debug = RUNNING_LOCAL # if we're local, keep debug on
 
 
-#### Helpers ####
-
-def _get_plans(count=1000, query={}):
-    return list(db.plans.find(query, limit=count).sort(
-        [("year", pymongo.DESCENDING), ("month", pymongo.DESCENDING), ("day", pymongo.DESCENDING)]))
-
-
-def _get_gushim(query={}, fields=None):
-    return list(db.gushim.find(query, fields=fields))
-
-
-def _create_response_json(data):
-    """
-    Convert dictionary to JSON. json_util.default adds automatic mongoDB result support
-    """
-    r = make_response(json.dumps(data, ensure_ascii=False, default=json_util.default))
-    r.headers['Access-Control-Allow-Origin'] = "*"
-    r.headers['Content-Type'] = "application/json; charset=utf-8"
-    return r
-
-
-def _create_response_atom_feed(request, plans, feed_title=''):
-    """
-    Create an atom feed of plans fetched from the DB based on an optional query
-    """
-    feed = AtomFeed(feed_title, feed_url=request.url, url=request.url_root)
-
-    for p in plans:
-        url = p['details_link']
-
-        # special emphasizing for some statuses
-        if p['status'] in [u'פרסום ההפקדה', u'פרסום בעיתונות להפקדה']:
-            status = u'»»%s««' % p['status']
-        else:
-            status = p['status']
-
-        content = p['essence'] + ' [' + status + ', ' + '%02d/%02d/%04d' % (p['day'], p['month'], p['year']) + \
-            ', ' + p['number'] + ']'
-        title = p['location_string']
-        # 'not title' is not supposed to happen anymore because every plan currently has a location
-        if not title:
-            title = p['number']
-
-        if p['mavat_code'] == '':
-            links = [{'href' : 'http://www.mavat.moin.gov.il/MavatPS/Forms/SV3.aspx?tid=4&tnumb=' + p['number'], 'rel': 'related', 'title': u'מבא"ת'}]
-        else:
-            links = [{'href': '%splan/%s/mavat' % (request.url_root, p['plan_id']), 'rel': 'related', 'title': u'מבא"ת'}]
-
-        feed.add(
-            title=title,
-            content=content,
-            content_type='html',
-            author="OpenTABA.info",
-            # id=url + '&status=' + p['status'], 
-            # ^^ it seems like the &tblView= value keeps changing in the URL, which causes the ID to change and dlvr.it to republish items.
-            id="%s-%s" % (title, p['status']),
-            # this is a unique ID (not real URL) so adding status to ensure uniqueness in TBA stages
-            url=url,
-            links=links,
-            updated=datetime.date(p['year'], p['month'], p['day'])
-        )
-
-    return feed
-
-
 #### Cache Helper ####
 
 @app.before_first_request
@@ -105,18 +40,14 @@ def get_gushim():
     get gush_id metadata
     """
     detailed = request.args.get('detailed', '') == 'true'
-    gushim = _get_gushim(fields={'gush_id': True, 'last_checked_at': True, '_id': False})
+    gushim = helpers._get_gushim(fields={'gush_id': True, 'last_checked_at': True, '_id': False})
     if detailed:
         # Flatten list of gushim into a dict
         g_flat = dict((g['gush_id'], {"gush_id": g['gush_id'],
                                       "last_checked_at": g['last_checked_at'],
                                       "plan_stats": {}}) for g in gushim)
         # Get plan statistics from DB
-        stats = db.plans.aggregate([
-            {"$unwind" : "$gushim" },
-            {"$project": {"gush_id": "$gushim", "status": "$status", "_id": 0}},
-            {"$group": {"_id": {"gush_id": "$gush_id", "status": "$status"}, "count": {"$sum": 1}}}
-        ])
+        stats = helpers._get_plan_statistics()
 
         # Merge stats into gushim dict
         for g in stats['result']:
@@ -132,7 +63,7 @@ def get_gushim():
         # De-flatten our dict
         gushim = g_flat.values()
 
-    return _create_response_json(gushim)
+    return helpers._create_response_json(gushim)
 
 
 @app.route('/gush/<gush_id>.json')
@@ -141,10 +72,10 @@ def get_gush(gush_id):
     """
     get gush_id metadata
     """
-    gush = _get_gushim(query={"gush_id": gush_id})
+    gush = helpers._get_gushim(query={"gush_id": gush_id})
     if gush is None or len(gush) == 0:
         abort(404)
-    return _create_response_json(gush[0])
+    return helpers._create_response_json(gush[0])
 
 
 @app.route('/gush/<gushim>/plans.json')
@@ -160,7 +91,7 @@ def get_plans(gushim):
     else:
         gushim_query = {'gushim': gushim[0]}
 
-    return _create_response_json(_get_plans(query=gushim_query))
+    return helpers._create_response_json(helpers._get_plans(query=gushim_query))
 
 
 @app.route('/recent.json')
@@ -169,7 +100,7 @@ def get_recent_plans():
     """
     Get the 10 most recent plans to show on the site's home page
     """
-    return _create_response_json(_get_plans(count=10))
+    return helpers._create_response_json(helpers._get_plans(count=10))
 
 
 @app.route('/plans.atom')
@@ -180,7 +111,7 @@ def atom_feed():
     else:
         title = u'תב"ע פתוחה'
 
-    return _create_response_atom_feed(request, _get_plans(count=20), feed_title=title).get_response()
+    return helpers._create_response_atom_feed(request, helpers._get_plans(count=20), feed_title=title).get_response()
 
 
 @app.route('/gush/<gushim>/plans.atom')
@@ -196,7 +127,7 @@ def atom_feed_gush(gushim):
     else:
         gushim_query = {'gushim': gushim[0]}
 
-    return _create_response_atom_feed(request, _get_plans(query=gushim_query), feed_title=u'תב״ע פתוחה - גוש %s' % ', '.join(gushim)).get_response()
+    return helpers._create_response_atom_feed(request, helpers._get_plans(query=gushim_query), feed_title=u'תב״ע פתוחה - גוש %s' % ', '.join(gushim)).get_response()
 
 
 @app.route('/plans/search/<path:plan_name>')
@@ -205,7 +136,7 @@ def find_plan(plan_name):
     """
     Find plans that contain the search query and return a json array of their plan and gush ids
     """
-    return _create_response_json(_get_plans(count=3, query={'number': {'$regex': '.*%s.*' % plan_name}}))
+    return helpers._create_response_json(helpers._get_plans(count=3, query={'number': {'$regex': '.*%s.*' % plan_name}}))
 
 
 @app.route('/plan/<plan_id>/mavat')
@@ -246,7 +177,7 @@ def wakeup():
     wake up Heroku dyno from idle. perhaps can if >1 dynos
     used as endpoint for a "wakeup" request when the client inits
     """
-    return _create_response_json({'morning': 'good'})
+    return helpers._create_response_json({'morning': 'good'})
 
 
 #### MAIN ####

diff --git a/fabfile.py b/fabfile.py
@@ -9,7 +9,7 @@
 from scripts.client_fabfile import create_client
 
 from scripts.server_fabfile import create_server, delete_server, update_gushim_server, deploy_server, deploy_server_all, create_db
-from scripts.server_fabfile import update_db, scrape, renew_db, renew_db_all, refresh_db, refresh_db_all
+from scripts.server_fabfile import update_db, scrape, renew_db, renew_db_all, refresh_db, refresh_db_all, sync_poster
 
 
 @task

diff --git a/tools/__init__.py → lib/__init__.py b/tools/__init__.py → lib/__init__.py
diff --git a/tools/cache.py → lib/cache.py b/tools/cache.py → lib/cache.py
diff --git a/tools/conn.py → lib/conn.py b/tools/conn.py → lib/conn.py
diff --git a/tools/gushim.py → lib/gushim.py b/tools/gushim.py → lib/gushim.py
diff --git a/lib/helpers.py b/lib/helpers.py
@@ -0,0 +1,116 @@
+# -*- coding: utf-8 -*-
+
+"""
+Helpers for our web and worker (scraper) instances
+"""
+
+from werkzeug.contrib.atom import AtomFeed
+from flask import make_response
+import json
+from bson import json_util
+import datetime
+import pymongo
+
+from conn import db
+
+
+def _get_plans(count=1000, query={}):
+    return list(db.plans.find(query, limit=count).sort(
+        [("year", pymongo.DESCENDING), ("month", pymongo.DESCENDING), ("day", pymongo.DESCENDING)]))
+
+
+def _get_gushim(query={}, fields=None):
+    return list(db.gushim.find(query, fields=fields))
+
+
+def _get_plan_statistics():
+    return db.plans.aggregate([
+            {"$unwind" : "$gushim" },
+            {"$project": {"gush_id": "$gushim", "status": "$status", "_id": 0}},
+            {"$group": {"_id": {"gush_id": "$gush_id", "status": "$status"}, "count": {"$sum": 1}}}
+        ])
+
+
+def _create_response_json(data):
+    """
+    Convert dictionary to JSON. json_util.default adds automatic mongoDB result support
+    """
+    r = make_response(json.dumps(data, ensure_ascii=False, default=json_util.default))
+    r.headers['Access-Control-Allow-Origin'] = "*"
+    r.headers['Content-Type'] = "application/json; charset=utf-8"
+    return r
+
+
+def _create_response_atom_feed(request, plans, feed_title=''):
+    """
+    Create an atom feed of plans fetched from the DB based on an optional query
+    """
+    feed = AtomFeed(feed_title, feed_url=request.url, url=request.url_root)
+
+    for p in plans:
+        formatted = _format_plan(p, request.url_root)
+
+        feed.add(
+            title=formatted['title'],
+            content=formatted['content'],
+            content_type='html',
+            author="OpenTABA.info",
+            # id=url + '&status=' + p['status'], 
+            # ^^ it seems like the &tblView= value keeps changing in the URL, which causes the ID to change and dlvr.it to republish items.
+            id="%s-%s" % (formatted['title'], p['status']),
+            # this is a unique ID (not real URL) so adding status to ensure uniqueness in TBA stages
+            url=formatted['url'],
+            links=formatted['links'],
+            updated=formatted['last_update']
+        )
+
+    return feed
+
+
+def _format_plan(plan, server_root=None):
+    """
+    Take a plan and format it for atom feed and social networks
+    """
+    formatted_plan = {}
+
+    formatted_plan['url'] = plan['details_link']
+
+    # special emphasizing for some statuses
+    if plan['status'] in [u'פרסום ההפקדה', u'פרסום בעיתונות להפקדה']:
+        formatted_plan['status'] = u'»»%s««' % plan['status']
+    else:
+        formatted_plan['status'] = plan['status']
+
+    # the plan's content
+    formatted_plan['content'] = plan['essence'] + ' [' + formatted_plan['status'] + ', ' + \
+        '%02d/%02d/%04d' % (plan['day'], plan['month'], plan['year']) + ', ' + plan['number'] + ']'
+
+    # the title
+    formatted_plan['title'] = plan['location_string']
+    # 'not title' is not supposed to happen anymore because every plan currently has a location
+    if not formatted_plan['title']:
+        formatted_plan['title'] = plan['number']
+
+    # mavat link - if we have a code and the base url for this server (currently only from the atom feed) we can give a direct link
+    # (through our server). otherwise link to the search page with parameters
+    if plan['mavat_code'] == '' or server_root is None:
+        formatted_plan['links'] = [{'href' : 'http://www.mavat.moin.gov.il/MavatPS/Forms/SV3.aspx?tid=4&tnumb=' + plan['number'], 'rel': 'related', 'title': u'מבא"ת'}]
+    else:
+        formatted_plan['links'] = [{'href': '%splan/%s/mavat' % (server_root, plan['plan_id']), 'rel': 'related', 'title': u'מבא"ת'}]
+
+    # plan last update
+    formatted_plan['last_update'] = datetime.date(plan['year'], plan['month'], plan['day'])
+
+    return formatted_plan
+
+
+"""
+A small class to enable json-serializing of datetime.date objects
+To use it: json.dumps(json_object, cls=helpers.DateTimeEncoder)
+"""
+class DateTimeEncoder(json.JSONEncoder):
+    def default(self, obj):
+       if hasattr(obj, 'isoformat'):
+           return obj.isoformat()
+       else:
+           return json.JSONEncoder.default(self, obj)
diff --git a/tools/mavat_scrape.py → lib/mavat_scrape.py b/tools/mavat_scrape.py → lib/mavat_scrape.py
diff --git a/tools/mmi_scrape.py → lib/mmi_scrape.py b/tools/mmi_scrape.py → lib/mmi_scrape.py