You'll just need Docker and docker-compose (and node if you want to use the easy npm scripts)
Start it up with npm start
Then you can get to the admin interface on https://localhost:8443/admin
and login with the dev credentials which are fraserdev
and dev
.
- DATABASE_PASSWORD
- SECRET_KEY
- ADMIN_PASSWORD
- RABBITMQ_DEFAULT_PASS
- DOCKER_USERNAME
- DOCKER_PASSWORD
- GEOLOCATION_API_KEY
Start it up with npm start
Once Postgres is up run npm run migrate
to apply migrations.
Scripts in ./_ops
are for managing the live deployment. These are bash scripts, so you gotta run them in a bash shell. We also have a package.json for standardizing how we run scripts across projects.
I use Dockerhub for my docker images. To build and push use npm run docker-push
.
To deploy it run npm run deploy
from the root directory.
To provision a new server:
- Run
DP_SERVER=[whatever] ./_ops/provision.sh
. This upgrades the distribution and adds a user called fraser with sudo permissions - SSH in to verify you can, and then disable SSH for the root user by editing
/etc/ssh/sshd_config
and changingPermitRootLogin yes
toPermitRootLogin no
then restart SSH withservice ssh restart
Before you deploy you'll need to point the DNS at it since it needs to be on the domain for letsencrypt to work.
Then it's ready to be deployed with the deploy script.
For performance all properties on models which are computationally expensive to get, or require database joins, are cached permanently. This includes stuff like price history, averages, and metadata around the number of practices in each PHO.
There is a custom command to clear caches which you can run by jumping in the container and doing this:
python manage.py clearcache
To trigger a backup on the remote of the live data run npm run backup-live
.
To get the latest backup you just made so you can use it locally run npm run backup-get
.
To restore the latest backup on the remote, put a backup file in ~/docker-services/doctorpricer/restore
then run npm run restore-live
.
To backup run npm run backup-dev
and it'll backup to ./backups
in a file called backup.
To restore run npm run restore-dev
and it'll restore ./backups/backup
. You'll need to run npm start
after to get the dev user back.
- Navigate to https://localhost:8443/admin and log in with your credentials
In dev this is fraserdev
and dev
.
- Put data.json into
_manual
- Create a PHO with the module name
_manual
- Run
scrape
on_manual
, this will create PHOs too - Run
submit
on the new PHOs
Needs a file called scraper.py in the folder named after the PHO.
- Start by importing the global module:
from scrapers import common as scrapers
- Define a function called
scrape(name)
- Instantiate a scraper object in the scrape function like this:
scraper = scrapers.Scraper(name)
- Use
scrapers.openAndSoup(url)
to open a url and turn it into something parseable - When you've found a practice use
scraper.newPractice(name, url, PHO, restriction)
and add more details like this:scraper.practice['phone'] = '5555'
- When you've completed the practice use
scraper.finishPractice()
to finish with it - When you've completed all practices
return scraper.finish()
The pricing object should be under scraper.practice['prices']
and should like a bit like this:
[
{"age": 0, "price": 0},
{"age": 13, "price": 20},
]
If don't provide a latitude, longitude then it will geolocate these based on the address. If you don't supply an address then it will try to geolocate based on the name.
If you don't provide a phone number it will default to "None supplied"
other: scraper.addError()
, scraper.addWarning()
, scraper.setLatLng([0, 0])
I made a docker image for testing them in Docker so we don't have to mess up our user environment!!!!
To do this run npm run test [scraper]
. It'll run the scraper and spit the output to scrapers/data.json for your perusal.
It turns out that practices will change their names quite a lot online for no good reason, meaning we end up with lots of duplicated practices.
To combat this I made a cleaning method which can be accessed from the homepage of the scrapers UI and does this:
-
Search for practices with addresses similar to other addresses (ie. 39 Something Road, Suburb, Auckland and 39 Something Road, Auckland)
-
Search for practices within 10m of each other
-
Delete them following this algorithm
-
If one is newer, keep that and disable the others
-
If they're all the initial date (ie. haven't been touched since we added creation dates) disable the ones with the smallest IDs (these are presumably older)
We don't delete anything because then we'd lose price history, so we just disable them which means they won't show up anywhere.
If you get:
OperationalError: could not access file "$libdir/postgis-X.X
On a new deploy that means the postgis image has updated their version of postgis. To fix it run the built in update-postgis script:
docker exec doctorpricer-postgres_1 update-postgis.s
If this gives you some annoying error about there not being a root role then you'll have to run its commands manually.
Get in the db with:
psql --user=postgres --dbname="postgres"
Then run those ALTER commands.
https://github.com/appropriate/docker-postgis/blob/master/update-postgis.sh
If you get some bullshit about the migration PKEY being wrong then I guess the migrations table got messed up somehow, try reindexing it after logging into psql:
REINDEX TABLE django_migrations;
If you have an issue with old data needing to be altered for a scheme change you can get into a db shell with
python manage.py dbshell
Then run a query like:
UPDATE dp_server_prices SET csc = False;
If the backup can't backup because of permissions, make sure the directory outside Docker has the same UID as the one inside. This means for Django www-data.
sudo chown www-data:www-data backups
If you can't restore a backup because of index not existing errors... Just edit the .psql file and replace DROP INDEX
with DROP INDEX IF EXISTS
(this is weird, idk why it does that)
Each Scraper needs a PHO object.
- name
- module
- website
- region
Scrapers make these when they scrape. Each is associated with a PHO.
- name
- address
- pho
- url
- lat
- lng
- phone
- restriction
- place_id
A log is made each time a scraper is run. These are displayed under each PHO/scraper at data.doctorpricer.co.nz.
Each practice has prices associated with it. These are seperate objects to make them easy to do database stuff with.