Skip to content
forked from splurge/splurge

Scholars Portal Library Usage-Based Recommendation Generation Engine

Notifications You must be signed in to change notification settings

dileshni/splurge

 
 

Repository files navigation

SPLURGE: Scholars Portal Library Usage-Based Recommendation Generation Engine

Amazon.ca has a "customers who bought this item also bought" feature that recommends things to you that you might be interested in. LibraryThing has it too: the recommendations for What's Bred in the Bone by Robertson Davies include books by Margaret Laurence, Carol Shields, Michael Ondaatje, Peter Ackroyd, John Fowles, and David Lodge, as well as other Davies works.

Library catalogues don't have any such feature, but they should. And libraries are sitting on the circulation and usage data that makes it possible. (BiblioCommons does have a Similar Titles feature, but it's a closed commercial product aimed at public libraries, and anyway the titles are added by hand.)

SPLURGE will collect usage data from OCUL members and build a recommendation engine that can be integrated into any member's catalogue. The code will be made available under the GNU Public License and the data will be made available under an open data license.

Installation

Here is everything necessary to get SPLURGE running on your own computer.

If you wish to skip creating you databae and webserver you can use the manage.splurge.ipy.sh tool. Be sure to git clone --recursive to include submodule ipython

Using the manage script (tested on *buntu):

. ./set_password.sh
./manage.splurge.ipy.sh
app.install()
https://splurge.localhost/splurge_service/

An other option is to manualy download the packages and configure the services.

(As of 1 October 2012: Incomplete.)

Required packages

You will need to install some software packages to use SPLURGE. (Ubuntu commands in brackets.)

  • Git for version control (sudo apt-get install git)
  • PostgreSQL (sudo apt-get intall postgresql)
  • PostgreSQL development libraries (sudo apt-get install libpq-dev)
  • Python (sudo apt-get install python)
  • Python development libaries (sudo apt-get install python-dev)
  • pip (sudo apt-get install python-pip)
  • psycopg2, Python module for talking to PostgreSQL (sudo pip install psycopg2)
  • flask, a simple Python framework for web applications (sudo pip install flask)

Setting up PostgreSQL

First, install it, for example on Ubuntu (see Ubuntu documentation on PostgreSQL for full details, and your own documentation if you use a different operating system)

$ sudo apt-get install postgresql

Set a password:

$ sudo -u postgres psql postgres
postgres=# \password postgres

At the shell, create the splurge_user account and the splurge database:

$ sudo -u postgres createuser
Enter name of role to add: splurge_user
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
$ sudo -u postgres createdb splurge
$ sudo -u postgres psql
psql (9.1.5)
Type "help" for help.

postgres=# alter user splurge_user with encrypted password 'splurge';
ALTER ROLE
postgres=# grant all privileges on database splurge to splurge_user;
GRANT

Note that the password is set to "splurge". For your local testing, this is fine. In production this should be different.

Now set up the database:

psql -d splurge -U splurge_user -W < app/db/schema_dump.sql 

If you get an error like this:

psql: FATAL:  Peer authentication failed for user "splurge_user"

then follow the instructions at Get Postgres working on Ubuntu or Linux Mint.

If you ever need to reset the splurge database back to zero and start over, run this command again.

Later, if you want to dump out the database, run

$     pg_dump -U splurge_user splurge

Set the SPLURGE_USER environment variable

The password for splurge_user needs to be set in the SPLURGE_DB_PASSWORD environment variable before running anything more. (This makes it easier to share code.) Before going on, run this (if you use bash) but, of course, use whatever password you set:

$ export SPLURGE_DB_PASSWORD=splurge

You could add this line to a login file such as .bashrc.

Test that it was set properly by running

$ echo $SPLURGE_DB_PASSWORD

Download SPLURGE

Download SPLURGE from the https://github.com/splurge/splurge.git:

$ git clone https://github.com/splurge/splurge.git
$ cd splurge

Loading in data

This assumes that you're a developer and have downloaded all of the data files from Scholars Portal into the app/splurge/data/ directory.

TODO: Add test data to the repo so this works out of the box for non-developers.

Run this:

$ cd app/splurge
$ ./tool.py --update_database

This will take a while.

Test it:

$ ./tool.py --test

Running the web service

$ ./tool.py --little_server

Then go to http://localhost:3000/static/index.html and try it out.

The service is running at http://localhost:3000/splurge_service/

Test ISBNs

  • 0321643720
  • 9780321643728
  • 0273713248
  • 9780273713241
  • 0763766321
  • 9780763766320
  • 0176501657
  • 9780176501655
  • 9780538733410
  • 9781412974882
  • 0773502424

TO DO

  • Put this under a GPLv2 license (with "or later") (discuss)
  • Figure out how best to handle new data uploads, and automate that process so that when new files are uploaded they automatically loaded.
  • Use xISBN and thingISBN: given book X, look up other manifestations of the same work, then look for and dedupe recommendations for all of them. Instead of offering recommendations based on one edition of a work, it would offer them based on the work.
  • Go beyond ISBNs into other standard numbers, such as LCCN and OCLCnums
  • Go beyond standard numbers!
  • Use for collection development purposes: give collection librarians a way of looking up what's recommended for a given book and seeing if it's in their collection. (Talk to collection librarians about what exactly they'd want.)

Background

We plan to implement in Ontario something close to the JISC project called MOSAIC (Making Our Shared Activity Information Count). The documents there describe what they did, and our plan is based on that.

The JISC MOSAIC wiki has code and data examples.

The JISC project grew out of work done by Dave Pattern (Library Systems Manager) and others at the University of Huddersfield. They made usage data available under an Open Data Commons License.

Updated 13 Feb: The SALT Recommender API is doing what we want to do, and JISC's planned SALT 2 project is a consortial approach like OCUL would do:

Pattern's Sliding Down the Long Tail describes the logic we'll need to follow.

Tim Spalding implemented a similar feature at LibraryThing. When asked on Twitter how it worked, he said The best code is just statistics and Given random distribution how many of book X would you expect? How many did you find?.

In conversation, both Pattern and Spalding mentioned the Harry Potter effect: some books are so popular with everyone that they need to be damped down. Everyone reading Freud or Ferlinghetti, Feynman or Foucault, is probably also reading J.K. Rowling, but that doesn't mean Harry Potter and the Goblet of Fire should be recommended to people looking at Totem and Taboo or Madness and Civilization.

Related reading

About

Scholars Portal Library Usage-Based Recommendation Generation Engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 87.2%
  • JavaScript 9.8%
  • Shell 3.0%