Skip to content

A open source Literature Based Discovery system, using Groovy & Grails, including a re-implementation of the Arrowsmith algorithm(s)

License

Notifications You must be signed in to change notification settings

fogbeam/Valmont-F

Repository files navigation

Overview

Valmont/F - A re-implementation of the classic ArrowSmith Literature Based Discovery system, using Groovy and Grails.

This is a learning platform, for anyone interested in LBD, to play and experiment. Our first goal is to re-implement the classic algorithm(s) that Don Swanson developed, and then start looking for new and interesting variations and additions.

As such, one of our primary goals here is to make this as modular and pluggable as possible, in order to facilitate experimenting with novel approaches to Literature Based Discovery.

This is very much a Work In Progress at the moment, featuring fairly naive implementations of the orignal "procedure one" and "procedure two" approaches from Swanson's original A-B-C discovery algorithm.

About

When I say I am called Valmont/F, the name will convey no impression to the reader, one way or another. My occupation is that of open source Literature Based Discovery system on GitHub. If you ask anyone who Valmont was, she will likely be able to tell you to see http://www.gutenberg.org/files/19369/19369-h/19369-h.htm If you ask here why I am named Valmont/F, she will surely say that I am named after the world-famous detective, Eugène Valmont.

Deployment

There are two main ways to deploy Valmont/F at the moment. The first, and easiest, is to use our public Docker image(s), located at https://hub.docker.com/r/fogbeam/valmont-f

A simple "docker pull fogbeam/valmont-f:latest" followed by a "docker run -d -p 8080:8080 fogbeam/valmont-f:latest" should yield a running Valmont/F instance. The container exposes port 8080, but you can change the -p argument to map that to whatever makes sense on your Docker host. The webapp is running on the root context.

The second way is to clone this Git repo, install Java and Grails (if you don't already have those installed), and then do a ./run_valmont.sh in the root of the cloned repo directory. Take this approach if you want to hack on the code yourself. The required Grails version is 3.3.6 if you plan to run things this way.

Reference Material

License

Original code provided by Fogbeam Labs is licensed under the Apache License v2. Data files and supporting libraries may be under separate licenses. See LICENSE file for more details.

TODO

  • Create a Docker image and push to Docker Hub - DONE
  • add more terms to the clinical-stopwords list
  • add a "domain selector" to toggle what archive is queried and what stopword list(s) are employed
  • better tokenization of abstracts and titles, so we don't, for example, treat 'Start' and 'Start.' as different tokens and generate each as a 'b term'
  • use NLP, deep learning, etc. to do deeper semantic analysis of article text to find more meaningful connections that simple co-occurence of words
  • improve code structure to create reusable components that simplify implementing new algorithms and approaches
  • add input validation to existing controllers
  • figure out a UI experience for "drilling down" further into the results we currently return, especially for "Procedure One"
  • support more complex relationships, especially "multi-hop" ones that involve more than two concepts
  • Add visualizations to help navigate / explore results. Maybe use dot / graphviz
  • Add caching to reduce the need for downloading documents all the time
  • Add ability to filter the initial query(ies) by date range

Resources and Stuff for Future Experimtation