The History Atlas is a web application that stores scholarly citations correlating historical people with a time and place, and presents an interactive map searchable by time period, geographic area, or person.
The project was my primary focus during my batch at the Recurse Center, a three month self-directed retreat for programmers in NYC from April-June 2021.
I've been building the History Atlas with a CQRS/Event-Sourcing architecture in mind. The goal is to have a flexible, extensible base infrastructure that will be easy to maintain and scale, and will allow for significant changes in the frontend data requirements without needing to make changes to the primary database.
The project is built as a series of microservices which communicate asynchronously via RabbitMQ. Details on message contracts between services are found in each application's documentation.
- Client: React app
- API: Apollo GraphQL service
- Read Model: Service for querying application state
- Write Model: Service for making mutations to application state
- Event Store: Canonical database for the application
- History Service: provides an event stream of the canonical database for building local state
- PyBroker: Base library for Python AMQP brokers across the application
- NLP Service: Natural language processing service providing custom named entity recognition (NER)
- Geo Service: Database of place names and their geographic coordinates.
- tha-config: Basic config library for working with environmental variables across Python services
- Tackle: Integration testing library built to help ensure message contracts between services
- Builder: Series of scripts to generate and manage data in development:
- builddata: a command-line tool to create annotations of real data sources
- build_geo_src: sources geo data from geonames.org and saves it to disk for the use by other scripts.
- build_fake_data: a script to generate a large amount of fake (but realistically interconnected) related data.
- mock_client: a tool to programmatically publish data from json files to the GraphQL API endpoint.
- User Accounts Service: Manages user information
- Email Service: Provides email services across the application (in progress)
- Logging Service: Tracks query/response times and application traffic patterns (in progress)
- install/setup docker
- fork/clone the repo, and navigate to the project root directory
- Build the project with
bash build.sh
- Run the project with
docker-compose up
- Generate fake data (or create your own real data) and publish it to the application using the scripts in the builder directory.
- Stop the project with ctl-C in the same terminal or with
docker-compose down
in a separate tab.
The RabbitMQ admin console is available at localhost:15672.
- username: guest
- password: guest The GraphQL API is available at localhost:4000.
On the first build, the Geo service will build its database, which involves pulling in data from geonames.org and building a graph database structure locally. The source for this build can be adjusted in the docker compose file under the Geo service. The default source is cities15000.zip, which will take ~10-15 minutes to fully build. The alternate source (cities500.zip) will be used in production, but takes ~3.5 hours to build, so use at your own risk.
Tests can be run with the bash test.sh
command.
Ideas, suggestions, and contributions are all welcome! If you would like to be involved, the issues page is a great place to start, but feel free to reach out directly as well.
At the time of writing, there are still some lingering issues with building select libraries on Apple silicon (notably, numpy, which is a dependency for spaCy in the NLP app). If you are developing on an ARM64 architecture, rename the default docker file in the NLP directory to something else, and rename the Dockerfile_arm64 file to simply Dockerfile. This alternate dockerfile uses Conda as a workaround, but has caused problems on Linux machines, so is no longer the default build.