Skip to content

Flowmap.query on DigitalOcean: Step by step guide

Ilya Boyandin edited this page Aug 8, 2019 · 6 revisions

Create a droplet

Create a 2GB RAM Ubuntu 18.04 droplet on DigitalOcean (costs $10 per month). If you database is small, you can later scale it down to a 1GB RAM droplet which costs $5 per month. But 1GB is unfortunately not enough to build the app from source with Node.js.

Connect to the droplet

On your DigitalOcean projects page you should see the IP address of your new droplet. Connect to it via SSH from your terminal:

Alternatively, you can use "Access console" directly on the projects page.

Install and start ClickHouse

sudo apt-add-repository "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/"
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 # optional
sudo apt-get update
sudo apt-get install clickhouse-client clickhouse-server
sudo service clickhouse-server start

You will need to set a password for ClickHouse during this process.

Download NYC CitiBike data (part of it)

Download the data for 2018:

for month in `seq 1 12`; do wget -P citibike-trips/ \
`printf "https://s3.amazonaws.com/tripdata/2018%02d-citibike-tripdata.csv.zip" $month`; done

Unzip it:

sudo apt-get install unzip
unzip "citibike-trips/*.zip" -d citibike-trips/

Ingest CitiBike data

clickhouse-client --password YOUR_CLICKHOUSE_PASSWORD --query="
CREATE TABLE nyc_citibike_trips (
start_date Date,
trip_duration UInt16,
start_time DateTime,
stop_time DateTime,
start_station_id String,
end_station_id String,
bike_id UInt8,
user_type Enum8('Subscriber'=1,'Customer'=2),
birth_year UInt16,
gender Enum8('0'=0,'1'=1,'2'=2)) ENGINE = MergeTree(start_date, (start_date, start_time), 8192);"


for csvfile in citibike-trips/*.csv; do cat $csvfile | \
awk -F, -v OFS=',' '{print     substr($2,2,10),$1,substr($2,2,19),substr($3,2,19),$4,$8,$12,$13,$14,$15,$16}' | \
sed '1d;$d' | \
clickhouse-client --password YOUR_CLICKHOUSE_PASSWORD --query="INSERT INTO nyc_citibike_trips FORMAT CSV"; \
done

Open connection to ClickHouse from outside (optional)

Do this if you want to be able to connect to ClickHouse from the outside.

sudo vi /etc/clickhouse-server/config.xml

Find and Uncomment the line <listen_host>::</listen_host>. Restart ClickHouse.

sudo service clickhouse-server restart

Install Node.js

sudo apt-get install build-essential libssl-dev
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.34.0/install.sh | bash
source ~/.bashrc
nvm install v10.16.0

Clone flowmap.query repo

git clone https://github.com/teralytics/flowmap.query.git
cd flowmap.query

Install dependencies and build flowmap.query

npm install
echo CLICKHOUSE_URL="http://localhost:8123?enable_http_compression=1&password=YOUR_CLICKHOUSE_PASSWORD" > .env

You need to provide a Mapbox access token for the base map to work. Sign up here if you don't have one yet.

echo REACT_APP_MapboxAccessToken=pk… > client/.env
cd client && npm install && npm run build && cd ..

Install PM2 (a process manager for Node.js apps)

PM2 will automatically restart flowmap.query if it crashes and if the server restarts.

npm install pm2@latest -g
pm2 startup systemd

Start flowmap.query

env $(cat .env) NODE_ENV=production pm2 start backend/server.js