This repository has been archived by the owner on Apr 10, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 5a4b61c
Showing
654 changed files
with
221,716 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
debug | ||
bin/moresql | ||
tmp | ||
resources | ||
bin | ||
moresql.json | ||
moresql-* | ||
moresql | ||
dist/ | ||
site/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
language: go | ||
|
||
go: | ||
- 1.6 | ||
- 1.7 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Borrowed from: | ||
# https://github.com/silven/go-example/blob/master/Makefile | ||
# https://vic.demuzere.be/articles/golang-makefile-crosscompile/ | ||
# https://ariejan.net/2015/10/03/a-makefile-for-golang-cli-tools/ | ||
# https://marmelab.com/blog/2016/02/29/auto-documented-makefile.html | ||
|
||
SOURCEDIR=. | ||
SOURCES := $(shell find $(SOURCEDIR) -name '*.go' -maxdepth 1 | grep -v main.go | grep -v _test.go) | ||
FILES = $(SOURCES) | ||
BINARY = moresql | ||
MAIN = cmds/moresql/main.go | ||
DATE_COMPILED = $(shell date -u +"%Y-%m-%dT%H:%M:%SZ") | ||
LDFLAGS_BASE = "-X main.version='$(shell git describe --abbrev=0 --tags --always)' -X main.BuildDate='$(DATE_COMPILED)' -X main.GitRef='$(shell git describe --tags --dirty --always)' -X main.GitSHA='$(shell git rev-parse --short HEAD)'" | ||
LDFLAGS = -ldflags $(LDFLAGS_BASE) | ||
# Symlink into GOPATH | ||
GITHUB_USERNAME=zph | ||
BUILD_DIR=${GOPATH}/src/github.com/${GITHUB_USERNAME}/${BINARY} | ||
CURRENT_DIR=$(shell pwd) | ||
BUILD_DIR_LINK=$(shell readlink ${BUILD_DIR}) | ||
GOARCH = amd64 | ||
.DEFAULT_GOAL := help | ||
|
||
# Build the project | ||
all: clean fmt test_full linux build docs | ||
|
||
$(BINARY): $(FILES) $(MAIN) ## Build binary for current system architecture | ||
go build $(LDFLAGS) -o bin/$(BINARY) $(MAIN) | ||
|
||
build: $(BINARY) | ||
|
||
heroku: build ## Used by heroku build process | ||
|
||
flags: | ||
@echo "$(LDFLAGS_BASE)" | ||
|
||
test: ## Run tests | ||
go test -v | ||
|
||
test_full: ## Test with race and coverage | ||
go test -v -race -cover | ||
|
||
linux: | ||
GOOS=linux GOARCH=${GOARCH} go build $(LDFLAGS) -o bin/$(BINARY)-linux-${GOARCH} $(MAIN) | ||
|
||
# darwin: | ||
# cd ${BUILD_DIR}; \ | ||
# GOOS=darwin GOARCH=${GOARCH} go build ${LDFLAGS} -o bin/${BINARY}-darwin-${GOARCH} . ; \ | ||
# cd - >/dev/null | ||
|
||
fmt: ## Go fmt the code | ||
cd ${BUILD_DIR}; \ | ||
go fmt $$(go list ./... | grep -v /vendor/) ; \ | ||
cd - >/dev/null | ||
|
||
clean: ## Clean out the generated binaries | ||
-rm -f bin/${BINARY}-* | ||
-rm -f bin/${BINARY} | ||
|
||
docs: clean ## Regenerate README.md from template | ||
@./bin/update-readme | ||
@echo "If changes occured in README.md that you want in mkdocs run:" | ||
@echo "cp -f README.md docs/README.md" | ||
|
||
docs-deploy: | ||
@git diff-index --quiet HEAD -- || (echo "Only allowed with clean working directory" && exit 1) | ||
@mkdocs gh-deploy | ||
|
||
# Allows building whether in GOPATH or not | ||
# link: | ||
# BUILD_DIR=${BUILD_DIR}; \ | ||
# BUILD_DIR_LINK=${BUILD_DIR_LINK}; \ | ||
# CURRENT_DIR=${CURRENT_DIR}; \ | ||
# if [ "$${BUILD_DIR_LINK}" != "$${CURRENT_DIR}" ]; then \ | ||
# echo "Fixing symlinks for build"; \ | ||
# rm -f $${BUILD_DIR}; \ | ||
# ln -s $${CURRENT_DIR} $${BUILD_DIR}; \ | ||
# fi | ||
|
||
help: ## prints help | ||
@ cat $(MAKEFILE_LIST) | grep -e "^[a-zA-Z_\-]*: *.*## *" | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' | ||
|
||
.PHONY: link linux darwin test fmt clean help |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,230 @@ | ||
# MoreSQL | ||
|
||
[](NOTE: README.md is a generated FILE changes belong in docs/README.template.md. Update with make docs) | ||
[](https://travis-ci.org/zph/moresql) | ||
[](https://godoc.org/github.com/zph/moresql) | ||
|
||
# WARNING ALPHA project | ||
|
||
Anything and everything can change (including git history can change at this point). | ||
|
||
*This is a young project that implements a barebones set of the features in MoSQL. Assess the capability and stability before relying on it in production.* | ||
|
||
Warning will be removed and git will stabilize after official release and announcement. | ||
|
||
## Introduction | ||
|
||
MoreSQL streams changes occuring in Mongo database into a Postgres db. MoreSQL tails the oplog and generates appropriate actions against Postgres. MoreSQL has the ability to do full synchronizations using `UPSERTS`, with the benefit over `INSERTS` that this can be executed against tables with existing data. | ||
|
||
MoreSQL gives you a chance to use more sql and less mongo query language. | ||
|
||
# Usage | ||
|
||
## Basic Use | ||
|
||
### Tail | ||
|
||
`./moresql -tail -config-file=moresql.json` | ||
|
||
Tail is the primary run mode for MoreSQL. When tailing, the oplog is observed for novely and each INSERT/UPDATE/DELETE is translated to its SQL equivalent, then executed against Postgres. | ||
|
||
Tail makes a best faith effort to do this and does not use checkpoint markers to track position in the oplog. It may be introduced in later releases. Or we could introduce a way to split MoreSQL into a producer (oplog tail) that puts records onto stream (Kinesis/Kafka/etc) and a consumer that reads from the stream. By doing so, we'd avoid re-implmenting checkpoints in MoreSQL. | ||
|
||
Given that `tail` mode executes `UPSERTS` instead of `INSERT || UPDATE`, we expect MoreSQL to be roughly eventually consistent. We're chosing to prioritize speed of execution (multiple workers) in lieu of some consistency. This helps to keep low latency with larger workloads. | ||
|
||
### Full Sync | ||
|
||
`./moresql -full-sync -config-file=moresql.json` | ||
|
||
Full sync is useful when first setting up a MoreSQL installation to port the existing Mongo data to Postgres. We recommend setting up a tailing instance first. Once that's running, do a full sync in different process. This should put the Mongo and Postgres into identical states. | ||
|
||
Given the nature of streaming replica data from Mongo -> Postgres, it's recommended to run full sync at intervals in order to offset losses that may have occured during network issues, system downtime, etc. | ||
|
||
### Documentation | ||
|
||
https://zph.github.io/moresql/ | ||
|
||
[](https://godoc.org/github.com/zph/moresql) | ||
|
||
## QuickStart | ||
|
||
### Introduction | ||
|
||
* Create metadata table | ||
* Setup moresql.json | ||
* Setup any recipient tables in postgres | ||
* Validate with `./moresql -validate` | ||
* Deploy binary to server | ||
* Configure Environmental variables | ||
* Run `./moresql -tail` to start transmitting novelty | ||
* Run `./moresql -full-sync` to populate the database | ||
|
||
### Table Setup | ||
|
||
```sql | ||
-- Execute the following SQL to setup table in Postgres. Replace $USERNAME with the moresql user. | ||
-- create the moresql_metadata table for checkpoint persistance | ||
CREATE TABLE public.moresql_metadata | ||
( | ||
app_name TEXT NOT NULL, | ||
last_epoch INT NOT NULL, | ||
processed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() NOT NULL | ||
); | ||
-- Setup mandatory unique index | ||
CREATE UNIQUE INDEX moresql_metadata_app_name_uindex ON public.moresql_metadata (app_name); | ||
|
||
-- Grant permissions to this user, replace $USERNAME with moresql's user | ||
GRANT SELECT, UPDATE, DELETE ON TABLE public.moresql_metadata TO $USERNAME; | ||
|
||
COMMENT ON COLUMN public.moresql_metadata.app_name IS 'Name of application. Used for circumstances where multiple apps stream to same PG instance.'; | ||
COMMENT ON COLUMN public.moresql_metadata.last_epoch IS 'Most recent epoch processed from Mongo'; | ||
COMMENT ON COLUMN public.moresql_metadata.processed_at IS 'Timestamp for when the last epoch was processed at'; | ||
COMMENT ON TABLE public.moresql_metadata IS 'Stores checkpoint data for MoreSQL (mongo->pg) streaming'; | ||
``` | ||
|
||
### Building Binary | ||
|
||
Compile binary using `make build` | ||
|
||
### Commandline Arguments / Usage | ||
|
||
Execute `./moresql --help` | ||
|
||
``` | ||
./bin/moresql | ||
Repo https://github.com/zph/moresql | ||
Usage of ./bin/moresql: | ||
-allow-deletes | ||
Allow deletes to propagate from Mongo -> PG (default true) | ||
-app-name string | ||
AppName used in Checkpoint table (default "moresql") | ||
-checkpoint | ||
Store and restore from checkpoints in PG table: moresql_metadata | ||
-config-file string | ||
Configuration file to use (default "moresql.json") | ||
-create-table-sql | ||
Print out the necessary SQL for creating metadata table required for checkpointing | ||
-enable-monitor | ||
Run expvarmon endpoint | ||
-error-reporting string | ||
Error reporting tool to use (currently only supporting Rollbar) | ||
-full-sync | ||
Run full sync for each db.collection in config | ||
-memprofile string | ||
Profile memory usage. Supply filename for output of memory usage | ||
-mongo-url MONGO_URL | ||
MONGO_URL aka connection string | ||
-postgres-url POSTGRES_URL | ||
POSTGRES_URL aka connection string | ||
-replay-duration duration | ||
Last x to replay ie '1s', '5m', etc as parsed by Time.ParseDuration. Will be subtracted from time.Now() | ||
-ssl-cert string | ||
SSL PEM cert for Mongodb | ||
-tail | ||
Tail mongodb for each db.collection in config | ||
-validate | ||
Validate the postgres table structures and exit | ||
``` | ||
|
||
### Validation of Configuration + Postgres Schema | ||
|
||
`./moresql -validate` | ||
|
||
This will report any issues related to the postgres schema being a mis-match for the fields and tables setup in configuration. | ||
|
||
# Requirements, Stability and Versioning | ||
|
||
MoreSQL is expected and built with Golang 1.6, 1.7 and master in mind. Broken tests on these versions indicates a bug. | ||
|
||
MoreSQL requires Postgres 9.5+ due to usage of UPSERTs. Using UPSERTs simplifies internal logic but also depends on UNIQUE indexes existing on each `_id` column in Postgres. See `moresql -validate` for advice. | ||
|
||
# Miscellanea | ||
|
||
### Error Reporting | ||
|
||
Available through Rollbar. PRs welcome for other services. We currently use Rollus | ||
which reports errors synchronously. If this is a performance bottleneck please PR or issue. | ||
|
||
Enable this by two steps: | ||
|
||
``` | ||
export ERROR_REPORTING_TOKEN=asdfasdfasdf | ||
export APP_ENV=[production, development, or staging] | ||
``` | ||
|
||
And when running application use the following flag to enable reporting: | ||
|
||
`./moresql -tail -error-reporting "rollbar"` | ||
|
||
If these steps are not followed, errors will be reported out solely via logging. | ||
|
||
### Environmental Variables used in Moresql | ||
|
||
``` | ||
MONGO_URL | ||
POSTGRES_URL | ||
ERROR_REPORTING_TOKEN | ||
APP_ENV | ||
DYNO | ||
LOG_LEVEL | ||
``` | ||
|
||
### Mongo types | ||
|
||
We guard against a few of these for conversion into Postgres friendly types. | ||
|
||
Objects and Arrays do not behave properly when inserting into Postgres. These will be automatically converted into their JSON representation before inserting into Postgres. | ||
|
||
As of writing, any BsonID/ObjectId should be noted as `id` type in `Fields.Mongo.Type` to facilitate this. In the future we may assume that all fields ending in `_id` are Id based fields and require conversion. | ||
|
||
## Converting from MoSQL | ||
|
||
Run the ./bin/convert_config_from_mosql_to_moresql script in a folder with `collections.yml` | ||
|
||
``` | ||
ruby ./bin/convert_config_from_mosql_to_moresql collection.yml | ||
``` | ||
|
||
The generated file `moresql.json` will be in place ready for use. | ||
|
||
## Unsupported Features | ||
|
||
These features are part of mosql but not implemented in MoreSQL. PRs welcome. | ||
|
||
* Dot Notation for nested structures | ||
* extra_props field for spare data | ||
* Automatic creation of tables/columns | ||
|
||
## Performance | ||
|
||
During benchmarking when moresql is asked to replay existing events from oplog we've seen the following performance with the following configurations: | ||
|
||
5 workers per collection | ||
500 generic workers | ||
On a Heroku 1X dyno | ||
|
||
``` | ||
~ $ ./moresql -tail -replay-duration "5000m" | grep "Rate of" | ||
{"level":"info","msg":"Rate of insert per min: 532","time":"2017-02-23T01:49:31Z"} | ||
{"level":"info","msg":"Rate of update per min: 44089","time":"2017-02-23T01:49:31Z"} | ||
{"level":"info","msg":"Rate of delete per min: 1","time":"2017-02-23T01:49:31Z"} | ||
{"level":"info","msg":"Rate of read per min: 91209","time":"2017-02-23T01:49:31Z"} | ||
{"level":"info","msg":"Rate of skipped per min: 46587","time":"2017-02-23T01:49:31Z"} | ||
``` | ||
|
||
Approximately 700 updates/sec and 1500 reads/sec is our top observed throughput so far. Please submit PRs with further numbers using a similar command. | ||
|
||
We expect the following bottlenecks: connection count in Postgres, pg connection limitations in Moresql (for safety), network bandwidth, worker availability. | ||
|
||
At this level of throughput, Moresql uses ~90MB RAM. At low idle throughput of 10-20 req/sec it consumes ~30MB RAM. | ||
|
||
In another benchmark when updating 28k documents simultaneously, we observed mean lag of ~ 500ms and 95% of requests arrived in <= 1194ms between when the document was updated in Mongo and when it arrived in Postgres. | ||
|
||
See full [performance information](https://zph.github.io/moresql/performance/) | ||
|
||
For a general discussion of UPSERT performance in Postgres: https://mark.zealey.org/2016/01/08/how-we-tweaked-postgres-upsert-performance-to-be-2-3-faster-than-mongodb | ||
|
||
# Credit and Prior Art | ||
|
||
* [MoSQL](https://github.com/stripe/mosql) - the project we used for 3 yrs at work and then retired with MoreSQL. Thanks Stripe! | ||
* [GTM](https://github.com/rwynn/gtm) - the go library that builds on mgo to wrap the tailing and oplog interface in a pleasant API. rwynn was a large help with improving GTM's performance with varying levels of consistency guarantees. |
Oops, something went wrong.