Another way to create a cheap document-base database with an easy ORM to handle your dataset!
First of all:
- Nothing is for free, but it can be cheaper.
- I'm not responsible for your AWS Costs strategy, use
s3db.js
at your own risk. - Please, do not use in production!
Let's go!
You might know AWS's S3 product for its high availability and its cheap pricing rules. I'll show you another clever and funny way to use S3.
AWS allows you define Metadata
to every single file you upload into your bucket. This attribute must be defined within a 2kb limit using in UTF-8
encoding. As this encoding may vary the bytes width for each symbol you may use [500 to 2000] chars of metadata storage. Follow the docs at AWS S3 User Guide: Using metadata.
There is another management subset of data called tags
that is used globally as [key, value] params. You can assign 10 tags with the conditions of: the key must be at most 128 unicode chars lengthy and the value up to 256 chars. With those key-values we can use more 2.5kb
of data, unicode will allow you to use up to 2500 more chars. Follow the official docs at AWS User Guide: Object Tagging.
With all this set you may store objects that should be able to store up to 4.5kb
of free space per object.
Check the cost simulation section below for a deep cost dive!
Lets give it a try! :)
You may check the snippets bellow or go straight to the Examples section!
npm i s3db.js
# or
yarn add s3db.js
Our S3db client use connection string params.
import { S3db } from "s3db.js";
const {
AWS_BUCKET,
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
} = process.env
const s3db = new S3db({
uri: `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/databases/mydatabase`
});
s3db
.connect()
.then(() => console.log('connected!')))
If you do use dotenv
package:
import * as dotenv from "dotenv";
dotenv.config();
import { S3db } from "s3db.js";
-
This implementation of ORM simulates a document repository. Due to the fact that
s3db.js
usesaws-sdk
's' S3 api; all requests are GET/PUT askey=value
resources. So the best case scenario is to access like a document implementation. -
For better use of the cache and listing, the best ID format is to use sequential ids with leading zeros (eq: 00001, 00002, 00003) due to S3 internal keys sorting method. But you will need to manage this incremental ID by your own.
Your s3db.js
client can be initiated with options:
option | optional | description | type | default |
---|---|---|---|---|
cache | true | Persist searched data to reduce repeated requests | boolean |
undefined |
parallelism | true | Number of simultaneous tasks | number |
10 |
passphrase | true | Your encryption secret | string |
undefined |
ttl | true | (Coming soon) TTL to your cache duration in seconds | number |
86400 |
uri | false | A url as your S3 connection string | string |
undefined |
Config example:
const {
AWS_BUCKET = "my-bucket",
AWS_ACCESS_KEY_ID = "secret",
AWS_SECRET_ACCESS_KEY = "secret",
AWS_BUCKET_PREFIX = "databases/test-" + Date.now(),
} = process.env;
const uri = `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/${AWS_BUCKET_PREFIX}`;
const options = {
uri,
parallelism: 25,
passphrase: fs.readFileSync("./cert.pem"),
};
This method must always be invoked before any operation take place. This will interact with AWS' S3 api and check the itens below:
- With current credentials:
- Check if client has access to the S3 bucket.
- Check if client has access to bucket life-cycle policies.
- With defined database:
- Check if there is already a database in this connection string.
- If any database is found, downloads it's medatada and loads each
Resource
definition. - Else, it will generate an empty
metadata
file into this prefix and mark that this is a new database from scratch.
- If any database is found, downloads it's medatada and loads each
- Check if there is already a database in this connection string.
s3db.js
will generate a file /s3db.json
at the pre-defined prefix with this structure:
{
// file version
"version": "1",
// previously defined resources
"resources": {
// definition example
"leads": {
"name": "leads",
// resource options
"options": {},
// resource defined schema
"schema": {
"name": "string",
"token": "secret"
},
// rules to simplify metadata usage
"mapper": {
"name": "0",
"token": "1"
},
}
}
}
Resources are definitions of data collections.
// resource
const attributes = {
utm: {
source: "string|optional",
medium: "string|optional",
campaign: "string|optional",
term: "string|optional",
},
lead: {
fullName: "string",
mobileNumber: "string",
personalEmail: "email",
},
};
const resource = await s3db.createResource({
name: "leads",
attributes,
});
Resources' names cannot prefix each other, like: leads
and leads-copy
! S3's api lists keys using prefix notation, so every time you list leads
, all keys of leads-copy
will appear as well.
s3db.js
use the fastest-validator package to define and validate your resource. Some few examples:
const attributes = {
// few simple examples
name: "string|min:4|max:64|trim",
email: "email|nullable",
mobile: "string|optional",
count: "number|integer|positive",
corrency: "corrency|symbol:R$",
createdAt: "date",
website: "url",
id: "uuid",
ids: "array|items:uuid|unique",
// s3db defines a custom type "secret" that is encrypted
token: "secret",
// nested data works aswell
geo: {
lat: "number",
long: "number",
city: "string",
},
// may have multiple definitions.
address_number: ["string", "number"],
};
You may just use the reference:
const Leads = s3db.resource("leads");
As we need to store the resource definition within a JSON file, to keep your definitions intact the best way is to use the string-based shorthand definitions in your resource definition.
By design, the resource definition will will strip all functions in attributes to avoid eval()
calls.
The fastest-validator
starts with the params below:
// fastest-validator params
{
useNewCustomCheckerFunction: true,
defaults: {
object: {
strict: "remove",
},
},
}
Consider resource
as:
const resource = s3db.resource("leads");
// data
const insertedData = await resource.insert({
id: "[email protected]", // if not defined a id will be generated!
utm: {
source: "abc",
},
lead: {
fullName: "My Complex Name",
personalEmail: "[email protected]",
mobileNumber: "+5511234567890",
},
invalidAttr: "this attribute will disappear",
});
// {
// id: "[email protected]",
// utm: {
// source: "abc",
// },
// lead: {
// fullName: "My Complex Name",
// personalEmail: "[email protected]",
// mobileNumber: "+5511234567890",
// },
// invalidAttr: "this attribute will disappear",
// }
If not defined an id attribute, s3db.js
will use nanoid
to generate a random unique id!
const obj = await resource.get("[email protected]");
// {
// id: "[email protected]",
// utm: {
// source: "abc",
// },
// lead: {
// fullName: "My Complex Name",
// personalEmail: "[email protected]",
// mobileNumber: "+5511234567890",
// },
// }
const obj = await resource.update("[email protected]", {
lead: {
fullName: "My New Name",
mobileNumber: "+5511999999999",
},
});
// {
// id: "[email protected]",
// utm: {
// source: "abc",
// },
// lead: {
// fullName: "My New Name",
// personalEmail: "[email protected]",
// mobileNumber: "+5511999999999",
// },
// }
await resource.delete(id);
await resource.count();
// 101
You may bulk insert data with a friendly method that receives a list of objects.
const objects = new Array(100).fill(0).map((v, k) => ({
id: `bulk-${k}@mymail.com`,
lead: {
fullName: "My Test Name",
personalEmail: `bulk-${k}@mymail.com`,
mobileNumber: "+55 11 1234567890",
},
}));
await resource.insertMany(objects);
Keep in mind that we need to send a request to each object to be created. There is an option to change the amount of simultaneos connections that your client will handle.
const s3db = new S3db({
parallelism: 100, // default = 10
});
This method uses supercharge/promise-pool
to organize the parallel promises.
await resource.getMany(["id1", "id2", "id3 "]);
// [
// obj1,
// obj2,
// obj3,
// ]
const data = await resource.getAll();
// [
// obj1,
// obj2,
// ...
// ]
await resource.deleteMany(["id1", "id2", "id3 "]);
await resource.deleteAll();
const ids = await resource.listIds();
// [
// 'id1',
// 'id2',
// 'id3',
// ]
As we need to request the metadata for each id to return it's attributes, a better way to handle a huge amount off data might be using streams.
const readableStream = await resource.readable();
readableStream.on("id", (id) => console.log("id =", id));
readableStream.on("data", (lead) => console.log("lead.id =", lead.id));
readableStream.on("end", console.log("end"));
const writableStream = await resource.writable();
writableStream.write({
lead: {
fullName: "My Test Name",
personalEmail: `bulk-${k}@mymail.com`,
mobileNumber: "+55 11 1234567890",
},
});
s3db.js
has a S3 proxied client named S3Client
. It brings a few handy and less verbose functions to deal with AWS S3's api.
import { S3Client } from "s3db.js";
const client = new S3Client({ connectionString });
Each method has a 🔗 link to the official aws-sdk
docs.
getObject 🔗
const { Body, Metadata } = await client.getObject({
key: `my-prefixed-file.csv`,
});
// AWS.Response
putObject 🔗
const response = await client.putObject({
key: `my-prefixed-file.csv`,
contentType: "text/csv",
metadata: { a: "1", b: "2", c: "3" },
body: "a;b;c\n1;2;3\n4;5;6",
});
// AWS.Response
headObject 🔗
const { Metadata } = await client.headObject({
key: `my-prefixed-file.csv`,
});
// AWS.Response
deleteObject 🔗
const response = await client.deleteObject({
key: `my-prefixed-file.csv`,
});
// AWS.Response
deleteObjects 🔗
const response = await client.deleteObjects({
keys: [`my-prefixed-file.csv`, `my-other-prefixed-file.csv`],
});
// AWS.Response
listObjects 🔗
const response = await client.listObjects({
prefix: `my-subdir`,
});
// AWS.Response
Custom made method to make it easier to count keys within a listObjects loop.
const count = await client.count({
prefix: `my-subdir`,
});
// 10
Custom made method to make it easier to return all keys in a subpath within a listObjects loop.
All returned keys will have the it's fullpath replaced with the current "scope" path.
const keys = await client.getAllKeys({
prefix: `my-subdir`,
});
// [
// key1,
// key2,
// ...
// ]
The 3 main classes S3db
, Resource
and S3Client
are extensions of Javascript's EventEmitter
.
S3Database | S3Client | S3Resource | S3Resource Readable Stream |
---|---|---|---|
error | error | error | error |
connected | request | insert | id |
response | get | data | |
response | update | ||
getObject | delete | ||
putObject | count | ||
headObject | insertMany | ||
deleteObject | deleteAll | ||
deleteObjects | listIds | ||
listObjects | getMany | ||
count | getAll | ||
getAllKeys |
s3db.on("error", (error) => console.error(error));
s3db.on("connected", () => {});
Using this reference for the events:
const client = s3db.client;
client.on("error", (error) => console.error(error));
Emitted when a request is generated to AWS.
client.on("request", (action, params) => {});
Emitted when a response is received from AWS.
client.on("response", (action, params, response) => {});
client.on("getObject", (options, response) => {});
client.on("putObject", (options, response) => {});
client.on("headObject", (options, response) => {});
client.on("deleteObject", (options, response) => {});
client.on("deleteObjects", (options, response) => {});
client.on("listObjects", (options, response) => {});
client.on("count", (options, response) => {});
client.on("getAllKeys", (options, response) => {});
Using this reference for the events:
const resource = s3db.resource("leads");
resource.on("error", (err) => console.error(err));
resource.on("insert", (data) => {});
resource.on("get", (data) => {});
resource.on("update", (attrs, data) => {});
resource.on("delete", (id) => {});
resource.on("count", (count) => {});
resource.on("insertMany", (count) => {});
resource.on("getMany", (count) => {});
resource.on("getAll", (count) => {});
resource.on("deleteAll", (count) => {});
resource.on("listIds", (count) => {});
Anatomy of a plugin:
const MyPlugin = {
setup(s3db: S3db) {},
start() {},
};
We have an example of a costs simulator plugin here!
S3's pricing deep dive:
- Data volume [1 GB x 0.023 USD]: it relates to the total volume of storage used and requests volume but, in this implementation, we just upload
0 bytes
files. - GET Requests [1,000 GET requests in a month x 0.0000004 USD per request = 0.0004 USD]: every read requests
- PUT Requests [1,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 0.005 USD]: every write request
- Data transfer [Internet: 1 GB x 0.09 USD per GB = 0.09 USD]:
Check by yourself the pricing page details at https://aws.amazon.com/s3/pricing/ and https://calculator.aws/#/addService/S3.
Lets try to simulate a big project where you have a database with a few tables:
- pageviews: 100,000,000 lines of 100 bytes each
- leads: 1,000,000 lines of 200 bytes each
const Fakerator = require("fakerator");
const fake = Fakerator("pt-BR");
const pageview = {
ip: this.faker.internet.ip(),
domain: this.faker.internet.url(),
path: this.faker.internet.url(),
query: `?q=${this.faker.lorem.word()}`,
};
const lead = {
name: fake.names.name(),
mobile: fake.phone.number(),
email: fake.internet.email(),
country: "Brazil",
city: fake.address.city(),
state: fake.address.countryCode(),
address: fake.address.street(),
};
If you write the whole database of:
- pageviews:
- 100,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 500.00 USD (S3 Standard PUT requests cost)
- leads:
- 1,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 5.00 USD (S3 Standard PUT requests cost)
It will cost 505.00 USD, once.
If you want to read the whole database:
- pageviews:
- 100,000,000 GET requests in a month x 0.0000004 USD per request = 40.00 USD (S3 Standard GET requests cost)
- (100,000,000 × 100 bytes)÷(1024×1000×1000) ≅ 10 Gb Internet: 10 GB x 0.09 USD per GB = 0.90 USD
- leads:
- 1,000,000 GET requests in a month x 0.0000004 USD per request = 0.40 USD (S3 Standard GET requests cost)
- (1,000,000 × 200 bytes)÷(1024×1000×1000) ≅ 0.19 Gb Internet: 1 GB x 0.09 USD per GB = 0.09 USD
It will cost 41.39 USD, once.
Lets save some JWT tokens using the RFC:7519.
await s3db.createResource({
name: "tokens",
attributes: {
iss: 'url|max:256',
sub: 'string',
aud: 'string',
exp: 'number',
email: 'email',
name: 'string',
scope: 'string',
email_verified: 'boolean',
})
function generateToken () {
const token = createTokenLib(...)
await resource.insert({
id: token.jti || md5(token)
...token,
})
return token
}
function validateToken (token) {
const id = token.jti || md5(token)
if (!validateTokenSignature(token, ...)) {
await resource.deleteById(id)
throw new Error('invalid-token')
}
return resource.getById(id)
}
Tasks board can be found at this link!
Feel free to interact and PRs are welcome! :)