Safedelivr has the following core components
- User : A github oAuth user
- Batch : An Email Batch
- Logs : A status Log of each recipient
- Stats : A stats counter
- Worker: A queue consumer
User is a basic entity of the safedelivr system. It has the following key properties.
User_Id | Auth_Token | Api_Key | |
---|---|---|---|
Time UUID | varchar | Github Token | Unique Api_Key |
There isn't much to explain about the User entity as it is self explanatory.
A Batch is a fundamental entity defining an email task, whenever the API is been hit, system generates a Batch_Id corresponding to the email delivery job.
A Batch has the following key parameters.
Batch_Id | User_Id | Subject | Options |
---|---|---|---|
Time UUID | The owner of this batch | Subject Line of Email | A map of options associated to this job |
The Options property of the Batch is a map which has the following keys
Key | Value |
---|---|
to | A comma separated list of recipients |
from | Standard SMTP from email address field |
body | HTML string |
reply_to | Standard SMPTY reply_to email address field |
Whenever a client sends a batch creation task, system will acknowledge the request after performing necessary checks and will than save the job in the cassandra DB and publish a batch.batch-uuid
key on the AMQP server.
A typical flow of batch creation is depicted below.
A log is a bidirectional entity, it corresponds to an individual event/log for a given email address which has been dispatched by the provider. The logs are generated whenever the system receives an event corresponding to an email address via webhooks from the email provider. There are different types of events that are sent by the providers, the generic ones are the following.
Event Name | Meaning |
---|---|
processed | Provider has dispatched the email. |
sent/delivered | Email has been successfully delivered |
dropped | Mail has been dropped/bounced/hardfailed |
delayed | Mail service provider will retry sending the mail later |
From the above list, our system narrows down the events to success
, failure
and queued
.
An email/log is said to be in queued state if:
- Dispatched from our end, not yet received any status update.
- Failed by one/many providers and has been put back in queue for retrial with another provider.
An email/log is said to be in failed state if:
- Provider sends hardbounce/dropped/bounced event for that email.
- While retrying we exhausted out of the providers and don't have any options to retry with.
Success state is self explanatory. The main parameters of a Log doc are:
Log_Id | Batch_Id | User_Id | State | Status | |
---|---|---|---|---|---|
Time UUID | Associated Batch_Id | Associated User_Id | Recipient Email | queued/failed/success | A map<> that holds the status of retrials |
The status field is a boolean map which looks like this, it tells the system which provider we have retried with
{
"sendgrid": true,
"mailgun": false
}
This is a meta doc which keeps track of counts for daily failures, success and queued emails in the system. Doc structure is pretty basic.
User_Id | Date | success | failure | queued |
---|---|---|---|---|
User Id | Date of the stat | counter | counter | counter |
Overall Doc structure of complete system is shown below for reference.
Consumers are rabbitmq consumers listening on different queues, queues used by the system are categorized as:
- Generic Batch Processing consumer
- Individual Exclusive service provider based consumers
- Individual Exclusive Log retry consumers
Generic consumers are associated to the batch processing part, they keeps on processing new enqueued batch that needs to be dispensed off.
Generic consumers listens for batches on the routing key batch.#
, where # corresponds to the batch's UUID.
Whenever a generic consumer fails to dispatch the mail batch request, it enqueues the packet into the exclusive consumer queue of a provider other than itself with an appended .retry._int_retry_count_
routing key.
The appended retry
field in routing key helps the exclusive queues in taking decision as to when to stop trying.
These are service provider specific consumers, they will try to dispatch an email through the provider they are associated to e.g. Sendgrid, MailGun. They too follow the same method of retrial in case of failure i.e. Push into another service provider's queue.
These are also similar to individual batch consumers and consumes packets of log.provider_namespace.log-uuid
signature. Whenever our system receives a delayed/failure response via webhooks by a service provider and we know we can retry sending mail to that particular recipient with another mail provider, the system will push that logid to another service provider's namespaced consumers.
Currently the signature of routing keys in the system are the following:
Routing Key | Associated consumer |
---|---|
batch.00000000-0000-0000-0000-000000000000 | Round Robin based (sendgrid/mailgun) |
mg.00000000-0000-0000-0000-000000000000 | Mailgun |
sg.00000000-0000-0000-0000-000000000000 | Sendgrid |
mg.00000000-0000-0000-0000-000000000000.retry.#num | Mailgun |
sg.00000000-0000-0000-0000-000000000000.retry.#num | Sendgrid |
log.mg.00000000-0000-0000-0000-000000000000 | Mailgun |
log.sg.00000000-0000-0000-0000-000000000000 | Sendgrid |
log.sg.00000000-0000-0000-0000-000000000000.retry.#num | Sendgrid |
log.mg.00000000-0000-0000-0000-000000000000.retry.#num | Mailgun |
Whenever the system encounters a failure, it is bound to retry until any one of the following condition is met:
- Successfully dispatched through one of the provider
- Ran out of service providers
- Ran out of number of retries
A lifecycle of a Batch in fail safe mode is shown below
Lifecycle for a log in fail safe mode.
Addition of a new Email provider is quite easy as long as the provider has webhook feedback mechanism, to add a new email provider you will need to add the following things:
- Add a Channel and append it to the rabbit.EmailProvideres
- Implement a consumer. Consumers are generically executed, have a look at the sendgrid one to get an idea.
- Add a webhook controller for the same.
- Current architecture doesn't keep state of consumer based failures, we can have a consumer based failure states so that they can be taken offline if any of them is failing rigorously
- Instead of immediately pushing for retry in case of a provider failure we can use Dead Letter Exchange mechanism and add a delay in our retries, optionally we can make use of go context timeouts.
- An addition of a cron like consumer for delayed or unacknowledged entity will enhance the whole system's robustness.
- Allow user defined webhooks to make it a completely service driven framework
MIT LICENSED