You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note none of what is below is set in stone by any means, please pick this apart and add any ideas or send me other things we should look into!
Still in draft
Data Types
Data Types are a key to how Singer works and are what enables us to have taps and targets be completely isolated from one another. We have a lot of taps that write to one format and targets that read from that one format and translate that format to the targets native type.
Data Types are mapped from a tap to JSONSchema , and a target maps from JSONSchema to its native type. Example of this is postgres maps data to the data types here https://www.postgresql.org/docs/current/datatype.html. Listed below is a mapping of each postgres type to its related json schema.
Why take the time to write this up?
Admittedly this is a selfish move as I move very slowly currently when dealing with types in target systems. I believe this is because we don't have a good best practice to follow, but it could just be my own incompetence! My hope here is that we can build a standard from this writeup that we push into the SDK docs as a good best practice while dealing with types. My experience so far with types and dealing with targets is that I have to manually test each mapping I care about to be sure things are doing what I want, I think we can make the README along with tests in the target itself explain thoroughly what will happen with each data type making data type issues become much more clear across the singer ecosystem.
What about normal "SaaS" taps
A SaaS tap is going to almost certainly have a subset of the type issues that a sql tap/target has, mainly because SaaS apps are pretty much all written with a database backend anyways that have the same types so you'll get some subset of those types in pretty much all cases.
What about customization? I want a different mapping than one that exists
Everyone has different use cases for their target. Some folks just need the data added to the target (This is the base case that we should always optimize for as getting data is infinitely better than failing and having no data), some folks need more attention paid to saving Space, or optimizing for speed and thus need smaller data types. Therefore using bigint for every integer may not be appropriate.
We should offer a mapping capability. I offer a configuration option called data_type_mappings which defaults to the mappings we offer but can be changed. I think the mapping would look something like
"data_type_mappings":
- "
What about custom types
SQL taps can have custom types like https://www.postgresql.org/docs/current/sql-createtype.html . I am not going to address this directly yet, but I think the closest thing to support for this could be started in the customization section.
Prior Art / What's missing
These are not in order, just numbering for something to reference later. I'm not sure of much prior writeups regarding types in the singer ecosystem, I'm sure they exist but after some searching I haven't found much. If someone could point me to some that would be great!
There is a singer.decimal format for Decimal strings. I havne't made the dive here but the general idea is that we can make a custom format like "PERSON_ID": {"format": "singer.decimal", "type": ["string"]}
README update that lists each postgres and jsonschema type and how
target-postgres maps between them.
Based on prior discussion, this is all that's desired right now in-terms
of #13 and #106, although this doesn't full address either.
Note none of what is below is set in stone by any means, please pick this apart and add any ideas or send me other things we should look into!
Still in draft
Data Types
Data Types are a key to how Singer works and are what enables us to have taps and targets be completely isolated from one another. We have a lot of taps that write to one format and targets that read from that one format and translate that format to the targets native type.
Data Types are mapped from a
tap
toJSONSchema
, and atarget
maps fromJSONSchema
to its native type. Example of this is postgres maps data to the data types here https://www.postgresql.org/docs/current/datatype.html. Listed below is a mapping of each postgres type to its related json schema.Why take the time to write this up?
Admittedly this is a selfish move as I move very slowly currently when dealing with types in target systems. I believe this is because we don't have a good best practice to follow, but it could just be my own incompetence! My hope here is that we can build a standard from this writeup that we push into the SDK docs as a good best practice while dealing with types. My experience so far with types and dealing with targets is that I have to manually test each mapping I care about to be sure things are doing what I want, I think we can make the README along with tests in the target itself explain thoroughly what will happen with each data type making data type issues become much more clear across the singer ecosystem.
What about normal "SaaS" taps
A SaaS
tap
is going to almost certainly have a subset of the type issues that a sqltap/target
has, mainly because SaaS apps are pretty much all written with a database backend anyways that have the same types so you'll get some subset of those types in pretty much all cases.What about customization? I want a different mapping than one that exists
Everyone has different use cases for their target. Some folks just need the data added to the target (This is the base case that we should always optimize for as getting data is infinitely better than failing and having no data), some folks need more attention paid to saving Space, or optimizing for speed and thus need smaller data types. Therefore using
bigint
for everyinteger
may not be appropriate.We should offer a mapping capability. I offer a configuration option called
data_type_mappings
which defaults to the mappings we offer but can be changed. I think the mapping would look something likeWhat about custom types
SQL taps can have custom types like https://www.postgresql.org/docs/current/sql-createtype.html . I am not going to address this directly yet, but I think the closest thing to support for this could be started in the customization section.
Prior Art / What's missing
These are not in order, just numbering for something to reference later. I'm not sure of much prior writeups regarding types in the singer ecosystem, I'm sure they exist but after some searching I haven't found much. If someone could point me to some that would be great!
singer.decimal
format for Decimal strings. I havne't made the dive here but the general idea is that we can make a custom format like"PERSON_ID": {"format": "singer.decimal", "type": ["string"]}
transferwise
has their own methodsWe should create a section in the Readme or maybe a test for each of these that has a mapping.
{"type": "integer"}
{ "type": "boolean" }
With every type from https://www.postgresql.org/docs/current/datatype.html listed.
We should also have a corresponding test for every supported Type. We should also give a writeup regarding how anyOf is handled.
The text was updated successfully, but these errors were encountered: