Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Types - Document and Test #106

Open
visch opened this issue Feb 15, 2023 · 1 comment
Open

Data Types - Document and Test #106

visch opened this issue Feb 15, 2023 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@visch
Copy link
Member

visch commented Feb 15, 2023

Note none of what is below is set in stone by any means, please pick this apart and add any ideas or send me other things we should look into!

Still in draft

Data Types

Data Types are a key to how Singer works and are what enables us to have taps and targets be completely isolated from one another. We have a lot of taps that write to one format and targets that read from that one format and translate that format to the targets native type.

Data Types are mapped from a tap to JSONSchema , and a target maps from JSONSchema to its native type. Example of this is postgres maps data to the data types here https://www.postgresql.org/docs/current/datatype.html. Listed below is a mapping of each postgres type to its related json schema.

Why take the time to write this up?

Admittedly this is a selfish move as I move very slowly currently when dealing with types in target systems. I believe this is because we don't have a good best practice to follow, but it could just be my own incompetence! My hope here is that we can build a standard from this writeup that we push into the SDK docs as a good best practice while dealing with types. My experience so far with types and dealing with targets is that I have to manually test each mapping I care about to be sure things are doing what I want, I think we can make the README along with tests in the target itself explain thoroughly what will happen with each data type making data type issues become much more clear across the singer ecosystem.

What about normal "SaaS" taps

A SaaS tap is going to almost certainly have a subset of the type issues that a sql tap/target has, mainly because SaaS apps are pretty much all written with a database backend anyways that have the same types so you'll get some subset of those types in pretty much all cases.

What about customization? I want a different mapping than one that exists

Everyone has different use cases for their target. Some folks just need the data added to the target (This is the base case that we should always optimize for as getting data is infinitely better than failing and having no data), some folks need more attention paid to saving Space, or optimizing for speed and thus need smaller data types. Therefore using bigint for every integer may not be appropriate.

We should offer a mapping capability. I offer a configuration option called data_type_mappings which defaults to the mappings we offer but can be changed. I think the mapping would look something like

"data_type_mappings": 
  - "

What about custom types

SQL taps can have custom types like https://www.postgresql.org/docs/current/sql-createtype.html . I am not going to address this directly yet, but I think the closest thing to support for this could be started in the customization section.

Prior Art / What's missing

These are not in order, just numbering for something to reference later. I'm not sure of much prior writeups regarding types in the singer ecosystem, I'm sure they exist but after some searching I haven't found much. If someone could point me to some that would be great!

  1. There is a singer.decimal format for Decimal strings. I havne't made the dive here but the general idea is that we can make a custom format like "PERSON_ID": {"format": "singer.decimal", "type": ["string"]}
  2. https://github.com/singer-io/singer-python has its own methods for dealing with types
  3. transferwise has their own methods

We should create a section in the Readme or maybe a test for each of these that has a mapping.

Postgres Type JSONSchema Notes
bigint {"type": "integer"} All integers are mapped to bigint
bigserial N/A Not supported
bit N/A Not supported
bit varying N/A Not supported
boolean { "type": "boolean" }
box Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell
bigserial Content Cell Content Cell

With every type from https://www.postgresql.org/docs/current/datatype.html listed.

We should also have a corresponding test for every supported Type. We should also give a writeup regarding how anyOf is handled.

@tayloramurphy tayloramurphy added the documentation Improvements or additions to documentation label Feb 24, 2023
@visch
Copy link
Member Author

visch commented Jun 15, 2023

Related to #13

visch pushed a commit that referenced this issue Nov 28, 2023
README update that lists each postgres and jsonschema type and how
target-postgres maps between them.

Based on prior discussion, this is all that's desired right now in-terms
of #13 and #106, although this doesn't full address either.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants