Feature: add ibis meta data routers #603

onlyjackfrost · 2024-06-07T06:19:51Z

add ibis meta data routers

POST /v2/ibis/{datasource}/metadata/tables

Request body

for postgres

{
    "connectionInfo": {
        "host": "localhost",
        "port": "6432",
        "user": "postgres",
        "database": "postgres",
        "password": "postgres"
    }
}

for bigquery

{
    "connectionInfo": {
        "project_id": "wrenai",
        "dataset_id": "wrenai.ecommerce",
        "credentials": "your base64 encoded JSON string of your credential file"
    }
}

Response

name: unique table name (might contain schema name, depends on are you listing table across schema or not )
columns
- name: column name in the datasource
- type: column data type
- notNull: boolean, nullable or not
- description: column description(comment) if any
- properties: column properties if any
description: table description if any
properties
- schema: schema name to build tableReference
- catalog: catalog name to build tableReference
- table: table name to build tableReference
primaryKey: the column name which is bind with primary constraint

{
   [
        {
            "name": "public.nation",
            "columns": [
                {
                    "name": "n_nationkey",
                    "type": "INTEGER",
                    "notNull": true,
                    "description": "",
                    "properties": {}
                }
            ],
            "description": "",
            "properties": {
                "schema": "public",
                "catalog": "postgres",
                "table": "nation"
            },
            "primaryKey": ""
        },
  ]
}

POST /v2/ibis/{datasource}/metadata/constraints

Request body

Same as above

Response

constraints
- constraintName: unique constraint name ({constraint_table}_{constraint_column}_{constrainted_table}_{constrainted_column})
- constraintType: “FOREIGN KEY”
- constraintTable
- constraintColumn
- constraintedTable
- constraintedColumn

{
    [
        {
            "constraintName": "composite_pk_y_composite_fk_z",
            "constraintType": "FOREIGN KEY",
            "constraintTable": "composite_pk",
            "constraintColumn": "y",
            "constraintedTable": "composite_fk",
            "constraintedColumn": "z"
        }
    ]
}

goldmedal · 2024-06-07T06:27:43Z

Could you add some tests for them? It would make it more stable. Furthermore, could you add a description for the new API?

grieve54706 · 2024-06-07T06:50:15Z

Plz rebase the main branch and use Ruff to format codes. Follow #601

grieve54706 · 2024-06-07T06:43:54Z

ibis-server/app/model/metadata.py

+class Metadata(StrEnum):
+    postgres = auto()
+    bigquery = auto()
+
+    def get_table_list(self, connection_info):
+        if self == Metadata.postgres:
+            return self.get_postgres_table_list_sql(connection_info)
+        if self == Metadata.bigquery:
+            return self.get_bigquery_table_list_sql(connection_info)
+        raise NotImplementedError(f"Unsupported data source: {self}")
+
+    def get_constraints(self, connection_info):
+        if self == Metadata.postgres:
+            return self.get_postgres_table_constraints(connection_info)
+        if self == Metadata.bigquery:
+            return self.get_bigquery_table_constraints(connection_info)
+        raise NotImplementedError(f"Unsupported data source: {self}")


If every method needs to check the data source type, it should split the class by data source like PostgresMetadata(Metadata) and BigQueryMetadata(Metadata).

grieve54706 · 2024-06-07T06:46:36Z

ibis-server/app/model/metadata.py

+        res = to_json(
+            DataSource.postgres.get_connection(connection_info)
+            .sql(sql, dialect="trino")
+            .to_pandas()
+        )
+
+        # transform the result to a list of dictionaries
+        response = [
+            (
+                lambda x: {
+                    "table_catalog": x[0],
+                    "table_schema": x[1],
+                    "table_name": x[2],
+                    "column_name": x[3],
+                    "data_type": transform_postgres_column_type(x[4]),
+                    "is_nullable": x[5],
+                    "ordinal_position": x[6],
+                }
+            )(row)
+            for row in res["data"]
+        ]


You can use df.to_json(orient="records") to get data with columns like

[ { "col 1": "a", "col 2": "b" }, { "col 1": "c", "col 2": "d" } ]

Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html

grieve54706 · 2024-06-07T07:05:31Z

ibis-server/app/model/metadata.py

+                table = {
+                    "name": schema_table,
+                    "description": "",
+                    "columns": [],
+                    "properties": {
+                        "schema": row["table_schema"],
+                        "catalog": row["table_catalog"],
+                        "table": row["table_name"],
+                    },
+                    "primaryKey": "",
+                }
+                unique_tables[schema_table] = CompactTable(**table)


How about directly using class new?

unique_tables[schema_table] = CompactTable( name=schema_table, properties=CompactTableProperties( schema=row["table_schema"], catalog=row["table_catalog"], table=row["table_name"], ) )

CompactColumn too.

grieve54706 · 2024-06-07T07:09:03Z

ibis-server/app/model/metadata_dto.py

+    table: Optional[str]  # only table name without schema or catalog
+
+
+class CompactTable(BaseModel):


You can just name Table in the metadata_dto.py. Let's drop the redundant Compact

grieve54706 · 2024-06-07T07:12:16Z

ibis-server/app/model/metadata.py

+        compact_tables: list[CompactTable] = list(unique_tables.values())
+        return compact_tables


You can just return list(unique_tables.values()).

grieve54706 · 2024-06-07T07:34:23Z

ibis-server/app/model/metadata.py

+def transform_postgres_column_type(data_type):
+    # lower case the data_type
+    data_type = data_type.lower()
+    print(f"=== data_type: {data_type}")


Don't use print

grieve54706 · 2024-06-07T07:41:41Z

ibis-server/app/model/metadata.py

+    switcher = {
+        "text": WrenEngineColumnType.TEXT,
+        "char": WrenEngineColumnType.CHAR,
+        "character": WrenEngineColumnType.CHAR,
+        "bpchar": WrenEngineColumnType.CHAR,
+        "name": WrenEngineColumnType.CHAR,
+        "character varying": WrenEngineColumnType.VARCHAR,
+        "bigint": WrenEngineColumnType.BIGINT,
+        "int": WrenEngineColumnType.INTEGER,
+        "integer": WrenEngineColumnType.INTEGER,
+        "smallint": WrenEngineColumnType.SMALLINT,
+        "real": WrenEngineColumnType.REAL,
+        "double precision": WrenEngineColumnType.DOUBLE,
+        "numeric": WrenEngineColumnType.DECIMAL,
+        "decimal": WrenEngineColumnType.DECIMAL,
+        "boolean": WrenEngineColumnType.BOOLEAN,
+        "timestamp": WrenEngineColumnType.TIMESTAMP,
+        "timestamp without time zone": WrenEngineColumnType.TIMESTAMP,
+        "timestamp with time zone": WrenEngineColumnType.TIMESTAMPTZ,
+        "date": WrenEngineColumnType.DATE,
+        "interval": WrenEngineColumnType.INTERVAL,
+        "json": WrenEngineColumnType.JSON,
+        "bytea": WrenEngineColumnType.BYTEA,
+        "uuid": WrenEngineColumnType.UUID,
+        "inet": WrenEngineColumnType.INET,
+        "oid": WrenEngineColumnType.OID,
+    }


Use enum mapping.

class ColumnType(Enum): TEXT = ("TEXT", "text") def __init__(self, wtype, ptype): self.wtype = wtype self.ptype = ptype @property def wtype(self): return self.wtype @property def ptype(self): return self.ptype

grieve54706 · 2024-06-07T07:42:45Z

ibis-server/app/model/metadata_dto.py

+    connection_info: Union[
+        PostgresConnectionUrl | PostgresConnectionInfo,
+        BigQueryConnectionInfo,
+        SnowflakeConnectionInfo,
+    ] = Field(alias="connectionInfo")


You could just use connection_info: ConnectionInfo = Field(alias="connectionInfo")

grieve54706 · 2024-06-07T07:43:34Z

ibis-server/app/model/metadata_dto.py

+class WrenEngineColumnType(Enum):
+    # Boolean Types
+    BOOLEAN = "BOOLEAN"
+
+    # Numeric Types
+    TINYINT = "TINYINT"
+
+    INT2 = "INT2"
+    SMALLINT = "SMALLINT"  # alias for INT2
+
+    INT4 = "INT4"
+    INTEGER = "INTEGER"  # alias for INT4
+
+    INT8 = "INT8"
+    BIGINT = "BIGINT"  # alias for INT8
+
+    NUMERIC = "NUMERIC"
+    DECIMAL = "DECIMAL"
+
+    # Floating-Point Types
+    FLOAT4 = "FLOAT4"
+    REAL = "REAL"  # alias for FLOAT4
+
+    FLOAT8 = "FLOAT8"
+    DOUBLE = "DOUBLE"  # alias for FLOAT8
+
+    # Character Types
+    VARCHAR = "VARCHAR"
+    CHAR = "CHAR"
+    BPCHAR = "BPCHAR"  # BPCHAR is fixed-length blank padded string
+    TEXT = "TEXT"  # alias for VARCHAR
+    STRING = "STRING"  # alias for VARCHAR
+    NAME = "NAME"  # alias for VARCHAR
+
+    # Date/Time Types
+    TIMESTAMP = "TIMESTAMP"
+    TIMESTAMPTZ = "TIMESTAMP WITH TIME ZONE"
+    DATE = "DATE"
+    INTERVAL = "INTERVAL"
+
+    # JSON Types
+    JSON = "JSON"
+
+    # Object identifiers (OIDs) are used internally by PostgreSQL as primary keys for various system tables.
+    # https:#www.postgresql.org/docs/current/datatype-oid.html
+    OID = "OID"
+
+    # Binary Data Types
+    BYTEA = "BYTEA"
+
+    # UUID Type
+    UUID = "UUID"
+
+    # Network Address Types
+    INET = "INET"
+
+    # Unknown Type
+    UNKNOWN = "UNKNOWN"


Plz remove too many space lines.

grieve54706 · 2024-06-07T07:47:12Z

ibis-server/app/routers/ibis/bigquery.py

+@log_dto
+def get_bigquery_constraints(dto: MetadataDTO) -> dict:
+    table_list = Metadata.bigquery.get_constraints(dto.connection_info)
+    return {"constraints": table_list}


It doesn't have any other data. How about just returning a list to a JSON array?
BTW, plz add a empty line in the end of file.

grieve54706 · 2024-06-07T07:49:23Z

I saw you use POST for API in the codebase, but your description of PR is not mapped.

grieve54706 · 2024-06-11T06:59:34Z

ibis-server/tests/routers/ibis/test_bigquery.py

+    def test_metadata_list_tables(self):
+        connection_info = self.get_connection_info()
+        response = client.post(
+            url="/v2/ibis/bigquery/metadata/tables",
+            json={"connectionInfo": connection_info},
+        )
+        assert response.status_code == 200
+
+    def test_metadata_list_constraints(self):
+        connection_info = self.get_connection_info()
+        response = client.post(
+            url="/v2/ibis/bigquery/metadata/constraints",
+            json={"connectionInfo": connection_info},
+        )
+        assert response.status_code == 200


Please assert the contents.

I think we do not need to assert the content here, cause Pydentic will raise errors if the responding data structure is incorrect, and we do not care about the actual data in the data structure

No, FastAPI only checks the response type is dict. We should check the content like the data count or fields like name and column should be with it. The test is to sure the result follows our code design.

I'll update the test case and modify the data source's schema & constraints in another PR

grieve54706 reviewed Jun 7, 2024

View reviewed changes

onlyjackfrost force-pushed the feature/ibis-metadata branch from 5695865 to 53c41d8 Compare June 9, 2024 12:09

grieve54706 reviewed Jun 11, 2024

View reviewed changes

onlyjackfrost added 4 commits June 12, 2024 10:12

add ibis meta data routers

1186707

refactor metadata with factory method

7e1100a

add test cases and format code using ruff

21d3687

rebase main branch and fix the import path

3f86231

onlyjackfrost force-pushed the feature/ibis-metadata branch from 53c41d8 to 3f86231 Compare June 12, 2024 05:36

grieve54706 approved these changes Jun 12, 2024

View reviewed changes

grieve54706 merged commit a325b49 into main Jun 12, 2024
1 check passed

grieve54706 deleted the feature/ibis-metadata branch June 12, 2024 06:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: add ibis meta data routers #603

Feature: add ibis meta data routers #603

onlyjackfrost commented Jun 7, 2024 •

edited

Loading

goldmedal commented Jun 7, 2024

grieve54706 commented Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 Jun 7, 2024

grieve54706 commented Jun 7, 2024

grieve54706 Jun 11, 2024

onlyjackfrost Jun 12, 2024

grieve54706 Jun 12, 2024 •

edited

Loading

onlyjackfrost Jun 12, 2024 •

edited

Loading

		table: Optional[str] # only table name without schema or catalog


		class CompactTable(BaseModel):

		compact_tables: list[CompactTable] = list(unique_tables.values())
		return compact_tables

Feature: add ibis meta data routers #603

Feature: add ibis meta data routers #603

Conversation

onlyjackfrost commented Jun 7, 2024 • edited Loading

POST /v2/ibis/{datasource}/metadata/tables

Request body

Response

POST /v2/ibis/{datasource}/metadata/constraints

Request body

Response

goldmedal commented Jun 7, 2024

grieve54706 commented Jun 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grieve54706 commented Jun 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grieve54706 Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

onlyjackfrost Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

onlyjackfrost commented Jun 7, 2024 •

edited

Loading

grieve54706 Jun 12, 2024 •

edited

Loading

onlyjackfrost Jun 12, 2024 •

edited

Loading