-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: add ibis meta data routers #603
Conversation
Could you add some tests for them? It would make it more stable. Furthermore, could you add a description for the new API? |
Plz rebase the main branch and use Ruff to format codes. Follow #601 |
ibis-server/app/model/metadata.py
Outdated
class Metadata(StrEnum): | ||
postgres = auto() | ||
bigquery = auto() | ||
|
||
def get_table_list(self, connection_info): | ||
if self == Metadata.postgres: | ||
return self.get_postgres_table_list_sql(connection_info) | ||
if self == Metadata.bigquery: | ||
return self.get_bigquery_table_list_sql(connection_info) | ||
raise NotImplementedError(f"Unsupported data source: {self}") | ||
|
||
def get_constraints(self, connection_info): | ||
if self == Metadata.postgres: | ||
return self.get_postgres_table_constraints(connection_info) | ||
if self == Metadata.bigquery: | ||
return self.get_bigquery_table_constraints(connection_info) | ||
raise NotImplementedError(f"Unsupported data source: {self}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If every method needs to check the data source type, it should split the class by data source like PostgresMetadata(Metadata)
and BigQueryMetadata(Metadata)
.
ibis-server/app/model/metadata.py
Outdated
res = to_json( | ||
DataSource.postgres.get_connection(connection_info) | ||
.sql(sql, dialect="trino") | ||
.to_pandas() | ||
) | ||
|
||
# transform the result to a list of dictionaries | ||
response = [ | ||
( | ||
lambda x: { | ||
"table_catalog": x[0], | ||
"table_schema": x[1], | ||
"table_name": x[2], | ||
"column_name": x[3], | ||
"data_type": transform_postgres_column_type(x[4]), | ||
"is_nullable": x[5], | ||
"ordinal_position": x[6], | ||
} | ||
)(row) | ||
for row in res["data"] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use df.to_json(orient="records")
to get data with columns like
[
{
"col 1": "a",
"col 2": "b"
},
{
"col 1": "c",
"col 2": "d"
}
]
Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html
ibis-server/app/model/metadata.py
Outdated
table = { | ||
"name": schema_table, | ||
"description": "", | ||
"columns": [], | ||
"properties": { | ||
"schema": row["table_schema"], | ||
"catalog": row["table_catalog"], | ||
"table": row["table_name"], | ||
}, | ||
"primaryKey": "", | ||
} | ||
unique_tables[schema_table] = CompactTable(**table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about directly using class new?
unique_tables[schema_table] = CompactTable(
name=schema_table,
properties=CompactTableProperties(
schema=row["table_schema"],
catalog=row["table_catalog"],
table=row["table_name"],
)
)
CompactColumn
too.
table: Optional[str] # only table name without schema or catalog | ||
|
||
|
||
class CompactTable(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just name Table
in the metadata_dto.py. Let's drop the redundant Compact
ibis-server/app/model/metadata.py
Outdated
compact_tables: list[CompactTable] = list(unique_tables.values()) | ||
return compact_tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just return list(unique_tables.values())
.
ibis-server/app/model/metadata.py
Outdated
def transform_postgres_column_type(data_type): | ||
# lower case the data_type | ||
data_type = data_type.lower() | ||
print(f"=== data_type: {data_type}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use print
ibis-server/app/model/metadata.py
Outdated
switcher = { | ||
"text": WrenEngineColumnType.TEXT, | ||
"char": WrenEngineColumnType.CHAR, | ||
"character": WrenEngineColumnType.CHAR, | ||
"bpchar": WrenEngineColumnType.CHAR, | ||
"name": WrenEngineColumnType.CHAR, | ||
"character varying": WrenEngineColumnType.VARCHAR, | ||
"bigint": WrenEngineColumnType.BIGINT, | ||
"int": WrenEngineColumnType.INTEGER, | ||
"integer": WrenEngineColumnType.INTEGER, | ||
"smallint": WrenEngineColumnType.SMALLINT, | ||
"real": WrenEngineColumnType.REAL, | ||
"double precision": WrenEngineColumnType.DOUBLE, | ||
"numeric": WrenEngineColumnType.DECIMAL, | ||
"decimal": WrenEngineColumnType.DECIMAL, | ||
"boolean": WrenEngineColumnType.BOOLEAN, | ||
"timestamp": WrenEngineColumnType.TIMESTAMP, | ||
"timestamp without time zone": WrenEngineColumnType.TIMESTAMP, | ||
"timestamp with time zone": WrenEngineColumnType.TIMESTAMPTZ, | ||
"date": WrenEngineColumnType.DATE, | ||
"interval": WrenEngineColumnType.INTERVAL, | ||
"json": WrenEngineColumnType.JSON, | ||
"bytea": WrenEngineColumnType.BYTEA, | ||
"uuid": WrenEngineColumnType.UUID, | ||
"inet": WrenEngineColumnType.INET, | ||
"oid": WrenEngineColumnType.OID, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use enum mapping.
class ColumnType(Enum):
TEXT = ("TEXT", "text")
def __init__(self, wtype, ptype):
self.wtype = wtype
self.ptype = ptype
@property
def wtype(self):
return self.wtype
@property
def ptype(self):
return self.ptype
connection_info: Union[ | ||
PostgresConnectionUrl | PostgresConnectionInfo, | ||
BigQueryConnectionInfo, | ||
SnowflakeConnectionInfo, | ||
] = Field(alias="connectionInfo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just use connection_info: ConnectionInfo = Field(alias="connectionInfo")
class WrenEngineColumnType(Enum): | ||
# Boolean Types | ||
BOOLEAN = "BOOLEAN" | ||
|
||
# Numeric Types | ||
TINYINT = "TINYINT" | ||
|
||
INT2 = "INT2" | ||
SMALLINT = "SMALLINT" # alias for INT2 | ||
|
||
INT4 = "INT4" | ||
INTEGER = "INTEGER" # alias for INT4 | ||
|
||
INT8 = "INT8" | ||
BIGINT = "BIGINT" # alias for INT8 | ||
|
||
NUMERIC = "NUMERIC" | ||
DECIMAL = "DECIMAL" | ||
|
||
# Floating-Point Types | ||
FLOAT4 = "FLOAT4" | ||
REAL = "REAL" # alias for FLOAT4 | ||
|
||
FLOAT8 = "FLOAT8" | ||
DOUBLE = "DOUBLE" # alias for FLOAT8 | ||
|
||
# Character Types | ||
VARCHAR = "VARCHAR" | ||
CHAR = "CHAR" | ||
BPCHAR = "BPCHAR" # BPCHAR is fixed-length blank padded string | ||
TEXT = "TEXT" # alias for VARCHAR | ||
STRING = "STRING" # alias for VARCHAR | ||
NAME = "NAME" # alias for VARCHAR | ||
|
||
# Date/Time Types | ||
TIMESTAMP = "TIMESTAMP" | ||
TIMESTAMPTZ = "TIMESTAMP WITH TIME ZONE" | ||
DATE = "DATE" | ||
INTERVAL = "INTERVAL" | ||
|
||
# JSON Types | ||
JSON = "JSON" | ||
|
||
# Object identifiers (OIDs) are used internally by PostgreSQL as primary keys for various system tables. | ||
# https:#www.postgresql.org/docs/current/datatype-oid.html | ||
OID = "OID" | ||
|
||
# Binary Data Types | ||
BYTEA = "BYTEA" | ||
|
||
# UUID Type | ||
UUID = "UUID" | ||
|
||
# Network Address Types | ||
INET = "INET" | ||
|
||
# Unknown Type | ||
UNKNOWN = "UNKNOWN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz remove too many space lines.
@log_dto | ||
def get_bigquery_constraints(dto: MetadataDTO) -> dict: | ||
table_list = Metadata.bigquery.get_constraints(dto.connection_info) | ||
return {"constraints": table_list} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have any other data. How about just returning a list to a JSON array?
BTW, plz add a empty line in the end of file.
I saw you use POST for API in the codebase, but your description of PR is not mapped. |
5695865
to
53c41d8
Compare
def test_metadata_list_tables(self): | ||
connection_info = self.get_connection_info() | ||
response = client.post( | ||
url="/v2/ibis/bigquery/metadata/tables", | ||
json={"connectionInfo": connection_info}, | ||
) | ||
assert response.status_code == 200 | ||
|
||
def test_metadata_list_constraints(self): | ||
connection_info = self.get_connection_info() | ||
response = client.post( | ||
url="/v2/ibis/bigquery/metadata/constraints", | ||
json={"connectionInfo": connection_info}, | ||
) | ||
assert response.status_code == 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please assert the contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not need to assert the content here, cause Pydentic will raise errors if the responding data structure is incorrect, and we do not care about the actual data in the data structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, FastAPI only checks the response type is dict. We should check the content like the data count or fields like name
and column
should be with it. The test is to sure the result follows our code design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the test case and modify the data source's schema & constraints in another PR
53c41d8
to
3f86231
Compare
add ibis meta data routers
POST /v2/ibis/{datasource}/metadata/tables
Request body
for postgres
for bigquery
Response
POST /v2/ibis/{datasource}/metadata/constraints
Request body
Same as above
Response
({constraint_table}_{constraint_column}_{constrainted_table}_{constrainted_column})