This repository has been archived by the owner on Jun 29, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
ANALYZE STAR SCHEMA
ali1rathore edited this page Sep 10, 2018
·
1 revision
analyzeStarSchema : ANALYZE STAR SCHEMA ON qualifiedId
opt(
AS
repsep(
(tableInfo | joinInfo | columnEqInfo),
opt(",")
)
)
tableInfo : TABLE qualifiedCol
repsep(
keyColumns | functionalDep | levelHierarchy | parentChildHierarchy,
opt(",")
)
joinInfo : opt(starRelationType) opt(INDEX)
JOIN OF qualifiedId WITH qualifiedId ON repsep(joinEqualityCond, AND)
joinEqualityCond : qualifiedCol "=" qualifiedCol
columnEqInfo : COLUMNS qualifiedCol "=" qualifiedCol
functionalDep : opt(COLUMNS) qualifiedCols DETERMINES opt(COLUMNS) qualifiedCols
levelHierarchy : LEVEL ~ HIERARCHY qualifiedCols
parentChildHierarchy : PARENT CHILD HIERARCHY qualifiedCol opt(",") qualifiedCol
keyColumns : KEY ~ opt(COLUMNS) qualifiedCols
qualifiedCols : repsep(qualifiedCol, ",")
qualifiedCol : ident opt("." ident) ~ opt("." ident)
Use analyze command to profile the columns in the Tables involved in the Star Schema.
This command computes the distinct_count, count, percentMissing and percentUnique
values for each column.
Note: The
analyze star schema
statement is only supported for tables that do not already have a star schema defined.
###Checks
- Validates table and column existence
- All Table Keys are specified or can be inferred
- Any join with the
Fact Table
must be on the Dimension Tablekey
columns - Any join condition must have at least 1-side that involves key columns
- A Dimension Table can only 1 parent-child hierarchy
-- tpch star schema
analyze star schema on lineitem_small
as many_to_one join of lineitem_small with orders on l_orderkey = o_orderkey
many_to_one join of lineitem_small with partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey
many_to_one join of partsupp with part on ps_partkey = p_partkey
many_to_one join of partsupp with supplier on ps_suppkey = s_suppkey
many_to_one join of orders with customer on o_custkey = c_custkey
many_to_one join of customer with custnation on c_nationkey = cn_nationkey
many_to_one join of custnation with custregion on cn_regionkey = cr_regionkey
many_to_one join of supplier with suppnation on s_nationkey = sn_nationkey
many_to_one join of suppnation with suppregion on sn_regionkey = sr_regionkey
;
+-------------------------------+----+--------------+-------+--------------+--------------------+
|ColumnName |Type|Distinct Count|Count |PercentMissing|PercentUnique |
+-------------------------------+----+--------------+-------+--------------+--------------------+
|default.orders.o_orderkey |INT |1500000 |1500000|0.0 |100.0 |
|default.partsupp.ps_suppkey |INT |10000 |800000 |0.0 |1.25 |
|default.partsupp.ps_partkey |INT |200000 |800000 |0.0 |25.0 |
|default.part.p_partkey |INT |200000 |200000 |0.0 |100.0 |
|default.partsupp.ps_partkey |INT |200000 |800000 |0.0 |25.0 |
|default.supplier.s_suppkey |INT |10000 |10000 |0.0 |100.0 |
|default.partsupp.ps_suppkey |INT |10000 |800000 |0.0 |1.25 |
|default.customer.c_custkey |INT |150000 |150000 |0.0 |100.0 |
|default.orders.o_custkey |INT |99996 |1500000|0.0 |6.6664 |
|default.custnation.cn_nationkey|INT |25 |25 |0.0 |100.0 |
|default.customer.c_nationkey |INT |25 |150000 |0.0 |0.016666666666666666|
|default.custregion.cr_regionkey|INT |5 |5 |0.0 |100.0 |
|default.custnation.cn_regionkey|INT |5 |25 |0.0 |20.0 |
|default.suppnation.sn_nationkey|INT |25 |25 |0.0 |100.0 |
|default.supplier.s_nationkey |INT |25 |10000 |0.0 |0.25 |
|default.suppregion.sr_regionkey|INT |5 |5 |0.0 |100.0 |
|default.suppnation.sn_regionkey|INT |5 |25 |0.0 |20.0 |
+-------------------------------+----+--------------+-------+--------------+--------------------+