-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc: new docs demands from slack #217
Comments
Few ideas here based on experiences:
Again, those are very high level ideas to optimize for "append-only, truncate-only, mostly read-only" OLAP workloads. If you are thinking OLTP or any HTAP workloads, the above recommendation will not work. I agree with @tuhaihe that, we need to have extended documentation with general guidance as well as real world cases to demonstrate the tradeoff for the guidance above. Those are great questions, and please keep asking. Thanks. |
Here is a deck I share with my clients who are building a data warehouse (deck is generic and part of larger demo I do). It's showing generic code I built and provide for customers but it also shows my conceptual best practices. You can see I lean towards a Kimball based model (star or snowflake) because most of the time I'm pushing my customers towards a self-service BI model and feel dimensional modeling is the easiest for business users to understand. Physical design on this for me generally leads towards dimensions being AO columnar compressed and facts being AO row compressed and usually partitioned on one of the primary dates. I just started working with a new data modeling tool called DbSchema (https://dbschema.com/) and gave them code for Greenplum that they recently incorporated into the product so it's my tool of choice. One warning though, A LOT depends on the BI tool being used. You need to make sure the tool is pushing down the aggregation and only pulling back the columns the user is "slicing and dicing on" as shown by the Tableau examples in the deck (look at the queries from the DB side). Other tools don't generate good queries and want to just extract everything (horrible) in which case columnar wouldn't work well (honestly nothing works well with those tools, might as well just feed them flat files) |
A colleague of mine asked the following questions:
My questions around a cloudberry DB that is used as a datawarehouse (OLAP applications, reporting/BI)
• any specific optimization options for analytical queries / aggregation functions
• best practices in the data model design (star/snowflake-schema; normalized vs. denormalized; redundancy issues)
• best practices in the configuration
• any thoughts on massive deletes
• any thoughts on doing outer joins of 10+ tables
• any specific recommendations regarding partioning, indexing, views/materialized views,...
Slack Message
The text was updated successfully, but these errors were encountered: