From 26db173a55dcd271d6e545eea2b90b2b61fd95c1 Mon Sep 17 00:00:00 2001 From: Alan Francis Date: Tue, 6 Aug 2024 17:39:36 +0530 Subject: [PATCH] doc: add SQL db design checklist to support doc --- lib/sql/sql-db-design-checklist.md | 172 ------------ .../docs/dcp/SQL/sql-db-design-checklist.md | 249 ++++++++++++++++++ 2 files changed, 249 insertions(+), 172 deletions(-) delete mode 100644 lib/sql/sql-db-design-checklist.md create mode 100644 support/docs-astro/src/content/docs/dcp/SQL/sql-db-design-checklist.md diff --git a/lib/sql/sql-db-design-checklist.md b/lib/sql/sql-db-design-checklist.md deleted file mode 100644 index 5e1c1206..00000000 --- a/lib/sql/sql-db-design-checklist.md +++ /dev/null @@ -1,172 +0,0 @@ -# Database Design and SQL Best Practices Checklist - -## Database Design - -### Always use proper data types -- **Checklist/Recommendation**: Use data types based on the nature of data. -- **Remarks**: Example: Using `varchar(20)` to store date time values instead of `DATETIME` datatype will lead to errors during date time-related calculations and there is also a possible case of storing invalid data. - -### Avoid spaces in object names -- **Checklist/Recommendation**: Using spaces makes the code more confusing, inconsistent with other queries and objects, and arguably harder to work with. -- **Remarks**: A better way is to use underscores instead of spaces. - -### Use CHAR data type to store fixed length data only -- **Checklist/Recommendation**: Using `char(100)` instead of `varchar(100)` will consume more space if the length of data is less than 100 characters. - -### Key Columns Indexing -- **Checklist/Recommendation**: Ensure indexing columns which are used in JOIN clauses so that queries return faster. - -### Don't create index on each column -- **Checklist/Recommendation**: It’s not a good idea to put index on each column of a table. This will badly affect DML (Insert/Update) performance. Analyze SQL queries predicates and wisely create indexes. - -### Always have a clustered (Primary) key on a table -- **Checklist/Recommendation**: Clustered Key (Primary) helps in storing table data in the same order as clustered key. This improves scan efficiency in queries by skipping data that does not match filtering predicates. - -### Avoid multiple columns in a single table -- **Checklist/Recommendation**: Though there is no such limitation on keeping many columns in a table, it is always a good idea to divide into different tables based on design. - -### Don't store calculated fields in a table -- **Checklist/Recommendation**: Example - Store DOB instead of age of a person. - -### Avoid Over-Normalization Practice -- **Checklist/Recommendation**: While normalization is important, avoid over-normalizing to the point where queries become complex and slow due to excessive joins. - -### Understand your Data and Domain -- **Checklist/Recommendation**: Thoroughly understand the data and its domain before designing the database schema to ensure relevant tables, columns, and relationships are defined. - -### Consider Use Cases -- **Checklist/Recommendation**: Design the database schema based on anticipated use cases to ensure that it supports the required queries and operations efficiently. - -## SQL Best Practices / Tuning - -### Use SELECT * only if needed -- **Checklist/Recommendation**: Always explicitly type out the column names which are actually needed. This will improve response time particularly if you send the result to front-end applications. - -### Use ORDER BY only if needed -- **Checklist/Recommendation**: `ORDER BY` & `DISTINCT` require the database engine to sort the result before it is sent to the client. Doing this in SQL may slow down response time in a multi-user environment. - -### Don't use Functions over indexed columns -- **Checklist/Recommendation**: Using functions over indexed columns defeats the purpose of the index. Suppose you want to get data where the first two characters of customer code is AK, do not write: - ```sql - SELECT columns FROM table WHERE left(customer_code, 2) = 'AK' - SELECT columns FROM table WHERE customer_code like ‘AK%’ - which will make use of index which results in faster response time. - -- **LIMIT 1 When Getting a Unique Row** - - **Checklist/Recommendation**: This reduces execution time because the database engine will stop scanning for records after it finds the first matching record, instead of going through the whole table or index. - -- **Create views to simplify / abstract complex SQL** - - **Checklist/Recommendation**: Views help to both simplify complex schemas and to implement security. One way that views contribute to security is to hide auditing fields from developers. - -- **Analyze Query Structure / Artifacts** - - Review the SQL query to understand its purpose and logic. - - Identify unnecessary or redundant parts of the query. - -- **Define Appropriate Indexes** - - Ensure that the tables involved in the query have appropriate indexes based on the WHERE clause. - - Avoid over-indexing, as it can negatively impact insert/update/delete performance. - - Use composite indexes when appropriate for multiple columns frequently used together in queries. - - Group by columns can be added in the INCLUDE part of the index. - -- **Avoid SELECT \*** - - Specify only the columns you need in the SELECT statement instead of retrieving all columns. - - Retrieving unnecessary columns increases I/O and network traffic. - -- **Use JOINs Wisely** - - Choose the appropriate type of JOIN (INNER, LEFT, RIGHT, etc.) based on your data relationships. - - Use JOIN conditions that can take advantage of indexes for better performance. - - Avoid JOIN conditions that use functions (e.g., LTRIM, RTRIM, UPPER, LOWER). - -- **Limit Data Returned** - Use the LIMIT or FETCH FIRST clauses to restrict the number of rows returned, especially for large result sets. - -- **Use WHERE Clause Efficiently** - - Restrict the number of rows retrieved using the WHERE clause to avoid unnecessary data processing. - - Avoid using functions on indexed columns in the WHERE clause, as it can prevent index usage. - -- **Avoid Subqueries Whenever Possible** - Replace subqueries with JOINs or temporary tables when feasible, as subqueries can be performance bottlenecks. - -- **Use Proper Data Types** - Choose the most appropriate data types for your columns to save storage space and improve query performance. - -- **Tune & Optimize Grouping and Aggregation** - - Use GROUP BY and aggregate functions efficiently to avoid unnecessary data processing. - - Consider creating summary tables for frequently used aggregations. - -- **Update Database Statistics** - - Regularly update the statistics of the database to ensure the query optimizer makes accurate decisions. - - Use ANALYZE for Postgres or Auto Analyze feature. - -- **Avoid Using DISTINCT** - Use GROUP BY instead of DISTINCT for better performance when aggregating data. - -- **Consider Caching Mechanism** - Implement caching mechanisms to store frequently accessed query results and reduce database load. - -- **Save & Monitor Execution Plans** - Use database tools to analyze query execution plans and identify bottlenecks or inefficient operations. - -- **Use Bind Variables** - Use bind variables (parameterized queries) to allow the database to reuse query execution plans. - -- **Regularly Perform Query Reviews** - Periodically review and analyze the performance of your frequently executed queries to identify optimization opportunities. - -- **Consider Denormalization for Reporting Queries** - In some cases, denormalizing data (combining tables for performance reasons) can improve query performance, but it comes with trade-offs in terms of data integrity and maintenance. - -- **Use Database Indexing and Table Partitioning** - Utilize advanced database features like indexing and partitioning for very large tables to improve query performance. - -- **Monitor and Tune Regularly** - Database performance can change over time due to data growth or other factors, so make SQL tuning an ongoing process. - -- **Backup and Test Changes** - Before making significant changes to your queries, ensure you have backups and test changes in a controlled environment to avoid disruptions. - -- **Analyze Query Execution Plan** - - Understand the query execution plan using tools like EXPLAIN in SQL databases. - - Identify performance bottlenecks, such as full table scans, inefficient joins, or excessive sorting. - -- **Index Optimization** - - Ensure that appropriate indexes are present on columns used in WHERE, JOIN, and ORDER BY clauses. - - Avoid over-indexing, as too many indexes can slow down write operations. - -- **Use Proper Joins** - - Choose the correct join types (INNER, LEFT, RIGHT, FULL) based on the relationships between tables. - - Use proper join conditions to avoid Cartesian products. - -- **Avoid Using Functions in WHERE Clauses** - Applying functions to columns in WHERE clauses can prevent the use of indexes. Consider rewriting queries to avoid this. - -- **Avoid Long-running Updates/Inserts** - Long-running transactions can cause locks and contention. Keep transactions as short as possible. - -- **Avoid Data Type Conversion** - Minimize data type conversions in your queries, as they can impact performance. - -- **Batch Processing** - Use batch processing for bulk data operations instead of processing individual records. - -- **Use Connection Pooling** - Implement connection pooling to efficiently manage database connections. - -- **Hardware and Infrastructure** - Ensure that the underlying hardware and infrastructure meet the database's performance requirements. - -- **Monitor and Optimize Temp Tables and Disk Usage** - Keep an eye on temporary table usage and disk space utilization, as excessive disk I/O can impact performance. - -- **Avoid Using ORDER BY with Large Result Sets** - If possible, avoid using ORDER BY with large result sets, as it requires sorting, which can be resource-intensive. - -- **Use Stored Procedures for Frequently Executed Logic** - Wrap frequently executed queries in stored procedures to promote code reusability and reduce query parsing overhead. - -- **Use UNION Instead of UNION ALL with Caution** - Use UNION when you want to eliminate duplicate rows, but if duplicates are not an issue, use UNION ALL for better performance. - -- **Database Parameter Tuning** - Adjust database configuration parameters (such as memory allocation, cache size, and parallelism) based on workload characteristics. - diff --git a/support/docs-astro/src/content/docs/dcp/SQL/sql-db-design-checklist.md b/support/docs-astro/src/content/docs/dcp/SQL/sql-db-design-checklist.md new file mode 100644 index 00000000..96601a6e --- /dev/null +++ b/support/docs-astro/src/content/docs/dcp/SQL/sql-db-design-checklist.md @@ -0,0 +1,249 @@ +--- +title: SQL DB Design Checklist +--- + +### Always use proper data types + +- **Checklist/Recommendation**: Use data types based on the nature of data. +- **Remarks**: Example: Using `varchar(20)` to store date time values instead of + `DATETIME` datatype will lead to errors during date time-related calculations + and there is also a possible case of storing invalid data. + +### Avoid spaces in object names + +- **Checklist/Recommendation**: Using spaces makes the code more confusing, + inconsistent with other queries and objects, and arguably harder to work with. +- **Remarks**: A better way is to use underscores instead of spaces. + +### Use CHAR data type to store fixed length data only + +- **Checklist/Recommendation**: Using `char(100)` instead of `varchar(100)` will + consume more space if the length of data is less than 100 characters. + +### Key Columns Indexing + +- **Checklist/Recommendation**: Ensure indexing columns which are used in JOIN + clauses so that queries return faster. + +### Don't create index on each column + +- **Checklist/Recommendation**: It’s not a good idea to put index on each column + of a table. This will badly affect DML (Insert/Update) performance. Analyze + SQL queries predicates and wisely create indexes. + +### Always have a clustered (Primary) key on a table + +- **Checklist/Recommendation**: Clustered Key (Primary) helps in storing table + data in the same order as clustered key. This improves scan efficiency in + queries by skipping data that does not match filtering predicates. + +### Avoid multiple columns in a single table + +- **Checklist/Recommendation**: Though there is no such limitation on keeping + many columns in a table, it is always a good idea to divide into different + tables based on design. + +### Don't store calculated fields in a table + +- **Checklist/Recommendation**: Example - Store DOB instead of age of a person. + +### Avoid Over-Normalization Practice + +- **Checklist/Recommendation**: While normalization is important, avoid + over-normalizing to the point where queries become complex and slow due to + excessive joins. + +### Understand your Data and Domain + +- **Checklist/Recommendation**: Thoroughly understand the data and its domain + before designing the database schema to ensure relevant tables, columns, and + relationships are defined. + +### Consider Use Cases + +- **Checklist/Recommendation**: Design the database schema based on anticipated + use cases to ensure that it supports the required queries and operations + efficiently. + +## SQL Best Practices / Tuning + +### Use SELECT * only if needed + +- **Checklist/Recommendation**: Always explicitly type out the column names + which are actually needed. This will improve response time particularly if you + send the result to front-end applications. + +### Use ORDER BY only if needed + +- **Checklist/Recommendation**: `ORDER BY` & `DISTINCT` require the database + engine to sort the result before it is sent to the client. Doing this in SQL + may slow down response time in a multi-user environment. + +### Don't use Functions over indexed columns + +- **Checklist/Recommendation**: Using functions over indexed columns defeats the + purpose of the index. Suppose you want to get data where the first two + characters of customer code is AK, do not write: + ```sql + SELECT columns FROM table WHERE left(customer_code, 2) = 'AK' + SELECT columns FROM table WHERE customer_code like ‘AK%’ + which will make use of index which results in faster response time. + ``` + +- **LIMIT 1 When Getting a Unique Row** + - **Checklist/Recommendation**: This reduces execution time because the + database engine will stop scanning for records after it finds the first + matching record, instead of going through the whole table or index. + +- **Create views to simplify / abstract complex SQL** + - **Checklist/Recommendation**: Views help to both simplify complex schemas + and to implement security. One way that views contribute to security is to + hide auditing fields from developers. + +- **Analyze Query Structure / Artifacts** + - Review the SQL query to understand its purpose and logic. + - Identify unnecessary or redundant parts of the query. + +- **Define Appropriate Indexes** + - Ensure that the tables involved in the query have appropriate indexes based + on the WHERE clause. + - Avoid over-indexing, as it can negatively impact insert/update/delete + performance. + - Use composite indexes when appropriate for multiple columns frequently used + together in queries. + - Group by columns can be added in the INCLUDE part of the index. + +- **Avoid SELECT \*** + - Specify only the columns you need in the SELECT statement instead of + retrieving all columns. + - Retrieving unnecessary columns increases I/O and network traffic. + +- **Use JOINs Wisely** + - Choose the appropriate type of JOIN (INNER, LEFT, RIGHT, etc.) based on your + data relationships. + - Use JOIN conditions that can take advantage of indexes for better + performance. + - Avoid JOIN conditions that use functions (e.g., LTRIM, RTRIM, UPPER, LOWER). + +- **Limit Data Returned**\ + Use the LIMIT or FETCH FIRST clauses to restrict the number of rows returned, + especially for large result sets. + +- **Use WHERE Clause Efficiently** + - Restrict the number of rows retrieved using the WHERE clause to avoid + unnecessary data processing. + - Avoid using functions on indexed columns in the WHERE clause, as it can + prevent index usage. + +- **Avoid Subqueries Whenever Possible**\ + Replace subqueries with JOINs or temporary tables when feasible, as subqueries + can be performance bottlenecks. + +- **Use Proper Data Types**\ + Choose the most appropriate data types for your columns to save storage space + and improve query performance. + +- **Tune & Optimize Grouping and Aggregation** + - Use GROUP BY and aggregate functions efficiently to avoid unnecessary data + processing. + - Consider creating summary tables for frequently used aggregations. + +- **Update Database Statistics** + - Regularly update the statistics of the database to ensure the query + optimizer makes accurate decisions. + - Use ANALYZE for Postgres or Auto Analyze feature. + +- **Avoid Using DISTINCT**\ + Use GROUP BY instead of DISTINCT for better performance when aggregating data. + +- **Consider Caching Mechanism**\ + Implement caching mechanisms to store frequently accessed query results and + reduce database load. + +- **Save & Monitor Execution Plans**\ + Use database tools to analyze query execution plans and identify bottlenecks + or inefficient operations. + +- **Use Bind Variables**\ + Use bind variables (parameterized queries) to allow the database to reuse + query execution plans. + +- **Regularly Perform Query Reviews**\ + Periodically review and analyze the performance of your frequently executed + queries to identify optimization opportunities. + +- **Consider Denormalization for Reporting Queries**\ + In some cases, denormalizing data (combining tables for performance reasons) + can improve query performance, but it comes with trade-offs in terms of data + integrity and maintenance. + +- **Use Database Indexing and Table Partitioning**\ + Utilize advanced database features like indexing and partitioning for very + large tables to improve query performance. + +- **Monitor and Tune Regularly**\ + Database performance can change over time due to data growth or other factors, + so make SQL tuning an ongoing process. + +- **Backup and Test Changes**\ + Before making significant changes to your queries, ensure you have backups and + test changes in a controlled environment to avoid disruptions. + +- **Analyze Query Execution Plan** + - Understand the query execution plan using tools like EXPLAIN in SQL + databases. + - Identify performance bottlenecks, such as full table scans, inefficient + joins, or excessive sorting. + +- **Index Optimization** + - Ensure that appropriate indexes are present on columns used in WHERE, JOIN, + and ORDER BY clauses. + - Avoid over-indexing, as too many indexes can slow down write operations. + +- **Use Proper Joins** + - Choose the correct join types (INNER, LEFT, RIGHT, FULL) based on the + relationships between tables. + - Use proper join conditions to avoid Cartesian products. + +- **Avoid Using Functions in WHERE Clauses**\ + Applying functions to columns in WHERE clauses can prevent the use of indexes. + Consider rewriting queries to avoid this. + +- **Avoid Long-running Updates/Inserts**\ + Long-running transactions can cause locks and contention. Keep transactions as + short as possible. + +- **Avoid Data Type Conversion**\ + Minimize data type conversions in your queries, as they can impact + performance. + +- **Batch Processing**\ + Use batch processing for bulk data operations instead of processing individual + records. + +- **Use Connection Pooling**\ + Implement connection pooling to efficiently manage database connections. + +- **Hardware and Infrastructure**\ + Ensure that the underlying hardware and infrastructure meet the database's + performance requirements. + +- **Monitor and Optimize Temp Tables and Disk Usage**\ + Keep an eye on temporary table usage and disk space utilization, as excessive + disk I/O can impact performance. + +- **Avoid Using ORDER BY with Large Result Sets**\ + If possible, avoid using ORDER BY with large result sets, as it requires + sorting, which can be resource-intensive. + +- **Use Stored Procedures for Frequently Executed Logic**\ + Wrap frequently executed queries in stored procedures to promote code + reusability and reduce query parsing overhead. + +- **Use UNION Instead of UNION ALL with Caution**\ + Use UNION when you want to eliminate duplicate rows, but if duplicates are not + an issue, use UNION ALL for better performance. + +- **Database Parameter Tuning**\ + Adjust database configuration parameters (such as memory allocation, cache + size, and parallelism) based on workload characteristics.