From 5c7ac57d30be451f9028d331342196a01b902469 Mon Sep 17 00:00:00 2001 From: Nikolas Date: Fri, 15 Dec 2023 15:37:13 +0100 Subject: [PATCH] fix broken /docs/ links (#463) * fix broken /docs/ links * fix remaining links --- content/01-intro/01-what-are-databases.mdx | 20 +- .../01-intro/02-comparing-database-types.mdx | 2 +- content/01-intro/03-intro-to-schemas.mdx | 142 +- content/01-intro/04-database-glossary.mdx | 1549 ++++++++++++++--- .../02-datamodeling/01-intro-dont-panic.mdx | 10 +- .../02-know-your-problem-space.mdx | 4 +- .../04-tables-tuples-types.mdx | 4 +- .../05-correctness-constraints.mdx | 4 +- .../02-datamodeling/06-making-connections.mdx | 4 +- .../02-datamodeling/08-functional-units.mdx | 10 +- content/02-datamodeling/12-in-vivo.mdx | 4 +- .../01-relational-vs-document-databases.mdx | 86 +- .../02-relational/01-what-is-an-orm.mdx | 24 +- ...-comparing-sql-query-builders-and-orms.mdx | 14 +- .../03-what-are-database-migrations.mdx | 2 +- .../02-relational/04-migration-strategies.mdx | 4 +- .../05-expand-and-contract-pattern.mdx | 4 +- .../03-document/01-what-are-document-dbs.mdx | 48 +- .../01-benefits-of-postgresql.mdx | 2 +- .../02-getting-to-know-postgresql.mdx | 4 +- .../03-5-ways-to-host-postgresql.mdx | 2 +- ...setting-up-a-local-postgresql-database.mdx | 2 +- .../05-setting-up-postgresql-on-rds.mdx | 2 +- .../06-connecting-to-postgresql-databases.mdx | 2 +- ...create-and-delete-databases-and-tables.mdx | 2 +- .../09-introduction-to-data-types.mdx | 329 ++-- .../10-column-and-table-constraints.mdx | 14 +- content/04-postgresql/11-date-types.mdx | 73 +- .../01-inserting-and-deleting-data.mdx | 52 +- .../02-updating-existing-data.mdx | 29 +- .../03-insert-on-conflict.mdx | 36 +- .../05-using-transactions.mdx | 112 +- .../01-basic-select.mdx | 54 +- .../02-filtering-data.mdx | 131 +- .../03-joining-tables.mdx | 102 +- .../04-optimizing-postgresql.mdx | 4 +- .../14-short-guides/01-quoting-rules.mdx | 50 +- .../14-short-guides/03-connection-uris.mdx | 88 +- .../14-short-guides/04-exporting-schemas.mdx | 4 +- .../07-introduction-to-data-types.mdx | 225 ++- .../08-column-and-table-constraints.mdx | 117 +- .../01-inserting-and-deleting-data.mdx | 35 +- .../02-updating-existing-data.mdx | 25 +- ...-importing-and-exporting-data-in-mysql.mdx | 26 +- .../01-basic-select.mdx | 40 +- .../02-filtering-data.mdx | 121 +- .../03-joining-tables.mdx | 100 +- .../04-identifying-slow-queries.mdx | 2 +- .../05-optimizing-slow-queries.mdx | 2 +- .../11-tools/01-mysql-config-editor.mdx | 71 +- .../02-validate-configuration.mdx | 27 +- .../12-short-guides/03-exporting-schemas.mdx | 40 +- ...ting-and-deleting-databases-and-tables.mdx | 2 +- .../04-inserting-and-deleting-data.mdx | 21 +- content/06-sqlite/05-basic-select.mdx | 30 +- content/06-sqlite/06-update-data.mdx | 29 +- content/06-sqlite/07-exporting-schemas.mdx | 20 +- ...setting-up-a-local-sql-server-database.mdx | 10 +- content/08-mongodb/01-what-is-mongodb.mdx | 47 +- ...02-setting-up-a-local-mongodb-database.mdx | 4 +- .../08-mongodb/03-connecting-to-mongodb.mdx | 49 +- content/08-mongodb/04-mongodb-atlas-setup.mdx | 6 +- ...ngodb-user-accounts-and-authentication.mdx | 112 +- .../06-authorization-and-privileges.mdx | 976 +++++------ .../07-creating-dbs-and-collections.mdx | 64 +- content/08-mongodb/08-managing-documents.mdx | 196 ++- content/08-mongodb/09-querying-documents.mdx | 90 +- content/08-mongodb/10-mongodb-datatypes.mdx | 154 +- content/08-mongodb/11-mongodb-indexes.mdx | 6 +- .../08-mongodb/12-mongodb-transactions.mdx | 4 +- content/08-mongodb/13-connection-uris.mdx | 79 +- content/08-mongodb/14-working-with-dates.mdx | 276 +-- content/08-mongodb/15-mongodb-encryption.mdx | 4 +- .../08-mongodb/16-mongodb-database-tools.mdx | 75 +- content/08-mongodb/17-mongodb-sorting.mdx | 366 ++-- .../18-mongodb-aggregation-framework.mdx | 86 +- ...-query-builders-and-database-libraries.mdx | 10 +- ...ype-safety-in-the-top-8-typescript-orms.md | 242 ++- .../03-connection-pooling.mdx | 92 +- .../01-database-troubleshooting.mdx | 4 +- ...how-to-spot-bottlenecks-in-performance.mdx | 2 +- ...lopment-databases-between-team-members.mdx | 4 +- .../01-database-replication-introduction.mdx | 2 +- .../08-testing-in-production.mdx | 74 +- .../09-backup-considerations.mdx | 104 +- .../10-intro-to-full-text-search.mdx | 58 +- .../02-serverless-comparison.mdx | 8 +- .../03-serverless-challenges.mdx | 59 +- ...04-traditional-vs-serverless-databases.mdx | 2 +- 89 files changed, 4244 insertions(+), 2983 deletions(-) diff --git a/content/01-intro/01-what-are-databases.mdx b/content/01-intro/01-what-are-databases.mdx index 14ec7283..55328ea7 100644 --- a/content/01-intro/01-what-are-databases.mdx +++ b/content/01-intro/01-what-are-databases.mdx @@ -1,6 +1,6 @@ --- title: 'What are databases?' -metaTitle: "What Are Databases? Definition, Usage, Examples and Types" +metaTitle: 'What Are Databases? Definition, Usage, Examples and Types' metaDescription: 'Learn the fundamentals of databases to gain better insight into what Prisma abstracts and how to think of the database layer generally.' authors: ['justinellingwood'] --- @@ -11,11 +11,11 @@ Databases are essential components for many modern applications and tools. As a In this article, we'll go over: -* what databases are -* how they are used by people and applications to keep track of various kinds of data -* what features databases offer -* what types of guarantees they make -* how they compare to other methods of data storage +- what databases are +- how they are used by people and applications to keep track of various kinds of data +- what features databases offer +- what types of guarantees they make +- how they compare to other methods of data storage Finally, we'll discuss how applications rely on databases for storing and retrieving data to enable complex functionality. @@ -260,9 +260,9 @@ Fortunately, there are many different database options designed to fulfil the re -[Prisma](https://www.prisma.io/docs/concepts/overview/what-is-prisma) is one way to make it easy to work with databases from your application. You can learn more about what Prisma offers in our [Why Prisma? page](https://www.prisma.io/docs/concepts/overview/why-prisma). +[Prisma](https://www.prisma.io/docs/orm/overview/introduction/what-is-prisma) is one way to make it easy to work with databases from your application. You can learn more about what Prisma offers in our [Why Prisma? page](https://www.prisma.io/docs/orm/overview/introduction/why-prisma). -[Prisma database connectors](https://www.prisma.io/docs/concepts/database-connectors) allow you to connect Prisma to many different types of databases. Check out our docs to learn more. +[Prisma database connectors](https://www.prisma.io/docs/orm/overview/databases) allow you to connect Prisma to many different types of databases. Check out our docs to learn more. @@ -270,7 +270,7 @@ Fortunately, there are many different database options designed to fulfil the re
What are persistent data structures? -Databases store data either on disk or in-memory. On disk storage is generally said to be *persistent*, meaning that the data is reliably saved for later, even if the database application or the computer itself restarts. +Databases store data either on disk or in-memory. On disk storage is generally said to be _persistent_, meaning that the data is reliably saved for later, even if the database application or the computer itself restarts.
@@ -288,7 +288,7 @@ A [database abstraction layer](/intro/database-glossary#database-abstraction-lay
What is database management? -Database management refers to the actions taken to work with and control data to meet necessary conditions throughout the data lifecycle. +Database management refers to the actions taken to work with and control data to meet necessary conditions throughout the data lifecycle. Some database management tasks include performance monitoring and tuning, storage and capacity planning, backup and recovery data, data archiving, data partitioning, replication, and more. diff --git a/content/01-intro/02-comparing-database-types.mdx b/content/01-intro/02-comparing-database-types.mdx index ac2e971c..e936d206 100644 --- a/content/01-intro/02-comparing-database-types.mdx +++ b/content/01-intro/02-comparing-database-types.mdx @@ -539,6 +539,6 @@ Many times, using a mixture of different database types is the best approach for -You can use Prisma work more easily with databases from within your application code. Check out our [database connectors page](https://www.prisma.io/docs/concepts/database-connectors) to see all of the databases Prisma supports. +You can use Prisma work more easily with databases from within your application code. Check out our [database connectors page](https://www.prisma.io/docs/orm/overview/databases) to see all of the databases Prisma supports. diff --git a/content/01-intro/03-intro-to-schemas.mdx b/content/01-intro/03-intro-to-schemas.mdx index d52c0fb8..7abfa77a 100644 --- a/content/01-intro/03-intro-to-schemas.mdx +++ b/content/01-intro/03-intro-to-schemas.mdx @@ -1,202 +1,202 @@ --- title: 'Introduction to database schemas' -metaTitle: "What is a Database Schema? Introduction with Examples" -metaDescription: "Schemas are the way that you configure your database to represent your data within the system. Here we define what they are with a few examples." +metaTitle: 'What is a Database Schema? Introduction with Examples' +metaDescription: 'Schemas are the way that you configure your database to represent your data within the system. Here we define what they are with a few examples.' authors: ['justinellingwood'] --- ## Introduction -One of the primary advantages of databases over other, more simple data storage options is their ability to store information in an orderly, easily queryable structure. These features are derived from the fact that databases implement *schemas* to describe the data they store. +One of the primary advantages of databases over other, more simple data storage options is their ability to store information in an orderly, easily queryable structure. These features are derived from the fact that databases implement _schemas_ to describe the data they store. -A [database schema](/intro/database-glossary#schema) serves as a blueprint for the shape and format of data within a database. For relational databases, this includes describing categories of data and their connections through tables, primary keys, data types, indexes, and other objects. With NoSQL schemas, this often involves organizing data according to the most important anticipated query patterns. +A [database schema](/intro/database-glossary#schema) serves as a blueprint for the shape and format of data within a database. For relational databases, this includes describing categories of data and their connections through tables, primary keys, data types, indexes, and other objects. With NoSQL schemas, this often involves organizing data according to the most important anticipated query patterns. -In either case, understanding the value of your database's schema and how best to design and optimize it for your needs is crucial. This guide will focus on what database schemas are, the different types of schema you might encounter, why they're important, and what to keep in mind when designing your own schemas. +In either case, understanding the value of your database's schema and how best to design and optimize it for your needs is crucial. This guide will focus on what database schemas are, the different types of schema you might encounter, why they're important, and what to keep in mind when designing your own schemas. ## Why are database schemas important? Database schemas are important for many reasons. -Your data will almost always include some regularity to it, regardless of its source or application. Some data is highly *regular*, meaning that it all can be described by the same patterns. Some data is much more *irregular*, but even so, its *metadata*, contextual data about the data itself, will often still be regular. +Your data will almost always include some regularity to it, regardless of its source or application. Some data is highly _regular_, meaning that it all can be described by the same patterns. Some data is much more _irregular_, but even so, its _metadata_, contextual data about the data itself, will often still be regular. -Database schemas tell the database what your data is and how to work with it. Database schemas help the database engine understand these patterns which allows it to enforce constraints on the data, respond with the right information when queried, and manipulate it in ways that users request. +Database schemas tell the database what your data is and how to work with it. Database schemas help the database engine understand these patterns which allows it to enforce constraints on the data, respond with the right information when queried, and manipulate it in ways that users request. -Good schemas tend to reduce implicit information in favor of making it visible to the system and its users. Schemas in relational databases can reduce information redundancy, ensure data consistency, and provide the scaffolding and structures needed to access and join related data. Within non-relational contexts, good schemas enable high performance and scalability by aligning the storage format with the access patterns that are essential to your application. +Good schemas tend to reduce implicit information in favor of making it visible to the system and its users. Schemas in relational databases can reduce information redundancy, ensure data consistency, and provide the scaffolding and structures needed to access and join related data. Within non-relational contexts, good schemas enable high performance and scalability by aligning the storage format with the access patterns that are essential to your application. ## Defining physical vs logical schemas -Before we further, we should introduce a few definitions. Two terms that are potentially confusing are *physical schema* and *logical schema*. These two terms can convey different meanings depending on the *context* in which they are used. +Before we further, we should introduce a few definitions. Two terms that are potentially confusing are _physical schema_ and _logical schema_. These two terms can convey different meanings depending on the _context_ in which they are used. -For the purpose of this article, we are mainly talking about logical and physical schemas when *designing database schemas*. +For the purpose of this article, we are mainly talking about logical and physical schemas when _designing database schemas_. ### When designing database schemas -When talking about designing database schemas, a **logical schema** is a general design for organizing data into different categories, defining properties of the data, and determining the best structure for database items. This general document has no implementation details and is therefore platform-agnostic. It can be taken as a blueprint and implemented in a variety of database systems. +When talking about designing database schemas, a **logical schema** is a general design for organizing data into different categories, defining properties of the data, and determining the best structure for database items. This general document has no implementation details and is therefore platform-agnostic. It can be taken as a blueprint and implemented in a variety of database systems. -In this same context, a **physical schema** is recognized as being the next step in the design process where implementation-specific details are worked out. The names of different entities, constraints, keys, indexes, and other items are identified and mapped onto the logical schema. This provides a specific plan for implementation using a given database platform. +In this same context, a **physical schema** is recognized as being the next step in the design process where implementation-specific details are worked out. The names of different entities, constraints, keys, indexes, and other items are identified and mapped onto the logical schema. This provides a specific plan for implementation using a given database platform. -In this context, logical and physical schemas are different stages of a design process. The goal of the process is to iteratively develop an implementation plan from a set of requirements by first laying out the abstract qualities of the data and then later mapping that organization to the tool set and language of a database system you want to use. +In this context, logical and physical schemas are different stages of a design process. The goal of the process is to iteratively develop an implementation plan from a set of requirements by first laying out the abstract qualities of the data and then later mapping that organization to the tool set and language of a database system you want to use. ### When discussing database architecture The other context where physical and logical schema are sometimes seen in regards to databases is in the physical and virtual architecture of the actual database software. -In this context, the **logical schema** refers to the visible database entities that users interact with. This means objects like tables, keys, views, and indexes are abstractions that users create and manipulate using the database software. The layout of these items within the system are part of the logical schema that the database presents. +In this context, the **logical schema** refers to the visible database entities that users interact with. This means objects like tables, keys, views, and indexes are abstractions that users create and manipulate using the database software. The layout of these items within the system are part of the logical schema that the database presents. -In this same context, the **physical schema** refers to the way that the database software handles the data, files, and storage when interacting with the filesystem. For example, the physical schema of the database architecture can determine whether the system stores a separate file for each database or each table and determines how those can be partitioned across multiple servers. +In this same context, the **physical schema** refers to the way that the database software handles the data, files, and storage when interacting with the filesystem. For example, the physical schema of the database architecture can determine whether the system stores a separate file for each database or each table and determines how those can be partitioned across multiple servers. ## Static vs dynamic schemas Another important categorization that can help clarify the differences between schema in relational and non-relational databases is the difference between static and dynamic schemas. -**Static schemas** are the type of schemas generally associated with relational databases. They are defined ahead of time as a definition of the shape that data must follow to be accepted by the system. The database system has the ability to enforce these patterns when using static schema because static schema is an assertion of the desired state that the database system can validate input against. +**Static schemas** are the type of schemas generally associated with relational databases. They are defined ahead of time as a definition of the shape that data must follow to be accepted by the system. The database system has the ability to enforce these patterns when using static schema because static schema is an assertion of the desired state that the database system can validate input against. -In contrast, **dynamic schemas** are much more prevalent in non-relational contexts. Dynamic schemas are less rigid and might lack *any* preconceived organizational structure. Instead, dynamic schemas *emerge* based on the qualities of the data that is entered into the system. While many non-relational databases can store information with an arbitrary internal structure, regular patterns tend to emerge with most real world use cases. +In contrast, **dynamic schemas** are much more prevalent in non-relational contexts. Dynamic schemas are less rigid and might lack _any_ preconceived organizational structure. Instead, dynamic schemas _emerge_ based on the qualities of the data that is entered into the system. While many non-relational databases can store information with an arbitrary internal structure, regular patterns tend to emerge with most real world use cases. -Because dynamic schemas are emergent structures, the database system cannot use them as a conformance tool. However, they are still incredibly important to understand and develop around as a user. Understanding what your data will look like in a general sense and how your applications will need to interact with it will help you choose structures that fulfill your requirements, perform well, and avoid unnecessary inconsistency. +Because dynamic schemas are emergent structures, the database system cannot use them as a conformance tool. However, they are still incredibly important to understand and develop around as a user. Understanding what your data will look like in a general sense and how your applications will need to interact with it will help you choose structures that fulfill your requirements, perform well, and avoid unnecessary inconsistency. ## Designing database schemas -Now that you understand some of the different types of database schemas, how do you go about designing one for your project? Designing effective schemas takes thought and practice, as well as a thorough understanding of the problem domain and the systems that will use the data. +Now that you understand some of the different types of database schemas, how do you go about designing one for your project? Designing effective schemas takes thought and practice, as well as a thorough understanding of the problem domain and the systems that will use the data. -The design process looks quite different depending on the type of database you are designing the schema for. Specifically, the design process for static schemas differs from that of dynamic schemas. Practically speaking, these end up aligning to differences between designing for relational databases (static) and non-relational databases (dynamic). +The design process looks quite different depending on the type of database you are designing the schema for. Specifically, the design process for static schemas differs from that of dynamic schemas. Practically speaking, these end up aligning to differences between designing for relational databases (static) and non-relational databases (dynamic). ### General tips -Although there are differences between schema design for relational and non-relational databases, there are some *general* tips that are applicable with any schema development. Since many of these are important to the beginning of the design process, it makes sense to discuss these first. +Although there are differences between schema design for relational and non-relational databases, there are some _general_ tips that are applicable with any schema development. Since many of these are important to the beginning of the design process, it makes sense to discuss these first. #### Learn about your data -One of the first steps in designing schemas should always be to learn about your data and domain. It is impossible to develop a good database design without understanding the information it will manage and context in which it will be used. +One of the first steps in designing schemas should always be to learn about your data and domain. It is impossible to develop a good database design without understanding the information it will manage and context in which it will be used. While you will likely not know all of the features of your data in the beginning, learning as much as you can about the data that your system is expected to manage is essential for design. Some questions you should try to answer include: -* Broadly speaking, what will the data be? -* Which attributes are important to record? -* How large will your total dataset be? -* How rapidly will the system accumulate new data? -* Will your data be highly regular? +- Broadly speaking, what will the data be? +- Which attributes are important to record? +- How large will your total dataset be? +- How rapidly will the system accumulate new data? +- Will your data be highly regular? #### Understand usage patterns -Similarly, designing a database schema without understanding user requirements is as problematic as it is with other software design. If you are not an expert in the domain in which the data will be used, you need to consult someone who is to guide you on the requirements. +Similarly, designing a database schema without understanding user requirements is as problematic as it is with other software design. If you are not an expert in the domain in which the data will be used, you need to consult someone who is to guide you on the requirements. You should ask yourself questions like: -* Are the most common queries predictable? -* How many concurrent users or clients will there be? -* How much data will be touched by typical operations and queries? -* Will the majority of requests be read queries or write queries? -* What data will be queried together regularly? -* Do most operations target individual records or aggregate many records? +- Are the most common queries predictable? +- How many concurrent users or clients will there be? +- How much data will be touched by typical operations and queries? +- Will the majority of requests be read queries or write queries? +- What data will be queried together regularly? +- Do most operations target individual records or aggregate many records? #### Develop a naming convention While it might not seem important, designing a naming convention and following it rigorously will help during both development and regular usage. -Naming and styling conventions help minimize the amount of mental work you need to perform when naming new entities. Similarly, conventions allow users to safely assume a pattern when accessing different items within your schemas. Some database systems or types of databases already have popular naming conventions, which you can follow to avoid surprises and avoid the need to develop your own standards. +Naming and styling conventions help minimize the amount of mental work you need to perform when naming new entities. Similarly, conventions allow users to safely assume a pattern when accessing different items within your schemas. Some database systems or types of databases already have popular naming conventions, which you can follow to avoid surprises and avoid the need to develop your own standards. Some style and naming conventions you might want to consider: -* How should you use upper and lowercase lettering for systems that are case-sensitive? -* When should items use the plural of a word versus the singular? -* Should multi-word names separate words with underscores, dashes, or other delimiters? -* Should full names always be used or are abbreviations permissible in some cases? +- How should you use upper and lowercase lettering for systems that are case-sensitive? +- When should items use the plural of a word versus the singular? +- Should multi-word names separate words with underscores, dashes, or other delimiters? +- Should full names always be used or are abbreviations permissible in some cases? ### Designing schemas for relational databases -Relational databases are often considered flexible, general purpose solutions. Their ability to process ad-hoc queries allows the same database to serve different applications and use cases. Because of this, when designing schemas for relational databases, your end goal is usually to represent your data in a way that promotes flexibility while minimizing the opportunity for data inconsistencies to enter the system. +Relational databases are often considered flexible, general purpose solutions. Their ability to process ad-hoc queries allows the same database to serve different applications and use cases. Because of this, when designing schemas for relational databases, your end goal is usually to represent your data in a way that promotes flexibility while minimizing the opportunity for data inconsistencies to enter the system. #### Developing a logical schema Relational schema designs often start with a logical schema, as discussed in a [previous section](#when-designing-database-schemas). -You map out the data items you want to manage, their relationships, and any attributes important to consider without regard to implementation details or performance criteria. This step is important because it collects all of your data items in one place and allows you to sort through the way they relate to one another on an abstract level. +You map out the data items you want to manage, their relationships, and any attributes important to consider without regard to implementation details or performance criteria. This step is important because it collects all of your data items in one place and allows you to sort through the way they relate to one another on an abstract level. -You can begin sketching out tables that represent specific data items and their attributes. This mapping process is often best represented by [entity-relationship (or ER) models](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model). *ER models* are diagrams that visually represent data objects by defining item types and their attributes and then connecting these to map out relationships and dependencies. +You can begin sketching out tables that represent specific data items and their attributes. This mapping process is often best represented by [entity-relationship (or ER) models](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model). _ER models_ are diagrams that visually represent data objects by defining item types and their attributes and then connecting these to map out relationships and dependencies. -ER models are frequently used in early stage schema designs because they are very good at helping you figure out what distinct entities you have, what attributes must be managed, which entities are related to one another, and the specific nature of their relationship. Using ER model diagrams to represent your logical schema gives you a solid plan for *what* you want your database design to be without commenting on implementation-specific details. +ER models are frequently used in early stage schema designs because they are very good at helping you figure out what distinct entities you have, what attributes must be managed, which entities are related to one another, and the specific nature of their relationship. Using ER model diagrams to represent your logical schema gives you a solid plan for _what_ you want your database design to be without commenting on implementation-specific details. #### Developing a physical schema -Once you have a logical schema, your next step is to figure out specific implementation details by creating a physical schema (as discussed in a [previous section](#when-designing-database-schemas)). The physical schema will determine exactly how you want to commit your plan using the database structures and features available to you. +Once you have a logical schema, your next step is to figure out specific implementation details by creating a physical schema (as discussed in a [previous section](#when-designing-database-schemas)). The physical schema will determine exactly how you want to commit your plan using the database structures and features available to you. -The first step is often to go through each of your database entities and determine your primary key field. The [primary key](/intro/database-glossary#primary-key) is used to uniquely identify each record within a table as well to bind records together from different tables. When a relationship exists between two entities in the logical schema, you will have to connect the two tables in the physical schema by referencing the primary key in one table as a foreign key in the other. The direction of this relationship will impact the performance and ease in which you can join different entities together when using your database. +The first step is often to go through each of your database entities and determine your primary key field. The [primary key](/intro/database-glossary#primary-key) is used to uniquely identify each record within a table as well to bind records together from different tables. When a relationship exists between two entities in the logical schema, you will have to connect the two tables in the physical schema by referencing the primary key in one table as a foreign key in the other. The direction of this relationship will impact the performance and ease in which you can join different entities together when using your database. -Another consideration you will want to think through during this stage are the predicted query patterns. Certain tables and fields within these tables will be accessed much more frequently than others. These "hot spots" are good candidates for database indexes. [Database indexes](/intro/database-glossary#index) significantly speed up retrieval of commonly accessed items at the cost of worse performance during data updates. Determining which columns to index initially will help you balance these concerns and define the most critical places for indexes in your system. +Another consideration you will want to think through during this stage are the predicted query patterns. Certain tables and fields within these tables will be accessed much more frequently than others. These "hot spots" are good candidates for database indexes. [Database indexes](/intro/database-glossary#index) significantly speed up retrieval of commonly accessed items at the cost of worse performance during data updates. Determining which columns to index initially will help you balance these concerns and define the most critical places for indexes in your system. #### Normalizing your data structures -During this process, you might find that it's easier to extract certain elements from logical entities into their own independent tables. For instance, you may wish to extract shipping address from a customer so that multiple shipping addresses can be associated with a single customer and so that product orders can reference a specific address. These changes can be thought of as part of a process is called [normalization](https://en.wikipedia.org/wiki/Database_normalization). +During this process, you might find that it's easier to extract certain elements from logical entities into their own independent tables. For instance, you may wish to extract shipping address from a customer so that multiple shipping addresses can be associated with a single customer and so that product orders can reference a specific address. These changes can be thought of as part of a process is called [normalization](https://en.wikipedia.org/wiki/Database_normalization). -[**Database normalization**](/intro/database-glossary#normalization) is a process that ensures that your database represents each piece of data once and doesn't allow updates that would result in inconsistencies. Normalization is a huge topic that, for the most part, is outside of the scope of this guide, but you part of the physical schema design process involves figuring out the level of normalization to seek and transforming data entities as necessary to achieve that goal. +[**Database normalization**](/intro/database-glossary#normalization) is a process that ensures that your database represents each piece of data once and doesn't allow updates that would result in inconsistencies. Normalization is a huge topic that, for the most part, is outside of the scope of this guide, but you part of the physical schema design process involves figuring out the level of normalization to seek and transforming data entities as necessary to achieve that goal. ### Designing schemas for non-relational and NoSQL databases -The design process for non-relational databases often looks quite different. A large part of this difference stems from the fact that often, non-relational databases are chosen to allow for high performance on a limited number of predefined queries. +The design process for non-relational databases often looks quite different. A large part of this difference stems from the fact that often, non-relational databases are chosen to allow for high performance on a limited number of predefined queries. #### Determining your primary queries -Non-relational database schemas are often designed in tandem with the application that will use them. The schema reflects the specific needs of the application and, in a sense, is a custom structure designed to fit the mold developed by the application. +Non-relational database schemas are often designed in tandem with the application that will use them. The schema reflects the specific needs of the application and, in a sense, is a custom structure designed to fit the mold developed by the application. -Because of this close relationship, it is important to determine what queries your database must be optimized to respond to. The first step is figuring out what queries your database will need to run. Since you don't have a data structure yet, these will be pseudo queries, but understanding what data your application will need to perform certain operations is your first objective. +Because of this close relationship, it is important to determine what queries your database must be optimized to respond to. The first step is figuring out what queries your database will need to run. Since you don't have a data structure yet, these will be pseudo queries, but understanding what data your application will need to perform certain operations is your first objective. -Once you have a good idea of what queries your application will need to perform, you need to select the most important ones to focus on. These are the queries that your application performs often and cannot afford to wait for. +Once you have a good idea of what queries your application will need to perform, you need to select the most important ones to focus on. These are the queries that your application performs often and cannot afford to wait for. -Defining which of your queries are the most important tells you the exact access pattern your data structure needs to optimize around. The way that the database system stores and represents data will have a huge impact on its ability to quickly retrieve and manipulate data items. +Defining which of your queries are the most important tells you the exact access pattern your data structure needs to optimize around. The way that the database system stores and represents data will have a huge impact on its ability to quickly retrieve and manipulate data items. #### Design your initial schema around your primary queries Now that you know your most essential access patterns, you can start to develop a schema to match these queries. -Your first step in this process will be to determine the exact information required to be returned by each query. Then, map out what it would look like to store all of the information to respond to a query in a single entity. +Your first step in this process will be to determine the exact information required to be returned by each query. Then, map out what it would look like to store all of the information to respond to a query in a single entity. For instance, if your application will be querying your database to retrieve user profile information, your starting point should likely be to assume that all of the users profile information can be stored in a single place. -#### Combine and deduplicate data entities where possible +#### Combine and deduplicate data entities where possible -After you've determined the attributes that are needed and mapped out what it would look like to store all items related to each query in a single entity, check for overlaps. The idea is to consolidate data entities where possible to reduce the number of separate items your system will maintain. The greater number of distinct entity types you maintain, the greater chance for inconsistency and update performance problems to arise. +After you've determined the attributes that are needed and mapped out what it would look like to store all items related to each query in a single entity, check for overlaps. The idea is to consolidate data entities where possible to reduce the number of separate items your system will maintain. The greater number of distinct entity types you maintain, the greater chance for inconsistency and update performance problems to arise. -Some of these overlaps will be fairly obvious. Cases where one query returns a subset of the attributes that another query does can be safely collapsed into a single entity. +Some of these overlaps will be fairly obvious. Cases where one query returns a subset of the attributes that another query does can be safely collapsed into a single entity. -Other times, it may be more difficult to determine how to map the information for your queries. Non-relational databases are often not great at coalescing data from multiple entities in a single query, something relational databases excel at through joins. So when certain attributes or entities are present in multiple queries, you may have to make a choice in how best to represent that data. +Other times, it may be more difficult to determine how to map the information for your queries. Non-relational databases are often not great at coalescing data from multiple entities in a single query, something relational databases excel at through joins. So when certain attributes or entities are present in multiple queries, you may have to make a choice in how best to represent that data. #### Determine where your application can fill in the gaps -For some queries, your application may need to do part of the work of assembling data instead of relying on the database to respond with all of the relevant information in a single query. For example, if you need to handle customer information and their associated orders, it might make sense to store orders in a different category and reference them by ID in your customer objects. +For some queries, your application may need to do part of the work of assembling data instead of relying on the database to respond with all of the relevant information in a single query. For example, if you need to handle customer information and their associated orders, it might make sense to store orders in a different category and reference them by ID in your customer objects. -Some database systems cannot easily join this information by following the references between your objects. Instead, your application may need to query the customer first and then make additional queries for each of the related orders using the order IDs you discovered. +Some database systems cannot easily join this information by following the references between your objects. Instead, your application may need to query the customer first and then make additional queries for each of the related orders using the order IDs you discovered. -Performing these operations in your application code can help work around the limitations of some non-relational databases. This is often a better option than attempting to maintain a great deal of information within a single entry or attempting to duplicate data many times for many different types of database objects. Those options could result in very poor performance and data consistency. +Performing these operations in your application code can help work around the limitations of some non-relational databases. This is often a better option than attempting to maintain a great deal of information within a single entry or attempting to duplicate data many times for many different types of database objects. Those options could result in very poor performance and data consistency. -That being said, it will be important to test and tune your application code and database schemas once they are both online to ensure that you are not trading good application performance for fast database operations. A good rule of thumb is to use the database's capabilities whenever possible since they are highly optimized for information retrieval and manipulation and to supplement within your application as required. +That being said, it will be important to test and tune your application code and database schemas once they are both online to ensure that you are not trading good application performance for fast database operations. A good rule of thumb is to use the database's capabilities whenever possible since they are highly optimized for information retrieval and manipulation and to supplement within your application as required. #### Determine appropriate partition keys -For highly scalable, non-relational databases, users often have to determine a partition or sharding key. These keys will be used to split datasets among various servers to improve performance and responsiveness. +For highly scalable, non-relational databases, users often have to determine a partition or sharding key. These keys will be used to split datasets among various servers to improve performance and responsiveness. -Finding the right partition keys is highly dependent on your data and your workloads. Some general rules, however, can help guide you. +Finding the right partition keys is highly dependent on your data and your workloads. Some general rules, however, can help guide you. -It is best to try to choose a partition key that has a fairly regular distribution of keys. For instance, if you need to distribute customer data, their birth month would typically lead to a decent distribution. In contrast, if you are selling winter clothing, sign up month would not be a good partition key since your products' seasonality would likely affect the distribution of keys. Applying a hashing algorithm to your candidate data can also sometimes help to distribute your key space more evenly. +It is best to try to choose a partition key that has a fairly regular distribution of keys. For instance, if you need to distribute customer data, their birth month would typically lead to a decent distribution. In contrast, if you are selling winter clothing, sign up month would not be a good partition key since your products' seasonality would likely affect the distribution of keys. Applying a hashing algorithm to your candidate data can also sometimes help to distribute your key space more evenly. -Another consideration is whether your workloads are read or write heavy. If you have a read-heavy application, you likely want to choose a partition key that will allow you to write as much related data to a single server as possible. This will help you avoid having to read from many servers each time you need to retrieve related data. +Another consideration is whether your workloads are read or write heavy. If you have a read-heavy application, you likely want to choose a partition key that will allow you to write as much related data to a single server as possible. This will help you avoid having to read from many servers each time you need to retrieve related data. -On the other hand, if you have write-heavy workloads, it is often preferable to spread the writes over as many servers as possible. If each request ends up writing data to the same server, you will not gain much performance for write-intensive operations. +On the other hand, if you have write-heavy workloads, it is often preferable to spread the writes over as many servers as possible. If each request ends up writing data to the same server, you will not gain much performance for write-intensive operations. ## Wrapping up Designing effective database schemas takes patience, practice, and often a lot of trial and error. -To start, you have to try to develop a good idea of what your data will look like, how your applications will use it, and what usability and data integrity requirements are required. Afterwards, your goal is to develop a schema that reflects your data's specific features and facilitates the type of use cases you anticipate. +To start, you have to try to develop a good idea of what your data will look like, how your applications will use it, and what usability and data integrity requirements are required. Afterwards, your goal is to develop a schema that reflects your data's specific features and facilitates the type of use cases you anticipate. -Schema design, like any other type of design, is an iterative process. Expect to change your design as your understanding of the problem space deepens and as real world performance data becomes available. While you may have to evolve your schema over time, starting off with a solid foundation will both aid you in this process and reduce the likelihood of dramatic, disruptive schema changes in the future. +Schema design, like any other type of design, is an iterative process. Expect to change your design as your understanding of the problem space deepens and as real world performance data becomes available. While you may have to evolve your schema over time, starting off with a solid foundation will both aid you in this process and reduce the likelihood of dramatic, disruptive schema changes in the future. -Prisma defines the characteristics of its data in the [data model](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model) section of the [Prisma schema](https://www.prisma.io/docs/concepts/components/prisma-schema) file. Check out the linked documentation to learn more about how these concepts apply to Prisma. +Prisma defines the characteristics of its data in the [data model](https://www.prisma.io/docs/orm/prisma-schema/data-model) section of the [Prisma schema](https://www.prisma.io/docs/orm/prisma-schema/overview) file. Check out the linked documentation to learn more about how these concepts apply to Prisma. -You might also want to take a look at the [Prisma schema API reference page](https://www.prisma.io/docs/reference/api-reference/prisma-schema-reference) to get an overview of how to use the various features. +You might also want to take a look at the [Prisma schema API reference page](https://www.prisma.io/docs/orm/reference/prisma-schema-reference) to get an overview of how to use the various features. diff --git a/content/01-intro/04-database-glossary.mdx b/content/01-intro/04-database-glossary.mdx index a71cd89f..77d2b20e 100644 --- a/content/01-intro/04-database-glossary.mdx +++ b/content/01-intro/04-database-glossary.mdx @@ -1,740 +1,1843 @@ --- title: 'Glossary of common database terminology' metaTitle: "Database glossary | Prisma's Data Guide" -metaDescription: "Database terminology can be difficult to understand. This glossary was designed to help you learn important terminology by providing definitions and context in one place." +metaDescription: 'Database terminology can be difficult to understand. This glossary was designed to help you learn important terminology by providing definitions and context in one place.' authors: ['justinellingwood'] --- ## Introduction -When dealing with databases, there is a lot of terminology that you must learn in order to understand the technology, how best to use it, and how it relates to other parts of your environment. This glossary aims to collect common terminology used in the database community and provide definitions and context to help you grow your knowledge. +When dealing with databases, there is a lot of terminology that you must learn in order to understand the technology, how best to use it, and how it relates to other parts of your environment. This glossary aims to collect common terminology used in the database community and provide definitions and context to help you grow your knowledge. -This glossary is a work in progress and a living document. We intend to update it to add new topics and refine the existing entries as time goes on. We have a backlog of terms we hope to add in the near future, but if you have anything you'd like us to talk about, please open a [GitHub issue](https://github.com/prisma/dataguide/issues/new?title=Glossary%20suggestion%3A%20) to add your suggestions. +This glossary is a work in progress and a living document. We intend to update it to add new topics and refine the existing entries as time goes on. We have a backlog of terms we hope to add in the near future, but if you have anything you'd like us to talk about, please open a [GitHub issue](https://github.com/prisma/dataguide/issues/new?title=Glossary%20suggestion%3A%20) to add your suggestions. ## Terminology
-1NF, or first normal form, describes a type of database normalization where each table column only has a single value. A column that has a nested table as a value or multiple values is not in 1NF. + 1NF, or first normal form, describes a type of database normalization where + each table column only has a single value. A column that has a nested table as + a value or multiple values is not in 1NF. -2NF, or second normal form, describes a type of database normalization that: 1) satisfies the requirements of 1NF, 2) has no values that are tied directly to a subset of a candidate key. In other words, a relation is in 2NF if it is in 1NF and all of the non-candidate values are dependent on the composite key in whole, not just a portion of the candidate key. For example, a `book` table that has a candidate key composed of `title` and `author` cannot be in 2NF if it also includes a `dob` field describing the author's date of birth. That column value is dependent only on the value of `author` and could lead to inconsistencies if the values get out of sync. + 2NF, or second normal form, describes a type of database normalization that: + 1) satisfies the requirements of 1NF, 2) has no values that are tied directly + to a subset of a candidate key. In other words, a relation is in 2NF if it is + in 1NF and all of the non-candidate values are dependent on the composite key + in whole, not just a portion of the candidate key. For example, a `book` table + that has a candidate key composed of `title` and `author` cannot be in 2NF if + it also includes a `dob` field describing the author's date of birth. That + column value is dependent only on the value of `author` and could lead to + inconsistencies if the values get out of sync. -3NF, or third normal form, describes a type of database normalization that: 1) satisfies the requirements of 2NF, 2) each non-key attribute is not transitively dependent on a key attribute. For example, if a `user` table has a `user_id` column as a primary key, a `user_city` column, and a `user_state` column, it would not be in 3NF because `user_state` is transitively dependent on `user_id` through `user_city` (the city and state should be extracted to their own table and referenced together). + 3NF, or third normal form, describes a type of database normalization that: 1) + satisfies the requirements of 2NF, 2) each non-key attribute is not + transitively dependent on a key attribute. For example, if a `user` table has + a `user_id` column as a primary key, a `user_city` column, and a `user_state` + column, it would not be in 3NF because `user_state` is transitively dependent + on `user_id` through `user_city` (the city and state should be extracted to + their own table and referenced together). -4NF, or fourth normal form, describes a type of database normalization that: 1) satisfies the requirements of BCNF, 2) for every non-trivial multivalued dependency, the determining attribute in the dependency is either a candidate key or a superset of it. In other words, if a field has multiple dependent fields that are independent from one another, it can lead to redundancies that violate 4NF rules. + 4NF, or fourth normal form, describes a type of database normalization that: + 1) satisfies the requirements of BCNF, 2) for every non-trivial multivalued + dependency, the determining attribute in the dependency is either a candidate + key or a superset of it. In other words, if a field has multiple dependent + fields that are independent from one another, it can lead to redundancies that + violate 4NF rules. -ACID — an acronym created from the words atomicity, consistency, isolation, and durability — describes a set of characteristics that database transactions are meant to provide. Atomicity guarantees that all operations in a transaction will complete successfully or will be rolled back. Consistency, often considered a property maintained by the application rather than the database, is often achieved through transactions to make sure that all related values are updated at once. Transaction isolation aims to allow simultaneous transactions to execute independently. Durability means that transactions are meant to be stored on non-volatile storage when committed. + ACID — an acronym created from the words atomicity, consistency, isolation, + and durability — describes a set of characteristics that database transactions + are meant to provide. Atomicity guarantees that all operations in a + transaction will complete successfully or will be rolled back. Consistency, + often considered a property maintained by the application rather than the + database, is often achieved through transactions to make sure that all related + values are updated at once. Transaction isolation aims to allow simultaneous + transactions to execute independently. Durability means that transactions are + meant to be stored on non-volatile storage when committed. -An access control list, often shorted to ACL, is a security policy list that dictates which actions each user or process can perform on which resources. There are many different types of ACLs, but they each describe the permissions and access patterns that are allowed by a system. + An access control list, often shorted to ACL, is a security policy list that + dictates which actions each user or process can perform on which resources. + There are many different types of ACLs, but they each describe the permissions + and access patterns that are allowed by a system. -An active record ORM is an object-relational mapper that functions by trying to represent each table in a database as a class in the application. Each record in the table is represented as an instance of the class. Database entries are added and managed by interacting with these representations in the application. + An active record ORM is an object-relational mapper that functions by trying + to represent each table in a database as a class in the application. Each + record in the table is represented as an instance of the class. Database + entries are added and managed by interacting with these representations in the + application. -Anti-caching is a strategy that can be used when data is not found in the faster in-memory cache and must be retrieved from slower, persistent storage. The technique involves aborting the transaction and kicking off an asynchronous operation to fetch the data from the slower medium to memory. The transaction can be retried later and the information will be ready to served from memory. + Anti-caching is a strategy that can be used when data is not found in the + faster in-memory cache and must be retrieved from slower, persistent storage. + The technique involves aborting the transaction and kicking off an + asynchronous operation to fetch the data from the slower medium to memory. The + transaction can be retried later and the information will be ready to served + from memory. -Atomicity is a quality mainly associated with database transactions that means that the operations encapsulated in the transaction are handled in an all-or-nothing fashion. This prevents partial updates from occurring where some operations were performed before an error condition arose, leading to inconsistent data. In the case of transactions, either all of the operations are committed or every operation is rolled back to leave the database in the same state that it was in when the transaction began. + Atomicity is a quality mainly associated with database transactions that means + that the operations encapsulated in the transaction are handled in an + all-or-nothing fashion. This prevents partial updates from occurring where + some operations were performed before an error condition arose, leading to + inconsistent data. In the case of transactions, either all of the operations + are committed or every operation is rolled back to leave the database in the + same state that it was in when the transaction began. -Attributes are characteristics that describe a certain entity in a database. In the ER (entity-relationship) model, attributes are any additional properties that are not relationships that add information about an entity. + Attributes are characteristics that describe a certain entity in a database. + In the ER (entity-relationship) model, attributes are any additional + properties that are not relationships that add information about an entity. -Authentication is an action that validates an identity. In computing and databases, authentication is mainly used as a way to prove that the person or process requesting access has the credentials to validate that they can operate with a specific identity. In practical terms, this might include providing an identity (like a username) and associated authentication material (such as a password, a certificate or key file, or a secret generated by a hardware device belonging to the person associated with the identity). Authentication is used in conjunction with authorization to determine if a user has permission to perform actions on a system. + Authentication is an action that validates an identity. In computing and + databases, authentication is mainly used as a way to prove that the person or + process requesting access has the credentials to validate that they can + operate with a specific identity. In practical terms, this might include + providing an identity (like a username) and associated authentication material + (such as a password, a certificate or key file, or a secret generated by a + hardware device belonging to the person associated with the identity). + Authentication is used in conjunction with authorization to determine if a + user has permission to perform actions on a system. -Authorization is an action that determines if a certain user or process should be allowed to perform a certain action. Authorization involves checking the requested action against a set of guidelines that describe who should be allowed perform what actions. Authorization usually relies on a trusted authentication process to take place before the request in order to confirm the subject's identity. + Authorization is an action that determines if a certain user or process should + be allowed to perform a certain action. Authorization involves checking the + requested action against a set of guidelines that describe who should be + allowed perform what actions. Authorization usually relies on a trusted + authentication process to take place before the request in order to confirm + the subject's identity. -Availability is a property that describes the degree to which a system is running and capable of performing work. In terms of computing systems like databases, for a single machine, availability is synonymous with the uptime of the application on that computer. For distributed systems, availability is subject to rules that dictate in what capacity the system is allowed to continue functioning if a subset of the system is unavailable. + Availability is a property that describes the degree to which a system is + running and capable of performing work. In terms of computing systems like + databases, for a single machine, availability is synonymous with the uptime of + the application on that computer. For distributed systems, availability is + subject to rules that dictate in what capacity the system is allowed to + continue functioning if a subset of the system is unavailable. -BASE — an acronym created from the words Basically Available, Soft-state, and Eventually consistent — describes a set of characteristics of some NoSQL databases. It is offered as an description for certain databases that do not conform to the properties described by ACID-compliance (atomicity, consistency, isolation, and durability). BASE databases choose to remain available at the expense of strict data consistency in cases of network partitions. The soft-state component refers to the fact that the state of the system can be in flux as the different members negotiate the most correct values in the system. Eventually consistent is another related statement indicating that the system will eventually achieve consistency given enough time and assuming new inconsistencies aren't introduced during that time. + BASE — an acronym created from the words Basically Available, Soft-state, and + Eventually consistent — describes a set of characteristics of some NoSQL + databases. It is offered as an description for certain databases that do not + conform to the properties described by ACID-compliance (atomicity, + consistency, isolation, and durability). BASE databases choose to remain + available at the expense of strict data consistency in cases of network + partitions. The soft-state component refers to the fact that the state of the + system can be in flux as the different members negotiate the most correct + values in the system. Eventually consistent is another related statement + indicating that the system will eventually achieve consistency given enough + time and assuming new inconsistencies aren't introduced during that time. -BCNF, or Boyce-Codd normal form, describes a type of database normalization that: 1) satisfies the requirements of 3NF, 2) where the determining attribute in each dependency (the attribute that dictates another attribute's value) is either a _superset_ of the dependent attribute, is a candidate key, or is a superset of a candidate key. + BCNF, or Boyce-Codd normal form, describes a type of database normalization + that: 1) satisfies the requirements of 3NF, 2) where the determining attribute + in each dependency (the attribute that dictates another attribute's value) is + either a _superset_ of the dependent attribute, is a candidate key, or is a + superset of a candidate key. -Blue-green deployments are a technique for deploying software updates with little to no downtime by managing active traffic between two identical sets of infrastructure. New releases can be deployed to the inactive infrastructure group and tested independently. To release the new version, a traffic routing mechanism is switched to direct traffic from the current infrastructure to the infrastructure with the new version. The previously-active infrastructure now functions as the target for the next updates. This strategy is helpful in that the routing mechanism can easily switch back and forth to roll backwards or forwards depending on the success of a deployment. + Blue-green deployments are a technique for deploying software updates with + little to no downtime by managing active traffic between two identical sets of + infrastructure. New releases can be deployed to the inactive infrastructure + group and tested independently. To release the new version, a traffic routing + mechanism is switched to direct traffic from the current infrastructure to the + infrastructure with the new version. The previously-active infrastructure now + functions as the target for the next updates. This strategy is helpful in that + the routing mechanism can easily switch back and forth to roll backwards or + forwards depending on the success of a deployment. -In computing, a bottleneck occurs when the performance or capacity of a system is limited by contention around a single component. In databases, this can be related to the hardware that the database runs on or the network environment that is available. Application usage patterns can also affect which resource is most under contention. To solve bottlenecks, you must first identify the resource limiting your system's performance and then either add additional capacity or take measures to reduce the rate of usage. + In computing, a bottleneck occurs when the performance or capacity of a system + is limited by contention around a single component. In databases, this can be + related to the hardware that the database runs on or the network environment + that is available. Application usage patterns can also affect which resource + is most under contention. To solve bottlenecks, you must first identify the + resource limiting your system's performance and then either add additional + capacity or take measures to reduce the rate of usage. -CAP theorem is a statement about distributed databases that states that any system can only provide at most two out of the following three qualities: consistency, availability, and partition tolerance. Generally, it is agreed that partition tolerance must be a feature of any distributed system (as the only way to avoid all network partitions is to have a non-distributed system). Therefore, each distributed system must make a decision as to whether they want to prioritize data consistency (by not accepting new changes in the case of a partition) or system availability (by sacrificing some consistency for the sake of still being able to introduce new changes during the partition). + CAP theorem is a statement about distributed databases that states that any + system can only provide at most two out of the following three qualities: + consistency, availability, and partition tolerance. Generally, it is agreed + that partition tolerance must be a feature of any distributed system (as the + only way to avoid all network partitions is to have a non-distributed system). + Therefore, each distributed system must make a decision as to whether they + want to prioritize data consistency (by not accepting new changes in the case + of a partition) or system availability (by sacrificing some consistency for + the sake of still being able to introduce new changes during the partition). -CRUD — an acronym standing for Create, Read, Update, and Delete — describes the basic operations that one uses to operate on stored data. In SQL, the components of CRUD broadly correspond to the operations `INSERT`, `SELECT`, `UPDATE`, and `DELETE`, but many other operations facilitate more granular actions. More generally, CRUD is also often discussed in the context of user interfaces and APIs as a description of the types of actions that a system may permit. + CRUD — an acronym standing for Create, Read, Update, and Delete — describes + the basic operations that one uses to operate on stored data. In SQL, the + components of CRUD broadly correspond to the operations `INSERT`, `SELECT`, + `UPDATE`, and `DELETE`, but many other operations facilitate more granular + actions. More generally, CRUD is also often discussed in the context of user + interfaces and APIs as a description of the types of actions that a system may + permit. -A cache is a component of a system designed to allow faster retrieval for high value or frequently requested pieces of data. In general, caches function by storing a useful fraction of data on media that is either higher performance or closer to the client than the general use persistent media focused on long term, non-volatile storage. In general, caches tend to be higher performance but tend to have more limited capacity and be more expensive. + A cache is a component of a system designed to allow faster retrieval for high + value or frequently requested pieces of data. In general, caches function by + storing a useful fraction of data on media that is either higher performance + or closer to the client than the general use persistent media focused on long + term, non-volatile storage. In general, caches tend to be higher performance + but tend to have more limited capacity and be more expensive. -Cache-aside is a caching architecture that positions the cache outside of the regular path between application and database. In this arrangement, the application will fetch data from the cache if it is available there. If the data is not in the cache, the application will issue a separate query to the original data source to fetch the data and then write that data to the cache for subsequent queries. The minimal crossover between the cache and backing data source allows this architecture to be resilient against unavailable caches. Cache-aside is well-suited for read-heavy workloads. + Cache-aside is a caching architecture that positions the cache outside of the + regular path between application and database. In this arrangement, the + application will fetch data from the cache if it is available there. If the + data is not in the cache, the application will issue a separate query to the + original data source to fetch the data and then write that data to the cache + for subsequent queries. The minimal crossover between the cache and backing + data source allows this architecture to be resilient against unavailable + caches. Cache-aside is well-suited for read-heavy workloads. -Cache invalidation is the process of targeting and removing specific items from a cache. Most often, this is performed as part of a routine when updating records so that the data in the cache does not serve stale data to clients. + Cache invalidation is the process of targeting and removing specific items + from a cache. Most often, this is performed as part of a routine when updating + records so that the data in the cache does not serve stale data to clients. -A canary release describes a release strategy where new versions of software are deployed to a small subset of servers to test new changes in an environment with limited impact. The deployment and resulting behavior of the test group are observed and the team can then decide if they want to roll back the changes or continue to deploy the changes to a wider range of hosts. Canary releases are a way of testing in production while limiting the number of clients impacted by any problems. + A canary release describes a release strategy where new versions of software + are deployed to a small subset of servers to test new changes in an + environment with limited impact. The deployment and resulting behavior of the + test group are observed and the team can then decide if they want to roll back + the changes or continue to deploy the changes to a wider range of hosts. + Canary releases are a way of testing in production while limiting the number + of clients impacted by any problems. -A candidate key in a relational database is the term for a minimal superkey. In other words, a candidate key is any column or combination of columns that can be used to uniquely identify each record in a relation without including columns that do not help in specificity. In a `cars` table, a unique `car_id` column would be a candidate key as well as a combination of the `make`, `model`, and `year` columns (assuming that's specific enough to eliminate any duplicates). However, `car_id` and `make` would not be a candidate key since in this instance, `make` does nothing to narrow down the uniqueness of each row. + A candidate key in a relational database is the term for a minimal superkey. + In other words, a candidate key is any column or combination of columns that + can be used to uniquely identify each record in a relation without including + columns that do not help in specificity. In a `cars` table, a unique `car_id` + column would be a candidate key as well as a combination of the `make`, + `model`, and `year` columns (assuming that's specific enough to eliminate any + duplicates). However, `car_id` and `make` would not be a candidate key since + in this instance, `make` does nothing to narrow down the uniqueness of each + row. -In relational databases, cascade is an option for how to handle deletes or updates for records that have related entries in other tables. Cascade means that the operation (delete or update) should be applied to the child, dependent rows as well. This helps you avoid orphaned rows in the case of deletes and out of sync values in the case of updates. + In relational databases, cascade is an option for how to handle deletes or + updates for records that have related entries in other tables. Cascade means + that the operation (delete or update) should be applied to the child, + dependent rows as well. This helps you avoid orphaned rows in the case of + deletes and out of sync values in the case of updates. -Apache Cassandra is a distributed, wide-column NoSQL database focused on operating on and managing large volumes of data. Cassandra scales incredibly well and each node in the cluster can accept reads or writes. Data is stored in rows that are uniquely identifiable and partitioned based on partition key. Each partition key returns a row of data with both column names and values defined internally, meaning each row in the same column family may contain different columns. + Apache Cassandra is a distributed, wide-column NoSQL database focused on + operating on and managing large volumes of data. Cassandra scales incredibly + well and each node in the cluster can accept reads or writes. Data is stored + in rows that are uniquely identifiable and partitioned based on partition key. + Each partition key returns a row of data with both column names and values + defined internally, meaning each row in the same column family may contain + different columns. -A check constraint is perhaps the most flexible table or column constraint that can be added to a relational database. It is defined as a boolean condition that must be met for the proposed data to be accepted by the system. Because of the nature of the condition is fairly open-ended, check constraints can be used to model many different types of requirements to ensure that the data coming into the system conforms to expectations. + A check constraint is perhaps the most flexible table or column constraint + that can be added to a relational database. It is defined as a boolean + condition that must be met for the proposed data to be accepted by the system. + Because of the nature of the condition is fairly open-ended, check constraints + can be used to model many different types of requirements to ensure that the + data coming into the system conforms to expectations. -In computing, a cluster is a group of computers all dedicated to helping with a shared task. Database clusters are used to increase the capacity, availability, and performance of certain types of actions compared to database deployed on a single computer. There are many different topologies, technologies, and trade-offs that different clustered systems employ to achieve different levels of performance or fault tolerance. Because of the diversity of different implementations, it can be difficult to generalize specific characteristics that apply to all clustered database systems. + In computing, a cluster is a group of computers all dedicated to helping with + a shared task. Database clusters are used to increase the capacity, + availability, and performance of certain types of actions compared to database + deployed on a single computer. There are many different topologies, + technologies, and trade-offs that different clustered systems employ to + achieve different levels of performance or fault tolerance. Because of the + diversity of different implementations, it can be difficult to generalize + specific characteristics that apply to all clustered database systems. -Collation in databases refers to the ordering and comparison characteristics of different character systems. Most databases allow you to assign collation settings, which impact how text in the system are sorted, displayed, and compared against one another. Collation is often defined using a set of labels that describe the character set, language context, and different options about sensitivity or insensitivity to capitalization, accents, and other character modifiers. + Collation in databases refers to the ordering and comparison characteristics + of different character systems. Most databases allow you to assign collation + settings, which impact how text in the system are sorted, displayed, and + compared against one another. Collation is often defined using a set of labels + that describe the character set, language context, and different options about + sensitivity or insensitivity to capitalization, accents, and other character + modifiers. -In document databases, collections are containers that are used to store groups of documents together. The collections may have semantic meaning assigned by the application and database designers, but otherwise are simply a way to partition different sets of documents from one another in the system. Different collections can be assigned different properties and actions can be performed targeting specific collections of documents. + In document databases, collections are containers that are used to store + groups of documents together. The collections may have semantic meaning + assigned by the application and database designers, but otherwise are simply a + way to partition different sets of documents from one another in the system. + Different collections can be assigned different properties and actions can be + performed targeting specific collections of documents. -Columns are a component of table-oriented databases that label and potentially define the type of each value stored in that column. In traditional relational databases, the properties of a series of columns are one of the primary ways of defining the properties of the table in general. Each row added to the table must provide values that conform to the requirements associated with the table's columns. In non-relational databases, columns can have many different properties. Generally, however, they are used to label and define the characteristics for values that records choose to store in that column. + Columns are a component of table-oriented databases that label and potentially + define the type of each value stored in that column. In traditional relational + databases, the properties of a series of columns are one of the primary ways + of defining the properties of the table in general. Each row added to the + table must provide values that conform to the requirements associated with the + table's columns. In non-relational databases, columns can have many different + properties. Generally, however, they are used to label and define the + characteristics for values that records choose to store in that column. -A column database or column-oriented database is a table-oriented database similar to a traditional relational database that stores data in the background by column instead of by record. This means that the data associated with a single column are stored together rather than grouping all of the data associated with a single record. This can provide different performance characteristics depending on usage patterns, but generally doesn't affect how the user interacts with the data in the table on a daily basis. Although often confused in the literature, column databases are not to be confused with wide column databases or column family databases. + A column database or column-oriented database is a table-oriented database + similar to a traditional relational database that stores data in the + background by column instead of by record. This means that the data associated + with a single column are stored together rather than grouping all of the data + associated with a single record. This can provide different performance + characteristics depending on usage patterns, but generally doesn't affect how + the user interacts with the data in the table on a daily basis. Although often + confused in the literature, column databases are not to be confused with wide + column databases or column family databases. -A column family is a database object that stores groups of key-value pairs where each key is a row identifier and each value is a group of column names and values. All together, a column family constructs something that is akin to a table in relational databases. However, each row can define its own columns, meaning that rows are of varying lengths and do not have to match each other in the columns represented or the data types stored. - - - -Command query responsibility segregation is a application design pattern that allows you to separate operations based on their impact on the underlying database. In general, this usually means providing different mechanisms for queries that read data versus queries that change data. Separating these two contexts allows you to make infrastructure and system changes to scale each use-case independently, increasing performance. + A column family is a database object that stores groups of key-value pairs + where each key is a row identifier and each value is a group of column names + and values. All together, a column family constructs something that is akin to + a table in relational databases. However, each row can define its own columns, + meaning that rows are of varying lengths and do not have to match each other + in the columns represented or the data types stored. + + + + Command query responsibility segregation is a application design pattern that + allows you to separate operations based on their impact on the underlying + database. In general, this usually means providing different mechanisms for + queries that read data versus queries that change data. Separating these two + contexts allows you to make infrastructure and system changes to scale each + use-case independently, increasing performance. -In the context of databases, committing data is the process whereby you execute and durably store a set of proposed actions. Many databases are configured to automatically commit each statement as it is received by the system, but transactions, for example, are one mechanism through which you can control the commit behavior of the database by grouping multiple statements together and committing them as a group. Committing in database is the action that is actually responsible for performing a permanent action on the system. + In the context of databases, committing data is the process whereby you + execute and durably store a set of proposed actions. Many databases are + configured to automatically commit each statement as it is received by the + system, but transactions, for example, are one mechanism through which you can + control the commit behavior of the database by grouping multiple statements + together and committing them as a group. Committing in database is the action + that is actually responsible for performing a permanent action on the system. -In relational databases, a composite key is a key composed of two or more columns that can be used to uniquely identify any record in a table. For example, if we have a `shirts` table that only stores a single record for each combination of size and color could have a composite key defined by a combination of the `color` and `size` columns. + In relational databases, a composite key is a key composed of two or more + columns that can be used to uniquely identify any record in a table. For + example, if we have a `shirts` table that only stores a single record for each + combination of size and color could have a composite key defined by a + combination of the `color` and `size` columns. -Concurrency is the ability of a system to work on multiple tasks at once without affecting the overall result. Concurrency allows systems to execute operations in parallel, increasing the relative performance of the group of tasks. + Concurrency is the ability of a system to work on multiple tasks at once + without affecting the overall result. Concurrency allows systems to execute + operations in parallel, increasing the relative performance of the group of + tasks. -Connection pooling is a strategy used to improve performance and avoid connection exhaustion by managing the connections between an application and database. It does this by maintaining a pool of connections to the database. By keeping the connections open and reusing them for multiple queries, the application can forgo the overhead of having to establish a connection each time and the database's connection limits can be managed by pooling component. + Connection pooling is a strategy used to improve performance and avoid + connection exhaustion by managing the connections between an application and + database. It does this by maintaining a pool of connections to the database. + By keeping the connections open and reusing them for multiple queries, the + application can forgo the overhead of having to establish a connection each + time and the database's connection limits can be managed by pooling component. -Consistency is a property of data systems that means that the individual data entities do not conflict and continue to model the information they intend to even as changes are introduced. Each piece of data and change must be validated to ensure that it conforms to the rules imposed on the data structures and care must be taken to balance out any changes that should impact other data (like debiting and crediting different accounts at the same time). + Consistency is a property of data systems that means that the individual data + entities do not conflict and continue to model the information they intend to + even as changes are introduced. Each piece of data and change must be + validated to ensure that it conforms to the rules imposed on the data + structures and care must be taken to balance out any changes that should + impact other data (like debiting and crediting different accounts at the same + time). -A constraint is a limitation imposed on a specific column or table that impacts the range of values accepted by the system. Constraints are used to define rules that the database system can enforce to ensure that values conform to requirements. + A constraint is a limitation imposed on a specific column or table that + impacts the range of values accepted by the system. Constraints are used to + define rules that the database system can enforce to ensure that values + conform to requirements. -A database cursor is a way for clients to iterate over records and query results in a controlled, precise manner. Cursors are primarily used to page through results that match a query one-by-one by iteratively returning the next row for processing. This can help you operate on an unknown number of records by accessing the results as a queue. Care must be taken when using cursors as they take up resources on the database system, can result in locking, and often result in many more network round trips than would be required otherwise. + A database cursor is a way for clients to iterate over records and query + results in a controlled, precise manner. Cursors are primarily used to page + through results that match a query one-by-one by iteratively returning the + next row for processing. This can help you operate on an unknown number of + records by accessing the results as a queue. Care must be taken when using + cursors as they take up resources on the database system, can result in + locking, and often result in many more network round trips than would be + required otherwise. -Dark launching is a deployment and release strategy that helps organizations test new changes in production contexts without affecting the user experience. Dark launching involves releasing new code in parallel to the original functionality. Requests and actions are then mirrored and run against both the old code and the new code. While the system's behavior from the user's perspective is only affected by the original code, the new code can be tested with real data to validate functionality and catch performance and functional problems. When properly vetted, the application can be altered to use the new code path exclusively. + Dark launching is a deployment and release strategy that helps organizations + test new changes in production contexts without affecting the user experience. + Dark launching involves releasing new code in parallel to the original + functionality. Requests and actions are then mirrored and run against both the + old code and the new code. While the system's behavior from the user's + perspective is only affected by the original code, the new code can be tested + with real data to validate functionality and catch performance and functional + problems. When properly vetted, the application can be altered to use the new + code path exclusively. -In the broadest sense, data are facts or pieces of information. They are measurements or values that contain information about something. In some contexts, data is defined as distinct from information in that information is analyzed or processed data while data consists only of raw values. Practically speaking, however, these terms are often used as synonyms and typically encapsulate any fact along with the relevant context necessary to interpret or contextualize it. Data is an essential component of almost all communication and activity and it can gain meaning and value as it is collected, analyzed, and contextualized. - - - -A data definition language, or DDL, is a set of commands or actions that are used to define database structures and objects. They are a key component to relational and other databases and are expressed as a subset of the available commands available to manage data in languages like SQL. Data definition language is the portion of the language dedicated to describing, creating, and modifying structures and the frameworks that will hold data. + In the broadest sense, data are facts or pieces of information. They are + measurements or values that contain information about something. In some + contexts, data is defined as distinct from information in that information is + analyzed or processed data while data consists only of raw values. Practically + speaking, however, these terms are often used as synonyms and typically + encapsulate any fact along with the relevant context necessary to interpret or + contextualize it. Data is an essential component of almost all communication + and activity and it can gain meaning and value as it is collected, analyzed, + and contextualized. + + + + A data definition language, or DDL, is a set of commands or actions that are + used to define database structures and objects. They are a key component to + relational and other databases and are expressed as a subset of the available + commands available to manage data in languages like SQL. Data definition + language is the portion of the language dedicated to describing, creating, and + modifying structures and the frameworks that will hold data. -Data independence is a term used to describe the separation of database clients or applications from the underlying structure responsible for representing and storing the data. Data independence is achieved if the database is able to abstract the structure in a way that allows user applications to continue running even if additional attributes are added to a relation (logical independence) or if the details of the storage medium changes (physical independence), for instance. + Data independence is a term used to describe the separation of database + clients or applications from the underlying structure responsible for + representing and storing the data. Data independence is achieved if the + database is able to abstract the structure in a way that allows user + applications to continue running even if additional attributes are added to a + relation (logical independence) or if the details of the storage medium + changes (physical independence), for instance. -A data mapper ORM, or just simply a data mapper, is an application component that acts as a go between to translate between database representations and the data structures present in applications. Data mappers allow your application logic and database data representations to remain independent. The data mapper manages and translates data between these two mediums so that each representation is independent and can be structured intelligently. + A data mapper ORM, or just simply a data mapper, is an application component + that acts as a go between to translate between database representations and + the data structures present in applications. Data mappers allow your + application logic and database data representations to remain independent. The + data mapper manages and translates data between these two mediums so that each + representation is independent and can be structured intelligently. -A data type is a category or attribute that expresses a constraint on valid values. For example, an integer type specifies that only whole numbers are appropriate and expected for a variable or field. Data types allow you to specify expectations and requirements about your data during when defining a field or container. The programming language or application can then validate that introduced data meets the necessary criteria. Data types are also help determine the available operations that can be performed on a piece of data. + A data type is a category or attribute that expresses a constraint on valid + values. For example, an integer type specifies that only whole numbers are + appropriate and expected for a variable or field. Data types allow you to + specify expectations and requirements about your data during when defining a + field or container. The programming language or application can then validate + that introduced data meets the necessary criteria. Data types are also help + determine the available operations that can be performed on a piece of data. -A database is a structure used to organize, structure, and store data. Databases are often managed by a database management system which provides an interface to manipulate and interact with the database and the data it manages. Databases can be highly structured or allow more flexible data storage patterns and can store many different types of data in a way that allows for querying, recalling, and combining data at the time of retrieval. + A database is a structure used to organize, structure, and store data. + Databases are often managed by a database management system which provides an + interface to manipulate and interact with the database and the data it + manages. Databases can be highly structured or allow more flexible data + storage patterns and can store many different types of data in a way that + allows for querying, recalling, and combining data at the time of retrieval. -A database abstraction layer is a programming interface that attempts to abstract differences between underlying database technologies to provide a unified experience or interface to the application layer. Database abstraction layers are often helpful for developers because they help to normalize the implementation differences between various offerings and can stay stable even as the underlying technology evolves. However, there are some challenges as well, such as leaking abstractions, masking implementation-specific features or optimizations from the user, and creating a dependency that can be difficult to dislodge. + A database abstraction layer is a programming interface that attempts to + abstract differences between underlying database technologies to provide a + unified experience or interface to the application layer. Database abstraction + layers are often helpful for developers because they help to normalize the + implementation differences between various offerings and can stay stable even + as the underlying technology evolves. However, there are some challenges as + well, such as leaking abstractions, masking implementation-specific features + or optimizations from the user, and creating a dependency that can be + difficult to dislodge. -A database administrator, or DBA, is a role responsible for configuring, managing, and optimizing database systems and the related ecosystem of software and hardware. Some responsibilities they may be involved with include architecture planning, configuration, schema and change management, migrations, replication and load balancing, sharding, security considerations, managing backup strategies, and more. Database administrators are typically expected to have expertise in database design and theory and be able to help organizations make decisions about database technology selection and implementation. In many modern organizations, the responsibilities traditionally held by DBAs are now distributed between various members of the development and operations teams or have been offloaded to external providers to simplify some of the infrastructure management portions of the job. + A database administrator, or DBA, is a role responsible for configuring, + managing, and optimizing database systems and the related ecosystem of + software and hardware. Some responsibilities they may be involved with include + architecture planning, configuration, schema and change management, + migrations, replication and load balancing, sharding, security considerations, + managing backup strategies, and more. Database administrators are typically + expected to have expertise in database design and theory and be able to help + organizations make decisions about database technology selection and + implementation. In many modern organizations, the responsibilities + traditionally held by DBAs are now distributed between various members of the + development and operations teams or have been offloaded to external providers + to simplify some of the infrastructure management portions of the job. -A database engine is the piece of a database management system responsible for defining how data is stored and retrieved, as well as the actions supported for interacting with the system and data. Some database management systems support multiple database engines that offer different features and designs, while other systems only support a single database engine that has been designed to align with the goals of the software. - - - -A database management system, often called a DBMS or even just a "database", is an application responsible for organizing and managing data. DBMSs can follow many different paradigms and prioritize certain goals. Generally, at the very least, they are responsible for persisting data, organizing and categorizing data, and ingesting, manipulating, and querying data. Most often, DBMSs offer a client / server model where the server is responsible for controlling and managing the data while clients, libraries, or APIs can be used to interact with the server to add or query data, change data structures, or manage other aspects of the system. + A database engine is the piece of a database management system responsible for + defining how data is stored and retrieved, as well as the actions supported + for interacting with the system and data. Some database management systems + support multiple database engines that offer different features and designs, + while other systems only support a single database engine that has been + designed to align with the goals of the software. + + + + A database management system, often called a DBMS or even just a "database", + is an application responsible for organizing and managing data. DBMSs can + follow many different paradigms and prioritize certain goals. Generally, at + the very least, they are responsible for persisting data, organizing and + categorizing data, and ingesting, manipulating, and querying data. Most often, + DBMSs offer a client / server model where the server is responsible for + controlling and managing the data while clients, libraries, or APIs can be + used to interact with the server to add or query data, change data structures, + or manage other aspects of the system. -A database model is the overall strategy used by a database management system for storing, organizing, and providing access to data. There are many different database models available, but the relational model, which uses highly structured tables to store data in a specific format, is perhaps the most common type. Other types of databases include document databases, wide-column databases, hierarchical databases, key-value stores, and more. Some database systems are designed to be "multi-model", meaning they support databases with different types of models running within the same system. + A database model is the overall strategy used by a database management system + for storing, organizing, and providing access to data. There are many + different database models available, but the relational model, which uses + highly structured tables to store data in a specific format, is perhaps the + most common type. Other types of databases include document databases, + wide-column databases, hierarchical databases, key-value stores, and more. + Some database systems are designed to be "multi-model", meaning they support + databases with different types of models running within the same system. -A database proxy is a software component responsible for managing connections between database clients and database servers. Database proxies are used for a number of reasons including organizing access to a limited number of connections, allowing transparent scaling of the database layer, and redirecting traffic for deployments and similar scenarios. Database proxies are usually designed to be transparent for applications, meaning that the applications can connect to the proxy as if they were connecting directly to the backend database. + A database proxy is a software component responsible for managing connections + between database clients and database servers. Database proxies are used for a + number of reasons including organizing access to a limited number of + connections, allowing transparent scaling of the database layer, and + redirecting traffic for deployments and similar scenarios. Database proxies + are usually designed to be transparent for applications, meaning that the + applications can connect to the proxy as if they were connecting directly to + the backend database. -A dataset, sometimes spelled data set, is a single collection of data. Typically, this represents a chunk of related data applicable to a certain task, application, or area of concern. Typically, datasets are a combination of the data itself as well as the structure and context necessary to interpret it. They often consist of a combination of quantitative and qualitative values that can act as the raw data for further analysis and interpretation. + A dataset, sometimes spelled data set, is a single collection of data. + Typically, this represents a chunk of related data applicable to a certain + task, application, or area of concern. Typically, datasets are a combination + of the data itself as well as the structure and context necessary to interpret + it. They often consist of a combination of quantitative and qualitative values + that can act as the raw data for further analysis and interpretation. -Denormalization is a process where the data and structure within a database is "denormalized" or taken out of a normalized state. This can happen accidentally if a data structure that is intended to be normalized is ill defined or mismanaged. However, it is often also performed intentionally in certain scenarios. Denormalization tends to allow faster access to data by storing values redundantly in different places. The drawback of this is that write performance suffers and there is a possibility that data can get out of sync since multiple locations are used to represent the same data. + Denormalization is a process where the data and structure within a database is + "denormalized" or taken out of a normalized state. This can happen + accidentally if a data structure that is intended to be normalized is ill + defined or mismanaged. However, it is often also performed intentionally in + certain scenarios. Denormalization tends to allow faster access to data by + storing values redundantly in different places. The drawback of this is that + write performance suffers and there is a possibility that data can get out of + sync since multiple locations are used to represent the same data. -A dirty read is a specific type of anomaly that can occur where one transaction can read data that hasn't been committed by another transaction. If the second transaction is rolled back instead of committed, the first transaction will be using a value that doesn't reflect the actual state of the database. Dirty reads are possible at certain isolation levels for transactions and represent a risk that can lead to inconsistency when manipulating data in parallel. + A dirty read is a specific type of anomaly that can occur where one + transaction can read data that hasn't been committed by another transaction. + If the second transaction is rolled back instead of committed, the first + transaction will be using a value that doesn't reflect the actual state of the + database. Dirty reads are possible at certain isolation levels for + transactions and represent a risk that can lead to inconsistency when + manipulating data in parallel. -A distributed database is a database system that spans multiple physical systems. Data is spread across a number of machines for the sake of performance or availability. While distributed systems can help scale a database to handle more load, they also represent a significant increase in complexity that can lead to consistency and partition challenges as well as certain negative performance impacts like an increase in data writes in some cases. + A distributed database is a database system that spans multiple physical + systems. Data is spread across a number of machines for the sake of + performance or availability. While distributed systems can help scale a + database to handle more load, they also represent a significant increase in + complexity that can lead to consistency and partition challenges as well as + certain negative performance impacts like an increase in data writes in some + cases. -In the context of document databases, a document is considered a container for information representing a single record or object containing related descriptive data. Documents can have a flexible structure that does not have to match the other documents on the system and can often be nested. Documents are typically represented in a data serialization format like JSON or YAML that can organize the document with labels and metadata. + In the context of document databases, a document is considered a container for + information representing a single record or object containing related + descriptive data. Documents can have a flexible structure that does not have + to match the other documents on the system and can often be nested. Documents + are typically represented in a data serialization format like JSON or YAML + that can organize the document with labels and metadata. -A document database is a database model that represents items in individual objects called documents. While documents can be grouped together for organization, they don't have to share the same structure and can be designed to uniquely capture the data required to describe the item in question. Document databases typically don't support robust join operations to link different documents together, but are often praised for their flexibility and quick time-to-productivity due to their flexibility and ease in representing programmatic data structures. + A document database is a database model that represents items in individual + objects called documents. While documents can be grouped together for + organization, they don't have to share the same structure and can be designed + to uniquely capture the data required to describe the item in question. + Document databases typically don't support robust join operations to link + different documents together, but are often praised for their flexibility and + quick time-to-productivity due to their flexibility and ease in representing + programmatic data structures. -Durability is a quality of data that signifies that it has been captured on persistent storage that will survive in the event of a program crash. Typically, this means flushing the data to a non-volatile storage medium like a hard drive that doesn't require electricity to maintain state. + Durability is a quality of data that signifies that it has been captured on + persistent storage that will survive in the event of a program crash. + Typically, this means flushing the data to a non-volatile storage medium like + a hard drive that doesn't require electricity to maintain state. -Encoding is a system that translates between a character system that can represents the components used in written language and a digital representation that the computer can store and operate on. Different encoding systems have been developed with a wide variety of character ranges. Some are targeted at specific languages or language families (like ASCII) while others attempt to provide representation for much larger character sets appropriate for different many languages (like the UTF unicode varieties). + Encoding is a system that translates between a character system that can + represents the components used in written language and a digital + representation that the computer can store and operate on. Different encoding + systems have been developed with a wide variety of character ranges. Some are + targeted at specific languages or language families (like ASCII) while others + attempt to provide representation for much larger character sets appropriate + for different many languages (like the UTF unicode varieties). -Encrypted transport is any type of communication process that encrypts its messages prior to sending them to the recipient. Transport encryption is necessary to ensure privacy (prevent others from seeing sensitive information) as well as avoid tampering (making manipulation of the data obvious). Many different encrypted transport systems can used when deploying databases, including TLS/SSL encryption, VPNs, and private networks. + Encrypted transport is any type of communication process that encrypts its + messages prior to sending them to the recipient. Transport encryption is + necessary to ensure privacy (prevent others from seeing sensitive information) + as well as avoid tampering (making manipulation of the data obvious). Many + different encrypted transport systems can used when deploying databases, + including TLS/SSL encryption, VPNs, and private networks. -Ephemerality is a characteristic that indicates that a piece of data or circumstance is not permanent. In many ways, it is the opposite of durability. In databases, certain items, like data you wish to persist, should not be ephemeral. However, other components, like a secret key used to encrypt a connection between a database and client, can benefit from being ephemeral by preventing key leakage from effecting future or past sessions. + Ephemerality is a characteristic that indicates that a piece of data or + circumstance is not permanent. In many ways, it is the opposite of durability. + In databases, certain items, like data you wish to persist, should not be + ephemeral. However, other components, like a secret key used to encrypt a + connection between a database and client, can benefit from being ephemeral by + preventing key leakage from effecting future or past sessions. -Ephemeral storage, also sometimes called volatile or non-durable storage, is any storage medium that persists for a short time, often associated with certain conditions. For instance, in applications, data being stored in memory will only survive while the process is running. Similarly, data stored to a temporary directory is only available until the system reboots. Often, ephemeral storage is useful for temporary data or as a holding area before data can be stored on a more permanent medium. + Ephemeral storage, also sometimes called volatile or non-durable storage, is + any storage medium that persists for a short time, often associated with + certain conditions. For instance, in applications, data being stored in memory + will only survive while the process is running. Similarly, data stored to a + temporary directory is only available until the system reboots. Often, + ephemeral storage is useful for temporary data or as a holding area before + data can be stored on a more permanent medium. -Eventual consistency is a description of a consistency / availability strategy implemented by certain distributed computing or database systems. The CAP theorem of distributed systems states that systems must choose whether prioritize availability or data consistency in the face of a network partition. Eventual consistent systems make the choice to favor availability by continuing to serve requests even if the server's peers are not available to confirm operations. Eventually, when the partition is resolved, a consistency routine will run to decide on the most correct state of any inconsistent data, but there will be a time where the data on different servers are not in agreement. + Eventual consistency is a description of a consistency / availability strategy + implemented by certain distributed computing or database systems. The CAP + theorem of distributed systems states that systems must choose whether + prioritize availability or data consistency in the face of a network + partition. Eventual consistent systems make the choice to favor availability + by continuing to serve requests even if the server's peers are not available + to confirm operations. Eventually, when the partition is resolved, a + consistency routine will run to decide on the most correct state of any + inconsistent data, but there will be a time where the data on different + servers are not in agreement. -In the context of caches, eviction is a process where a piece of data is removed from a cache. This can happen because the current value has been invalidated by an operation or it can occur automatically as a result of policies designed to remove the data that is the oldest or least used. + In the context of caches, eviction is a process where a piece of data is + removed from a cache. This can happen because the current value has been + invalidated by an operation or it can occur automatically as a result of + policies designed to remove the data that is the oldest or least used. - -The expand and contract pattern is a strategy for introducing new changes to a database schema without affecting existing applications. It works by introducing changes in carefully controlled stages by first adding new or changed structures alongside existing structures and then expanding the application logic to use both structures simultaneously. Eventually, after testing, the application can stop writing to original structure and it can be removed. + + The expand and contract pattern is a strategy for introducing new changes to a + database schema without affecting existing applications. It works by + introducing changes in carefully controlled stages by first adding new or + changed structures alongside existing structures and then expanding the + application logic to use both structures simultaneously. Eventually, after + testing, the application can stop writing to original structure and it can be + removed. -Extract, transform, and load, often abbreviated as ETL, is a process of copying and processing data from a data source to a managed system. First the data is extracted from its current system to make it accessible to the destination system. Next, the data is manipulated and modified to match the requirements and format of the new system. Finally, the reconstructed data is loaded into the new system. + Extract, transform, and load, often abbreviated as ETL, is a process of + copying and processing data from a data source to a managed system. First the + data is extracted from its current system to make it accessible to the + destination system. Next, the data is manipulated and modified to match the + requirements and format of the new system. Finally, the reconstructed data is + loaded into the new system. -A feature flag, or a feature toggle, is a programming strategy that involves gating functionality behind an external switch or control. The switch is typically first set to indicate that the feature should not be active. When the organization is ready, they can activate the switch and the program will start using its new functionality. This allows new features to be deployed without immediately activating them. It decouples the deployment of new software from the release of the software, offering greater control over how a change is introduced and for greater testing in a production environment. + A feature flag, or a feature toggle, is a programming strategy that involves + gating functionality behind an external switch or control. The switch is + typically first set to indicate that the feature should not be active. When + the organization is ready, they can activate the switch and the program will + start using its new functionality. This allows new features to be deployed + without immediately activating them. It decouples the deployment of new + software from the release of the software, offering greater control over how a + change is introduced and for greater testing in a production environment. -A database column, or field, is a container for a specific type of data in a database table. Database fields in relational databases are regular, in the sense that each row in the table will contain the same number of fields with the same characteristics. The values that database fields can contain can be controlled by the data type assigned to the field as well as constraints that further limit the valid values. + A database column, or field, is a container for a specific type of data in a + database table. Database fields in relational databases are regular, in the + sense that each row in the table will contain the same number of fields with + the same characteristics. The values that database fields can contain can be + controlled by the data type assigned to the field as well as constraints that + further limit the valid values. -A flat-file database is a database or database-like structure stored in a file. These define the structure and the data the database contains in a unified format. Many examples of flat-file databases, like CSV (comma-separated values) files are written in plain text, but binary formats exist too. One difference between flat-file databases and more complex types is that the storage format itself often is responsible for describing the relationships between data instead of the database system. + A flat-file database is a database or database-like structure stored in a + file. These define the structure and the data the database contains in a + unified format. Many examples of flat-file databases, like CSV + (comma-separated values) files are written in plain text, but binary formats + exist too. One difference between flat-file databases and more complex types + is that the storage format itself often is responsible for describing the + relationships between data instead of the database system. -A foreign key is a designated column or group of columns in a relational database that is used to maintain data integrity between two tables. A foreign key in one table refers to a candidate key, typically the primary key, in another table. Since a candidate key is referenced, each row in the database will be unique and the two tables can be linked together row for row. The values are of these designated columns is expected to remain identical across the two tables. The foreign key constraint allows the database system to enforce this requirement by not allowing the values to be out of sync. + A foreign key is a designated column or group of columns in a relational + database that is used to maintain data integrity between two tables. A foreign + key in one table refers to a candidate key, typically the primary key, in + another table. Since a candidate key is referenced, each row in the database + will be unique and the two tables can be linked together row for row. The + values are of these designated columns is expected to remain identical across + the two tables. The foreign key constraint allows the database system to + enforce this requirement by not allowing the values to be out of sync. -Full-text search describes a family of techniques and functionality that allow you to search the complete text of documents within a database system. This is in direct opposition to search functionality that relies only on metadata, partial text sources, and other incomplete assessments. Full-text search relies on asynchronous indexing using natural language-aware parsers to analyze and categorize text within documents. + Full-text search describes a family of techniques and functionality that allow + you to search the complete text of documents within a database system. This is + in direct opposition to search functionality that relies only on metadata, + partial text sources, and other incomplete assessments. Full-text search + relies on asynchronous indexing using natural language-aware parsers to + analyze and categorize text within documents. -A graph database is a NoSQL database that uses a graph structure to store and define relationships between pieces of data. Graph databases are constructed using nodes, which represent entities and can contain properties or attributes. Nodes are connected to one another using edges, which are responsible not only for linking nodes, but also defining the nature of the relationship. For example, a node might describe a person with a property of "teacher". It might be connected to a class node with an edge called that specifies "teaches" but may be connected to another person node with an edge that specifies "married to". + A graph database is a NoSQL database that uses a graph structure to store and + define relationships between pieces of data. Graph databases are constructed + using nodes, which represent entities and can contain properties or + attributes. Nodes are connected to one another using edges, which are + responsible not only for linking nodes, but also defining the nature of the + relationship. For example, a node might describe a person with a property of + "teacher". It might be connected to a class node with an edge called that + specifies "teaches" but may be connected to another person node with an edge + that specifies "married to". -GraphQL is a language that can be used to query and manipulate data, commonly used for building APIs. Clients are able to specify the exact data required and the server crafts a response following the provided structure. GraphQL's strengths are its ability to return data using custom structures, stitch together data from various back ends, and answer complex queries in a single API call. + GraphQL is a language that can be used to query and manipulate data, commonly + used for building APIs. Clients are able to specify the exact data required + and the server crafts a response following the provided structure. GraphQL's + strengths are its ability to return data using custom structures, stitch + together data from various back ends, and answer complex queries in a single + API call. -HTAP databases, or hybrid transactional/analytical databases, are a category of database that seeks to offer the advantages of both fast, reliable transactional processing and the ability to process heavy, complex analytical workloads concurrently on the same machine. Rather than analyzing data after the fact, these database offerings attempt to allow real time analysis that can impact the way decisions are made rapidly. + HTAP databases, or hybrid transactional/analytical databases, are a category + of database that seeks to offer the advantages of both fast, reliable + transactional processing and the ability to process heavy, complex analytical + workloads concurrently on the same machine. Rather than analyzing data after + the fact, these database offerings attempt to allow real time analysis that + can impact the way decisions are made rapidly. -A hierarchical database is a database model that organizes itself into a tree-like structure. Each new record is attached to a single parent record. As records are added to the database, a tree-like structure emerges as records fan out more and more from the root record. The links between records can be traversed to get to other records. Examples of systems that use a hierarchical model include LDAP (Lightweight Directory Access Protocol) and DNS (Domain Name System). + A hierarchical database is a database model that organizes itself into a + tree-like structure. Each new record is attached to a single parent record. As + records are added to the database, a tree-like structure emerges as records + fan out more and more from the root record. The links between records can be + traversed to get to other records. Examples of systems that use a hierarchical + model include LDAP (Lightweight Directory Access Protocol) and DNS (Domain + Name System). -Horizontal scaling, also known as scaling out, is a scaling strategy that involves increasing the number of units that can perform a given task. This often means increasing the number of computers in a worker pool that can respond to requests. Scaling out has many advantages including cost, flexibility, and the level of traffic that can be handled, but may add complexity in terms of coordination and complexity, especially when data is involved. + Horizontal scaling, also known as scaling out, is a scaling strategy that + involves increasing the number of units that can perform a given task. This + often means increasing the number of computers in a worker pool that can + respond to requests. Scaling out has many advantages including cost, + flexibility, and the level of traffic that can be handled, but may add + complexity in terms of coordination and complexity, especially when data is + involved. -A hot backup is a backup of a database system while it is actively in use. They are often preferable, if possible, because they do not require the database system to be taken offline to perform the operation. Hot backups are not always possible as they can require locking certain parts of the database or can reduce the IOPS (Input / Output Operations per Second) available for normal database tasks. + A hot backup is a backup of a database system while it is actively in use. + They are often preferable, if possible, because they do not require the + database system to be taken offline to perform the operation. Hot backups are + not always possible as they can require locking certain parts of the database + or can reduce the IOPS (Input / Output Operations per Second) available for + normal database tasks. -An in-memory database is a database system where the entire data set can all be loaded into and processed in the computers memory. This processing model offers huge performance benefits as all of the data is already in main memory and there is no delay retrieving data from slower storage. Care must be taken when using in-memory databases to have a strategy for persisting the data or repopulating the in-memory information when the machines are restarted. + An in-memory database is a database system where the entire data set can all + be loaded into and processed in the computers memory. This processing model + offers huge performance benefits as all of the data is already in main memory + and there is no delay retrieving data from slower storage. Care must be taken + when using in-memory databases to have a strategy for persisting the data or + repopulating the in-memory information when the machines are restarted. -A database index is a structure that is created to allow for faster record finding within a table. An index allows the database system to look up data efficiently by keeping a separate structure for the values of specific columns. Queries that target the indexed columns can identify applicable rows in the table quickly by using a more efficient lookup strategy than checking each row line by line. Indexed columns improve read operations but do add overhead to write operations since both the table and the index must be updated. It is important to balance these two considerations when designing table indexes. + A database index is a structure that is created to allow for faster record + finding within a table. An index allows the database system to look up data + efficiently by keeping a separate structure for the values of specific + columns. Queries that target the indexed columns can identify applicable rows + in the table quickly by using a more efficient lookup strategy than checking + each row line by line. Indexed columns improve read operations but do add + overhead to write operations since both the table and the index must be + updated. It is important to balance these two considerations when designing + table indexes. -Ingesting data is the act of importing new data into a data system. This can be a one-off data loading operation or a continuous consumption of data being generated by other system. Data ingestion is a common stage of populating and updating analytic databases and big data stores as they often involve consolidating data from various sources. + Ingesting data is the act of importing new data into a data system. This can + be a one-off data loading operation or a continuous consumption of data being + generated by other system. Data ingestion is a common stage of populating and + updating analytic databases and big data stores as they often involve + consolidating data from various sources. -An inner join is a type of relational database operation that joins two tables by only returning rows where the joining column values exist in both tables. With an inner join, there must be a match on the join columns in both tables. There are no rows using `NULL` values to pad out rows missing from one table or the other. + An inner join is a type of relational database operation that joins two tables + by only returning rows where the joining column values exist in both tables. + With an inner join, there must be a match on the join columns in both tables. + There are no rows using `NULL` values to pad out rows missing from one table + or the other. -Interactive transactions are a database transaction feature that allows clients to manually specify transaction operations in an ad-hoc manner. Rather than a transaction being a wrapper around a group of queries that can all be executed sequentially with no pause, interactive transactions allow developers to briefly pause their database operations to execute other logic before continuing with the transaction processing. This gives flexibility in transaction processing but can lead to unwanted transaction running times if not carefully managed. + Interactive transactions are a database transaction feature that allows + clients to manually specify transaction operations in an ad-hoc manner. Rather + than a transaction being a wrapper around a group of queries that can all be + executed sequentially with no pause, interactive transactions allow developers + to briefly pause their database operations to execute other logic before + continuing with the transaction processing. This gives flexibility in + transaction processing but can lead to unwanted transaction running times if + not carefully managed. -In the context of databases, isolation is a property that describes how data and operations are visible within and between transactions. The level of isolation can be set by the database administrator or the query author to define the trade-offs between isolation levels and performance. Isolation is one of the key guarantees described by the ACID acronym. + In the context of databases, isolation is a property that describes how data + and operations are visible within and between transactions. The level of + isolation can be set by the database administrator or the query author to + define the trade-offs between isolation levels and performance. Isolation is + one of the key guarantees described by the ACID acronym. -Isolation levels describe different types of trade-offs between isolation and performance that databases can make when processing transactions. Isolation levels determine what types of data leaking can occur between transactions or what data anomalies can occur. In general, greater levels of isolation provide more guarantees at the expense of slower processing. + Isolation levels describe different types of trade-offs between isolation and + performance that databases can make when processing transactions. Isolation + levels determine what types of data leaking can occur between transactions or + what data anomalies can occur. In general, greater levels of isolation provide + more guarantees at the expense of slower processing. -In relational databases, a join is an operation that connects two tables based on a shared "join" column or columns. The values within the join columns must be unique within each table. The join operation matches rows based on the join column values to create an extended virtual row composed of the columns from each table. Different types of joins are available based on what the user wants to do with rows that do not have a matching counterpart in the other table. + In relational databases, a join is an operation that connects two tables based + on a shared "join" column or columns. The values within the join columns must + be unique within each table. The join operation matches rows based on the join + column values to create an extended virtual row composed of the columns from + each table. Different types of joins are available based on what the user + wants to do with rows that do not have a matching counterpart in the other + table. -In the context of databases, a key is any attribute, column, or group of attributes or columns that can be used to uniquely identify individual rows. Some pieces of data can be used as a key because of their natural uniqueness (a natural key) while other data sets may need to generate a key to identify each record (a surrogate key). Each table or data collection can have multiple keys that uniquely identify a row (called candidate keys), but typically, there is a main key (called the primary key) designated as the main way to access rows. + In the context of databases, a key is any attribute, column, or group of + attributes or columns that can be used to uniquely identify individual rows. + Some pieces of data can be used as a key because of their natural uniqueness + (a natural key) while other data sets may need to generate a key to identify + each record (a surrogate key). Each table or data collection can have multiple + keys that uniquely identify a row (called candidate keys), but typically, + there is a main key (called the primary key) designated as the main way to + access rows. -A key-value database, or key-value store, is a database model that allows users to store and retrieve data with an arbitrary structure using keys. The key is used to identify and access the record, which can consist of a single value or a structure of more complex data. Each record in a key-value database can define its own structure, so there is not a unified table structure as there is in relational databases. Key-value databases are useful because they are extremely flexible and use a model that feels familiar to many object-oriented developers. + A key-value database, or key-value store, is a database model that allows + users to store and retrieve data with an arbitrary structure using keys. The + key is used to identify and access the record, which can consist of a single + value or a structure of more complex data. Each record in a key-value database + can define its own structure, so there is not a unified table structure as + there is in relational databases. Key-value databases are useful because they + are extremely flexible and use a model that feels familiar to many + object-oriented developers. -A left join is a join operation for relational databases where all of the rows of the first table specified are returned, regardless of whether a matching row in the second table is found. Join operations construct virtual rows by matching records that have identical values in specified comparison columns from each table. The results for a left join will contain the rows from both tables where the column values matched and will additionally contain all of the unmatched rows from the first, or left, table. For these rows, the columns associated with the second, or right, table will be padded with `NULL` values to indicate that no matching row was found. + A left join is a join operation for relational databases where all of the rows + of the first table specified are returned, regardless of whether a matching + row in the second table is found. Join operations construct virtual rows by + matching records that have identical values in specified comparison columns + from each table. The results for a left join will contain the rows from both + tables where the column values matched and will additionally contain all of + the unmatched rows from the first, or left, table. For these rows, the columns + associated with the second, or right, table will be padded with `NULL` values + to indicate that no matching row was found. -Lexemes are language-level units of meaning that are relevant in natural language processing and full-text search contexts. Typically, when text is indexed, it is broken down into individual tokens which are then analyzed as lexemes using language-level resources like dictionaries, thesauruses, and other word lists to understand how to process them further. + Lexemes are language-level units of meaning that are relevant in natural + language processing and full-text search contexts. Typically, when text is + indexed, it is broken down into individual tokens which are then analyzed as + lexemes using language-level resources like dictionaries, thesauruses, and + other word lists to understand how to process them further. -In databases and computing in general, a locale specifies the region, language, country, and other pieces of contextual data that should be used when performing operations and rendering results. In databases, locale settings can affect things like column orderings, comparisons between values, spelling, currency identifiers, date and time formatting, and more. Defining the correct locale at the database server level or requesting the locale you need during a database session are essential for ensuring that the operations are performed will yield the expected results. + In databases and computing in general, a locale specifies the region, + language, country, and other pieces of contextual data that should be used + when performing operations and rendering results. In databases, locale + settings can affect things like column orderings, comparisons between values, + spelling, currency identifiers, date and time formatting, and more. Defining + the correct locale at the database server level or requesting the locale you + need during a database session are essential for ensuring that the operations + are performed will yield the expected results. -In databases, a lock is a technique used to prevent modification of a database record or table in order to maintain consistency during certain operations. Locks can prevent any access to the locked resource or prevent only certain operations from being performed. They can be issued for a specific record or for an entire table. Because locks prevent concurrent operations from accessing the locked data, it is possible for locked data to impact performance and lead to resource contention. + In databases, a lock is a technique used to prevent modification of a database + record or table in order to maintain consistency during certain operations. + Locks can prevent any access to the locked resource or prevent only certain + operations from being performed. They can be issued for a specific record or + for an entire table. Because locks prevent concurrent operations from + accessing the locked data, it is possible for locked data to impact + performance and lead to resource contention. -MariaDB is an open-source relational database system developed with the goal of providing a drop-in replacement for MySQL after Oracle's acquisition left some within the community uncertain about the future direction of the project. Since its initial fork, each project has added features that widen the gap between the two database systems. + MariaDB is an open-source relational database system developed with the goal + of providing a drop-in replacement for MySQL after Oracle's acquisition left + some within the community uncertain about the future direction of the project. + Since its initial fork, each project has added features that widen the gap + between the two database systems. -The microservices architecture is an application and service design that affects the development, deployment, and operation of the components. The microservices approach decomposes an application's functionality and implements each responsibility as a discrete service. Rather than internal function calls, the service communicates over the network using clearly defined interfaces. Microservices are often used to help speed up development as each component can be coded and iterated on independently. It also helps with scalability as each service can be scaled as needed, often with the help of service orchestration software. + The microservices architecture is an application and service design that + affects the development, deployment, and operation of the components. The + microservices approach decomposes an application's functionality and + implements each responsibility as a discrete service. Rather than internal + function calls, the service communicates over the network using clearly + defined interfaces. Microservices are often used to help speed up development + as each component can be coded and iterated on independently. It also helps + with scalability as each service can be scaled as needed, often with the help + of service orchestration software. -Database or schema migrations are processes used to transform a database structure to a new design. This involves operations to modify the existing schema of a database or table as well as transforming any existing data to fit the new structure. Database migrations are often built upon one another and stored as an ordered list in version control so that the current database structure can be built from any previous version by sequentially applying the migration files. Often, developers must make decisions about how best to modify existing data to fit the new structure which might include columns that did not previously exist or changes to data that are difficult to easily reverse. + Database or schema migrations are processes used to transform a database + structure to a new design. This involves operations to modify the existing + schema of a database or table as well as transforming any existing data to fit + the new structure. Database migrations are often built upon one another and + stored as an ordered list in version control so that the current database + structure can be built from any previous version by sequentially applying the + migration files. Often, developers must make decisions about how best to + modify existing data to fit the new structure which might include columns that + did not previously exist or changes to data that are difficult to easily + reverse. -MongoDB is the most popular document-oriented NoSQL database system in use today. It stores data using JSON-like structures that can be specified at the time of data storage. Each document can have its own structure with as much or as little complexity as required. MongoDB provides a non-SQL methods and commands to manage and query data programmatically or interactively. MongoDB is known for its fast performance, scalability, and for enabling a rapid development pace. + MongoDB is the most popular document-oriented NoSQL database system in use + today. It stores data using JSON-like structures that can be specified at the + time of data storage. Each document can have its own structure with as much or + as little complexity as required. MongoDB provides a non-SQL methods and + commands to manage and query data programmatically or interactively. MongoDB + is known for its fast performance, scalability, and for enabling a rapid + development pace. -Monolithic architecture is a term used to refer to a traditional application. In monoliths, although different pieces may be broken down internally for ease of development, once built, the application is a single item that has many different functions and responsibilities. Monoliths can interface with the external world in any number of ways, but the communication and coordination of different functionality within the program happens internally. Monolithic architecture is sometimes considered to be easier to implement, but does suffer from inflexibility with scaling and availability as the entire application must be scaled up and down as a single unit. - - - -Multiversion concurrency control, or MVCC, is a strategy for allowing concurrent access to data within database systems as an alternative to row and table locking. MVCC works by taking "snapshots" that represent a consistent data state for each user accessing a set of data. The goal of MVCC is to offer a system where read queries never block write queries and where write queries never block read queries. Each client will be able to read and use the data as if they were the only user while the database system tracks multiple versions of the data being read and updated by each user. Locking or the normal transaction rollback and conflict management strategies are used to resolve disputes caused by updating the same data. + Monolithic architecture is a term used to refer to a traditional application. + In monoliths, although different pieces may be broken down internally for ease + of development, once built, the application is a single item that has many + different functions and responsibilities. Monoliths can interface with the + external world in any number of ways, but the communication and coordination + of different functionality within the program happens internally. Monolithic + architecture is sometimes considered to be easier to implement, but does + suffer from inflexibility with scaling and availability as the entire + application must be scaled up and down as a single unit. + + + + Multiversion concurrency control, or MVCC, is a strategy for allowing + concurrent access to data within database systems as an alternative to row and + table locking. MVCC works by taking "snapshots" that represent a consistent + data state for each user accessing a set of data. The goal of MVCC is to offer + a system where read queries never block write queries and where write queries + never block read queries. Each client will be able to read and use the data as + if they were the only user while the database system tracks multiple versions + of the data being read and updated by each user. Locking or the normal + transaction rollback and conflict management strategies are used to resolve + disputes caused by updating the same data. -MySQL is one of the most popular relational database systems available today. Initially released in 1995 and acquired by Oracle in 2010, MySQL has a long history as powerful and easy to use relational system. It offers a wide array of storage engines and boasts very wide community support. It is used in many popular open-source and commercial projects and for many years was considered a key piece of software for many internet services. + MySQL is one of the most popular relational database systems available today. + Initially released in 1995 and acquired by Oracle in 2010, MySQL has a long + history as powerful and easy to use relational system. It offers a wide array + of storage engines and boasts very wide community support. It is used in many + popular open-source and commercial projects and for many years was considered + a key piece of software for many internet services. -Neo4j is a high performance graph-oriented database system. It offers ACID-compliant transactions with a graph data structure and uses the Cypher querying language to manage and query stored data. Neo4j allows developers to scale graph-oriented data workloads easily and offers clients in many different languages. + Neo4j is a high performance graph-oriented database system. It offers + ACID-compliant transactions with a graph data structure and uses the Cypher + querying language to manage and query stored data. Neo4j allows developers to + scale graph-oriented data workloads easily and offers clients in many + different languages. -A network database is an early database model that conceived of data objects that could have more complex relationships than that of hierarchical databases. Instead of limiting a node's relationships to a single parent and zero or more children, a network database allowed you to represent nodes with multiple parents. This allowed you to represent more complex structures, but generally, the model was superseded by the introduction of relational databases. + A network database is an early database model that conceived of data objects + that could have more complex relationships than that of hierarchical + databases. Instead of limiting a node's relationships to a single parent and + zero or more children, a network database allowed you to represent nodes with + multiple parents. This allowed you to represent more complex structures, but + generally, the model was superseded by the introduction of relational + databases. -NewSQL is a descriptor for a category of more recent relational database offerings that attempt to bridge the gap between the structure and well-ordered guarantees of a relational database system and the high performance and scalability associated with NoSQL databases. While NewSQL is a fairly loose categorization, it is generally used to refer to databases that allow SQL or SQL-like querying, transaction guarantees, and flexible scaling and distributed processing. + NewSQL is a descriptor for a category of more recent relational database + offerings that attempt to bridge the gap between the structure and + well-ordered guarantees of a relational database system and the high + performance and scalability associated with NoSQL databases. While NewSQL is a + fairly loose categorization, it is generally used to refer to databases that + allow SQL or SQL-like querying, transaction guarantees, and flexible scaling + and distributed processing. -NoSQL databases, also sometimes called non-relational or not only SQL databases, are a broad category that covers any type of database systems that deviates from the common relational database model. While non-relational databases have long been available, the category generally is used to refer to newer generations of databases using alternative models like key-value, document-oriented, graph-oriented, and column family stores. They generally are used to manage data that is not suited for the relational model with a heavy focus on flexibility and scalability. + NoSQL databases, also sometimes called non-relational or not only SQL + databases, are a broad category that covers any type of database systems that + deviates from the common relational database model. While non-relational + databases have long been available, the category generally is used to refer to + newer generations of databases using alternative models like key-value, + document-oriented, graph-oriented, and column family stores. They generally + are used to manage data that is not suited for the relational model with a + heavy focus on flexibility and scalability. -In databases, a node often refers to a single instance of a database. The term node is often used when talking about the infrastructure architecture of distributed databases where multiple servers may be involved in processing a request. + In databases, a node often refers to a single instance of a database. The term + node is often used when talking about the infrastructure architecture of + distributed databases where multiple servers may be involved in processing a + request. -A nonrepeatable read is a type of unwanted consistency problem that can occur at certain transaction isolation levels. A nonrepeatable read occurs when a repeated read operations within a transaction can return different data based based on commits outside of the transaction. This breach of isolation is one of the types of behavior that some transaction isolation levels are designed to prevent. + A nonrepeatable read is a type of unwanted consistency problem that can occur + at certain transaction isolation levels. A nonrepeatable read occurs when a + repeated read operations within a transaction can return different data based + based on commits outside of the transaction. This breach of isolation is one + of the types of behavior that some transaction isolation levels are designed + to prevent. -Database normalization is a process of structuring a database to remove data redundancy and eliminate opportunities for inconsistencies to be introduced. Normalization is often discussed in terms of "normal forms" where each form adds additional checks and guarantees over the previous forms. In practice, data normalization is often a trade-off between data integrity guarantees and performance, so structures often are not put into the highest level of normalization possible. + Database normalization is a process of structuring a database to remove data + redundancy and eliminate opportunities for inconsistencies to be introduced. + Normalization is often discussed in terms of "normal forms" where each form + adds additional checks and guarantees over the previous forms. In practice, + data normalization is often a trade-off between data integrity guarantees and + performance, so structures often are not put into the highest level of + normalization possible. -An OLAP database, or Online Analytic Processing database, is a database system primarily designed to be used for analytics and insight generation. Databases used for OLAP do not require the same type of performance characteristics as those involved in real-time transaction processing (OLTP databases). Instead, they usually are designed for ingesting and working on large data sets, executing complex and long-running queries, and generating reports, graphs, and insights to help make business decisions. + An OLAP database, or Online Analytic Processing database, is a database system + primarily designed to be used for analytics and insight generation. Databases + used for OLAP do not require the same type of performance characteristics as + those involved in real-time transaction processing (OLTP databases). Instead, + they usually are designed for ingesting and working on large data sets, + executing complex and long-running queries, and generating reports, graphs, + and insights to help make business decisions. -An OLTP database, or Online Transaction Processing database, is a database system primarily designed to facilitate fast, near real time database tasks. Typically, OLTP databases are used with applications where multiple clients may be accessing the data at a single time and where quick response times are required. OLTP databases are optimized for reliability and processing speed. + An OLTP database, or Online Transaction Processing database, is a database + system primarily designed to facilitate fast, near real time database tasks. + Typically, OLTP databases are used with applications where multiple clients + may be accessing the data at a single time and where quick response times are + required. OLTP databases are optimized for reliability and processing speed. -An ORM, or Object Relational Mapper, is a database tool designed to translate between the relational model used by many databases and the object-oriented data model used in client applications. The tool offers a way represent database objects in code and to transform programming objects into a format appropriate for storing in a database. While ORMs can be helpful tools, they are usually not a perfect abstraction and can lead to issues where the different models conflict on how to represent data. - - - -Object relational impedance mismatch is a term used for the general tension that exists between the relational model of data used by many databases and the object-oriented view of data used in many applications. The impedance mismatch refers to the differences between the two models that makes faithful translation between the representations difficult or impossible. It is a broad term used to refer to many different types of problems that can occur within the space including problems representing inheritance, encapsulation, type differences, different consistency guarantees, and more. - - - -Optimistic concurrency control, sometimes referred to as OCC, is a strategy used by database systems to handle conflicting concurrent operations. Optimistic concurrency control assumes that concurrent transactions will likely not interfere with each other and allows them to proceed. If a conflict occurs when a transaction attempts to commit, it will be rolled back at that time. OCC is an attractive policy if you think that most transactions within your workloads will not be in conflict with one another. Only transactions that do in fact have a conflict will suffer a performance penalty (they'll be rolled back and will have to be restarted) while all non-conflicting transactions can execute without waiting to see if a conflict will arise. + An ORM, or Object Relational Mapper, is a database tool designed to translate + between the relational model used by many databases and the object-oriented + data model used in client applications. The tool offers a way represent + database objects in code and to transform programming objects into a format + appropriate for storing in a database. While ORMs can be helpful tools, they + are usually not a perfect abstraction and can lead to issues where the + different models conflict on how to represent data. + + + + Object relational impedance mismatch is a term used for the general tension + that exists between the relational model of data used by many databases and + the object-oriented view of data used in many applications. The impedance + mismatch refers to the differences between the two models that makes faithful + translation between the representations difficult or impossible. It is a broad + term used to refer to many different types of problems that can occur within + the space including problems representing inheritance, encapsulation, type + differences, different consistency guarantees, and more. + + + + Optimistic concurrency control, sometimes referred to as OCC, is a strategy + used by database systems to handle conflicting concurrent operations. + Optimistic concurrency control assumes that concurrent transactions will + likely not interfere with each other and allows them to proceed. If a conflict + occurs when a transaction attempts to commit, it will be rolled back at that + time. OCC is an attractive policy if you think that most transactions within + your workloads will not be in conflict with one another. Only transactions + that do in fact have a conflict will suffer a performance penalty (they'll be + rolled back and will have to be restarted) while all non-conflicting + transactions can execute without waiting to see if a conflict will arise. -An outer join is a type of relational database operation that joins two tables by returning all rows from each component table, even where there is not a matching record in the companion table. Join operations construct virtual rows by matching records that have identical values in specified comparison columns from each table. The results for an outer join will contain the rows from both tables where the column values matched and will additionally contain all of the unmatched rows from each table. For these rows, the columns without a match in the other table will be padded with `NULL` values to indicate that no matching row was found. + An outer join is a type of relational database operation that joins two tables + by returning all rows from each component table, even where there is not a + matching record in the companion table. Join operations construct virtual rows + by matching records that have identical values in specified comparison columns + from each table. The results for an outer join will contain the rows from both + tables where the column values matched and will additionally contain all of + the unmatched rows from each table. For these rows, the columns without a + match in the other table will be padded with `NULL` values to indicate that no + matching row was found. -A parameterized query, also known as a prepared statement, is a database query that has been to take user input as parameters instead of by concatenating strings with user input. Parameterized queries allow you to specify the query, including the unknown inputs ahead of time and then later provide the values that should be substituted within the statement. This prevents SQL injection vulnerabilities where carefully crafted inputs can be used to make the database system misinterpret a query by viewing values as executable SQL code. + A parameterized query, also known as a prepared statement, is a database query + that has been to take user input as parameters instead of by concatenating + strings with user input. Parameterized queries allow you to specify the query, + including the unknown inputs ahead of time and then later provide the values + that should be substituted within the statement. This prevents SQL injection + vulnerabilities where carefully crafted inputs can be used to make the + database system misinterpret a query by viewing values as executable SQL code. -Persistence is a quality of data that indicates that the state will outlive the process that created it. Persistence is a key part of most database systems and allows the data to be loaded once again after the database process or the server itself is restarted. Applications and databases can have various levels of persistence that guard against different types of failure conditions like single system persistence, remote persistence, and cluster persistence. + Persistence is a quality of data that indicates that the state will outlive + the process that created it. Persistence is a key part of most database + systems and allows the data to be loaded once again after the database process + or the server itself is restarted. Applications and databases can have various + levels of persistence that guard against different types of failure conditions + like single system persistence, remote persistence, and cluster persistence. -Persistent storage refers to any storage medium that is able to maintain data after the system loses power or is disconnected. Persistent storage is required to maintain a more permanent repository of data. Often, persistent storage is slower than ephemeral storage like in-memory data, so database systems use a variety of processes to shuttle data between the two storage systems as needed to take advantage of and balance the disadvantages of both types. - - - -Pessimistic concurrency control, or PCC, is a strategy used by database systems to handle conflicting concurrent operations. In contrast to optimistic concurrency control, pessimistic concurrency control short circuits transactions as soon as the possibility of a conflict arises. This strategy is useful if frequent conflicts occur because it ensures that the system does not waste time executing transactions that will be unable to commit due to conflict. Instead, it enforces a more serialized execution approach when conflicts might occur, which is slower, but avoids non-productive processing. + Persistent storage refers to any storage medium that is able to maintain data + after the system loses power or is disconnected. Persistent storage is + required to maintain a more permanent repository of data. Often, persistent + storage is slower than ephemeral storage like in-memory data, so database + systems use a variety of processes to shuttle data between the two storage + systems as needed to take advantage of and balance the disadvantages of both + types. + + + + Pessimistic concurrency control, or PCC, is a strategy used by database + systems to handle conflicting concurrent operations. In contrast to optimistic + concurrency control, pessimistic concurrency control short circuits + transactions as soon as the possibility of a conflict arises. This strategy is + useful if frequent conflicts occur because it ensures that the system does not + waste time executing transactions that will be unable to commit due to + conflict. Instead, it enforces a more serialized execution approach when + conflicts might occur, which is slower, but avoids non-productive processing. -A phantom read is a type of isolation anomaly that can occur within a transaction under certain types of isolation levels. A phantom read occurs when different rows are returned for a `SELECT` operation during a transaction due to changes made outside of the transaction. For example if you try to `SELECT` all records in a table, the first time it could return 8 rows, but if another transaction commits an additional row, a repeat of the original query would show a different result. + A phantom read is a type of isolation anomaly that can occur within a + transaction under certain types of isolation levels. A phantom read occurs + when different rows are returned for a `SELECT` operation during a transaction + due to changes made outside of the transaction. For example if you try to + `SELECT` all records in a table, the first time it could return 8 rows, but if + another transaction commits an additional row, a repeat of the original query + would show a different result. -PostgreSQL is a popular, high performance relational database system known for its compliance to various SQL standards. PostgreSQL focuses on providing a single, flexible database engine instead of offering multiple engines for different use cases. It is highly extensible and has a great range of community additions and client applications. + PostgreSQL is a popular, high performance relational database system known for + its compliance to various SQL standards. PostgreSQL focuses on providing a + single, flexible database engine instead of offering multiple engines for + different use cases. It is highly extensible and has a great range of + community additions and client applications. -In the context of search performance, precision is a measure of how relevant the retrieved results are to the given query. Specifically, search precision is defined as the ratio between the number of relevant results out of all of the results that were returned. A query with a high level of precision does not retrieve many items that are not applicable to the query. + In the context of search performance, precision is a measure of how relevant + the retrieved results are to the given query. Specifically, search precision + is defined as the ratio between the number of relevant results out of all of + the results that were returned. A query with a high level of precision does + not retrieve many items that are not applicable to the query. -A primary key is a type of database key that is designated as the main way to uniquely address a database row. While other keys may be able to pull individual rows, the primary key is specifically marked for this purpose with the system enforcing uniqueness and not `NULL` consistency checks. A primary key can be a natural key (a key that is naturally unique across records) or a surrogate key (a key added specifically to serve as a primary key) and can be formed from a single or multiple columns. + A primary key is a type of database key that is designated as the main way to + uniquely address a database row. While other keys may be able to pull + individual rows, the primary key is specifically marked for this purpose with + the system enforcing uniqueness and not `NULL` consistency checks. A primary + key can be a natural key (a key that is naturally unique across records) or a + surrogate key (a key added specifically to serve as a primary key) and can be + formed from a single or multiple columns. -In databases, a query is a formatted command used to make a request to a database management system using a query language. The database system processes the query to understand what actions to take and what data to return to the client. Often, queries are used to request data matching specific patterns, insert new data into an existing structure, or modify and save changes to existing records. In addition to targeting data items, queries can often manipulate items like table structures and the server settings, making them the general administrative interface for the system. SQL, or Structured Query Language, is the most common database querying language used with relational databases. + In databases, a query is a formatted command used to make a request to a + database management system using a query language. The database system + processes the query to understand what actions to take and what data to return + to the client. Often, queries are used to request data matching specific + patterns, insert new data into an existing structure, or modify and save + changes to existing records. In addition to targeting data items, queries can + often manipulate items like table structures and the server settings, making + them the general administrative interface for the system. SQL, or Structured + Query Language, is the most common database querying language used with + relational databases. -A query builder is a database abstraction used in application development to make programming against databases easier. Similar to an ORM, a query builder provides an interface for working with a database system from within the application. However, instead of attempting to map application objects to database records directly, query builders focus on providing native functions and methods that translate closely to the database operations. This allows you to build queries programmatically in a safer and more flexible way than working with SQL (or other database language) strings directly. + A query builder is a database abstraction used in application development to + make programming against databases easier. Similar to an ORM, a query builder + provides an interface for working with a database system from within the + application. However, instead of attempting to map application objects to + database records directly, query builders focus on providing native functions + and methods that translate closely to the database operations. This allows you + to build queries programmatically in a safer and more flexible way than + working with SQL (or other database language) strings directly. -A query language is a type of programming language that specializes in searching for, retrieving, and manipulating data in databases. SQL, or Structured Query Language, is the most common querying language in the world and is used primarily to manage data within relational database systems. Query language operations can be categorized based on the focus and target of the procedure into Data Definition Language (DDL) when they are used to define data structures, Data Control Language (DCL) when they are used for system management tasks, and Data Manipulation Language (DML) when they are used to modify data. + A query language is a type of programming language that specializes in + searching for, retrieving, and manipulating data in databases. SQL, or + Structured Query Language, is the most common querying language in the world + and is used primarily to manage data within relational database systems. Query + language operations can be categorized based on the focus and target of the + procedure into Data Definition Language (DDL) when they are used to define + data structures, Data Control Language (DCL) when they are used for system + management tasks, and Data Manipulation Language (DML) when they are used to + modify data. -A query planner is an internal component of a database system that is responsible for translating a client provided query into steps that can be used to actually search the database and construct the desired response. Well designed query planners can consider multiple potential solutions and select the option that will give the most optimized results. Sometimes, query planners do not select the best solution and database administrators must tweak the selection criteria manually. + A query planner is an internal component of a database system that is + responsible for translating a client provided query into steps that can be + used to actually search the database and construct the desired response. Well + designed query planners can consider multiple potential solutions and select + the option that will give the most optimized results. Sometimes, query + planners do not select the best solution and database administrators must + tweak the selection criteria manually. -The Raft consensus algorithm is an algorithm designed to coordinate information sharing, management responsibilities, and fault recovery across a cluster of nodes. The algorithm provides a method to ensure that each member agrees on data operations and includes mechanisms for leader election in cases of network partitions or node outages. It is generally considered a simpler algorithm to implement than alternatives like Paxos. - - - -The read committed isolation level is a transaction isolation level for relational database systems that offers a minimal amount of isolation guarantees. At the read committed level, transactions are guaranteed to be free of dirty reads, a phenomena where transactions can read data from other transactions that have not been committed yet. Nonrepeatable reads, phantom reads, and serialization anomalies are still possible at this isolation level. + The Raft consensus algorithm is an algorithm designed to coordinate + information sharing, management responsibilities, and fault recovery across a + cluster of nodes. The algorithm provides a method to ensure that each member + agrees on data operations and includes mechanisms for leader election in cases + of network partitions or node outages. It is generally considered a simpler + algorithm to implement than alternatives like{' '} + Paxos. + + + + The read committed isolation level is a transaction isolation level for + relational database systems that offers a minimal amount of isolation + guarantees. At the read committed level, transactions are guaranteed to be + free of dirty reads, a phenomena where transactions can read data from other + transactions that have not been committed yet. Nonrepeatable reads, phantom + reads, and serialization anomalies are still possible at this isolation level. -A read operation is generally defined as any operation that retrieves data without modification. Read operations should generally behave as if the underlying data were immutable. They may modify the retrieved data to change its format, filter it, or make other modifications, but the underlying data stored in the database system is not changed. + A read operation is generally defined as any operation that retrieves data + without modification. Read operations should generally behave as if the + underlying data were immutable. They may modify the retrieved data to change + its format, filter it, or make other modifications, but the underlying data + stored in the database system is not changed. -Read-through caching is a caching strategy where the cache is deployed in the path to the backing data source. The application sends all read queries directly to the cache. If the cache contains the requested item, it is returned immediately. It the cache request misses, the cache fetches the data from the backing database in order to return the items to the client and add it to the cache for future queries. In this architecture, the application continues to send all write queries directly to the backing database. - - - -The read uncommitted isolation level is a transaction isolation level for relational database systems that fundamentally offers no isolation. Transactions performed using the read uncommitted isolation level can suffer from dirty reads, nonrepeatable reads, phantom reads, and serialization anomalies. Generally speaking, the read uncommitted level is not very useful as it does not fulfill most users' expectations for isolation. + Read-through caching is a caching strategy where the cache is deployed in the + path to the backing data source. The application sends all read queries + directly to the cache. If the cache contains the requested item, it is + returned immediately. It the cache request misses, the cache fetches the data + from the backing database in order to return the items to the client and add + it to the cache for future queries. In this architecture, the application + continues to send all write queries directly to the backing database. + + + + The read uncommitted isolation level is a transaction isolation level for + relational database systems that fundamentally offers no isolation. + Transactions performed using the read uncommitted isolation level can suffer + from dirty reads, nonrepeatable reads, phantom reads, and serialization + anomalies. Generally speaking, the read uncommitted level is not very useful + as it does not fulfill most users' expectations for isolation. -In the context of search performance, recall is a measure of how many of the relevant items a query was able to retrieve. Recall is specifically defined as the ratio of the number of relevant results returned by a query compared to the total number of relevant entries in the dataset. A query with high recall retrieves a large number of the items that would be potentially relevant to a search query. + In the context of search performance, recall is a measure of how many of the + relevant items a query was able to retrieve. Recall is specifically defined as + the ratio of the number of relevant results returned by a query compared to + the total number of relevant entries in the dataset. A query with high recall + retrieves a large number of the items that would be potentially relevant to a + search query. -In databases, a record is a group of data usually representing a single entity. In relational databases, a record is synonymous with a row in a table. Each record may have multiple pieces of data or attributes associated with it (these would be fields in a relational table). + In databases, a record is a group of data usually representing a single + entity. In relational databases, a record is synonymous with a row in a table. + Each record may have multiple pieces of data or attributes associated with it + (these would be fields in a relational table). -Redis is a popular high performance key-value store that is frequently deployed as a cache, message queue, or configuration store. Redis is primarily an in-memory database but can optionally persist data to nonvolatile storage. It features a wide variety of types, flexible deployment options, and high scalability. + Redis is a popular high performance key-value store that is frequently + deployed as a cache, message queue, or configuration store. Redis is primarily + an in-memory database but can optionally persist data to nonvolatile storage. + It features a wide variety of types, flexible deployment options, and high + scalability. -A relational database is a database model that organizes data items according to predefined data structures known as tables. A table defines various columns with specific constraints and types and each record is added as a row in the table. The use of highly regular data structures provides relational database systems with many ways to combine the data held within various tables to answer individual queries. Relational databases take their name from algebraic relations which describes different operations that can be used to manipulate regular data. In most cases, relational databases use the SQL (Structured Query Language) to interact with the database system as it allows users to express complex queries in an ad-hoc manner. - - - -A relational database management system, also known as an RDBMS, is database software that manages relational databases. In practice, the term RDBMS is often used interchangeably with relational database, though technically speaking, an RDBMS manages one or more relational database. - - - -The repeatable read isolation level is a transaction isolation level for relational database systems that offers better isolation than read committed level, but not as much isolation of the serializable level. At the repeatable read isolation level, dirty reads and nonrepeatable reads are both prevented. However, phantom reads and serialization anomalies can still occur. This means that while reads of individual records are guaranteed to remain stable, range queries (like `SELECT` statements that return multiple rows) can change as a result of commits outside of the transaction. + A relational database is a database model that organizes data items according + to predefined data structures known as tables. A table defines various columns + with specific constraints and types and each record is added as a row in the + table. The use of highly regular data structures provides relational database + systems with many ways to combine the data held within various tables to + answer individual queries. Relational databases take their name from algebraic + relations which describes different operations that can be used to manipulate + regular data. In most cases, relational databases use the SQL (Structured + Query Language) to interact with the database system as it allows users to + express complex queries in an ad-hoc manner. + + + + A relational database management system, also known as an RDBMS, is database + software that manages relational databases. In practice, the term RDBMS is + often used interchangeably with relational database, though technically + speaking, an RDBMS manages one or more relational database. + + + + The repeatable read isolation level is a transaction isolation level for + relational database systems that offers better isolation than read committed + level, but not as much isolation of the serializable level. At the repeatable + read isolation level, dirty reads and nonrepeatable reads are both prevented. + However, phantom reads and serialization anomalies can still occur. This means + that while reads of individual records are guaranteed to remain stable, range + queries (like `SELECT` statements that return multiple rows) can change as a + result of commits outside of the transaction. -Replication is a process of continually copying and updating data from one system to another system. In databases, this typically involves a server sharing a log of changes that other servers can read and apply to their own copies of the data. This allows changes to propagate between various servers without requiring each server to approve operations at the time of execution. Many types of replication exists that differ in terms of method of sharing, the architecture of which systems copy data from where, and what policies are in place to control the replication process. Replication is an important feature in many systems for maintaining data availability, distributing load, and providing copies of data for offline procedures like backups. + Replication is a process of continually copying and updating data from one + system to another system. In databases, this typically involves a server + sharing a log of changes that other servers can read and apply to their own + copies of the data. This allows changes to propagate between various servers + without requiring each server to approve operations at the time of execution. + Many types of replication exists that differ in terms of method of sharing, + the architecture of which systems copy data from where, and what policies are + in place to control the replication process. Replication is an important + feature in many systems for maintaining data availability, distributing load, + and providing copies of data for offline procedures like backups. -A right join is a join operation for relational databases where all of the rows of the second table specified are returned, regardless of whether a matching row in the first table is found. Join operations construct virtual rows by matching records that have identical values in specified comparison columns from each table. The results for a right join will contain the rows from both tables where the column values matched and will additionally contain all of the unmatched rows from the second, or right, table. For these rows, the columns associated with the first, or left, table will be padded with `NULL` values to indicate that no matching row was found. - - - -Role-based access control, also known as RBAC, is a security strategy that restricts the operations permitted to a user based on their assigned roles. Permissions on object and privileges to execute actions are assigned to roles, labels that make managing access easier. To grant the capabilities associated with a role to a user, the user can be made a member of the role. Users can be made a member of multiple roles to gain a union of the permissions each role provides. Roles are helpful as a way of standardizing the privileges required for various roles and making it simple to add or remove access to users. + A right join is a join operation for relational databases where all of the + rows of the second table specified are returned, regardless of whether a + matching row in the first table is found. Join operations construct virtual + rows by matching records that have identical values in specified comparison + columns from each table. The results for a right join will contain the rows + from both tables where the column values matched and will additionally contain + all of the unmatched rows from the second, or right, table. For these rows, + the columns associated with the first, or left, table will be padded with + `NULL` values to indicate that no matching row was found. + + + + Role-based access control, also known as RBAC, is a security strategy that + restricts the operations permitted to a user based on their assigned roles. + Permissions on object and privileges to execute actions are assigned to roles, + labels that make managing access easier. To grant the capabilities associated + with a role to a user, the user can be made a member of the role. Users can be + made a member of multiple roles to gain a union of the permissions each role + provides. Roles are helpful as a way of standardizing the privileges required + for various roles and making it simple to add or remove access to users. -In relational databases, a row is a representation of a single record within a database table. Rows in these databases have a predefined structure in the form of a collection of columns that specify the data type and any constraints on the range of acceptable values. Each row in a relational table has the same columns or fields, leading to a very regular data structure. + In relational databases, a row is a representation of a single record within a + database table. Rows in these databases have a predefined structure in the + form of a collection of columns that specify the data type and any constraints + on the range of acceptable values. Each row in a relational table has the same + columns or fields, leading to a very regular data structure. -Serial scanning is a search technique that involves analyzing each potential item against the query at the time of the search. This is in opposition to index-based searching where items are accounted for and organized ahead of time to allow for faster query response. + Serial scanning is a search technique that involves analyzing each potential + item against the query at the time of the search. This is in opposition to + index-based searching where items are accounted for and organized ahead of + time to allow for faster query response. -SQL, or Structured Query Language, is the most common database querying language in use today. It is primarily used to work with relational data and allows users to create queries to select, filter, define, and manipulate the data within relational databases. While SQL is a common standard, implementation details differ widely, making it less software agnostic than hoped. + SQL, or Structured Query Language, is the most common database querying + language in use today. It is primarily used to work with relational data and + allows users to create queries to select, filter, define, and manipulate the + data within relational databases. While SQL is a common standard, + implementation details differ widely, making it less software agnostic than + hoped. -SQL injection is a type of attack that can be performed against vulnerable SQL-backed applications. It works by carefully crafting inputs that can be used to make the database system misinterpret a query by treating submitted values as executable SQL code. SQL injection is primarily caused by developers attempting to combine unsanitized user input with a query string using string concatenation. It can be prevented using prepared statements, also called parameterized queries, where the query with placeholders is submitted to the database separately from the substitute values so that the boundaries of the query values are unambiguous. + SQL injection is a type of attack that can be performed against vulnerable + SQL-backed applications. It works by carefully crafting inputs that can be + used to make the database system misinterpret a query by treating submitted + values as executable SQL code. SQL injection is primarily caused by developers + attempting to combine unsanitized user input with a query string using string + concatenation. It can be prevented using prepared statements, also called + parameterized queries, where the query with placeholders is submitted to the + database separately from the substitute values so that the boundaries of the + query values are unambiguous. -SQLite is a relational management database system written as a C language library. Since it is implemented as a library, it does not conform to the traditional client / server separation model and instead relies on the library or client program to perform both roles to write to local files. It is extremely functional for its size and is especially suitable for embedded use. SQLite has bindings in many different languages and it is deployed widely in applications as an internal storage system. + SQLite is a relational management database system written as a C language + library. Since it is implemented as a library, it does not conform to the + traditional client / server separation model and instead relies on the library + or client program to perform both roles to write to local files. It is + extremely functional for its size and is especially suitable for embedded use. + SQLite has bindings in many different languages and it is deployed widely in + applications as an internal storage system. -Sanitizing input, also known as input validation, is a process used to render user-provided values safe for further processing. It is used to guard against malicious input that can cause an application or the database to misinterpret data values as valid application or query code. Inputs can be sanitized in a number of different ways like limiting the list of valid characters, removing characters that have special meaning for the systems in use, and escaping values. Generally speaking, instead of sanitizing input, it is considered much safer to use prepared statements. + Sanitizing input, also known as input validation, is a process used to render + user-provided values safe for further processing. It is used to guard against + malicious input that can cause an application or the database to misinterpret + data values as valid application or query code. Inputs can be sanitized in a + number of different ways like limiting the list of valid characters, removing + characters that have special meaning for the systems in use, and escaping + values. Generally speaking, instead of sanitizing input, it is considered much + safer to use prepared statements. -Scaling is the process of expanding the resources allocated to your application or workload to allow for better performance or to handle more concurrent activity. Scaling strategies generally fall into two categories: scaling out (also called horizontal scaling) and scaling up (also known as vertical scaling). Horizontal scaling involves adding additional workers to a pool that can handle the incoming work. This often means adding additional servers that can all perform the same operations, thus distributing the load. Scaling up involves adding additional resources like processors, RAM, or storage to the server already handling requests. Scaling allows you to handle more concurrent operations but it can potentially increase the complexity of your application architecture. + Scaling is the process of expanding the resources allocated to your + application or workload to allow for better performance or to handle more + concurrent activity. Scaling strategies generally fall into two categories: + scaling out (also called horizontal scaling) and scaling up (also known as + vertical scaling). Horizontal scaling involves adding additional workers to a + pool that can handle the incoming work. This often means adding additional + servers that can all perform the same operations, thus distributing the load. + Scaling up involves adding additional resources like processors, RAM, or + storage to the server already handling requests. Scaling allows you to handle + more concurrent operations but it can potentially increase the complexity of + your application architecture. -A database schema is a structure describing how data should be organized within a database system. It defines the format of each table, field, index, relation, function, and any other structures held within the database. The schema is the definition that tells the database system what the object looks like and what data is and is not allowed to be associated with the object. In PostgreSQL, the database schema has a slightly different connotation in that it is implemented as a child of a database object that acts as a namespace for other database objects. - - - -The serializable isolation level is a transaction isolation level for relational database systems that offers the strictest isolation guarantees. At the serializable level, dirty reads, nonrepeatable reads, phantom reads, and serialization anomalies are all prevented. The database system does this by aborting any transactions where conflicts may occur, which ensures that concurrent transactions can be applied as if they were serially applied. Serializable isolation provides strong isolation, but it can suffer from significant performance problems due to the fact that conflicting transactions may be aborted and have to be resubmitted. + A database schema is a structure describing how data should be organized + within a database system. It defines the format of each table, field, index, + relation, function, and any other structures held within the database. The + schema is the definition that tells the database system what the object looks + like and what data is and is not allowed to be associated with the object. In + PostgreSQL, the database schema has a slightly different connotation in that + it is implemented as a child of a database object that acts as a namespace for + other database objects. + + + + The serializable isolation level is a transaction isolation level for + relational database systems that offers the strictest isolation guarantees. At + the serializable level, dirty reads, nonrepeatable reads, phantom reads, and + serialization anomalies are all prevented. The database system does this by + aborting any transactions where conflicts may occur, which ensures that + concurrent transactions can be applied as if they were serially applied. + Serializable isolation provides strong isolation, but it can suffer from + significant performance problems due to the fact that conflicting transactions + may be aborted and have to be resubmitted. -A serialization anomaly is a problem that can occur with concurrent transactions where the order that concurrent transactions are committed can impact the resulting data. Serialization anomalies occur because operations in different transactions can be making calculations based on data that other transactions may be updating. To prevent serialization anomalies, transactions must use the serializable isolation level, which prevents these conditions by rolling back one of the conflicting transactions. + A serialization anomaly is a problem that can occur with concurrent + transactions where the order that concurrent transactions are committed can + impact the resulting data. Serialization anomalies occur because operations in + different transactions can be making calculations based on data that other + transactions may be updating. To prevent serialization anomalies, transactions + must use the serializable isolation level, which prevents these conditions by + rolling back one of the conflicting transactions. -A database shard is a segment of records stored by a database object that is separated out and managed by a different database node for performance reasons. For example, a database table with 9 million records could be divided into three separate shards, each managing 3 million records. The data is typically divided according to a "shard key" which is a key that determines which shard a record should be managed by. Each shard manages its subset of records and a coordinating component is required to direct client queries to the appropriate shard by referring to the shard key. Sharding can help some types of performance in very large datasets but it often requires making trade-offs that might degrade other types of performance (for instance, on operations that need to coordinate between multiple shards). + A database shard is a segment of records stored by a database object that is + separated out and managed by a different database node for performance + reasons. For example, a database table with 9 million records could be divided + into three separate shards, each managing 3 million records. The data is + typically divided according to a "shard key" which is a key that determines + which shard a record should be managed by. Each shard manages its subset of + records and a coordinating component is required to direct client queries to + the appropriate shard by referring to the shard key. Sharding can help some + types of performance in very large datasets but it often requires making + trade-offs that might degrade other types of performance (for instance, on + operations that need to coordinate between multiple shards). -When working with data storage, stale data refers to any data that does not accurately reflect the most recent state of the data. This is often a concern primarily when caching, as pieces of data might potentially be preserved and used long after they been invalidated by changes. + When working with data storage, stale data refers to any data that does not + accurately reflect the most recent state of the data. This is often a concern + primarily when caching, as pieces of data might potentially be preserved and + used long after they been invalidated by changes. -A standard column family is a type of column family database object that stores data by defining row keys that are associated with key value pairs akin to columns. Each row can define and use its own columns, so the resulting dataset is not regular as with relational database tables. However, the row keys combined with column labels and values still somewhat resembles a table. Standard column families offer good performance for key-based data retrieval as they are able to store all of the information associated with a key in the same place and can modify the data structure for that key easily. + A standard column family is a type of column family database object that + stores data by defining row keys that are associated with key value pairs akin + to columns. Each row can define and use its own columns, so the resulting + dataset is not regular as with relational database tables. However, the row + keys combined with column labels and values still somewhat resembles a table. + Standard column families offer good performance for key-based data retrieval + as they are able to store all of the information associated with a key in the + same place and can modify the data structure for that key easily. -Stemming is a technique used in full-text search indexing where words with the same stem are collapsed into a single entry. This increases the number of relevant results considered at the expense of a slight decrease in precision. For instance, the words "cook", "cooked", and "cooks" might occupy a single entry where a search for any of the terms would return results for the whole entry. + Stemming is a technique used in full-text search indexing where words with the + same stem are collapsed into a single entry. This increases the number of + relevant results considered at the expense of a slight decrease in precision. + For instance, the words "cook", "cooked", and "cooks" might occupy a single + entry where a search for any of the terms would return results for the whole + entry. -In full-text search contexts, stop words are a list of words that are considered inapplicable to search queries. These are typically the most common words in a language that lack much meaning on their own or are ambiguous to the point of irrelevancy. Some examples in English are words like "the", "it", and "a". + In full-text search contexts, stop words are a list of words that are + considered inapplicable to search queries. These are typically the most common + words in a language that lack much meaning on their own or are ambiguous to + the point of irrelevancy. Some examples in English are words like "the", "it", + and "a". -A storage engine is the underlying component in database management systems that is responsible for inserting, removing, querying, and updating data within the database. Many database features, like the ability to execute transactions, are actually properties of the underlying storage engine. Some database systems, like MySQL, have many different storage engines that can be used according to the requirements of your use case. Other systems, like PostgreSQL, focus on providing a single storage engine that is useful in all typical scenarios. + A storage engine is the underlying component in database management systems + that is responsible for inserting, removing, querying, and updating data + within the database. Many database features, like the ability to execute + transactions, are actually properties of the underlying storage engine. Some + database systems, like MySQL, have many different storage engines that can be + used according to the requirements of your use case. Other systems, like + PostgreSQL, focus on providing a single storage engine that is useful in all + typical scenarios. -A stored procedure is a way to define a set of operations within the database that clients can easily run. Because they are stored within the database, they can sometimes offer performance improvements and avoid network latency. Stored procedures differ from user defined functions in that they must be explicitly invoked with a special statement rather than incorporated into other queries and cannot be used in all of the same scenarios. + A stored procedure is a way to define a set of operations within the database + that clients can easily run. Because they are stored within the database, they + can sometimes offer performance improvements and avoid network latency. Stored + procedures differ from user defined functions in that they must be explicitly + invoked with a special statement rather than incorporated into other queries + and cannot be used in all of the same scenarios. -A super column family is a type of column family database object that stores data by defining row keys that are associated with column families. Each row can contain multiple column families as a way of segmenting data further than in standard column families. + A super column family is a type of column family database object that stores + data by defining row keys that are associated with column families. Each row + can contain multiple column families as a way of segmenting data further than + in standard column families. -A superkey is any set of attributes within the relational database model that can be used to uniquely identify a record. All other key types (primary keys, candidate keys, composite keys, etc.) are examples of super keys. A trivial superkey contains all available attributes, while a candidate key is any superkey that cannot be simplified by removing additional columns. + A superkey is any set of attributes within the relational database model that + can be used to uniquely identify a record. All other key types (primary keys, + candidate keys, composite keys, etc.) are examples of super keys. A trivial + superkey contains all available attributes, while a candidate key is any + superkey that cannot be simplified by removing additional columns. -In relational databases, a table is a database structure that defines different attributes in the form of columns and stores records with the associated column values in the form of rows. The constraints and data types defined by a table's columns as well as additional table-level requirements describe the type of data that can be stored within the table. Since tables are a regular data structure, the database system understands the shape of the data contained within, which can help make query performance more predictable in some cases. + In relational databases, a table is a database structure that defines + different attributes in the form of columns and stores records with the + associated column values in the form of rows. The constraints and data types + defined by a table's columns as well as additional table-level requirements + describe the type of data that can be stored within the table. Since tables + are a regular data structure, the database system understands the shape of the + data contained within, which can help make query performance more predictable + in some cases. -A table alias is name given at query time for an existing or calculated table or table-like database object. Table aliases can be useful if the original name is long or ambiguous or if the table is generated by the query itself and requires a label to refer back to it in other parts of the query or for display. + A table alias is name given at query time for an existing or calculated table + or table-like database object. Table aliases can be useful if the original + name is long or ambiguous or if the table is generated by the query itself and + requires a label to refer back to it in other parts of the query or for + display. -The three tier application architecture is a common infrastructure architecture for deploying web applications. The first layer is comprised of one or more web servers that respond to client requests, serve static content, and generate requests to the subsequent layers. The second layer is handled by application servers and is responsible for generating dynamic content by executing code to generate responses for the front end. The third layer is handled by the database system and is responsible for responding to requests from the middle layer for custom values used to generate content. + The three tier application architecture is a common infrastructure + architecture for deploying web applications. The first layer is comprised of + one or more web servers that respond to client requests, serve static content, + and generate requests to the subsequent layers. The second layer is handled by + application servers and is responsible for generating dynamic content by + executing code to generate responses for the front end. The third layer is + handled by the database system and is responsible for responding to requests + from the middle layer for custom values used to generate content. -In natural language processing and full-text search, a token is a discrete word that is recognized to the system and can be categorized according to different features. A token might be stored with information including its relative position in a piece of text, it's type (number, word, phrase, etc.), as well as any additional metadata that might be useful. + In natural language processing and full-text search, a token is a discrete + word that is recognized to the system and can be categorized according to + different features. A token might be stored with information including its + relative position in a piece of text, it's type (number, word, phrase, etc.), + as well as any additional metadata that might be useful. -A database transaction is a set of operations combined into a single unit that can be executed by a database system atomically. Transactions ensure that all of the operations within them are successfully completed or that they are all rolled back to return to the starting state. This helps preserve data integrity and allows for isolation between different unrelated actions that clients may make within a database. The guarantees provided by database transactions are summarized by the ACID (atomicity, consistency, isolation, and durability) properties. + A database transaction is a set of operations combined into a single unit that + can be executed by a database system atomically. Transactions ensure that all + of the operations within them are successfully completed or that they are all + rolled back to return to the starting state. This helps preserve data + integrity and allows for isolation between different unrelated actions that + clients may make within a database. The guarantees provided by database + transactions are summarized by the ACID (atomicity, consistency, isolation, + and durability) properties. -Two-phase commit is an algorithm used to implement transactions in distributed systems. Two-phase commits work by separating the commit process into two general stages. In the first stage, a potential change is communicated by the server that received it to a coordinating component. The coordinator requests a vote from all of the involved servers on whether to commit or not. If the vote succeeds, the second stage begins where the transaction is actually committed by each individual member. The algorithm allows distributed systems to maintain a consistent dataset at the expense of the overhead associated with coordinating the voting procedure. + Two-phase commit is an algorithm used to implement transactions in distributed + systems. Two-phase commits work by separating the commit process into two + general stages. In the first stage, a potential change is communicated by the + server that received it to a coordinating component. The coordinator requests + a vote from all of the involved servers on whether to commit or not. If the + vote succeeds, the second stage begins where the transaction is actually + committed by each individual member. The algorithm allows distributed systems + to maintain a consistent dataset at the expense of the overhead associated + with coordinating the voting procedure. -Two-phase locking, sometimes abbreviated as 2PL, is a strategy for concurrency control to ensure that transactions are serializable. The two phases refer to actions that expand the number of locks held by the transaction and the actions that trigger a release of locks. Two phase locking works by using exclusive and shared locks to coordinate read and write operations. A transaction that needs to read data can request a shared read lock that allows other transactions to read the same data but blocks write operations. Because this is a shared lock, each successive read operation can simultaneously request a read lock and the data will remain unmodifiable until they are all released. A transaction that needs to modify data requests an exclusive write lock which prevents other write locks and any read locks from being issued. + Two-phase locking, sometimes abbreviated as 2PL, is a strategy for concurrency + control to ensure that transactions are serializable. The two phases refer to + actions that expand the number of locks held by the transaction and the + actions that trigger a release of locks. Two phase locking works by using + exclusive and shared locks to coordinate read and write operations. A + transaction that needs to read data can request a shared read lock that allows + other transactions to read the same data but blocks write operations. Because + this is a shared lock, each successive read operation can simultaneously + request a read lock and the data will remain unmodifiable until they are all + released. A transaction that needs to modify data requests an exclusive write + lock which prevents other write locks and any read locks from being issued. -An upsert is a database operation that either updates an existing entry or inserts a new entry when no current entry is found. Upsert operations consists of a querying component that is used to search for matching records to update and a mutation component that specifies which values should be updated. Often, additional values need to be provided for other fields to handle the case where a new record must be created. + An upsert is a database operation that either updates an existing entry or + inserts a new entry when no current entry is found. Upsert operations consists + of a querying component that is used to search for matching records to update + and a mutation component that specifies which values should be updated. Often, + additional values need to be provided for other fields to handle the case + where a new record must be created. -When talking about databases, a value is any piece of data that the database system stores within its data structures. With additional context like the name of the field where the value is stored, meaning can be assigned to the value beyond what is intrinsically there. The specific storage structure like the column or table may define requirements about what types of values it stores. + When talking about databases, a value is any piece of data that the database + system stores within its data structures. With additional context like the + name of the field where the value is stored, meaning can be assigned to the + value beyond what is intrinsically there. The specific storage structure like + the column or table may define requirements about what types of values it + stores. -Vertical scaling, also known as scaling up, is a scaling strategy that involves allocating additional resources like CPUs, RAM, or storage to a server or component in order to increase its performance or load capacity. Scaling up is typically the simplest strategy for scaling workloads as it does not increase the architectural complexity of the current deployment. While vertical scaling can work well in many scenarios, some disadvantages include reliance on a single point of failure and limitations on the amount of resources that can reasonably be managed by a single machine. + Vertical scaling, also known as scaling up, is a scaling strategy that + involves allocating additional resources like CPUs, RAM, or storage to a + server or component in order to increase its performance or load capacity. + Scaling up is typically the simplest strategy for scaling workloads as it does + not increase the architectural complexity of the current deployment. While + vertical scaling can work well in many scenarios, some disadvantages include + reliance on a single point of failure and limitations on the amount of + resources that can reasonably be managed by a single machine. -In graph databases, vertices are entities that can hold properties and be connected to other vertices through edges. Vertices are similar to a record or a document in other database systems as they have a label or name to indicate the type of object they represent and they have attributes that provide specific additional information to differentiate a specific vertex from others of its type. Vertices are connected to other vertices through edges that define a relationship between them. For instance, an "author" vertex can be connected to a "book" vertex with a "written by" edge. + In graph databases, vertices are entities that can hold properties and be + connected to other vertices through edges. Vertices are similar to a record or + a document in other database systems as they have a label or name to indicate + the type of object they represent and they have attributes that provide + specific additional information to differentiate a specific vertex from others + of its type. Vertices are connected to other vertices through edges that + define a relationship between them. For instance, an "author" vertex can be + connected to a "book" vertex with a "written by" edge. -In relational databases, a view is a table-like representation of a stored query. Views can be used as tables in many contexts, but instead of being part of the underlying data structure, they are derived from the results of their query. Views are useful for constructing more complex representations of data than exists in the underlying schema. For example, a view might join a few tables and display only a few relevant columns, which can help make the data more useable even if a different structure is preferable for storage due to consistency or performance reasons. + In relational databases, a view is a table-like representation of a stored + query. Views can be used as tables in many contexts, but instead of being part + of the underlying data structure, they are derived from the results of their + query. Views are useful for constructing more complex representations of data + than exists in the underlying schema. For example, a view might join a few + tables and display only a few relevant columns, which can help make the data + more useable even if a different structure is preferable for storage due to + consistency or performance reasons. -Volatile storage is any type of storage that is dependent on continual power to persist data. For example, data stored in RAM is typically considered to be volatile because it will be lost and unrecoverable in the event of a power outage. + Volatile storage is any type of storage that is dependent on continual power + to persist data. For example, data stored in RAM is typically considered to be + volatile because it will be lost and unrecoverable in the event of a power + outage. -A wide-column store is a type of NoSQL database that organizes its data into rows and columns using standard and super column families. A row key is used to retrieve all of the associated columns and super columns. Each row can contain entirely different columns as the column definitions and values are stored within the row structure itself. + A wide-column store is a type of NoSQL database that organizes its data into + rows and columns using standard and super column families. A row key is used + to retrieve all of the associated columns and super columns. Each row can + contain entirely different columns as the column definitions and values are + stored within the row structure itself. -Write-ahead logging, or WAL, is an approach to data revision management that increases the resiliency of systems data corruption during crashes and failures. Without a technique like WAL, corruption can occur if the system crashes when a change to a database is only partially completed. In this case, the data will be in neither the initial nor the intended state. With write ahead logging, the system records its intentions to a durable write ahead log before executing operations. This way, the database can recover a known-good state of the data by reviewing the log during recovery and redoing any operations that did not complete correctly initially. + Write-ahead logging, or WAL, is an approach to data revision management that + increases the resiliency of systems data corruption during crashes and + failures. Without a technique like WAL, corruption can occur if the system + crashes when a change to a database is only partially completed. In this case, + the data will be in neither the initial nor the intended state. With write + ahead logging, the system records its intentions to a durable write ahead log + before executing operations. This way, the database can recover a known-good + state of the data by reviewing the log during recovery and redoing any + operations that did not complete correctly initially. -In the context of searching, a search weight is an arbitrary value assigned to different categories of data designed to influence the priority of the item when analyzed for relevance. Assigning a heavy weight to a specific type of information will cause a query engine to assign greater significance to that category compared to other categories when compiling a list of relevant results. + In the context of searching, a search weight is an arbitrary value assigned to + different categories of data designed to influence the priority of the item + when analyzed for relevance. Assigning a heavy weight to a specific type of + information will cause a query engine to assign greater significance to that + category compared to other categories when compiling a list of relevant + results. -Write-around caching is a caching pattern where write queries are sent to the backing database directly rather than written to the cache first. Because any items in the cache related to the update will be now be stale, this method requires a way to invalidate the cache results for those items for subsequent reads. This technique is almost always combined with a policy for cache reads to control read behavior. This approach is best for data that is read infrequently once written or updated. + Write-around caching is a caching pattern where write queries are sent to the + backing database directly rather than written to the cache first. Because any + items in the cache related to the update will be now be stale, this method + requires a way to invalidate the cache results for those items for subsequent + reads. This technique is almost always combined with a policy for cache reads + to control read behavior. This approach is best for data that is read + infrequently once written or updated. -Write-back caching is a caching method where write queries are sent to the cache instead of the backing database. The cache then periodically bundles the write operations and sends them to the backing database for persistence. This is a modification of the write-through caching approach to reduce strain caused by high throughput write operations at the cost of less durability in the event of a crash. This ensures that all recently written data is immediately available to applications without additional operations, but can result in data loss if the cache crashes before it's able to persist writes to the database. + Write-back caching is a caching method where write queries are sent to the + cache instead of the backing database. The cache then periodically bundles the + write operations and sends them to the backing database for persistence. This + is a modification of the write-through caching approach to reduce strain + caused by high throughput write operations at the cost of less durability in + the event of a crash. This ensures that all recently written data is + immediately available to applications without additional operations, but can + result in data loss if the cache crashes before it's able to persist writes to + the database. -In the context of databases, a write operation is any database action that modifies the stored data. This includes inserting new records, deleting records, and updating existing records to new values. + In the context of databases, a write operation is any database action that + modifies the stored data. This includes inserting new records, deleting + records, and updating existing records to new values. -Write-through caching is a caching pattern where the application writes changes directly to the cache instead of the backing database. The cache then immediately forwards the new data to the backing database for persistence. This strategy minimizes the risk of data loss in the event of a cache crash while ensuring that read operations have access to all new data. In high write scenarios, it may make sense to transition to write-back caching to prevent straining the backing database. + Write-through caching is a caching pattern where the application writes + changes directly to the cache instead of the backing database. The cache then + immediately forwards the new data to the backing database for persistence. + This strategy minimizes the risk of data loss in the event of a cache crash + while ensuring that read operations have access to all new data. In high write + scenarios, it may make sense to transition to write-back caching to prevent + straining the backing database.
-If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) with your databases, you can visit our [database feature matrix](https://www.prisma.io/docs/reference/database-reference/database-features) to check the support for various database features within Prisma. +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) with your databases, you can visit our [database feature matrix](https://www.prisma.io/docs/orm/reference/database-features) to check the support for various database features within Prisma. diff --git a/content/02-datamodeling/01-intro-dont-panic.mdx b/content/02-datamodeling/01-intro-dont-panic.mdx index 0ab30259..068ad6a4 100644 --- a/content/02-datamodeling/01-intro-dont-panic.mdx +++ b/content/02-datamodeling/01-intro-dont-panic.mdx @@ -1,22 +1,22 @@ --- title: "Intro (don't panic)" -metaTitle: "Data modeling | A Quick Introduction" +metaTitle: 'Data modeling | A Quick Introduction' metaDescription: "Intro (Don't Panic)" authors: ['dianfay'] --- - ## Introduction + +## Introduction If you're reading this, you are more than likely being pressed into service as a database architect. This can befall you any number of ways: you might be a developer or analyst tasked with your first (or fortieth) improvement or patch job on an existing data model, or you could be staring down the blank canvas of an empty database like a rookie matador. Two things are certain: first, information needs to be stored and retrieved, as efficiently and conveniently as possible; and second, you're the one who needs to make it work. This guide will help you get to grips with modeling information and producing durable and maintainable database schema designs. We'll concentrate on relational databases for the most part, so you should come into this with a basic grasp of storing and retrieving data with SQL. Ideally, you'll have a database of your own to experiment in; examples will be given for PostgreSQL, a free and open-source database management system. So: data modeling. Like everything else in computing, it's math once you get right down to it. However, its day-to-day practice is almost entirely abstracted to the level of structuring and managing information as it flows through various systems. We'll touch on some of the mathy fundamentals of sets and predicates later on, but the database designer must solve problems of legibility and maintainability as much as of raw mathematical efficiency. As Heinz Klein and Kalle Lyytinen put it thirty years ago, _Towards a New Understanding of Data Modeling_
in Software Development and Reality Construction
"the appropriate metaphors for data modelling are not fact gathering and modelling, but negotiation and lawmaking"
. - This is intended eventually to be a complete crash course in (relational, although not ignoring others) designing data models. For now, we're publishing parts as they're written, and concentrating first on situating databases and data modeling problems in an organization and systems design context, as well as covering some of the less-prominent areas of database functionality. -If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/concepts/overview/what-is-prisma/data-modeling). +If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/orm/overview/introduction/data-modeling). -You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model) section of the Prisma documentation. +You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/orm/prisma-schema/data-model) section of the Prisma documentation. diff --git a/content/02-datamodeling/02-know-your-problem-space.mdx b/content/02-datamodeling/02-know-your-problem-space.mdx index 98ac0574..9735f4c7 100644 --- a/content/02-datamodeling/02-know-your-problem-space.mdx +++ b/content/02-datamodeling/02-know-your-problem-space.mdx @@ -81,8 +81,8 @@ The present is more important than the future. If your model is not useful now, -If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/concepts/overview/what-is-prisma/data-modeling). +If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/orm/overview/introduction/data-modeling). -You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model) section of the Prisma documentation. +You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/orm/prisma-schema/data-model) section of the Prisma documentation. diff --git a/content/02-datamodeling/04-tables-tuples-types.mdx b/content/02-datamodeling/04-tables-tuples-types.mdx index 76a05b58..5804e7ea 100644 --- a/content/02-datamodeling/04-tables-tuples-types.mdx +++ b/content/02-datamodeling/04-tables-tuples-types.mdx @@ -85,8 +85,8 @@ Coming up in [_Correctness and Constraints_](correctness-constraints), we'll cov -If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/concepts/overview/what-is-prisma/data-modeling). +If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/orm/overview/introduction/data-modeling). -You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model) section of the Prisma documentation. +You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/orm/prisma-schema/data-model) section of the Prisma documentation. diff --git a/content/02-datamodeling/05-correctness-constraints.mdx b/content/02-datamodeling/05-correctness-constraints.mdx index 165944e5..efad84e3 100644 --- a/content/02-datamodeling/05-correctness-constraints.mdx +++ b/content/02-datamodeling/05-correctness-constraints.mdx @@ -110,9 +110,9 @@ But there are also ways in which the very structure of a table can lead to incon -If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/concepts/overview/what-is-prisma/data-modeling). +If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/orm/overview/introduction/data-modeling). -You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model) section of the Prisma documentation. +You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/orm/prisma-schema/data-model) section of the Prisma documentation. diff --git a/content/02-datamodeling/06-making-connections.mdx b/content/02-datamodeling/06-making-connections.mdx index c9c6e506..ec287ec5 100644 --- a/content/02-datamodeling/06-making-connections.mdx +++ b/content/02-datamodeling/06-making-connections.mdx @@ -116,8 +116,8 @@ We'll come back to organizing tables in databases and in schemas within database -If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/concepts/overview/what-is-prisma/data-modeling). +If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/orm/overview/introduction/data-modeling). -You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model) section of the Prisma documentation. +You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/orm/prisma-schema/data-model) section of the Prisma documentation. diff --git a/content/02-datamodeling/08-functional-units.mdx b/content/02-datamodeling/08-functional-units.mdx index ba2787af..e31612b9 100644 --- a/content/02-datamodeling/08-functional-units.mdx +++ b/content/02-datamodeling/08-functional-units.mdx @@ -13,7 +13,7 @@ But databases are, by definition, very good at processing and manipulating infor -Find out [how to use functions with Prisma](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model#using-functions) in the Prisma documentation. +Find out [how to use functions with Prisma](https://www.prisma.io/docs/orm/prisma-schema/data-model/models#using-functions) in the Prisma documentation. @@ -93,7 +93,7 @@ CREATE TABLE lots ( -If you're using Prisma, our documentation covers the equivalent method of [defining default values for your fields](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model#defining-a-default-value). Prisma Client also supports [aggregation](https://www.prisma.io/docs/concepts/components/prisma-client/aggregation-grouping-summarizing), which allows you to perform count, average, and similar operations on data without separate storage. +If you're using Prisma, our documentation covers the equivalent method of [defining default values for your fields](https://www.prisma.io/docs/orm/prisma-schema/data-model/models#defining-a-default-value). Prisma Client also supports [aggregation](https://www.prisma.io/docs/orm/prisma-client/queries/aggregation-grouping-summarizing), which allows you to perform count, average, and similar operations on data without separate storage. @@ -179,7 +179,7 @@ The trigger-action-trigger-action call stack can become arbitrarily long, althou -In Prisma Client, you can achieve similar results with TypeScript at the client level instead of SQL by using [middleware](https://www.prisma.io/docs/concepts/components/prisma-client/middleware). Middleware allows you to perform an action before and after every query (e.g., turn a `delete` query into a "soft" delete which toggles a record's visibility instead). +In Prisma Client, you can achieve similar results with TypeScript at the client level instead of SQL by using [middleware](https://www.prisma.io/docs/orm/prisma-client/client-extensions/middleware). Middleware allows you to perform an action before and after every query (e.g., turn a `delete` query into a "soft" delete which toggles a record's visibility instead). @@ -191,9 +191,9 @@ Programmability is an unjustly overlooked feature of relational databases. There -If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/concepts/overview/what-is-prisma/data-modeling). +If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/orm/overview/introduction/data-modeling). -You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model) section of the Prisma documentation. +You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/orm/prisma-schema/data-model) section of the Prisma documentation. diff --git a/content/02-datamodeling/12-in-vivo.mdx b/content/02-datamodeling/12-in-vivo.mdx index 6e4640ce..f05e4550 100644 --- a/content/02-datamodeling/12-in-vivo.mdx +++ b/content/02-datamodeling/12-in-vivo.mdx @@ -99,8 +99,8 @@ The idea of "federated" architectures derives from the dual form of government w -If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/concepts/overview/what-is-prisma/data-modeling). +If you want to learn more about what data modeling means in the context of Prisma, visit our conceptual page on [data modeling](https://www.prisma.io/docs/orm/overview/introduction/data-modeling). -You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model) section of the Prisma documentation. +You can also learn about the specific data model component of a Prisma schema in the [data model](https://www.prisma.io/docs/orm/prisma-schema/data-model) section of the Prisma documentation. diff --git a/content/03-types/01-relational-vs-document-databases.mdx b/content/03-types/01-relational-vs-document-databases.mdx index 1abd9417..854e91ca 100644 --- a/content/03-types/01-relational-vs-document-databases.mdx +++ b/content/03-types/01-relational-vs-document-databases.mdx @@ -1,6 +1,6 @@ --- title: 'Comparing relational and document databases' -metaTitle: "Relational databases vs document databases" +metaTitle: 'Relational databases vs document databases' metaDescription: 'In this article, we examine the differences between relational databases and document databases.' metaImage: '/social/docs-social.png' authors: ['justinellingwood'] @@ -8,109 +8,109 @@ authors: ['justinellingwood'] ## Introduction -Over time, databases have been designed to accommodate many different usage patterns, organizational hierarchies, and consistency constraints. Two of the most enduring designs are [relational databases](/intro/database-glossary#relational-database) and [document databases](/intro/database-glossary#document-database). +Over time, databases have been designed to accommodate many different usage patterns, organizational hierarchies, and consistency constraints. Two of the most enduring designs are [relational databases](/intro/database-glossary#relational-database) and [document databases](/intro/database-glossary#document-database). -In this guide, we'll take a look at these two database models to get a better idea of their relative strengths and weaknesses and evaluate what scenarios they're best suited for. We'll take a look at how they handle data structure, querying, and the upshot of using either within your projects. +In this guide, we'll take a look at these two database models to get a better idea of their relative strengths and weaknesses and evaluate what scenarios they're best suited for. We'll take a look at how they handle data structure, querying, and the upshot of using either within your projects. ## How relational databases and document databases organize data -The core way that relational databases differ from document databases is the actual database model they employ. The way that data is stored and organized within the management system has wide reaching implications on the types of operations they permit, which access patterns are most performant, and how straightforward it is to integrate with application logic. +The core way that relational databases differ from document databases is the actual database model they employ. The way that data is stored and organized within the management system has wide reaching implications on the types of operations they permit, which access patterns are most performant, and how straightforward it is to integrate with application logic. ### How relational databases structure data The data structure of relational databases is defined primarily through a hierarchy of related mechanisms: databases, tables, and columns. -Databases themselves act as a container for various tables and properties within the system. The database layer allows administrators to apply policy and attributes to sets of related data globally. They also serve as namespaces to limit the potential for naming collision for tables and other child elements. +Databases themselves act as a container for various tables and properties within the system. The database layer allows administrators to apply policy and attributes to sets of related data globally. They also serve as namespaces to limit the potential for naming collision for tables and other child elements. -Within these databases, the data's structure is defined by [tables](/intro/database-glossary#table). Tables are made by declaring the names and properties of a set of [columns](/intro/database-glossary#column) and configuring table-wide attributes. +Within these databases, the data's structure is defined by [tables](/intro/database-glossary#table). Tables are made by declaring the names and properties of a set of [columns](/intro/database-glossary#column) and configuring table-wide attributes. -Each column specifies a [data type](/intro/database-glossary#data-type) which controls the shape of data that may be stored within it. [Constraints](/intro/database-glossary#constraint) can also be declared on columns and tables which allow you to impose additional requirements on what constitutes valid data for the field. The data itself is stored as [rows](/intro/database-glossary#row) within the table. Each record stores a single value for each of the table's columns. +Each column specifies a [data type](/intro/database-glossary#data-type) which controls the shape of data that may be stored within it. [Constraints](/intro/database-glossary#constraint) can also be declared on columns and tables which allow you to impose additional requirements on what constitutes valid data for the field. The data itself is stored as [rows](/intro/database-glossary#row) within the table. Each record stores a single value for each of the table's columns. -To summarize, [relational database management systems (RDMSs)](/intro/database-glossary#relational-database-management-system) group related tables and settings into databases. Each table defines a specific structure for the records it can hold by setting up a series of columns that have a data type and other properties. Records are added to tables as rows, with each row recording a value for each of the table's columns. +To summarize, [relational database management systems (RDMSs)](/intro/database-glossary#relational-database-management-system) group related tables and settings into databases. Each table defines a specific structure for the records it can hold by setting up a series of columns that have a data type and other properties. Records are added to tables as rows, with each row recording a value for each of the table's columns. ### How document databases structure data -Document database management systems also structure their data through a hierarchy of related components, but with a different set of paradigms. Document databases typically use a system of databases, collections, and documents. +Document database management systems also structure their data through a hierarchy of related components, but with a different set of paradigms. Document databases typically use a system of databases, collections, and documents. -As with relational databases, document database systems use an overarching "database" abstraction to encapsulate related data to allow for global policy and namespacing. The database layer serves as a container to define wide-ranging properties, allow for cohesive access control, and to scope actions to the relevant context. +As with relational databases, document database systems use an overarching "database" abstraction to encapsulate related data to allow for global policy and namespacing. The database layer serves as a container to define wide-ranging properties, allow for cohesive access control, and to scope actions to the relevant context. -Within the database, a grouping called [collections](/intro/database-glossary#collection) are used to bundle together individual documents. Collections are more loosely defined than tables (their relational counterpart) and mainly serve as a container and additional scoping mechanism. +Within the database, a grouping called [collections](/intro/database-glossary#collection) are used to bundle together individual documents. Collections are more loosely defined than tables (their relational counterpart) and mainly serve as a container and additional scoping mechanism. -Unlike relational databases, collections themselves do not define the fields or properties of the documents that they can store. Instead, the structure of each document is defined implicitly by the fields and data that it declares. Because the structure being a property of individual documents rather than a collection, the shape of the documents within a collection can vary widely. Conforming to an intelligible structure is left as a responsibility of the database user rather than a property that the database system enforces by itself. +Unlike relational databases, collections themselves do not define the fields or properties of the documents that they can store. Instead, the structure of each document is defined implicitly by the fields and data that it declares. Because the structure being a property of individual documents rather than a collection, the shape of the documents within a collection can vary widely. Conforming to an intelligible structure is left as a responsibility of the database user rather than a property that the database system enforces by itself. -## The interplay between structure and flexibility +## The interplay between structure and flexibility -Given the different data management philosophies that these two designs employ, it might not be surprising that there are some major differences in the amount of structure and flexibility afforded to users. In general, relational database systems tend to prioritize structure, predictability, and consistency while document databases prefer flexibility, responsiveness, and adaptiveness. +Given the different data management philosophies that these two designs employ, it might not be surprising that there are some major differences in the amount of structure and flexibility afforded to users. In general, relational database systems tend to prioritize structure, predictability, and consistency while document databases prefer flexibility, responsiveness, and adaptiveness. ### The case for structural rigidity -By using tables, relational databases configure the shape of their data ahead of time. Each record that is stored within a table must conform to the structure that the table implements without exception. +By using tables, relational databases configure the shape of their data ahead of time. Each record that is stored within a table must conform to the structure that the table implements without exception. -To change the structure of the data, the table structure itself must be altered and any existing records will need to be updated to match the new structure. This system makes structural changes relatively expensive as each piece of data already entered into the table requires an update. This can mean updating every record in the table, rebuilding indexes, and having to make decisions about the best way to backfill values that weren't recorded at their initial entry. +To change the structure of the data, the table structure itself must be altered and any existing records will need to be updated to match the new structure. This system makes structural changes relatively expensive as each piece of data already entered into the table requires an update. This can mean updating every record in the table, rebuilding indexes, and having to make decisions about the best way to backfill values that weren't recorded at their initial entry. -This cost makes it wise to think carefully about your data structure ahead of time, which can be intimidating. However, it is important to keep in mind that this method provides a good deal of reassurance and safety in terms of data integrity. +This cost makes it wise to think carefully about your data structure ahead of time, which can be intimidating. However, it is important to keep in mind that this method provides a good deal of reassurance and safety in terms of data integrity. -The data stored in a table will always be homogeneous to the extent required by the table definition. It is a mechanism of enforcement that can help you maintain well-ordered data that can be reasoned about without introspecting the individual properties of each data item. +The data stored in a table will always be homogeneous to the extent required by the table definition. It is a mechanism of enforcement that can help you maintain well-ordered data that can be reasoned about without introspecting the individual properties of each data item. -The table structure offers guarantees about your data that isn't possible without a consensus around what the data should look like. This can help you avoid entire classes of problems in terms of data consistency and coherence, especially over time as the application logic that interfaces with the database evolves. +The table structure offers guarantees about your data that isn't possible without a consensus around what the data should look like. This can help you avoid entire classes of problems in terms of data consistency and coherence, especially over time as the application logic that interfaces with the database evolves. ### The case for flexibility -On the other side of things, document databases define the structure of each individual document separately. The structure is a characteristic that the document itself defines rather than an *external* structure the record must conform to. +On the other side of things, document databases define the structure of each individual document separately. The structure is a characteristic that the document itself defines rather than an _external_ structure the record must conform to. -This gives you a great deal of flexibility in a number of different areas. You can change the data you want to record for individual records on the fly, delay or skip backfilling previously saved documents, or store documents with very different structure within the same collection without requiring to set missing values to NULL in every other document. +This gives you a great deal of flexibility in a number of different areas. You can change the data you want to record for individual records on the fly, delay or skip backfilling previously saved documents, or store documents with very different structure within the same collection without requiring to set missing values to NULL in every other document. -As you develop, your database structure can evolve easily alongside your application logic. This makes changes less burdensome as there is less of a required synchronization and migration process associated with each structural change. The database system will allow any new document structure you want to apply to exist alongside all previous structures. If you need to adjust existing records, you can backfill data or modify the structure out-of-band as a separate process. +As you develop, your database structure can evolve easily alongside your application logic. This makes changes less burdensome as there is less of a required synchronization and migration process associated with each structural change. The database system will allow any new document structure you want to apply to exist alongside all previous structures. If you need to adjust existing records, you can backfill data or modify the structure out-of-band as a separate process. -The flexibility provided by the document model encourages the iteration and evolution of your storage logic. It is important to keep in mind, however, that the software itself is unlikely to be able to provide you with as many guarantees about your data as you make changes. If there is no agreed upon standard for the shape of the data collections hold, then it is up to you as a developer to enforce consistency and to modify documents where appropriate to keep your data in a well-understood state. +The flexibility provided by the document model encourages the iteration and evolution of your storage logic. It is important to keep in mind, however, that the software itself is unlikely to be able to provide you with as many guarantees about your data as you make changes. If there is no agreed upon standard for the shape of the data collections hold, then it is up to you as a developer to enforce consistency and to modify documents where appropriate to keep your data in a well-understood state. ## Normalization, joins, and data cohabitation -One of the most useful features of the relational database model is the concept of [joins](/intro/database-glossary#join). Joins are queries that allow you to stitch together data held in different tables to allow you to work with them as a single unit. While it might make sense from an efficiency, logical, or consistency standpoint to store certain pieces of data in distinct tables, often you'll want to retrieve data or otherwise operate across those boundaries. +One of the most useful features of the relational database model is the concept of [joins](/intro/database-glossary#join). Joins are queries that allow you to stitch together data held in different tables to allow you to work with them as a single unit. While it might make sense from an efficiency, logical, or consistency standpoint to store certain pieces of data in distinct tables, often you'll want to retrieve data or otherwise operate across those boundaries. -[Normalization](/intro/database-glossary#normalization) is a table design philosophy that encourages you to organize data in a way that guarantees consistency of data across table boundaries. Normalization is in some ways a requirement for joins while also being one of the primary reason why joins are necessary within the relational model. +[Normalization](/intro/database-glossary#normalization) is a table design philosophy that encourages you to organize data in a way that guarantees consistency of data across table boundaries. Normalization is in some ways a requirement for joins while also being one of the primary reason why joins are necessary within the relational model. -You can think of normalization as a strategy for maintaining data consistency by logically separating data according to how reusable its discrete parts are. For instance, a record that represents a unique customer will be used very differently than a record that represents one of the customer's individual orders. Normalization encourages you to separate these types of information into their own tables, while joins provide you with the mechanism required to combine them when needed. +You can think of normalization as a strategy for maintaining data consistency by logically separating data according to how reusable its discrete parts are. For instance, a record that represents a unique customer will be used very differently than a record that represents one of the customer's individual orders. Normalization encourages you to separate these types of information into their own tables, while joins provide you with the mechanism required to combine them when needed. -Joins require consistently structured data with fields that can be mapped to one another to match individual records. Because the document model doesn't enforce a set structure for records, a join is a more difficult operation to conceive of within that paradigm. With that being said, it's important to note that: +Joins require consistently structured data with fields that can be mapped to one another to match individual records. Because the document model doesn't enforce a set structure for records, a join is a more difficult operation to conceive of within that paradigm. With that being said, it's important to note that: -* similar functionality can be achieved in many document database systems -* the model itself encourages less diffuse data storage +- similar functionality can be achieved in many document database systems +- the model itself encourages less diffuse data storage -What this means in practice is that the document structure does not have the same pressures driving towards normalization that relational databases have. Records can and often do contain nested information that would usually be separated in a relational model, allowing you to retrieve related information in a single document rather than stitching together multiple entities. Even so, [aggregation operators](https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/lookup/#pipe._S_lookup) can provide some of the same basic functionality as joins within the document model using some of the same methods as relational databases. +What this means in practice is that the document structure does not have the same pressures driving towards normalization that relational databases have. Records can and often do contain nested information that would usually be separated in a relational model, allowing you to retrieve related information in a single document rather than stitching together multiple entities. Even so, [aggregation operators](https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/lookup/#pipe._S_lookup) can provide some of the same basic functionality as joins within the document model using some of the same methods as relational databases. ## Querying data -Relational databases traditionally tend to implement [Structured Query Language](/intro/database-glossary#sql), or SQL, as their primary query language. SQL can be used to query for existing data, insert new data, create or modify database structures, and manage the general database environment. SQL has been around since the 1970s, has many different iterations and implementations, and is generally regarded as the standard interface for databases that follow a table-based structure. +Relational databases traditionally tend to implement [Structured Query Language](/intro/database-glossary#sql), or SQL, as their primary query language. SQL can be used to query for existing data, insert new data, create or modify database structures, and manage the general database environment. SQL has been around since the 1970s, has many different iterations and implementations, and is generally regarded as the standard interface for databases that follow a table-based structure. -SQL has many proponents and detractors and can be a polarizing subject, in part due to its long history, inconsistent standards, and different ergonomics compared to traditional programming languages. SQL is fairly expressive and flexible in what it can access and retrieve, but it can be cumbersome to deal with programmatically because its grammar and the structure of its statements can be difficult to parse and construct. Furthermore, it doesn't follow the patterns laid out by many other languages because, as its name suggests, it was conceived primarily as a command and querying language rather than a general purpose language for programming with data. +SQL has many proponents and detractors and can be a polarizing subject, in part due to its long history, inconsistent standards, and different ergonomics compared to traditional programming languages. SQL is fairly expressive and flexible in what it can access and retrieve, but it can be cumbersome to deal with programmatically because its grammar and the structure of its statements can be difficult to parse and construct. Furthermore, it doesn't follow the patterns laid out by many other languages because, as its name suggests, it was conceived primarily as a command and querying language rather than a general purpose language for programming with data. -For relational databases, SQL is often the primary means of interfacing with the database system. With document databases, the record structure isn't aligned with the assumptions inherent within SQL, so new querying languages are required for interacting with the system and entering, querying, and modifying data. Some languages take a good deal of inspiration from SQL while others implement entirely new languages in an attempt to escape from some of SQL's more frustrating warts. +For relational databases, SQL is often the primary means of interfacing with the database system. With document databases, the record structure isn't aligned with the assumptions inherent within SQL, so new querying languages are required for interacting with the system and entering, querying, and modifying data. Some languages take a good deal of inspiration from SQL while others implement entirely new languages in an attempt to escape from some of SQL's more frustrating warts. -In many document databases, the querying language has been designed to follow access patterns that may be more familiar to application developers. They often implement an API-like interface that emulates the tooling that developers use for JSON-like data management so that they can reuse existing patterns. While relational databases tend to converge around the SQL standard, document databases and other NoSQL databases may have very different interfaces from one another. Often the querying language that's developed alongside the actual database is one of the points of differentiation between different document database solutions. +In many document databases, the querying language has been designed to follow access patterns that may be more familiar to application developers. They often implement an API-like interface that emulates the tooling that developers use for JSON-like data management so that they can reuse existing patterns. While relational databases tend to converge around the SQL standard, document databases and other NoSQL databases may have very different interfaces from one another. Often the querying language that's developed alongside the actual database is one of the points of differentiation between different document database solutions. ## Scaling beyond a single database -Most traditional relational databases were initially designed in an era that assumed a fairly simple infrastructure environment. In many cases, [vertical scaling](/intro/database-glossary#vertical-scaling), or scaling the performance of the database by adding additional resources to a single host, was the simplest strategy and one that could usually satisfy any new requirements. +Most traditional relational databases were initially designed in an era that assumed a fairly simple infrastructure environment. In many cases, [vertical scaling](/intro/database-glossary#vertical-scaling), or scaling the performance of the database by adding additional resources to a single host, was the simplest strategy and one that could usually satisfy any new requirements. -[Horizontal scaling](/intro/database-glossary#horizontal-scaling) was also possible either by [replicating](/intro/database-glossary#replication) databases to multiple followers or by [sharding data](/intro/database-glossary#shard): splitting up databases between several hosts so that each is responsible for a subset of the complete data. These strategies allow for some flexibility beyond just managing the resources of a single host and also provided additional benefits like increased availability and failure mitigation. +[Horizontal scaling](/intro/database-glossary#horizontal-scaling) was also possible either by [replicating](/intro/database-glossary#replication) databases to multiple followers or by [sharding data](/intro/database-glossary#shard): splitting up databases between several hosts so that each is responsible for a subset of the complete data. These strategies allow for some flexibility beyond just managing the resources of a single host and also provided additional benefits like increased availability and failure mitigation. -There are limits to the ways that relational databases can easily scale across multiple hosts however. The focus on data consistency means that [write operations](/intro/database-glossary#write-operation) especially are difficult to implement across hosts as there must be a way to reach consensus on what changes are applied to datasets. The typical solutions for this are to either forward all write operations to a single host or to coordinate expensive consensus algorithms between multiple hosts, which often have large impacts on throughput and performance. +There are limits to the ways that relational databases can easily scale across multiple hosts however. The focus on data consistency means that [write operations](/intro/database-glossary#write-operation) especially are difficult to implement across hosts as there must be a way to reach consensus on what changes are applied to datasets. The typical solutions for this are to either forward all write operations to a single host or to coordinate expensive consensus algorithms between multiple hosts, which often have large impacts on throughput and performance. -Newer relational databases have been developed to help address these issues, so there has been significant progress in this area. However, some of the difficulty in scaling is inherent in the relational model itself. When constraints and consistency requirements are set across large portions of the data, some level of coordination overhead must exist to manage the state data across various hosts. In some important ways, the relational model seems to require a centralized management architecture. +Newer relational databases have been developed to help address these issues, so there has been significant progress in this area. However, some of the difficulty in scaling is inherent in the relational model itself. When constraints and consistency requirements are set across large portions of the data, some level of coordination overhead must exist to manage the state data across various hosts. In some important ways, the relational model seems to require a centralized management architecture. -Document databases, on the other hand, are able to avoid many of these shortcomings due to the way that they data is structured within their systems. By collocating related data together in a single document, the coordination between different hosts can be minimized. Sharding datasets is a much more common strategy in document databases. This is due to the fact that document-based operations typically don't require as much coordination since many actions target individual records. +Document databases, on the other hand, are able to avoid many of these shortcomings due to the way that they data is structured within their systems. By collocating related data together in a single document, the coordination between different hosts can be minimized. Sharding datasets is a much more common strategy in document databases. This is due to the fact that document-based operations typically don't require as much coordination since many actions target individual records. -Because fewer constraints and links exist between individual documents and collections within document databases, coordination is often easier and operations tend to be more self-contained. This allows document database providers to prioritize performance and availability, where relational databases are often forced to make concessions in the name of consistency. This is a trade-off that can have many implications for the safety of your data and how well your systems can handle outages and network partitions. The major difference ends up being that document databases tend to have much more flexibility in tuning the level of consistency versus performance and availability while relational databases often require consistency to always be the first priority. +Because fewer constraints and links exist between individual documents and collections within document databases, coordination is often easier and operations tend to be more self-contained. This allows document database providers to prioritize performance and availability, where relational databases are often forced to make concessions in the name of consistency. This is a trade-off that can have many implications for the safety of your data and how well your systems can handle outages and network partitions. The major difference ends up being that document databases tend to have much more flexibility in tuning the level of consistency versus performance and availability while relational databases often require consistency to always be the first priority. ## Conclusion -Relational and document databases have, in some ways, very different approaches to organizing data and making it manageable and accessible. Their basic designs have far-reaching implications for the types of applications they are suitable for and what challenges you are likely to see when working with them. +Relational and document databases have, in some ways, very different approaches to organizing data and making it manageable and accessible. Their basic designs have far-reaching implications for the types of applications they are suitable for and what challenges you are likely to see when working with them. -While relational databases and document databases can both be used in some scenarios, the priorities and development practices of your project often align better with one system or another. It is worthwhile to carefully evaluate both options when possible to understand the trade-offs you may be making and which system offers the features you really hold important for your work. +While relational databases and document databases can both be used in some scenarios, the priorities and development practices of your project often align better with one system or another. It is worthwhile to carefully evaluate both options when possible to understand the trade-offs you may be making and which system offers the features you really hold important for your work. -If you are using [Prisma to manage your MongoDB database](https://www.prisma.io/docs/concepts/database-connectors/mongodb), you need to set a connection URI within a 'datasource' block in your [Prisma schema file](https://www.prisma.io/docs/concepts/components/prisma-schema). You must provide a [connection URI for the 'url' field](https://www.prisma.io/docs/concepts/database-connectors/mongodb#example) so that Prisma can connect to your database. +If you are using [Prisma to manage your MongoDB database](https://www.prisma.io/docs/orm/overview/databases/mongodb), you need to set a connection URI within a 'datasource' block in your [Prisma schema file](https://www.prisma.io/docs/orm/prisma-schema/overview). You must provide a [connection URI for the 'url' field](https://www.prisma.io/docs/orm/overview/databases/mongodb#example) so that Prisma can connect to your database. diff --git a/content/03-types/02-relational/01-what-is-an-orm.mdx b/content/03-types/02-relational/01-what-is-an-orm.mdx index 3fe9ba42..0418d912 100644 --- a/content/03-types/02-relational/01-what-is-an-orm.mdx +++ b/content/03-types/02-relational/01-what-is-an-orm.mdx @@ -1,46 +1,46 @@ --- title: 'What is an ORM?' -metaTitle: "What is an ORM (Object Relational Mapper)?" +metaTitle: 'What is an ORM (Object Relational Mapper)?' metaDescription: "What are ORMs and how are they useful? In this guide, we'll explain what ORMs are and why they are helpful for many projects." authors: ['justinellingwood'] --- ## Introduction -What is an ORM and why are they so common when working with database? If you are new to programming applications backed by relational databases, you may have come across the term ORM during your research. In this guide, we'll briefly cover what an ORM is and how they can be helpful. +What is an ORM and why are they so common when working with database? If you are new to programming applications backed by relational databases, you may have come across the term ORM during your research. In this guide, we'll briefly cover what an ORM is and how they can be helpful. -[Prisma](https://www.prisma.io/docs/concepts/overview/what-is-prisma) is an ORM focused on making it easy for Node.js and TypeScript applications to work with databases. You can learn more about what Prisma offers in our [Why Prisma? page](https://www.prisma.io/docs/concepts/overview/why-prisma). +[Prisma](https://www.prisma.io/docs/orm/overview/introduction/what-is-prisma) is an ORM focused on making it easy for Node.js and TypeScript applications to work with databases. You can learn more about what Prisma offers in our [Why Prisma? page](https://www.prisma.io/docs/orm/overview/introduction/why-prisma). ## What is an ORM? -An [ORM](/intro/database-glossary#orm), or Object Relational Mapper, is a piece of software designed to translate between the data representations used by databases and those used in object-oriented programming. Basically, these two ways of working with data don't naturally fit together, so an ORM attempts to bridge the gap between the two systems' data designs. +An [ORM](/intro/database-glossary#orm), or Object Relational Mapper, is a piece of software designed to translate between the data representations used by databases and those used in object-oriented programming. Basically, these two ways of working with data don't naturally fit together, so an ORM attempts to bridge the gap between the two systems' data designs. -From a developer's perspective, an ORM allows you to work with database-backed data using the same object-oriented structures and mechanisms you'd use for any type of internal data. The promise of ORMs is that you won't need to rely on special techniques or necessarily learn a new querying language like [SQL](/intro/database-glossary#sql) to be productive with your data. +From a developer's perspective, an ORM allows you to work with database-backed data using the same object-oriented structures and mechanisms you'd use for any type of internal data. The promise of ORMs is that you won't need to rely on special techniques or necessarily learn a new querying language like [SQL](/intro/database-glossary#sql) to be productive with your data. -In general, ORMs serve as an abstraction layer between the application and the database. They attempt to increase developer productivity by removing the need for boilerplate code and avoiding the use of awkward techniques that might break the idioms and ergonomics that you expect from your language of choice. +In general, ORMs serve as an abstraction layer between the application and the database. They attempt to increase developer productivity by removing the need for boilerplate code and avoiding the use of awkward techniques that might break the idioms and ergonomics that you expect from your language of choice. ## Do I need an ORM? -While ORMs can be helpful, it's important to view them as a tool. They won't be useful in every scenario and there may be trade-offs you need to account for. +While ORMs can be helpful, it's important to view them as a tool. They won't be useful in every scenario and there may be trade-offs you need to account for. -In general, an ORM might be a good fit if you are using many object-oriented features of your language to manage a lot of state. The implications of managing state encapsulated in objects that have complex inheritance relationships, for instance, may be difficult to account for manually. They can also help get your project off the ground easier and can manage changes in your data structure through functionality like schema migration. +In general, an ORM might be a good fit if you are using many object-oriented features of your language to manage a lot of state. The implications of managing state encapsulated in objects that have complex inheritance relationships, for instance, may be difficult to account for manually. They can also help get your project off the ground easier and can manage changes in your data structure through functionality like schema migration. -While ORMs are often useful, they're not perfect. Sometimes the level of abstraction introduced by an ORM can make debugging difficult. There are also times when the representation the ORM uses to translate between the database and your application might not be completely accurate or might leak details of your internal implementation. These may be problems for certain use cases. +While ORMs are often useful, they're not perfect. Sometimes the level of abstraction introduced by an ORM can make debugging difficult. There are also times when the representation the ORM uses to translate between the database and your application might not be completely accurate or might leak details of your internal implementation. These may be problems for certain use cases. -It's important to understand what your project's requirements are and how you want to spend your resources when building your software. ORMs are a tool that can help you build database-backed applications more easily, but you'll have to decide for yourself if add value for your project. +It's important to understand what your project's requirements are and how you want to spend your resources when building your software. ORMs are a tool that can help you build database-backed applications more easily, but you'll have to decide for yourself if add value for your project. ## Conclusion -In this guide, we took a brief look at what ORMs are and how they can be useful. In general, it's a good idea to ask yourself early on whether an ORM would help your project. Evaluating the trade-offs can be a good exercise to help you understand how you wish to focus your efforts. +In this guide, we took a brief look at what ORMs are and how they can be useful. In general, it's a good idea to ask yourself early on whether an ORM would help your project. Evaluating the trade-offs can be a good exercise to help you understand how you wish to focus your efforts. To take a look at how ORMs compare to other ways of interacting with databases, take a look at our [comparison of SQL, query builders, and ORMs](/types/relational/comparing-sql-query-builders-and-orms). -[Prisma](https://www.prisma.io/docs/concepts/overview/what-is-prisma) is an ORM focused on making it easy for Node.js and TypeScript applications to work with databases. You can learn more about what Prisma offers in our [Why Prisma? page](https://www.prisma.io/docs/concepts/overview/why-prisma). +[Prisma](https://www.prisma.io/docs/orm/overview/introduction/what-is-prisma) is an ORM focused on making it easy for Node.js and TypeScript applications to work with databases. You can learn more about what Prisma offers in our [Why Prisma? page](https://www.prisma.io/docs/orm/overview/introduction/why-prisma). diff --git a/content/03-types/02-relational/02-comparing-sql-query-builders-and-orms.mdx b/content/03-types/02-relational/02-comparing-sql-query-builders-and-orms.mdx index 6a677cf9..29377b38 100644 --- a/content/03-types/02-relational/02-comparing-sql-query-builders-and-orms.mdx +++ b/content/03-types/02-relational/02-comparing-sql-query-builders-and-orms.mdx @@ -23,7 +23,7 @@ As a simplified summary, here is a general view of each approach's strengths and -Working with relational databases? Checkout [if Prisma fits your use case](https://www.prisma.io/docs/concepts/overview/should-you-use-prisma) and why you should be using [Prisma](https://www.prisma.io/docs/concepts/overview/why-prisma). +Working with relational databases? Checkout [if Prisma fits your use case](https://www.prisma.io/docs/orm/overview/introduction/should-you-use-prisma) and why you should be using [Prisma](https://www.prisma.io/docs/orm/overview/introduction/why-prisma). @@ -63,7 +63,7 @@ While we've primarily talked about SQL in this section, most of the information -If you're using Prisma Client, you can use [raw database access](https://www.prisma.io/docs/concepts/components/prisma-client/raw-database-access) to send SQL directly to your database. +If you're using Prisma Client, you can use [raw database access](https://www.prisma.io/docs/orm/prisma-client/queries/raw-database-access) to send SQL directly to your database. @@ -103,7 +103,7 @@ Overall, SQL query builders offer a thin layer of abstraction that specifically ## Managing data with ORMs -A step further up the abstraction hierarchy are ORMs. ORMs generally aim for a more complete abstraction with the hope of integrating with the application data more fluidly. +A step further up the abstraction hierarchy are ORMs. ORMs generally aim for a more complete abstraction with the hope of integrating with the application data more fluidly. ### What are ORMs? @@ -166,7 +166,7 @@ ORMs can be useful abstractions that make working with databases a lot easier. T When working with technologies that interface between databases and applications, you might encounter some terminology that you're not familiar with. In this section, we'll briefly go over some of the most common terms you might come across, some of which were covered earlier in this article and some of which were not. - **Data mapper:** A [data mapper](/intro/database-glossary#data-mapper-orm) is a design pattern or piece of software that maps programming data structures to those stored in a database. Data mappers attempt to synchronize changes between the two sources while keeping them independent of each other. The mapper itself is responsible for maintaining a working translation, freeing developers to iterate the application data structures without concern for the database representation. -- **Database driver:** A [database driver](https://en.wikipedia.org/wiki/Open_Database_Connectivity#Drivers_and_Managers) is a piece of software designed to encapsulate and enable connections between an application and a database. Database drivers abstract the low level details of how to make and manage connections and provide a unified, programmatic interface to the database system. Typically, database drivers are the lowest level of abstraction that developers use to interact with databases, with higher level tools building on the capabilities provided by the driver. +- **Database driver:** A [database driver](https://en.wikipedia.org/wiki/Open_Database_Connectivity#Drivers_and_Managers) is a piece of software designed to encapsulate and enable connections between an application and a database. Database drivers abstract the low level details of how to make and manage connections and provide a unified, programmatic interface to the database system. Typically, database drivers are the lowest level of abstraction that developers use to interact with databases, with higher level tools building on the capabilities provided by the driver. - **Injection attack:** An [injection attack](/intro/database-glossary#sql-injection) is an attack in which a malicious user attempts to execute unwanted database operations using specially crafted input in user-facing application fields. Often, this is used to retrieve data that should not be accessible or to delete or mangle information in the database. - **ORM:** [ORMs](/intro/database-glossary#orm), or object-relational mappers, are abstraction layers that translate between the data representations used in relational databases and the representation in memory used with object-oriented programming. The ORM provides an object-oriented interface to data within the database, attempting to reduce the amount of code and use familiar archetypes to speed up development. - **Object-relational impedance mismatch:** [Object-relational impedance mismatch](/intro/database-glossary#object-relational-impedence-mismatch) refers to the difficulty of translating between an object-oriented application and a relational database. Since the data structures vary significantly, it can be difficult to faithfully and performantly mutate and transcribe the programmatic data structures to the format used by the storage backend. @@ -184,7 +184,7 @@ Each of these approaches have their uses and some may be well-suited for certain
What is a visual SQL query builder? -A visual SQL query builder is a graphical user interface for creating SQL queries. +A visual SQL query builder is a graphical user interface for creating SQL queries. Typically, query builders are relatively light-weight, focus on easing data access and data representation, and do not attempt to translate the data into a specific programming paradigm. @@ -192,7 +192,7 @@ Typically, query builders are relatively light-weight, focus on easing data acce
What is an online SQL query builder? -An online SQL query builder is a cloud-based tool that helps quickly build SQL queries without the need to know how to write SQL. +An online SQL query builder is a cloud-based tool that helps quickly build SQL queries without the need to know how to write SQL. Typically, query builders are relatively light-weight, focus on easing data access and data representation, and do not attempt to translate the data into a specific programming paradigm. @@ -208,7 +208,7 @@ Active record implementations allow you to manage your database by creating and
What is the data mapper pattern? -The [data mapper pattern](https://en.wikipedia.org/wiki/Data_mapper_pattern) attempts to act as an independent layer between your code and your database that mediates between the two. +The [data mapper pattern](https://en.wikipedia.org/wiki/Data_mapper_pattern) attempts to act as an independent layer between your code and your database that mediates between the two. It focuses on trying to decouple and translate between them while letting each exist independently. This can help separate your business logic from database-related details that deal with mappings, representation, serialization, etc. diff --git a/content/03-types/02-relational/03-what-are-database-migrations.mdx b/content/03-types/02-relational/03-what-are-database-migrations.mdx index 303531be..513770d1 100644 --- a/content/03-types/02-relational/03-what-are-database-migrations.mdx +++ b/content/03-types/02-relational/03-what-are-database-migrations.mdx @@ -117,6 +117,6 @@ Generally speaking, while schema migration tools are optional, the organization, -To perform migrations with [Prisma](https://github.com/prisma/prisma), you can use the [Prisma Migrate](https://www.prisma.io/docs/concepts/components/prisma-migrate). Prisma Migrate generates migration files based on the declarative [Prisma schema](https://www.prisma.io/docs/concepts/components/prisma-schema) and applies them to your database. +To perform migrations with [Prisma](https://github.com/prisma/prisma), you can use the [Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate). Prisma Migrate generates migration files based on the declarative [Prisma schema](https://www.prisma.io/docs/orm/prisma-schema/overview) and applies them to your database. diff --git a/content/03-types/02-relational/04-migration-strategies.mdx b/content/03-types/02-relational/04-migration-strategies.mdx index e5ce8000..66408cb4 100644 --- a/content/03-types/02-relational/04-migration-strategies.mdx +++ b/content/03-types/02-relational/04-migration-strategies.mdx @@ -22,7 +22,7 @@ In other cases, like splitting or combining [columns](/intro/database-glossary#c -To perform migrations with [Prisma](https://github.com/prisma/prisma), you can use the [Prisma Migrate](https://www.prisma.io/docs/concepts/components/prisma-migrate). [Developing with Prisma Migrate](https://www.prisma.io/docs/guides/migrate/developing-with-prisma-migrate) generates migration files based on the declarative [Prisma schema](https://www.prisma.io/docs/concepts/components/prisma-schema) and applies them to your database. +To perform migrations with [Prisma](https://github.com/prisma/prisma), you can use the [Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate). [Developing with Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate/workflows) generates migration files based on the declarative [Prisma schema](https://www.prisma.io/docs/orm/prisma-schema/overview) and applies them to your database. @@ -197,6 +197,6 @@ While there are many deployment and migration strategies you can use to implemen -To perform migrations with [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), use the [Prisma Migrate tool](https://www.prisma.io/docs/concepts/components/prisma-migrate). Prisma Migrate analyzes your schema files, generates migration files, and applies them to target databases. +To perform migrations with [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), use the [Prisma Migrate tool](https://www.prisma.io/docs/orm/prisma-migrate). Prisma Migrate analyzes your schema files, generates migration files, and applies them to target databases. diff --git a/content/03-types/02-relational/05-expand-and-contract-pattern.mdx b/content/03-types/02-relational/05-expand-and-contract-pattern.mdx index cbb9abea..35d7b3e1 100644 --- a/content/03-types/02-relational/05-expand-and-contract-pattern.mdx +++ b/content/03-types/02-relational/05-expand-and-contract-pattern.mdx @@ -63,7 +63,7 @@ Other times, however, it might be less clear. For example, if you have a `name` -To perform migrations with [Prisma](https://github.com/prisma/prisma), you can use the [Prisma Migrate](https://www.prisma.io/docs/concepts/components/prisma-migrate). [Developing with Prisma Migrate](https://www.prisma.io/docs/guides/migrate/developing-with-prisma-migrate) generates migration files based on the declarative [Prisma schema](https://www.prisma.io/docs/concepts/components/prisma-schema) and applies them to your database. +To perform migrations with [Prisma](https://github.com/prisma/prisma), you can use the [Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate). [Developing with Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate/workflows) generates migration files based on the declarative [Prisma schema](https://www.prisma.io/docs/orm/prisma-schema/overview) and applies them to your database. @@ -260,6 +260,6 @@ The ideas presented in this guide are not unique, but they are powerful ways to For some guidance on how the expand and contract pattern can be used with -[Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), check out the matching section in [Prisma's documentation on advanced migration scenarios](https://www.prisma.io/docs/guides/migrate/developing-with-prisma-migrate/customizing-migrations). +[Prisma Client](https://www.prisma.io/docs/orm/prisma-client), check out the matching section in [Prisma's documentation on advanced migration scenarios](https://www.prisma.io/docs/orm/prisma-migrate/workflows/customizing-migrations). diff --git a/content/03-types/03-document/01-what-are-document-dbs.mdx b/content/03-types/03-document/01-what-are-document-dbs.mdx index 38c57979..a5349513 100644 --- a/content/03-types/03-document/01-what-are-document-dbs.mdx +++ b/content/03-types/03-document/01-what-are-document-dbs.mdx @@ -7,27 +7,30 @@ authors: ['alexemerich'] --- ## Introduction + The storage and organization of data are critical to the success of applications. Methods have evolved since the days of punch cards and player pianos. [Relational databases](/intro/database-glossary#relational-database) keeping data stored in a series of [rows](/intro/database-glossary#row) and [columns](/intro/database-glossary#column) among joined [tables](/intro/database-glossary#table) has been the overwhelming preferred choice for the last decades. These databases rely on [structured query language (SQL)](/intro/database-glossary#sql) to access the information and communicate its results to the requester. -As application design has continued developing, new databases have become increasingly popular for their different strengths. In this guide, we will cover one of the popular [NoSQL](/intro/database-glossary#nosql) database types, document-oriented databases. We will discuss what they are and where they came from, how [documents](/intro/database-glossary#document) work, their features, and their advantages and disadvantages. +As application design has continued developing, new databases have become increasingly popular for their different strengths. In this guide, we will cover one of the popular [NoSQL](/intro/database-glossary#nosql) database types, document-oriented databases. We will discuss what they are and where they came from, how [documents](/intro/database-glossary#document) work, their features, and their advantages and disadvantages. ## What are document databases? + [Document databases](/intro/database-glossary#document-database) are a category of NoSQL database that stores data as JSON and other data serialization format documents instead of columns and rows like in a SQL relational database. They are a subclass of the key-value store NoSQL database concept. Document databases deliver a better developer experience because of their closeness to modern programming techniques. It is easy to read JSON, and it is translatable to the languages developers are most often writing today. Document databases offer starkly different structure and experience from traditional relational databases. Relational databases store data in separate programmer-defined tables that may see a single object spread across several tables. This separation requires [join statements](/intro/database-glossary#join) to get to a desired return from the database. The document model stores all of an object’s information in a single instance in the database, and every object in the database can be starkly different than the next. This capability, in theory, removes the need for an [object-relational mapper (ORM)](/intro/database-glossary#orm) depending on the use case. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). -## Documents +## Documents + As discussed, documents are at the crux of any document database. Depending on the document database, documents encapsulate and encode data in JSON, XML, YAML, or a binary form like BSON. -One of the attractive elements for developers to use the document model is its similarities to objects in programming languages. There is a familiarity with the structure or lack thereof when using documents. +One of the attractive elements for developers to use the document model is its similarities to objects in programming languages. There is a familiarity with the structure or lack thereof when using documents. The basic format of a document looks as follows: @@ -49,57 +52,62 @@ Expanding on the basic syntax, a single document within a collection of authors "Books": { 'Grey Bees', 'Death and the Penguin' }, "Author": "Andrey Kurkov" } -``` +``` What is critical to notice is the ability to store multiple books within the `Books` field. In a relational database, this would not be possible. There would need to be an `Author` table and a `Book` table joined by a [key](/intro/database-glossary#foreign-key). This foreign key in the `Book` table would most likely be something like `author.id` where every record is assigned to an author. We can visualize the difference in the following tables: -| author.id | name | -|----------- |---| -|001 | Andrey Kurkov | +| author.id | name | +| --------- | ------------- | +| 001 | Andrey Kurkov | -| book.title | author.id | -|----------- |---| -|Grey Bees | 001 | -|Death and the Penguin | 001 | +| book.title | author.id | +| --------------------- | --------- | +| Grey Bees | 001 | +| Death and the Penguin | 001 | With an idea of the structure and capabilities of documents, we can take a step further and explore the advantages and disadvantages that the document model presents. ## Advantages of the document model + There are clear strengths and weaknesses with a document database, and it depends from application to application whether the document model is the right fit. The document model's flexibility, ease of scale, and quick-start agility are advantages of the document model with considerable trade-offs. ### Flexibility -Document databases offer flexibility unmatched by relational databases. Document databases define the structure of each document separately. The form is a characteristic that the document itself defines rather than an external structure the record must conform to. This is contrary to the rigidity of a relational database. + +Document databases offer flexibility unmatched by relational databases. Document databases define the structure of each document separately. The form is a characteristic that the document itself defines rather than an external structure the record must conform to. This is contrary to the rigidity of a relational database. The document model does not make structural changes as expensive as relational databases. A change does not require altering all existing records to match the new structure. You can change the data you want to record for individual records on the fly, hold off on, or skip other documents that do not have the same structure without any requirement. -Your database structure can evolve quickly alongside your application logic as you develop. This makes changes less burdensome as there is less of a required synchronization and [migration](/intro/database-glossary#migration) process associated with each structural change. The database system will allow any new document structure you want to apply to exist alongside all previous structures. +Your database structure can evolve quickly alongside your application logic as you develop. This makes changes less burdensome as there is less of a required synchronization and [migration](/intro/database-glossary#migration) process associated with each structural change. The database system will allow any new document structure you want to apply to exist alongside all previous structures. The flexibility provided by the document model encourages the iteration and evolution of your storage logic. However, it is essential to keep in mind that the software itself is unlikely to be able to provide you with as many guarantees about your data as you make changes. Suppose there is no agreed-upon standard for the data collections' shape. In that case, it is up to you as a developer to enforce consistency and modify documents where appropriate to keep your data in a well-understood state. ### Scalability + Document models generally allow you to avoid [vertical scaling](/intro/database-glossary#vertical-scaling) and adopt a more cost-efficient [horizontal scaling](/intro/database-glossary#horizontal-scaling) approach when your application grows. Despite growth in this area, there is an inherent difficulty in the relational model around scalability. Document databases can avoid many of these shortcomings experienced by relational databases due to how their systems structure data. The coordination between different hosts can be minimized by collocating related data together in a single document. [Sharding](/intro/database-glossary#shard) datasets is a much more common strategy in document databases. This is because document-based operations typically don't require much coordination since many actions target individual records. -Because fewer [constraints](/intro/database-glossary#constraint) and links exist between individual documents and collections within document databases, coordination is often more accessible, and operations tend to be more self-contained. This allows document database providers to prioritize performance and [availability](/intro/database-glossary#availability), where relational databases force concessions in the name of [consistency](/intro/database-glossary#consistency). +Because fewer [constraints](/intro/database-glossary#constraint) and links exist between individual documents and collections within document databases, coordination is often more accessible, and operations tend to be more self-contained. This allows document database providers to prioritize performance and [availability](/intro/database-glossary#availability), where relational databases force concessions in the name of [consistency](/intro/database-glossary#consistency). This results in a trade-off implicating the safety of your data and how well your systems can handle outages and network partitions. The significant difference is that document databases tend to have much more flexibility in tuning the level of consistency versus performance and availability. In contrast, relational databases often require consistency always to be the priority. -### Agility -The document model’s [schema-less](/intro/database-glossary#schema) capabilities can have databases up and running very quickly. Minimal maintenance is required once you create a document, and you can start inserting objects as documents immediately. +### Agility + +The document model’s [schema-less](/intro/database-glossary#schema) capabilities can have databases up and running very quickly. Minimal maintenance is required once you create a document, and you can start inserting objects as documents immediately. The agility provided by document databases makes it, so you do not have to know the exact structure of your data at the time of implementation. Data models are subject to change, and formulating a clear plan can be challenging when development begins. The combination of agility and flexibility allows developers to spin up a database instance and populate it with collections of documents right away and evolve the model alongside the application's evolution. -However, with this lack of schema comes the trade-off. The consistency of your data needs to be managed continuously rather than from the plan of a pre-defined schema. There is an advantage to having a good picture of your data's look and access patterns ahead of time. Relational databases force this consideration. +However, with this lack of schema comes the trade-off. The consistency of your data needs to be managed continuously rather than from the plan of a pre-defined schema. There is an advantage to having a good picture of your data's look and access patterns ahead of time. Relational databases force this consideration. ## Conclusion + This article covered document databases and why they are one of the most popular NoSQL offerings. We covered the structure and capabilities of documents and the document model advantages, and their associated trade-offs. -Document databases offer a different approach than relational databases to organizing and accessing data. This evolution from only the traditional models is exciting for the choice it now gives to developers. Based on your application, you can decide which features and advantages best align with your philosophies and goals. +Document databases offer a different approach than relational databases to organizing and accessing data. This evolution from only the traditional models is exciting for the choice it now gives to developers. Based on your application, you can decide which features and advantages best align with your philosophies and goals. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). diff --git a/content/04-postgresql/01-benefits-of-postgresql.mdx b/content/04-postgresql/01-benefits-of-postgresql.mdx index 29affdf4..1eff9252 100644 --- a/content/04-postgresql/01-benefits-of-postgresql.mdx +++ b/content/04-postgresql/01-benefits-of-postgresql.mdx @@ -119,7 +119,7 @@ PostgreSQL is notable for offering excellent implementation of core relational f -You can use [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). +You can use [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). diff --git a/content/04-postgresql/02-getting-to-know-postgresql.mdx b/content/04-postgresql/02-getting-to-know-postgresql.mdx index a5a24862..93305b5a 100644 --- a/content/04-postgresql/02-getting-to-know-postgresql.mdx +++ b/content/04-postgresql/02-getting-to-know-postgresql.mdx @@ -47,7 +47,7 @@ Concepts: -[Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) is another powerful way to work with your PostgreSQL databases using the [PostgreSQL connector](https://www.prisma.io/docs/concepts/database-connectors/postgresql). You can try it out be following along with our [PostgreSQL getting started guide](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql) +[Prisma Client](https://www.prisma.io/docs/orm/prisma-client) is another powerful way to work with your PostgreSQL databases using the [PostgreSQL connector](https://www.prisma.io/docs/orm/overview/databases/postgresql). You can try it out be following along with our [PostgreSQL getting started guide](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql) @@ -90,7 +90,7 @@ Concepts: -When working with Prisma Client, [data models](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model) in the Prisma schema are equivalent to tables in PostgreSQL. +When working with Prisma Client, [data models](https://www.prisma.io/docs/orm/prisma-schema/data-model) in the Prisma schema are equivalent to tables in PostgreSQL. diff --git a/content/04-postgresql/03-5-ways-to-host-postgresql.mdx b/content/04-postgresql/03-5-ways-to-host-postgresql.mdx index b09dcbee..d61103d9 100644 --- a/content/04-postgresql/03-5-ways-to-host-postgresql.mdx +++ b/content/04-postgresql/03-5-ways-to-host-postgresql.mdx @@ -264,6 +264,6 @@ The right choice for hosting your database depends significantly on your applica -Once you have a PostgreSQL database, you can use [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage it from within your JavaScript or TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). +Once you have a PostgreSQL database, you can use [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage it from within your JavaScript or TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). diff --git a/content/04-postgresql/04-setting-up-a-local-postgresql-database.mdx b/content/04-postgresql/04-setting-up-a-local-postgresql-database.mdx index 0aa36307..7a40058f 100644 --- a/content/04-postgresql/04-setting-up-a-local-postgresql-database.mdx +++ b/content/04-postgresql/04-setting-up-a-local-postgresql-database.mdx @@ -23,7 +23,7 @@ Navigate to the sections that match the platforms you will be working with. -Once you have a PostgreSQL database, you can use [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage it from within your JavaScript or TypeScript applications. Try our [PostgreSQL getting started guide](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql) to get started. +Once you have a PostgreSQL database, you can use [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage it from within your JavaScript or TypeScript applications. Try our [PostgreSQL getting started guide](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql) to get started. diff --git a/content/04-postgresql/05-setting-up-postgresql-on-rds.mdx b/content/04-postgresql/05-setting-up-postgresql-on-rds.mdx index da6a6b3b..c11ae1dd 100644 --- a/content/04-postgresql/05-setting-up-postgresql-on-rds.mdx +++ b/content/04-postgresql/05-setting-up-postgresql-on-rds.mdx @@ -377,6 +377,6 @@ As your needs change, keep in mind that you can scale your database instance to -Once you've configured your PostgreSQL instance on Amazon RDS, you can connect to it and manage it using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client). Configure access to your database instance by setting the `url` parameter in the [PostgreSQL database connector](https://www.prisma.io/docs/concepts/database-connectors/postgresql) settings to point to your RDS instance. +Once you've configured your PostgreSQL instance on Amazon RDS, you can connect to it and manage it using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client). Configure access to your database instance by setting the `url` parameter in the [PostgreSQL database connector](https://www.prisma.io/docs/orm/overview/databases/postgresql) settings to point to your RDS instance. diff --git a/content/04-postgresql/06-connecting-to-postgresql-databases.mdx b/content/04-postgresql/06-connecting-to-postgresql-databases.mdx index 236fe3a4..3fb33e10 100644 --- a/content/04-postgresql/06-connecting-to-postgresql-databases.mdx +++ b/content/04-postgresql/06-connecting-to-postgresql-databases.mdx @@ -132,6 +132,6 @@ Knowing how to connect to various PostgreSQL instances is vital as you start to -The [PostgreSQL database connector](https://www.prisma.io/docs/concepts/database-connectors/postgresql) can help you manage PostgreSQL databases from JavaScript and TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). +The [PostgreSQL database connector](https://www.prisma.io/docs/orm/overview/databases/postgresql) can help you manage PostgreSQL databases from JavaScript and TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). diff --git a/content/04-postgresql/08-create-and-delete-databases-and-tables.mdx b/content/04-postgresql/08-create-and-delete-databases-and-tables.mdx index 39260c96..b424770e 100644 --- a/content/04-postgresql/08-create-and-delete-databases-and-tables.mdx +++ b/content/04-postgresql/08-create-and-delete-databases-and-tables.mdx @@ -573,7 +573,7 @@ As mentioned earlier, the SQL statements covered in this PostgreSQL tutorial, pa -When using Prisma to develop with PostgreSQL, you will usually create databases and tables with [Prisma Migrate](https://www.prisma.io/docs/concepts/components/prisma-migrate). You can learn how use it in our guide on [developing with Prisma Migrate](https://www.prisma.io/docs/guides/migrate/developing-with-prisma-migrate). +When using Prisma to develop with PostgreSQL, you will usually create databases and tables with [Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate). You can learn how use it in our guide on [developing with Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate/workflows). diff --git a/content/04-postgresql/09-introduction-to-data-types.mdx b/content/04-postgresql/09-introduction-to-data-types.mdx index 4b624e64..f99bd2fb 100644 --- a/content/04-postgresql/09-introduction-to-data-types.mdx +++ b/content/04-postgresql/09-introduction-to-data-types.mdx @@ -1,6 +1,6 @@ --- title: 'An introduction to PostgreSQL data types' -metaTitle: "PostgreSQL Data Types - Numeric, Text, and More" +metaTitle: 'PostgreSQL Data Types - Numeric, Text, and More' metaDescription: "PostgreSQL's data type system allows you to define your data structures and store data in various formats. These are some of the most common data types." metaImage: '/social/generic-postgresql.png' authors: ['justinellingwood'] @@ -8,70 +8,70 @@ authors: ['justinellingwood'] ## Introduction -One of the primary features of relational databases in general is the ability to define [schemas](/intro/database-glossary#schema) or [table structures](/intro/database-glossary#table) that exactly specify the format of the data they will contain. This is done by prescribing the columns that these structures contain along with their [*data type*](/intro/database-glossary#data-type) and any constraints. +One of the primary features of relational databases in general is the ability to define [schemas](/intro/database-glossary#schema) or [table structures](/intro/database-glossary#table) that exactly specify the format of the data they will contain. This is done by prescribing the columns that these structures contain along with their [_data type_](/intro/database-glossary#data-type) and any constraints. -Data types specify a general pattern for the data they accept and store. Values must adhere to the requirements that they outline in order to be accepted by PostgreSQL. While it is possible to define custom requirements, data types provide the basic building blocks that allow PostgreSQL to validate input and work with the data using appropriate operations. +Data types specify a general pattern for the data they accept and store. Values must adhere to the requirements that they outline in order to be accepted by PostgreSQL. While it is possible to define custom requirements, data types provide the basic building blocks that allow PostgreSQL to validate input and work with the data using appropriate operations. -PostgreSQL includes [a wide range of data types](https://www.postgresql.org/docs/current/datatype.html) that are used to label and validate that values conform to appropriate types. In this guide, we will discuss the most common data types available in PostgreSQL, the different input and output formats they use, and how to configure various fields to meet your applications' needs. +PostgreSQL includes [a wide range of data types](https://www.postgresql.org/docs/current/datatype.html) that are used to label and validate that values conform to appropriate types. In this guide, we will discuss the most common data types available in PostgreSQL, the different input and output formats they use, and how to configure various fields to meet your applications' needs. ### What are the data types in PostgreSQL? Before going into detail, let's take a broad view of what data types PostgreSQL provides. -PostgreSQL supports a wide range of data types suitable for various types of simple and complex data. These include: - -* `integer` -* `smallint` -* `bigint` -* `serial` -* `smallserial` -* `bigserial` -* `numeric` -* `float` -* `double precision` -* `money` -* `char` -* `varchar` -* `text` -* `boolean` -* `date` -* `time` -* `timestamp` -* `timestamptz` -* `interval` -* `enum` -* `uuid` -* `json` -* `jsonb` -* `xml` -* `inet` (network address) -* `cidr` (network address) -* `macaddr` -* `polygon` -* `line` -* `lseg` (line segment) -* `box` (rectangular box) -* `bytea` (hex format) -* `tsvector` (text search) -* `tsquery` (text search) +PostgreSQL supports a wide range of data types suitable for various types of simple and complex data. These include: + +- `integer` +- `smallint` +- `bigint` +- `serial` +- `smallserial` +- `bigserial` +- `numeric` +- `float` +- `double precision` +- `money` +- `char` +- `varchar` +- `text` +- `boolean` +- `date` +- `time` +- `timestamp` +- `timestamptz` +- `interval` +- `enum` +- `uuid` +- `json` +- `jsonb` +- `xml` +- `inet` (network address) +- `cidr` (network address) +- `macaddr` +- `polygon` +- `line` +- `lseg` (line segment) +- `box` (rectangular box) +- `bytea` (hex format) +- `tsvector` (text search) +- `tsquery` (text search) We'll cover the most common of these in more depth throughout this guide. ### Getting started with PostgreSQL data types -PostgreSQL comes with a large number of types built-in to the software itself. It also allows you to define your own complex types by combining types of different kinds and specifying their parameters. This allows administrators to precisely define the types of data they expect each column to accept when using `CREATE TABLE` among other commands. PostgreSQL can then automatically check proposed values to ensure they match the provided criteria. +PostgreSQL comes with a large number of types built-in to the software itself. It also allows you to define your own complex types by combining types of different kinds and specifying their parameters. This allows administrators to precisely define the types of data they expect each column to accept when using `CREATE TABLE` among other commands. PostgreSQL can then automatically check proposed values to ensure they match the provided criteria. -As you get started with types, it's important to remember that types alone are not always a complete solution to data validation, but a component. Other database tools, like [constraints](https://www.prisma.io/dataguide/postgresql/column-and-table-constraints) also have a role to play in defining correctness. Still, data types are often the first line of defense against invalid data. +As you get started with types, it's important to remember that types alone are not always a complete solution to data validation, but a component. Other database tools, like [constraints](https://www.prisma.io/dataguide/postgresql/column-and-table-constraints) also have a role to play in defining correctness. Still, data types are often the first line of defense against invalid data. -For many cases, the general types provided by PostgreSQL are appropriate for the kinds of data you'll be storing. However, sometimes, more specific types are available that can provide additional operators, associated functions, or built-in constraint-like validation. For example, while you could store the coordinates of a geometric point in two different number columns, the provided [`point` type](https://www.postgresql.org/docs/current/datatype-geometric.html#id-1.5.7.16.5) is purpose built to store and validate exactly this type of information. When choosing types, check to see that you are using the most specific type applicable to your use case. +For many cases, the general types provided by PostgreSQL are appropriate for the kinds of data you'll be storing. However, sometimes, more specific types are available that can provide additional operators, associated functions, or built-in constraint-like validation. For example, while you could store the coordinates of a geometric point in two different number columns, the provided [`point` type](https://www.postgresql.org/docs/current/datatype-geometric.html#id-1.5.7.16.5) is purpose built to store and validate exactly this type of information. When choosing types, check to see that you are using the most specific type applicable to your use case. ## Numbers and numeric values -PostgreSQL includes a good range of numeric data types suitable for different scenarios. These include *integers*, *floating points*, *arbitrary precision*, and a special integer type with additional features called *serial*. +PostgreSQL includes a good range of numeric data types suitable for different scenarios. These include _integers_, _floating points_, _arbitrary precision_, and a special integer type with additional features called _serial_. ### Integers -The *integer* data type is a category of types used to store numbers without any fractions or decimals. These can be either positive or negative values, and different integer types can store different ranges of numbers. Integer types with smaller ranges of acceptable values take up space than those with wider ranges. +The _integer_ data type is a category of types used to store numbers without any fractions or decimals. These can be either positive or negative values, and different integer types can store different ranges of numbers. Integer types with smaller ranges of acceptable values take up space than those with wider ranges. The basic list of integer types includes the following: @@ -81,25 +81,25 @@ The basic list of integer types includes the following: | `smallint` | 2 bytes | -32768 to 32767 | It is rare to use this except in places with tight storage constraints. | | `bigint` | 8 bytes | -9223372036854775808 to 9223372036854775807 | Typically this type is reserved for scenarios where the `integer` type would have an insufficient range. | -The types above are limited by their valid range. Any value outside of the range will result in an error. +The types above are limited by their valid range. Any value outside of the range will result in an error. -In addition to the standard integer types mentioned above, PostgreSQL includes a set of special integer types called *serial* types. These types are primarily used to create unique identifiers, or [*primary keys*](/intro/database-glossary#primary-key), for records. +In addition to the standard integer types mentioned above, PostgreSQL includes a set of special integer types called _serial_ types. These types are primarily used to create unique identifiers, or [_primary keys_](/intro/database-glossary#primary-key), for records. -By default, serial types will automatically use the next integer in an internally tracked sequence when new records are added. So if the last integer used in a serial column was 8559, by default, the next record will automatically use 8560. To guarantee that each value is unique, you can further add a `UNIQUE` constraint on the column. +By default, serial types will automatically use the next integer in an internally tracked sequence when new records are added. So if the last integer used in a serial column was 8559, by default, the next record will automatically use 8560. To guarantee that each value is unique, you can further add a `UNIQUE` constraint on the column. There is a serial type for each of the integer sizes mentioned earlier, which dictates the type of integer used in the sequence: -* **`serial`:** a serial type that automatically increments a column with the next `integer` value. -* **`smallserial`:** a serial type that automatically increments a column with the next `smallint` value. -* **`bigserial`:** a serial type that automatically increments a column with the next `bigint` value. +- **`serial`:** a serial type that automatically increments a column with the next `integer` value. +- **`smallserial`:** a serial type that automatically increments a column with the next `smallint` value. +- **`bigserial`:** a serial type that automatically increments a column with the next `bigint` value. ### Arbitrary precision -Arbitrary precision types are used to control the amount of *precision* or specificity possible for a number with decimals. In PostgreSQL, this can be controlled by manipulating two factors: precision and scale. +Arbitrary precision types are used to control the amount of _precision_ or specificity possible for a number with decimals. In PostgreSQL, this can be controlled by manipulating two factors: precision and scale. -*Precision* is the maximum amount of total digits that a number can have. In contrast, *scale* is the number of digits to the right of the decimal point. By manipulating these numbers, you can control how large the fractional and non-fractional components of a number are allowed to be. +_Precision_ is the maximum amount of total digits that a number can have. In contrast, _scale_ is the number of digits to the right of the decimal point. By manipulating these numbers, you can control how large the fractional and non-fractional components of a number are allowed to be. -These two arguments are used to control arbitrary precision using the *`numeric`* data type. The `numeric` type takes zero to two arguments. +These two arguments are used to control arbitrary precision using the _`numeric`_ data type. The `numeric` type takes zero to two arguments. With no arguments, the column can store values of any precision and scale: @@ -107,13 +107,13 @@ With no arguments, the column can store values of any precision and scale: NUMERIC ``` -When a single argument is provided, it is interpreted as the precision of the column with scale set to 0. Though not stored this way on disk, this effectively allows you to specify the maximum number of digits in an integer-like number (no fractional or decimal components). For example, if you need a 5 digit whole number, you can specify: +When a single argument is provided, it is interpreted as the precision of the column with scale set to 0. Though not stored this way on disk, this effectively allows you to specify the maximum number of digits in an integer-like number (no fractional or decimal components). For example, if you need a 5 digit whole number, you can specify: ``` NUMERIC(5) ``` -Specify precision followed by scale when configuring a column using both controls. PostgreSQL will round the decimal component of any input to the correct number of digits using the scale number. Afterwards, it will check whether the complete rounded number (both the whole and decimal components) exceeds the given precision number. If it does, PostgreSQL will produce an error. +Specify precision followed by scale when configuring a column using both controls. PostgreSQL will round the decimal component of any input to the correct number of digits using the scale number. Afterwards, it will check whether the complete rounded number (both the whole and decimal components) exceeds the given precision number. If it does, PostgreSQL will produce an error. For example, we can specify a column with a total precision of 5 and a scale of 2: @@ -123,17 +123,17 @@ NUMERIC(5, 2) This column would have the following behavior: -Input value | Rounded value | Accepted (fits precision)? ------------ | ------------- | --------- -400.28080 | 400.28 | Yes -8.332799 | 8.33 | Yes -11799.799 | 11799.80 | No -11799 | 11799 | Yes -2802.27 | 2802.27 | No +| Input value | Rounded value | Accepted (fits precision)? | +| ----------- | ------------- | -------------------------- | +| 400.28080 | 400.28 | Yes | +| 8.332799 | 8.33 | Yes | +| 11799.799 | 11799.80 | No | +| 11799 | 11799 | Yes | +| 2802.27 | 2802.27 | No | ### Floating point -Floating point numbers are another way to express decimal numbers, but without exact, consistent precision. Instead, floating point types only have a concept of a maximum precision which is often related to the architecture and platform of the hardware. +Floating point numbers are another way to express decimal numbers, but without exact, consistent precision. Instead, floating point types only have a concept of a maximum precision which is often related to the architecture and platform of the hardware. For example, to limit a floating point column to 8 digits of precision, you can type: @@ -141,17 +141,17 @@ For example, to limit a floating point column to 8 digits of precision, you can FLOAT(8) ``` -Because of these design choices, floating point numbers can work with numbers with large number of decimals efficiently, but not always exactly. The internal representation of numbers may cause slight differences between the input and output. This can cause unexpected behavior when comparing values, doing floating point math, or performing operations that require exact values. +Because of these design choices, floating point numbers can work with numbers with large number of decimals efficiently, but not always exactly. The internal representation of numbers may cause slight differences between the input and output. This can cause unexpected behavior when comparing values, doing floating point math, or performing operations that require exact values. ### Double precision (floating point) vs numeric -Both floating point numbers provided by types like `float` and `double precision` and arbitrary precision numbers provided by the `numeric` type can be used to store decimal values. How do you know which one to use? +Both floating point numbers provided by types like `float` and `double precision` and arbitrary precision numbers provided by the `numeric` type can be used to store decimal values. How do you know which one to use? -The general rule is that if you need exactness in your calculations, the `numeric` type is always the better choice. The `numeric` type will store values exactly as they are provided, meaning that the results are entirely predictable when retrieving or computing over values. The `numeric` type is called arbitrary precision because you specify the amount of precision the type requires and it will store that exact amount of digits in the field. +The general rule is that if you need exactness in your calculations, the `numeric` type is always the better choice. The `numeric` type will store values exactly as they are provided, meaning that the results are entirely predictable when retrieving or computing over values. The `numeric` type is called arbitrary precision because you specify the amount of precision the type requires and it will store that exact amount of digits in the field. -In contrast, types like `float` and `double precision` are variable precision types. The amount of precision they maintain depends on the input value. When they reach the end of their allowed level of precision, they may round the remaining digits, leading to differences between the submitted and retrieved values. +In contrast, types like `float` and `double precision` are variable precision types. The amount of precision they maintain depends on the input value. When they reach the end of their allowed level of precision, they may round the remaining digits, leading to differences between the submitted and retrieved values. -So when would you use variable precision types? Variable precision types like `float` and `double precision` are well suited for scenarios where exact values are not necessary (for example, if they'll be rounded anyways) and when speed is highly valuable. Variable precision will generally offer performance benefits over the `numeric` type. +So when would you use variable precision types? Variable precision types like `float` and `double precision` are well suited for scenarios where exact values are not necessary (for example, if they'll be rounded anyways) and when speed is highly valuable. Variable precision will generally offer performance benefits over the `numeric` type. ### Monetary values @@ -163,20 +163,19 @@ The `money` type does not take arguments, so the column definitions use only the MONEY ``` -The `money` type has a fixed fractional component that takes its precision from the `lc_monetary` PostgreSQL localization option. If that variable is undefined, the precision is taken from the `LC_MONETARY` environment variable in Linux or Unix-like environments or equivalent locale settings in other operating systems. In many instances, the precision will be set to use two decimal places to match common usage. +The `money` type has a fixed fractional component that takes its precision from the `lc_monetary` PostgreSQL localization option. If that variable is undefined, the precision is taken from the `LC_MONETARY` environment variable in Linux or Unix-like environments or equivalent locale settings in other operating systems. In many instances, the precision will be set to use two decimal places to match common usage. -Because of this precision, [it is recommended](https://www.postgresql.org/message-id/flat/7696-1364569697-520061%40sneakemail.com#df183031e88ecf9e3d77e58ec710c7b1) to only use the `money` type when fractions of cents are not possible or important. Similarly, since no currency is attached to the type, it is not well suited for situations where currency conversions are necessary. The `money` type has great performance for simple use cases, however, so in spite of these constraints, it can still be valuable. +Because of this precision, [it is recommended](https://www.postgresql.org/message-id/flat/7696-1364569697-520061%40sneakemail.com#df183031e88ecf9e3d77e58ec710c7b1) to only use the `money` type when fractions of cents are not possible or important. Similarly, since no currency is attached to the type, it is not well suited for situations where currency conversions are necessary. The `money` type has great performance for simple use cases, however, so in spite of these constraints, it can still be valuable. Because of the dependency on locale settings of the PostgreSQL installation or execution environment, `money` values, it is critical to ensure that these values match when transferring data between different systems. -Care must also be taken when casting values in and out of the `money` type since it can lose precision data when converting between certain types. It is safe for `money` values to cast to and from the `numeric` type (used for arbitrary precision, as shown above), so it is recommended to always use `numeric` as an intermediary before performing converting to other types. - +Care must also be taken when casting values in and out of the `money` type since it can lose precision data when converting between certain types. It is safe for `money` values to cast to and from the `numeric` type (used for arbitrary precision, as shown above), so it is recommended to always use `numeric` as an intermediary before performing converting to other types. ## Text and characters -PostgreSQL's character types and string types can be placed into two categories: *fixed length* and *variable length*. The choice between these two affects how PostgreSQL allocates space for each value and how it validates input. +PostgreSQL's character types and string types can be placed into two categories: _fixed length_ and _variable length_. The choice between these two affects how PostgreSQL allocates space for each value and how it validates input. -The simplest character-based data type within PostgreSQL is the *`char`* type. With no arguments, the `char` type accepts a single character as input: +The simplest character-based data type within PostgreSQL is the _`char`_ type. With no arguments, the `char` type accepts a single character as input: ``` CHAR @@ -190,14 +189,13 @@ CHAR(10) If a string is provided with fewer characters, blank spaces will be appended to pad the length: -Input | # of input characters | Stored value | # of stored characters ------ | --------------------- | ------------ | ---------------------- -'tree' | 4 | 'tree      ' | 10 - +| Input | # of input characters | Stored value | # of stored characters | +| ------ | --------------------- | ------------------------------------------ | ---------------------- | +| 'tree' | 4 | 'tree      ' | 10 | -If a string is given with greater than the allowed number of characters, PostgreSQL will raise an error. As an exception to this rule, if the overflowing characters are all spaces, PostgreSQL will simply truncate the excess spaces to fit the field. PostgreSQL doesn't recommend using `char` unless these characteristics are specifically desirable. +If a string is given with greater than the allowed number of characters, PostgreSQL will raise an error. As an exception to this rule, if the overflowing characters are all spaces, PostgreSQL will simply truncate the excess spaces to fit the field. PostgreSQL doesn't recommend using `char` unless these characteristics are specifically desirable. -The alternative to fixed length character fields are variable length fields. For this, PostgreSQL provides the *`varchar`* type. The `varchar` type stores characters with no fixed size. By default, with no integer given, `varchar` columns will accept strings of any length: +The alternative to fixed length character fields are variable length fields. For this, PostgreSQL provides the _`varchar`_ type. The `varchar` type stores characters with no fixed size. By default, with no integer given, `varchar` columns will accept strings of any length: ``` VARCHAR @@ -211,13 +209,13 @@ VARCHAR(10) This differs from using the `char` type with an integer in that `varchar` will not pad the value if the input does not meet the maximum field length: -Input | # of input characters | Stored value | # of stored characters ------ | --------------------- | ------------ | ---------------------- -'tree' | 4 | 'tree' | 4 +| Input | # of input characters | Stored value | # of stored characters | +| ------ | --------------------- | ------------ | ---------------------- | +| 'tree' | 4 | 'tree' | 4 | -If the string is greater than the maximum length, PostgreSQL will throw an error. The same truncation behavior that's present in `char` fields occurs here: if the overflowing characters are spaces, they will be truncated to fit inside the maximum character length. +If the string is greater than the maximum length, PostgreSQL will throw an error. The same truncation behavior that's present in `char` fields occurs here: if the overflowing characters are spaces, they will be truncated to fit inside the maximum character length. -The third data type that PostgreSQL provides for strings and character storage is called *`text`*. This type operates exactly like the `varchar` type without a maximum field length. It is used to store strings of any length: +The third data type that PostgreSQL provides for strings and character storage is called _`text`_. This type operates exactly like the `varchar` type without a maximum field length. It is used to store strings of any length: ``` TEXT @@ -236,13 +234,13 @@ BOOLEAN In keeping with [SQL standards](https://en.wikipedia.org/wiki/Three-valued_logic#SQL), the PostgreSQL boolean data type can actually express three states: -* **true**: Represented by the SQL keyword `TRUE`. As input values, the following strings also evaluate to true: true, yes, on, and 1. The output function represents true values with the string "t". -* **false**: Represented by the SQL keyword `FALSE`. As input values, the following strings also evaluate to false: false, no, off, and 0. The output function represents false values with the string "f". -* **unknown**: Represented by the SQL keyword `NULL`. In the context of SQL, a `NULL` value in a boolean column is meant to indicate that the value is unknown. +- **true**: Represented by the SQL keyword `TRUE`. As input values, the following strings also evaluate to true: true, yes, on, and 1. The output function represents true values with the string "t". +- **false**: Represented by the SQL keyword `FALSE`. As input values, the following strings also evaluate to false: false, no, off, and 0. The output function represents false values with the string "f". +- **unknown**: Represented by the SQL keyword `NULL`. In the context of SQL, a `NULL` value in a boolean column is meant to indicate that the value is unknown. As mentioned above, PostgreSQL is somewhat flexible on boolean input values, but stores values using the dedicated `TRUE`, `FALSE`, and `NULL` keywords. -Care must be taken when working with the boolean `NULL`. While PostgreSQL can correctly interpret `TRUE` and `FALSE` as booleans, it cannot make that assumption for `NULL` due to its multiple uses. You can explicitly cast `NULL` values to the `boolean` type in these situations to avoid this ambiguity: +Care must be taken when working with the boolean `NULL`. While PostgreSQL can correctly interpret `TRUE` and `FALSE` as booleans, it cannot make that assumption for `NULL` due to its multiple uses. You can explicitly cast `NULL` values to the `boolean` type in these situations to avoid this ambiguity: ``` NULL::boolean @@ -260,18 +258,18 @@ The `date` type can store a date without an associated time value: DATE ``` -When processing input for `date` columns, PostgreSQL can interpret many different formats to determine the correct date to store. Some formats are based on well known standards, while others are colloquial formats used in many real world contexts. +When processing input for `date` columns, PostgreSQL can interpret many different formats to determine the correct date to store. Some formats are based on well known standards, while others are colloquial formats used in many real world contexts. The full range of input formats for dates that PostgreSQL understands is shown in the ["Date Input" table in the PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-DATETIME-DATE-TABLE). -To deal with ambiguous input, like 07/12/2019 (which could be interpreted as either July 12, 2019 or December 07, 2019 depending on format), you can set the expected ordering using the [`DateStyle` parameter](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-DATESTYLE). This can be set to `DMY`, `MDY`, or `YMD` to define the expected ordering. By default, PostgreSQL will set it to `MDY` or use the `lc_time` locale to determine the appropriate ordering. +To deal with ambiguous input, like 07/12/2019 (which could be interpreted as either July 12, 2019 or December 07, 2019 depending on format), you can set the expected ordering using the [`DateStyle` parameter](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-DATESTYLE). This can be set to `DMY`, `MDY`, or `YMD` to define the expected ordering. By default, PostgreSQL will set it to `MDY` or use the `lc_time` locale to determine the appropriate ordering. PostgreSQL can also output dates using various formats: -* **ISO**: Outputs dates according to [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). March 18, 2009 would be represented as `2009-03-18`. -* **SQL**: The traditional SQL date format. March 18, 2009 would be represented as `03/18/2009`. -* **Postgres**: Mirrors ISO format for dates. March 18, 2009 would be represented as `2009-03-18`. -* **German**: The German regional style. March 18, 2009 would be represented as `18.03.2009`. +- **ISO**: Outputs dates according to [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). March 18, 2009 would be represented as `2009-03-18`. +- **SQL**: The traditional SQL date format. March 18, 2009 would be represented as `03/18/2009`. +- **Postgres**: Mirrors ISO format for dates. March 18, 2009 would be represented as `2009-03-18`. +- **German**: The German regional style. March 18, 2009 would be represented as `18.03.2009`. Both the `SQL` and `Postgres` formats respect the `DateStyle` value, mentioned earlier, to determine the ordering of the month, day, and years in output. @@ -279,13 +277,13 @@ Both the `SQL` and `Postgres` formats respect the `DateStyle` value, mentioned e The `time` data type (also called `time without time zone`) can store a specific time of day without an associated timezone or date. -> PostgreSQL does not recommend using `time with time zone`, the `time` type's variant that pairs a time zone with the clock time. This is due to complications and ambiguities that arise that cannot be resolved without additional context, like an associated date. For times that require a time zone component, the `timezonetz` type, covered in the next section, is a good alternative that provides the date component context. +> PostgreSQL does not recommend using `time with time zone`, the `time` type's variant that pairs a time zone with the clock time. This is due to complications and ambiguities that arise that cannot be resolved without additional context, like an associated date. For times that require a time zone component, the `timezonetz` type, covered in the next section, is a good alternative that provides the date component context. -When processing input for `time` columns, PostgreSQL can interpret many different formats to determine the correct time to store. Most of these are variations on the [ISO 8601 standard](https://en.wikipedia.org/wiki/ISO_8601), with flexibility to catch different variations. +When processing input for `time` columns, PostgreSQL can interpret many different formats to determine the correct time to store. Most of these are variations on the [ISO 8601 standard](https://en.wikipedia.org/wiki/ISO_8601), with flexibility to catch different variations. The full range of input formats for times that PostgreSQL understands is shown in the ["Time Input" table in the PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-DATETIME-TIME-TABLE). -PostgreSQL can store time values with a microsecond resolution. The amount of precision can be defined when the column is created by specifying a '(p)' or precision value, which can be any integer between 0 and 6. +PostgreSQL can store time values with a microsecond resolution. The amount of precision can be defined when the column is created by specifying a '(p)' or precision value, which can be any integer between 0 and 6. For example, to store time values with 3 decimal places of fractional seconds, you could define the time column like this: @@ -295,20 +293,20 @@ TIME (3) If no `(p)` value is provided, the column will store according to the input's precision, up to 6 decimal places. -When outputting times, PostgreSQL relies on the same format definitions available for date options. These are mostly result in the same or similar outputs: +When outputting times, PostgreSQL relies on the same format definitions available for date options. These are mostly result in the same or similar outputs: -* **ISO**: Outputs time according to [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). 04:28 PM and 52 seconds would be represented as `16:28:52`. -* **SQL**: The traditional SQL time format. 04:28 PM and 52 seconds would be represented as `16:28:52.00`. -* **Postgres**: Uses the Unix date / time format. 04:28 PM and 52 seconds would be represented as `16:28:52`. -* **German**: The German regional style. 04:28 PM and 52 seconds would be represented as `16:28:52.00`. +- **ISO**: Outputs time according to [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). 04:28 PM and 52 seconds would be represented as `16:28:52`. +- **SQL**: The traditional SQL time format. 04:28 PM and 52 seconds would be represented as `16:28:52.00`. +- **Postgres**: Uses the Unix date / time format. 04:28 PM and 52 seconds would be represented as `16:28:52`. +- **German**: The German regional style. 04:28 PM and 52 seconds would be represented as `16:28:52.00`. -As you can see, the output format doesn't have much of an affect on time representations as it does on dates. The main difference can be seen in the timestamp output that we'll see next. +As you can see, the output format doesn't have much of an affect on time representations as it does on dates. The main difference can be seen in the timestamp output that we'll see next. ## Timestamps -PostgreSQL can represent [timestamps](https://en.wikipedia.org/wiki/Timestamp), a combination of a date and time used to represent a specific moment in time, in two different variations: with and without an associated [time zone](https://en.wikipedia.org/wiki/Time_zone). Timestamps with a specified time zone can be stored in the `timestamptz` data type (also known as `timestamp with time zone`), while the `timestamp` data type (can also write as `timestamp without time zone`) is used for timestamps without a time zone. +PostgreSQL can represent [timestamps](https://en.wikipedia.org/wiki/Timestamp), a combination of a date and time used to represent a specific moment in time, in two different variations: with and without an associated [time zone](https://en.wikipedia.org/wiki/Time_zone). Timestamps with a specified time zone can be stored in the `timestamptz` data type (also known as `timestamp with time zone`), while the `timestamp` data type (can also write as `timestamp without time zone`) is used for timestamps without a time zone. -Like the `time` type, the `timestamp` and `timestamptz` types can take a `(p)` value to control the amount of precision that is stored. This can again be a number between zero and six. +Like the `time` type, the `timestamp` and `timestamptz` types can take a `(p)` value to control the amount of precision that is stored. This can again be a number between zero and six. To declare a `timestamp` column with 3 digits of precision, you could type: @@ -322,7 +320,7 @@ To do the same with a timestamp that includes a timezone, type: TIMESTAMP (3) ``` -When inputting values for `timestamp` columns, all that is needed is a valid [date format](#dates) followed by a valid [time format](#time), separated by a space. PostgreSQL also recognizes the "Postgres original style" format, which is similar to the [default output used by the Unix date command](https://www.gnu.org/software/coreutils/manual/html_node/date-invocation.html#date-invocation), but with the time zone, if present, at the end: +When inputting values for `timestamp` columns, all that is needed is a valid [date format](#dates) followed by a valid [time format](#time), separated by a space. PostgreSQL also recognizes the "Postgres original style" format, which is similar to the [default output used by the Unix date command](https://www.gnu.org/software/coreutils/manual/html_node/date-invocation.html#date-invocation), but with the time zone, if present, at the end: ``` Wed Mar 18 16:28:52 2009 EST @@ -330,51 +328,52 @@ Wed Mar 18 16:28:52 2009 EST For `timestamp` columns, any provided time zone values will be ignored. -Providing values for `timestamptz` fields are exactly the same as for `timestamp` but with the addition of a time zone. Time zones can be specified in [a number of different formats](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONE-TABLE), which use labels, abbreviations, or offsets from UTC. The time zone indicator designation is included after the date and time in timestamps. +Providing values for `timestamptz` fields are exactly the same as for `timestamp` but with the addition of a time zone. Time zones can be specified in [a number of different formats](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONE-TABLE), which use labels, abbreviations, or offsets from UTC. The time zone indicator designation is included after the date and time in timestamps. -When storing `timestamptz` values, PostgreSQL converts the input to UTC for storage. This simplifies the storage since the time zone used for output may be different from the input. +When storing `timestamptz` values, PostgreSQL converts the input to UTC for storage. This simplifies the storage since the time zone used for output may be different from the input. When outputting timestamps, the same four formats that influence `date` and `time` can influence how PostgreSQL represents timestamp values: -* **ISO**: Outputs timestamps according to [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). The point in time of 04:28 PM and 52 seconds on March 18, 2009 in Eastern Standard Time would be represented as `2009-03-18 16:28:52-05`. For `timestamp` columns, which do not include the time zone, the `-05` would be omitted. Rather than separating the date and time components with a capital 'T', as ISO 8601 defines, PostgreSQL uses a space to delimit these fields. -* **SQL**: The traditional SQL date format. The point in time of 04:28 PM and 52 seconds on March 18, 2009 in Eastern Standard Time would be represented as `03/18/2009 16:28:52.00 EST`. For `timestamp` columns, which do not include the time zone, the `EST` would be omitted. -* **Postgres**: Resembles the format used by the Unix `date` command. The point in time of 04:28 PM and 52 seconds on March 18, 2009 in Eastern Standard Time would be represented as `Wed Mar 18 16:28:52 2009 EST`. For `timestamp` columns, which do not include the time zone, the `EST` would be omitted. -* **German**: The German regional style. The point in time of 04:28 PM and 52 seconds on March 18, 2009 in Eastern Standard Time would be represented as `18.03.2009 16:28:52.00 EST`. For `timestamp` columns, which do not include the time zone, the `EST` would be omitted. +- **ISO**: Outputs timestamps according to [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). The point in time of 04:28 PM and 52 seconds on March 18, 2009 in Eastern Standard Time would be represented as `2009-03-18 16:28:52-05`. For `timestamp` columns, which do not include the time zone, the `-05` would be omitted. Rather than separating the date and time components with a capital 'T', as ISO 8601 defines, PostgreSQL uses a space to delimit these fields. +- **SQL**: The traditional SQL date format. The point in time of 04:28 PM and 52 seconds on March 18, 2009 in Eastern Standard Time would be represented as `03/18/2009 16:28:52.00 EST`. For `timestamp` columns, which do not include the time zone, the `EST` would be omitted. +- **Postgres**: Resembles the format used by the Unix `date` command. The point in time of 04:28 PM and 52 seconds on March 18, 2009 in Eastern Standard Time would be represented as `Wed Mar 18 16:28:52 2009 EST`. For `timestamp` columns, which do not include the time zone, the `EST` would be omitted. +- **German**: The German regional style. The point in time of 04:28 PM and 52 seconds on March 18, 2009 in Eastern Standard Time would be represented as `18.03.2009 16:28:52.00 EST`. For `timestamp` columns, which do not include the time zone, the `EST` would be omitted. ### Intervals -PostgreSQL can also store and work with values that represent temporal intervals. These basically describe the amount of time between two specific timestamps. +PostgreSQL can also store and work with values that represent temporal intervals. These basically describe the amount of time between two specific timestamps. -Like `time`, `timestamp` and `timestamptz`, the `interval` data type can represent time differences to the microsecond level. Again, the `(p)` argument is used to represent the amount of precision, or decimal places, the number will record, which can range from zero to six: +Like `time`, `timestamp` and `timestamptz`, the `interval` data type can represent time differences to the microsecond level. Again, the `(p)` argument is used to represent the amount of precision, or decimal places, the number will record, which can range from zero to six: ``` INTERVAL (3) ``` -While the `(p)` argument affects how fractions of seconds are stored, the `interval` type has another argument that can alter the amount of specificity more generally. By providing one of the follow values when defining an `interval` column, you can control the level of detail by limiting the fields used to store interval data: +While the `(p)` argument affects how fractions of seconds are stored, the `interval` type has another argument that can alter the amount of specificity more generally. By providing one of the follow values when defining an `interval` column, you can control the level of detail by limiting the fields used to store interval data: -* `YEAR`: Store only the number of years -* `MONTH`: Store only the number of months -* `DAY`: Store only the number of days -* `HOUR`: Store only the number of hours -* `MINUTE`: Store only the number of minutes -* `SECOND`: Store only the number of seconds -* `YEAR TO MONTH`: Store only the number of years and months -* `DAY TO HOUR`: Store only the number of days and hours -* `DAY TO MINUTE`: Store only the number of days, hours, and minutes -* `DAY TO SECOND`: Store only the number of days, hours, minutes, and seconds -* `HOUR TO MINUTE`: Store only the number of hours and minutes -* `HOUR TO SECOND`: Store only the number of hours, minutes, and seconds -* `MINUTE TO SECOND`: Store only the number of minutes and seconds +- `YEAR`: Store only the number of years +- `MONTH`: Store only the number of months +- `DAY`: Store only the number of days +- `HOUR`: Store only the number of hours +- `MINUTE`: Store only the number of minutes +- `SECOND`: Store only the number of seconds +- `YEAR TO MONTH`: Store only the number of years and months +- `DAY TO HOUR`: Store only the number of days and hours +- `DAY TO MINUTE`: Store only the number of days, hours, and minutes +- `DAY TO SECOND`: Store only the number of days, hours, minutes, and seconds +- `HOUR TO MINUTE`: Store only the number of hours and minutes +- `HOUR TO SECOND`: Store only the number of hours, minutes, and seconds +- `MINUTE TO SECOND`: Store only the number of minutes and seconds -When specifying both the fields to store and precision, the fields come first. Therefore, to create a column with 5 digits of precision that only stores the days, hours, minutes, and seconds of a given interval, you could type: +When specifying both the fields to store and precision, the fields come first. Therefore, to create a column with 5 digits of precision that only stores the days, hours, minutes, and seconds of a given interval, you could type: ``` INTERVAL DAY TO SECOND (5) ``` + Be aware that you can only specify precision if your declared field ranges include the seconds value, since this is the only case where this argument matters. -There are a number of different ways to format input when adding values to an `interval` column. The most straightforward way is to specify the amount and the unit of each column or field you wish to provide, separated by a space. For example: +There are a number of different ways to format input when adding values to an `interval` column. The most straightforward way is to specify the amount and the unit of each column or field you wish to provide, separated by a space. For example: ``` 7 days 3 hours 27 minutes 8 seconds @@ -392,48 +391,48 @@ The interval described above can also be represented without units by providing 7 3:27:08 ``` -Similarly, intervals that only express years and months can be represented by a year, a dash, and a month. So 38 years and 4 months would look like: +Similarly, intervals that only express years and months can be represented by a year, a dash, and a month. So 38 years and 4 months would look like: ``` 38-4 ``` -PostgreSQL also understands abbreviated input based on [ISO 8601 timestamps](https://en.wikipedia.org/wiki/ISO_8601), which can be used to represent intervals that use a greater number of fields. There are two separate input formats based on this standard. +PostgreSQL also understands abbreviated input based on [ISO 8601 timestamps](https://en.wikipedia.org/wiki/ISO_8601), which can be used to represent intervals that use a greater number of fields. There are two separate input formats based on this standard. The first uses the following unit abbreviations, represented by the bold component of the following fields: -* **Y**ears -* **M**onths -* **W**eeks -* **D**ays -* **H**ours -* **M**inutes -* **S**econds +- **Y**ears +- **M**onths +- **W**eeks +- **D**ays +- **H**ours +- **M**inutes +- **S**econds -You may notice that **M** is used to label both months and minutes. In this format, the date component is separated from the time component by a "**T**". If the M appears before the T, it is interpreted as months; if it occurs after the T, it means minutes. +You may notice that **M** is used to label both months and minutes. In this format, the date component is separated from the time component by a "**T**". If the M appears before the T, it is interpreted as months; if it occurs after the T, it means minutes. -Formats based on ISO 8601 begin with a **P** and then contain the interval string. So to represent an interval of 4 years, 2 months, 7 days, 3 hours, 27 minutes, and 8 seconds, the following string would work: +Formats based on ISO 8601 begin with a **P** and then contain the interval string. So to represent an interval of 4 years, 2 months, 7 days, 3 hours, 27 minutes, and 8 seconds, the following string would work: ``` P4Y2M7DT3H27M8S ``` -The other ISO 8601 format does not use unit abbreviations at all. Instead, it separates the date components with dashes and the time components with colons. Again, the string begins with a P and separates the date and time components with a T. The same interval expressed earlier could be written as: +The other ISO 8601 format does not use unit abbreviations at all. Instead, it separates the date components with dashes and the time components with colons. Again, the string begins with a P and separates the date and time components with a T. The same interval expressed earlier could be written as: ``` P4-2-7T3:27:8 ``` -PostgreSQL can output values from `interval` columns in several formats. The output style is determined by the `intervalstyle` setting, which can be one of the following: +PostgreSQL can output values from `interval` columns in several formats. The output style is determined by the `intervalstyle` setting, which can be one of the following: -* **postgres**: The default style. This format uses the units `years`, `mons`, and `days`, each separated by a space to represent date components. It uses `HH:MM:SS` to represent time components. -* **postgres_verbose**: This format begins with an ampersand (@). It includes units or unit abbreviations for all fields, separated by spaces: `years`, `mons`, `days`, `hours`, `mins`, and `secs`. -* **sql_standard**: Follows the SQL standard output spec. Years and months are separated by a dash: `YYYY-MM`. Afterwards, day and time components are represented by an independent day field and a `HH:MM:SS` time field. The complete representation would look like: `YYYY-MM D HH:MM:SS` -* **iso_8601**: This style produces output with ISO 8601's "format with designators" (the first ISO 8601 style described above). Replacing the pound signs with actual values, the output would look like this: `P#Y#M#DT#H#M#S` +- **postgres**: The default style. This format uses the units `years`, `mons`, and `days`, each separated by a space to represent date components. It uses `HH:MM:SS` to represent time components. +- **postgres_verbose**: This format begins with an ampersand (@). It includes units or unit abbreviations for all fields, separated by spaces: `years`, `mons`, `days`, `hours`, `mins`, and `secs`. +- **sql_standard**: Follows the SQL standard output spec. Years and months are separated by a dash: `YYYY-MM`. Afterwards, day and time components are represented by an independent day field and a `HH:MM:SS` time field. The complete representation would look like: `YYYY-MM D HH:MM:SS` +- **iso_8601**: This style produces output with ISO 8601's "format with designators" (the first ISO 8601 style described above). Replacing the pound signs with actual values, the output would look like this: `P#Y#M#DT#H#M#S` ## Other useful types -Along with the types we covered with some depth above, there are additional types that are useful in specific scenarios. We'll cover these briefly to give you an idea of how to use them and when they may be useful. +Along with the types we covered with some depth above, there are additional types that are useful in specific scenarios. We'll cover these briefly to give you an idea of how to use them and when they may be useful. ### Does PostgreSQL support user defined types? @@ -441,9 +440,9 @@ PostgreSQL supports user defined types in a few different ways. #### Enumerated types -PostgreSQL enumerated types are user-defined types that have a set number of valid values. This functions similar to a drop down menu in that a choice can be made from a specific set of options. For example, an `enum` type called `season` could be created with the values `winter`, `spring`, `summer`, and `autumn`. +PostgreSQL enumerated types are user-defined types that have a set number of valid values. This functions similar to a drop down menu in that a choice can be made from a specific set of options. For example, an `enum` type called `season` could be created with the values `winter`, `spring`, `summer`, and `autumn`. -To use an `enum` type as a column, you must first define it to declare its name and range of values. You can create the `season` type we described above by typing: +To use an `enum` type as a column, you must first define it to declare its name and range of values. You can create the `season` type we described above by typing: ``` CREATE TYPE season AS ENUM ('winter', 'spring', 'summer', 'autumn'); @@ -459,15 +458,15 @@ SEASON #### Other user defined types -Other types can also be defined with the `CREATE TYPE` command. These include: +Other types can also be defined with the `CREATE TYPE` command. These include: -* **Composite types:** Composite types are types that are defined as a combination of two or more different types. For instance, you could create an `event` type that combines a geographical location and a timestamp to pinpoint a specific time and place. -* **Range types:** Range types include a valid range for a specified data type. While PostgreSQL includes [some range types](https://www.postgresql.org/docs/current/rangetypes.html#RANGETYPES-BUILTIN) by default, the `CREATE TYPE` command allows you to create your own. -* **Base types:** Base types are used to define a completely new type of data that isn't reliant on modifying existing types. To do this, you'll need to code up type functions in C to show PostgreSQL how to input, output, and process the data. +- **Composite types:** Composite types are types that are defined as a combination of two or more different types. For instance, you could create an `event` type that combines a geographical location and a timestamp to pinpoint a specific time and place. +- **Range types:** Range types include a valid range for a specified data type. While PostgreSQL includes [some range types](https://www.postgresql.org/docs/current/rangetypes.html#RANGETYPES-BUILTIN) by default, the `CREATE TYPE` command allows you to create your own. +- **Base types:** Base types are used to define a completely new type of data that isn't reliant on modifying existing types. To do this, you'll need to code up type functions in C to show PostgreSQL how to input, output, and process the data. ### UUID -[Universally Unique Identifiers](https://en.wikipedia.org/wiki/Universally_unique_identifier), or UUIDs, are 128-bit numbers used to distinctively identify pieces of information. They are used in many different contexts in order to assign a global identifier that is extremely unlikely to be assigned elsewhere. PostgreSQL includes the `uuid` type to work with these values: +[Universally Unique Identifiers](https://en.wikipedia.org/wiki/Universally_unique_identifier), or UUIDs, are 128-bit numbers used to distinctively identify pieces of information. They are used in many different contexts in order to assign a global identifier that is extremely unlikely to be assigned elsewhere. PostgreSQL includes the `uuid` type to work with these values: ``` UUID @@ -479,32 +478,32 @@ UUIDs have 32 digits, and are generally visually separated into five groups with ########-####-####-####-############ ``` -Each placeholder contains a hexadecimal digit (0 through 9, plus the lower case letters "a" through "f"). PostgreSQL uses this standard format for output. +Each placeholder contains a hexadecimal digit (0 through 9, plus the lower case letters "a" through "f"). PostgreSQL uses this standard format for output. For input, PostgreSQL understands a number of formats including using upper case letters, different digit groupings, no digit groupings, and surrounding the UUID with curly brackets. ### JSON -PostgreSQL supports columns in [JSON](https://en.wikipedia.org/wiki/JSON) using the `json` and `jsonb` format. Data stored as `json` is stored as-is, while data stored with `jsonb` is interpreted and processed and stored in binary for faster execution and processing. PostgreSQL can also index `jsonb` columns for better performance. In general, `jsonb` is recommended for JSON data for this reason: +PostgreSQL supports columns in [JSON](https://en.wikipedia.org/wiki/JSON) using the `json` and `jsonb` format. Data stored as `json` is stored as-is, while data stored with `jsonb` is interpreted and processed and stored in binary for faster execution and processing. PostgreSQL can also index `jsonb` columns for better performance. In general, `jsonb` is recommended for JSON data for this reason: ``` JSONB ``` -There are some slight differences between the two formats. The `json` type preserves incidental white space, key ordering, and duplicate keys. The `jsonb` format removes insignificant white space, overwrites duplicate keys, and provides no key ordering guarantees. +There are some slight differences between the two formats. The `json` type preserves incidental white space, key ordering, and duplicate keys. The `jsonb` format removes insignificant white space, overwrites duplicate keys, and provides no key ordering guarantees. -PostgreSQL includes [JSON operators](https://www.postgresql.org/docs/current/functions-json.html), can [index `jsonb` columns](https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), test whether [`jsonb` objects contain other `jsonb` objects](https://www.postgresql.org/docs/current/datatype-json.html#JSON-CONTAINMENT), and can [transform values to data types used in different languages](https://www.postgresql.org/docs/current/datatype-json.html#id-1.5.7.22.19). These are outside of the scope of this guide, but will be covered in a future article on working with JSON with PostgreSQL. +PostgreSQL includes [JSON operators](https://www.postgresql.org/docs/current/functions-json.html), can [index `jsonb` columns](https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), test whether [`jsonb` objects contain other `jsonb` objects](https://www.postgresql.org/docs/current/datatype-json.html#JSON-CONTAINMENT), and can [transform values to data types used in different languages](https://www.postgresql.org/docs/current/datatype-json.html#id-1.5.7.22.19). These are outside of the scope of this guide, but will be covered in a future article on working with JSON with PostgreSQL. ## Conclusion -We've covered many of the most common data types used in PostgreSQL in this guide. While these provide a good starting point for data types in PostgreSQL, [additional types are available](https://www.postgresql.org/docs/current/datatype.html) to store other forms of data. Using the most appropriate types for your data allows you to use the database system to validate and operate on your data easily. +We've covered many of the most common data types used in PostgreSQL in this guide. While these provide a good starting point for data types in PostgreSQL, [additional types are available](https://www.postgresql.org/docs/current/datatype.html) to store other forms of data. Using the most appropriate types for your data allows you to use the database system to validate and operate on your data easily. -Understanding data types is essential when [designing schemas and tables](/postgresql/create-and-delete-databases-and-tables#create-tables-within-databases) in PostgreSQL. It also affects how to interact with the database from your applications, as the type system influences how data must be formatted and how it may be expressed when outputted. Learning about the options available within PostgreSQL and the side effects your choices might have is the best way to plan ahead when designing your data structures. +Understanding data types is essential when [designing schemas and tables](/postgresql/create-and-delete-databases-and-tables#create-tables-within-databases) in PostgreSQL. It also affects how to interact with the database from your applications, as the type system influences how data must be formatted and how it may be expressed when outputted. Learning about the options available within PostgreSQL and the side effects your choices might have is the best way to plan ahead when designing your data structures. -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to work with your PostgreSQL databases, you can find a mapping between some of the common PostgreSQL and Prisma types in [Prisma's PostgreSQL data connectors docs](https://www.prisma.io/docs/concepts/database-connectors/postgresql#type-mapping-between-postgresql-to-prisma-schema). +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to work with your PostgreSQL databases, you can find a mapping between some of the common PostgreSQL and Prisma types in [Prisma's PostgreSQL data connectors docs](https://www.prisma.io/docs/orm/overview/databases/postgresql#type-mapping-between-postgresql-to-prisma-schema). -In the data model used by Prisma schema, data types are represented by [field types](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model#defining-fields). You might also want to take a look at our guide on [mapping fields to a specific native type](https://www.prisma.io/docs/concepts/components/prisma-migrate/supported-types-and-db-features#mapping-fields-to-a-specific-native-type). +In the data model used by Prisma schema, data types are represented by [field types](https://www.prisma.io/docs/orm/prisma-schema/data-model/models#defining-fields). You might also want to take a look at our guide on [mapping fields to a specific native type](https://www.prisma.io/docs/orm/prisma-migrate/workflows/native-database-types#mapping-fields-to-a-specific-native-type). diff --git a/content/04-postgresql/10-column-and-table-constraints.mdx b/content/04-postgresql/10-column-and-table-constraints.mdx index 188b6355..5bfa70bc 100644 --- a/content/04-postgresql/10-column-and-table-constraints.mdx +++ b/content/04-postgresql/10-column-and-table-constraints.mdx @@ -199,12 +199,6 @@ DETAIL: Failing row contains (A poor film, Misguided director, 2019-07-16, 128, In this case, the film has satisfied every condition except for the number of votes required. PostgreSQL rejects the submission since it does not pass the final table check constraint. - - -If you are using Prisma, you can learn how to use check constraints with our [PostgreSQL data validation guide](https://www.prisma.io/docs/guides/other/advanced-database-tasks/data-validation/postgresql). - - - ### Not null constraints The `NOT NULL` constraint is much more focused. It guarantees that values within a column are not null. While this is a simple constraint, it is used very frequently. @@ -250,7 +244,7 @@ CREATE TABLE national_capitals ( -When working with Prisma Client, you can control whether each field is [optional or mandatory](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model#optional-and-mandatory-fields) to get equivalent functionality to the `NOT NULL` constraint in PostgreSQL. +When working with Prisma Client, you can control whether each field is [optional or mandatory](https://www.prisma.io/docs/orm/prisma-schema/data-model/models#optional-and-mandatory-fields) to get equivalent functionality to the `NOT NULL` constraint in PostgreSQL. @@ -369,7 +363,7 @@ DETAIL: Key (country, capital)=(Bolivia, Sucre) already exists. -If you are using Prisma, you can define a [unique field](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model#defining-a-unique-field) in your Prisma schema. +If you are using Prisma, you can define a [unique field](https://www.prisma.io/docs/orm/prisma-schema/data-model/models#defining-a-unique-field) in your Prisma schema. @@ -429,7 +423,7 @@ CREATE TABLE national_capitals ( -When using Prisma, a primary key is synonymous with an [id field](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model#defining-an-id-field). +When using Prisma, a primary key is synonymous with an [id field](https://www.prisma.io/docs/orm/prisma-schema/data-model/models#defining-an-id-field). @@ -528,7 +522,7 @@ CREATE TABLE example ( -We cover how to define [relations](https://www.prisma.io/docs/concepts/components/prisma-schema/relations) in the Prisma schema in our documentation. +We cover how to define [relations](https://www.prisma.io/docs/orm/prisma-schema/data-model/relations) in the Prisma schema in our documentation. diff --git a/content/04-postgresql/11-date-types.mdx b/content/04-postgresql/11-date-types.mdx index 7e75b6bd..e5e84920 100644 --- a/content/04-postgresql/11-date-types.mdx +++ b/content/04-postgresql/11-date-types.mdx @@ -1,7 +1,7 @@ --- title: 'Working with dates in PostgreSQL' -metaTitle: "PostgreSQL Date Types - Format, Functions, and More" -metaDescription: "Read on to learn about how to work with date types in PostgreSQL, including information on formats, functions, and more." +metaTitle: 'PostgreSQL Date Types - Format, Functions, and More' +metaDescription: 'Read on to learn about how to work with date types in PostgreSQL, including information on formats, functions, and more.' metaImage: '/social/generic-postgresql.png' authors: ['alexemerich'] --- @@ -12,8 +12,6 @@ The ability to store date values inside of your database allows you to add a tim In this guide, we are going to discuss storing `DATE` types in PostgreSQL and the various ways that you can work with them. - - ## PostgreSQL `DATE` data type The [`DATE` type in PostgreSQL](https://www.prisma.io/dataguide/postgresql/introduction-to-data-types#dates-and-time) can store a date without an associated time value: @@ -33,7 +31,7 @@ CREATE TABLE checkouts ( author_id serial PRIMARY KEY, author_name VARCHAR (255) NOT NULL, book_title VARCHAR (255) NOT NULL, - published_date DATE NOT NULL, + published_date DATE NOT NULL, last_checkout DATE NOT NULL DEFAULT CURRENT_DATE ); ``` @@ -50,20 +48,16 @@ Then when querying the `checkouts` table, we get the following: ```sql SELECT * FROM checkouts; - author_id | author_name | book_title | published_date | last_checkout + author_id | author_name | book_title | published_date | last_checkout -----------+-------------+------------+----------------+--------------- 1 | James Joyce | Ulysses | 1922-02-02 | 2021-09-27 (1 row) ``` - - ## PostgreSQL `DATE` functions By knowing the ins and outs of the `DATE` type in PostgreSQL, you are then able to use functions working with the information that you store. We'll walk through some common functions building off of the table introduced in the prior section. - - ### Get the current date In PostgreSQL, you can get the current date and time by using the built-in `NOW()` function. The following statement will return both the day and time: @@ -71,7 +65,7 @@ In PostgreSQL, you can get the current date and time by using the built-in `NOW( ```sql SELECT NOW(); - now + now ------------------------------- 2021-09-27 15:22:53.679985+02 (1 row) @@ -82,7 +76,7 @@ If the time is not of interest, you can also specify to only return the date wit ```sql SELECT NOW()::date; - now + now ------------ 2021-09-27 (1 row) @@ -93,15 +87,13 @@ Using `CURRENT_DATE` is another way to get the current date as demonstrated belo ```sql SELECT CURRENT_DATE; - current_date + current_date -------------- 2021-09-27 (1 row) ``` -All three of these options will return you the date in the `yyyy-mm-dd` format. Within PostgreSQL, you can adjust the format of this output if desired. - - +All three of these options will return you the date in the `yyyy-mm-dd` format. Within PostgreSQL, you can adjust the format of this output if desired. ### Output a date value in a specific format @@ -113,7 +105,7 @@ To output a date value in a specific format, you use the `TO_CHAR()` function. T ```sql SELECT TO_CHAR(NOW()::date, 'dd/mm/yyyy'); - to_char + to_char ------------ 27/09/2021 (1 row) @@ -124,7 +116,7 @@ You can also display the date in a format like `Sep 27, 2021`: ```sql SELECT TO_CHAR(NOW():: DATE, 'Mon dd, yyyy'); - to_char + to_char -------------- Sep 27, 2021 (1 row) @@ -132,8 +124,6 @@ SELECT TO_CHAR(NOW():: DATE, 'Mon dd, yyyy'); Depending on the requirements of a system, you may need a date formatted in a specific way. This is a scenario where being able to specify the output in PostgreSQL is useful. - - ### Get the interval between two dates PostgreSQL allows you to get the [interval](https://www.postgresqltutorial.com/postgresql-interval/) between two dates using the `-` operator. Using this operator allows you to calculate things like the tenure of an employee or time since the publishing of a book. @@ -152,14 +142,12 @@ FROM Resulting in: ```sql - author_name | book_title | diff + author_name | book_title | diff -------------+------------+---------------------------- - James Joyce | Ulysses | 36397 days + James Joyce | Ulysses | 36397 days (1 row) ``` - - ### Calculating age using date values We can continue with the same example to calculate the age at the current date in years, months, and days using the `AGE()` function. The following statement uses the `AGE()` function to calculate the age of a publication from our library `checkouts` tables: @@ -176,7 +164,7 @@ FROM With this function we can calculate how old a book in inventory is: ```sql - author_name | book_title | age + author_name | book_title | age -------------+------------+------------------------- James Joyce | Ulysses | 99 years 7 mons 25 days (1 row) @@ -197,14 +185,12 @@ FROM Resulting in: ```sql - author_name | book_title | age + author_name | book_title | age -------------+------------+-------------------------- James Joyce | Ulysses | 77 years 10 mons 27 days (1 row) ``` - - ### Extracting year, quarter, month, week, or day from a date value The last function that we are going to cover is the `EXTRACT()` function in PostgreSQL that allows you to separate the components of date like the year, quarter, month, and day. @@ -212,20 +198,20 @@ The last function that we are going to cover is the `EXTRACT()` function in Post The following statement pulls out the year, month, and day from the published date of Ulysses: ```sql -SELECT - author_name, - book_title, +SELECT + author_name, + book_title, EXTRACT(YEAR FROM published_date) AS YEAR, - EXTRACT(MONTH FROM published_date) AS MONTH, - EXTRACT(DAY FROM published_date) AS DAY -FROM + EXTRACT(MONTH FROM published_date) AS MONTH, + EXTRACT(DAY FROM published_date) AS DAY +FROM checkouts; ``` The results will look like the following: ```sql - author_name | book_title | year | month | day + author_name | book_title | year | month | day -------------+------------+------+-------+----- James Joyce | Ulysses | 1922 | 2 | 2 (1 row) @@ -233,15 +219,13 @@ The results will look like the following: This is a useful function to be aware of when you may only need a portion of a date value for a calculation with your data for example. - - ## Conclusion -In this guide, we covered the basics of what you can do with the `DATE` data type in PostgreSQL. It is important to know how date data works inside of your database. Having a grasp on the ways you can access it and operate on it allows you to make age calculations, execute extractions in your queries, and also configure your output if necessary for matching another system's requirements. +In this guide, we covered the basics of what you can do with the `DATE` data type in PostgreSQL. It is important to know how date data works inside of your database. Having a grasp on the ways you can access it and operate on it allows you to make age calculations, execute extractions in your queries, and also configure your output if necessary for matching another system's requirements. -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) with PostgreSQL, here's some information about how Prisma Client translates [PostgreSQL's date type](https://www.prisma.io/docs/reference/api-reference/prisma-schema-reference#postgresql-6). +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) with PostgreSQL, here's some information about how Prisma Client translates [PostgreSQL's date type](https://www.prisma.io/docs/orm/reference/prisma-schema-reference#postgresql-6). @@ -255,12 +239,12 @@ The date format for the date data type in PostgreSQL is `yyyy-mm-dd`. This is th
What is the `DATE_PART()` function in PostgreSQL? -The `DATE_PART()` function in PostgreSQL is used to subquery for subfields from a date or time value. +The `DATE_PART()` function in PostgreSQL is used to subquery for subfields from a date or time value. The basic syntax looks like: ```sql -SELECT DATE_PART(field, source); +SELECT DATE_PART(field, source); ``` For example, you can write the following and return the hour, `20`: @@ -278,9 +262,9 @@ The `TO_DATE()` function in PostgreSQL can be used to convert a string of text i The basic syntax looks as follows: ```sql -TO_DATE(text, format); +TO_DATE(text, format); ``` - + An example such as: ```sql @@ -315,7 +299,7 @@ This statement’s return will be `2022`. You can truncate a timestamp in PostgreSQL by using the `DATE_TRUNC()` function. This function truncates a `TIMESTAMP` or `INTERVAL` based on a specified date part such as year, month, day, etc. -The basic syntax looks as follows: +The basic syntax looks as follows: ```sql DATE_TRUNC('datepart' field); @@ -328,4 +312,5 @@ DATE_TRUNC('hour', TIMESTAMP '2022-03-17 02:09:30'); ``` The return for this statement would be `2022-03-17 02:00:00`. +
diff --git a/content/04-postgresql/12-inserting-and-modifying-data/01-inserting-and-deleting-data.mdx b/content/04-postgresql/12-inserting-and-modifying-data/01-inserting-and-deleting-data.mdx index a2855304..2d19c45b 100644 --- a/content/04-postgresql/12-inserting-and-modifying-data/01-inserting-and-deleting-data.mdx +++ b/content/04-postgresql/12-inserting-and-modifying-data/01-inserting-and-deleting-data.mdx @@ -1,20 +1,20 @@ --- title: 'How to insert and delete data in PostgreSQL' metaTitle: 'How to insert, update, and delete data in PostgreSQL tables' -metaDescription: "The `INSERT` and `DELETE` commands are the primary way that PostgreSQL adds and removes records from tables. This guide demonstrates how to use them to control the data your tables manage." +metaDescription: 'The `INSERT` and `DELETE` commands are the primary way that PostgreSQL adds and removes records from tables. This guide demonstrates how to use them to control the data your tables manage.' metaImage: '/social/generic-postgresql.png' authors: ['justinellingwood'] --- ## Introduction -Adding and removing records from tables are some of the most common operations that databases perform. Adding data involves specifying the [table](/intro/database-glossary#table) and [column](/intro/database-glossary#column) names you wish to add values to as well as the values you wish to enter into each fields. Deleting records involves identifying the correct row or rows and removing them from the table. +Adding and removing records from tables are some of the most common operations that databases perform. Adding data involves specifying the [table](/intro/database-glossary#table) and [column](/intro/database-glossary#column) names you wish to add values to as well as the values you wish to enter into each fields. Deleting records involves identifying the correct row or rows and removing them from the table. -In this guide, we will cover how to use the SQL `INSERT` and `DELETE` commands with PostgreSQL. This includes the basic syntax, how to return data information about the data that was processed, and how to add or remove multiple rows in a single statement. +In this guide, we will cover how to use the SQL `INSERT` and `DELETE` commands with PostgreSQL. This includes the basic syntax, how to return data information about the data that was processed, and how to add or remove multiple rows in a single statement. ## Reviewing the table's structure -Before using the `INSERT` command, you must know the table's structure so that you can accommodate the requirements imposed by the table's columns, [data types](/intro/database-glossary#data-type), and [constraints](/intro/database-glossary#constraint). There are a few different ways of doing this depending on your database client. +Before using the `INSERT` command, you must know the table's structure so that you can accommodate the requirements imposed by the table's columns, [data types](/intro/database-glossary#data-type), and [constraints](/intro/database-glossary#constraint). There are a few different ways of doing this depending on your database client. If you are using the `psql` command line client, the most straightforward way to find this information is to use the `\d+` meta command built into the tool. @@ -23,6 +23,7 @@ For instance, to find the structure of a table called `employee`, you would type ```sql no-lines \d+ employee ``` + ``` Table "public.employee" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description @@ -41,12 +42,13 @@ Access method: heap The output displays the table's column names, data types, and default values, among others. -The `\d+` meta command is only available with the `psql` client, so if you are using a different client, you might have to query the table information directly. You can get most of the relevant information with a query like this: +The `\d+` meta command is only available with the `psql` client, so if you are using a different client, you might have to query the table information directly. You can get most of the relevant information with a query like this: ```sql SELECT column_name, data_type, column_default, is_nullable, character_maximum_length FROM information_schema.columns WHERE table_name ='employee'; ``` + ``` column_name | data_type | column_default | is_nullable | character_maximum_length -------------+-----------------------------+-----------------------------------------------+-------------+-------------------------- @@ -61,7 +63,7 @@ These should give you a good idea of the table's structure so that you can inser ## Using `INSERT` to add new records to tables -The SQL `INSERT` command is used to add rows of data to an existing table. Once you know the table's structure, you can construct a command that matches the table's columns with the corresponding values you wish to insert for the new record. +The SQL `INSERT` command is used to add rows of data to an existing table. Once you know the table's structure, you can construct a command that matches the table's columns with the corresponding values you wish to insert for the new record. The basic syntax of the command looks like this: @@ -84,15 +86,17 @@ As an example, to insert a new employee into the `employee` table listed above, INSERT INTO employee(first_name, last_name) VALUES ('Bob', 'Smith'); ``` + ``` INSERT 0 1 ``` -Here, we provide values for the `first_name` and `last_name` columns while leaving the other columns to be populated by their default values. If you query the table, you can see that the new record has been added: +Here, we provide values for the `first_name` and `last_name` columns while leaving the other columns to be populated by their default values. If you query the table, you can see that the new record has been added: ```sql SELECT * FROM employee; ``` + ``` employee_id | first_name | last_name | last_update -------------+------------+-----------+---------------------------- @@ -102,13 +106,13 @@ SELECT * FROM employee; -You can also use the Prisma Client to add data to your tables by issuing a [create query](https://www.prisma.io/docs/concepts/components/prisma-client/crud#create). +You can also use the Prisma Client to add data to your tables by issuing a [create query](https://www.prisma.io/docs/orm/prisma-client/queries/crud#create). ## Returning data from `INSERT` statements -If you want additional information about the data that was added to the table, you can include the `RETURNING` clause at the end of your statement. The `RETURNING` clause specifies the columns to display of the records that were just inserted. +If you want additional information about the data that was added to the table, you can include the `RETURNING` clause at the end of your statement. The `RETURNING` clause specifies the columns to display of the records that were just inserted. For instance, to display all of the columns for the records that were just inserted, you could type something like this: @@ -117,6 +121,7 @@ INSERT INTO my_table(column_name, column_name_2) VALUES ('value', 'value2') RETURNING *; ``` + ``` column_name | column_name_2 -------------+--------------- @@ -133,6 +138,7 @@ INSERT INTO employee(first_name, last_name) VALUES ('Sue', 'Berns') RETURNING *; ``` + ``` employee_id | first_name | last_name | last_update -------------+------------+-----------+-------------------------- @@ -142,15 +148,16 @@ RETURNING *; INSERT 0 1 ``` -You can also choose to return only specific columns from insertions. For instance, here, we only are interested in the new employee's ID: +You can also choose to return only specific columns from insertions. For instance, here, we only are interested in the new employee's ID: ```sql INSERT INTO employee(first_name, last_name) VALUES ('Delores', 'Muniz') RETURNING employee_id; ``` + ``` - employee_id + employee_id ------------- 3 (1 row) @@ -165,6 +172,7 @@ INSERT INTO employee(first_name, last_name) VALUES ('Simone', 'Kohler') RETURNING employee_id AS "Employee ID"; ``` + ``` Employee ID ------------- @@ -176,7 +184,7 @@ INSERT 0 1 ## Using `INSERT` to add multiple rows at once -Inserting records one statement at a time is more time consuming and less efficient than inserting multiple rows at once. PostgreSQL allows you to specify multiple rows to add to the same table. Each new row is encapsulated in parentheses, with each set of parentheses separated by commas. +Inserting records one statement at a time is more time consuming and less efficient than inserting multiple rows at once. PostgreSQL allows you to specify multiple rows to add to the same table. Each new row is encapsulated in parentheses, with each set of parentheses separated by commas. The basic syntax for multi-record insertion looks like this: @@ -198,13 +206,14 @@ VALUES ('Katie', 'Singh'), ('Felipe', 'Espinosa'); ``` + ``` INSERT 0 4 ``` ## Using `DELETE` to remove rows from tables -The SQL `DELETE` command is used to remove rows from tables, functioning as the complementary action to `INSERT`. In order to remove rows from a table, you must identify the rows you wish to target by providing match criteria within a `WHERE` clause. +The SQL `DELETE` command is used to remove rows from tables, functioning as the complementary action to `INSERT`. In order to remove rows from a table, you must identify the rows you wish to target by providing match criteria within a `WHERE` clause. The basic syntax looks like this: @@ -219,6 +228,7 @@ For instance, to every row in our `employee` table that has its `first_name` set DELETE FROM employee WHERE first_name = 'Abigail'; ``` + ``` DELETE 1 ``` @@ -227,7 +237,7 @@ The return value here indicates that the `DELETE` command was processed with a s -To remove data from your tables using Prisma Client, use a [delete query](https://www.prisma.io/docs/concepts/components/prisma-client/crud#delete). +To remove data from your tables using Prisma Client, use a [delete query](https://www.prisma.io/docs/orm/prisma-client/queries/crud#delete). @@ -235,7 +245,6 @@ To remove data from your tables using Prisma Client, use a [delete query](https: As with the `INSERT` command, you can return the affected rows or specific columns from the deleted rows by adding a `RETURNING` clause: - ```sql DELETE FROM my_table WHERE @@ -249,6 +258,7 @@ DELETE FROM employee WHERE last_name = 'Smith' RETURNING *; ``` + ``` employee_id | first_name | last_name | last_update -------------+------------+-----------+---------------------------- @@ -269,8 +279,9 @@ DELETE FROM employee WHERE employee_id in (3,4) RETURNING *; ``` + ``` - employee_id | first_name | last_name | last_update + employee_id | first_name | last_name | last_update -------------+------------+-----------+---------------------------- 3 | Delores | Muniz | 2020-08-19 21:17:06.943608 4 | Simone | Kohler | 2020-08-19 21:19:19.298833 @@ -285,6 +296,7 @@ You can even leave out the `WHERE` clause to remove all of the rows from a given DELETE FROM employee RETURNING *; ``` + ``` employee_id | first_name | last_name | last_update -------------+------------+-----------+---------------------------- @@ -301,15 +313,15 @@ Be aware, however, that using `DELETE` to empty a table of data is [not as effic -Prisma Client uses a separate query called [deleteMany](https://www.prisma.io/docs/concepts/components/prisma-client/crud#deletemany) to delete multiple rows of data at one time. +Prisma Client uses a separate query called [deleteMany](https://www.prisma.io/docs/orm/prisma-client/queries/crud#deletemany) to delete multiple rows of data at one time. ## Conclusion -In this article, we introduced some of the most important commands to control what data is in your PostgreSQL tables. The `INSERT` command can be used to add new data to tables, while the `DELETE` command specifies which rows should be removed. Both commands are able to return the rows they affect and can operate on multiple rows at once. +In this article, we introduced some of the most important commands to control what data is in your PostgreSQL tables. The `INSERT` command can be used to add new data to tables, while the `DELETE` command specifies which rows should be removed. Both commands are able to return the rows they affect and can operate on multiple rows at once. -These two commands are the primary mechanisms used to manage increase or decrease the number of records your table contains. Getting a handle on their basic syntax as well as the ways that they can be combined with other clauses will allow you to populate and clean your tables as necessary. +These two commands are the primary mechanisms used to manage increase or decrease the number of records your table contains. Getting a handle on their basic syntax as well as the ways that they can be combined with other clauses will allow you to populate and clean your tables as necessary. ## FAQ @@ -341,7 +353,7 @@ VALUES
How do you check if a record exists before inserting in PostgreSQL? -One way to check if a record exists in PostgreSQL before inserting is by using the [`EXISTS` subquery expression](https://www.postgresql.org/docs/8.1/functions-subquery.html). +One way to check if a record exists in PostgreSQL before inserting is by using the [`EXISTS` subquery expression](https://www.postgresql.org/docs/8.1/functions-subquery.html). The `EXISTS` condition is used in combination with a subquery for the data you are checking for. It is considered to be met if the subquery returns at least one row. If no row is returned, then the record does not yet exist. diff --git a/content/04-postgresql/12-inserting-and-modifying-data/02-updating-existing-data.mdx b/content/04-postgresql/12-inserting-and-modifying-data/02-updating-existing-data.mdx index 75f3679f..38793cdc 100644 --- a/content/04-postgresql/12-inserting-and-modifying-data/02-updating-existing-data.mdx +++ b/content/04-postgresql/12-inserting-and-modifying-data/02-updating-existing-data.mdx @@ -1,16 +1,16 @@ --- title: 'How to update existing data in PostgreSQL' metaTitle: "How to update existing data in PostgreSQL | Prisma's Data Guide" -metaDescription: "The `UPDATE` command is the primary method of altering existing data within PostgreSQL. This guide demonstrates how to use the `UPDATE` operation to modify the values within your tables." +metaDescription: 'The `UPDATE` command is the primary method of altering existing data within PostgreSQL. This guide demonstrates how to use the `UPDATE` operation to modify the values within your tables.' metaImage: '/social/generic-postgresql.png' authors: ['justinellingwood'] --- ## Introduction -Records stored within databases are not often static. They must be updated to reflect changes in the systems they represent to remain relevant. PostgreSQL allows you to change the values in records using the `UPDATE` SQL command. +Records stored within databases are not often static. They must be updated to reflect changes in the systems they represent to remain relevant. PostgreSQL allows you to change the values in records using the `UPDATE` SQL command. -In many ways, `UPDATE` functions similar to `INSERT` (in that you specify columns and their desired values) and `DELETE` (in that you provide the criteria needed to target specific records). You can modify the data in any of the columns of a table either one at a time or in bulk. In this guide, we will explore how to use this command effectively to manage your data once it's already in tables. +In many ways, `UPDATE` functions similar to `INSERT` (in that you specify columns and their desired values) and `DELETE` (in that you provide the criteria needed to target specific records). You can modify the data in any of the columns of a table either one at a time or in bulk. In this guide, we will explore how to use this command effectively to manage your data once it's already in tables. ## Using `UPDATE` to modify data @@ -27,11 +27,11 @@ WHERE As shown above, the basic structure involves three separate clauses: -* specifying a [table](/intro/database-glossary#table) to act on, -* providing the [columns](/intro/database-glossary#column) you wish to update as well as their new values, and -* defining any criteria PostgreSQL needs to evaluate to determine which records to match +- specifying a [table](/intro/database-glossary#table) to act on, +- providing the [columns](/intro/database-glossary#column) you wish to update as well as their new values, and +- defining any criteria PostgreSQL needs to evaluate to determine which records to match -In the basic template above, we demonstrated a style assigning values to columns directly. You can also use the column list syntax too, as is often seen in `INSERT` commands. +In the basic template above, we demonstrated a style assigning values to columns directly. You can also use the column list syntax too, as is often seen in `INSERT` commands. For instance, the example above could also be specified like this: @@ -51,13 +51,13 @@ UPDATE -To update data with Prisma Client, issue an [update query](https://www.prisma.io/docs/concepts/components/prisma-client/crud#update). +To update data with Prisma Client, issue an [update query](https://www.prisma.io/docs/orm/prisma-client/queries/crud#update). ## Returning records modified by the `UPDATE` command -Like many other commands, PostgreSQL allows you to append a `RETURNING` clause onto the `UPDATE` command. This causes the commands to return all or part of the records that were modified. +Like many other commands, PostgreSQL allows you to append a `RETURNING` clause onto the `UPDATE` command. This causes the commands to return all or part of the records that were modified. You can use the star `*` symbol to return all of the columns of the modified rows: @@ -87,9 +87,9 @@ Here, we also used a column alias to set the label of the column header in the o ## Updating records based on values in another table -Updates based on providing new external data are relatively straightforward. You just need to provide the table, the columns, the new values, and the targeting criteria. +Updates based on providing new external data are relatively straightforward. You just need to provide the table, the columns, the new values, and the targeting criteria. -However, you can also use `UPDATE` to conditionally update table values based on information stored in a joined table. The basic syntax looks like this: +However, you can also use `UPDATE` to conditionally update table values based on information stored in a joined table. The basic syntax looks like this: ```sql UPDATE table1 @@ -98,7 +98,7 @@ FROM table2 WHERE table1.column2 = table2.column2; ``` -Here, we are updating the value of `column1` in the `table1` table to ``, but only in rows where `column2` of `table1` match `column2` of `table2`. The `FROM` clause indicates a join between the two tables and `WHERE` construction specifies the join conditions. +Here, we are updating the value of `column1` in the `table1` table to ``, but only in rows where `column2` of `table1` match `column2` of `table2`. The `FROM` clause indicates a join between the two tables and `WHERE` construction specifies the join conditions. As an example, suppose that we have two tables called `film` and `director`. @@ -134,7 +134,7 @@ VALUES
-These two tables have a relation with `film.director_id` referencing `director.id`. Currently, the `latest_film` for the `director` table is `NULL`. However, we can populate it by with the director's latest film title using `FROM` and `WHERE` clauses to bring to bring the two tables together. +These two tables have a relation with `film.director_id` referencing `director.id`. Currently, the `latest_film` for the `director` table is `NULL`. However, we can populate it by with the director's latest film title using `FROM` and `WHERE` clauses to bring to bring the two tables together. Here, we use a `WITH` clause to create a Common Table Expression (CTE) called `latest_films` that we can reference in our `UPDATE` statement: @@ -156,6 +156,7 @@ If you query the `director` table, it should show you each director's latest fil ```sql SELECT * FROM director; ``` + ``` id | name | latest_film ----+-------+-------------- @@ -167,4 +168,4 @@ SELECT * FROM director; ## Conclusion -In this guide, we've taken a look at the basic ways that you can modify existing data within a table using the `UPDATE` command. Using these basic concepts, you can specify the exact criteria necessary to identify the existing rows within a table, update column names with new values, and optionally return the rows that were impacted. The `UPDATE` command is essential for managing your data after its initial ingestion into your databases. +In this guide, we've taken a look at the basic ways that you can modify existing data within a table using the `UPDATE` command. Using these basic concepts, you can specify the exact criteria necessary to identify the existing rows within a table, update column names with new values, and optionally return the rows that were impacted. The `UPDATE` command is essential for managing your data after its initial ingestion into your databases. diff --git a/content/04-postgresql/12-inserting-and-modifying-data/03-insert-on-conflict.mdx b/content/04-postgresql/12-inserting-and-modifying-data/03-insert-on-conflict.mdx index 27a5527b..aef24b0d 100644 --- a/content/04-postgresql/12-inserting-and-modifying-data/03-insert-on-conflict.mdx +++ b/content/04-postgresql/12-inserting-and-modifying-data/03-insert-on-conflict.mdx @@ -8,9 +8,9 @@ authors: ['justinellingwood'] ## Introduction -PostgreSQL lets you either add or modify a record within a table depending on whether the record already exists. This is commonly known as an ["upsert" operation](/intro/database-glossary#upsert) (a portmanteau of "insert" and "update"). +PostgreSQL lets you either add or modify a record within a table depending on whether the record already exists. This is commonly known as an ["upsert" operation](/intro/database-glossary#upsert) (a portmanteau of "insert" and "update"). -The actual implementation within PostgreSQL uses the `INSERT` command with a special `ON CONFLICT` clause to specify what to do if the record already exists within the table. You can specify whether you want the record to be updated if it's found in the table already or silently skipped. +The actual implementation within PostgreSQL uses the `INSERT` command with a special `ON CONFLICT` clause to specify what to do if the record already exists within the table. You can specify whether you want the record to be updated if it's found in the table already or silently skipped. ## How to use the `INSERT...ON CONFLICT` construct @@ -26,20 +26,19 @@ VALUES ON CONFLICT ; ``` -In this context, the `` specifies what conflict you want to define a policy for. This can be any of these: +In this context, the `` specifies what conflict you want to define a policy for. This can be any of these: -* The name of a specific column or columns: `(column1)` -* The name of a unique constraint: `ON CONSTRAINT ` +- The name of a specific column or columns: `(column1)` +- The name of a unique constraint: `ON CONSTRAINT ` -The companion `` item will define what PostgreSQL should do if a conflict arises. The `` specified can be one of the following: +The companion `` item will define what PostgreSQL should do if a conflict arises. The `` specified can be one of the following: -* `DO NOTHING`: Tells PostgreSQL to leave the conflicting record as-is. In essence, this action makes no changes, but suppresses the error that would normally occur if you tried to insert a row that violates a condition. -* `DO UPDATE`: This tells PostgreSQL that you want to update the row that is already in the table. The syntax for the update mirrors that of the normal `UPDATE` command. +- `DO NOTHING`: Tells PostgreSQL to leave the conflicting record as-is. In essence, this action makes no changes, but suppresses the error that would normally occur if you tried to insert a row that violates a condition. +- `DO UPDATE`: This tells PostgreSQL that you want to update the row that is already in the table. The syntax for the update mirrors that of the normal `UPDATE` command. -When `DO UPDATE` is specified, a special virtual table called `EXCLUDED` is available for use within the `UPDATE` clause. The table contains the values suggested in the original `INSERT` command (that conflicted with the existing table values). +When `DO UPDATE` is specified, a special virtual table called `EXCLUDED` is available for use within the `UPDATE` clause. The table contains the values suggested in the original `INSERT` command (that conflicted with the existing table values). - -**Note:** If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), you can perform upsert operations using the dedicated [upsert operation](https://www.prisma.io/docs/concepts/components/prisma-client/crud#upsert). +**Note:** If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), you can perform upsert operations using the dedicated [upsert operation](https://www.prisma.io/docs/orm/prisma-client/queries/crud#upsert). ## Using the `DO NOTHING` action @@ -63,7 +62,7 @@ VALUES
-Let's take a look at how PostgreSQL normally handles an insertion where a proposed row conflicts with existing data. Assuming there's already a director with an `id` of 3, PostgreSQL throws an error: +Let's take a look at how PostgreSQL normally handles an insertion where a proposed row conflicts with existing data. Assuming there's already a director with an `id` of 3, PostgreSQL throws an error: ```sql INSERT INTO director (id, name) @@ -71,12 +70,13 @@ VALUES (3, 'susan'), (4, 'delores'); ``` + ``` ERROR: duplicate key value violates unique constraint "director_pkey" DETAIL: Key (id)=(3) already exists. ``` -In this case, neither of the proposed records were added, even if only the first one had a conflict. If we want to continue adding any rows that do not have a conflict, we can use a `ON CONFLICT DO NOTHING` clause. +In this case, neither of the proposed records were added, even if only the first one had a conflict. If we want to continue adding any rows that do not have a conflict, we can use a `ON CONFLICT DO NOTHING` clause. Here, we tell PostgreSQL to move on if a conflict occurs and continue processing the other rows: @@ -87,6 +87,7 @@ VALUES (4, 'delores') ON CONFLICT (id) DO NOTHING; ``` + ``` INSERT 0 1 ``` @@ -96,6 +97,7 @@ If you query the table, it will show that the second record was added even thoug ```sql SELECT * FROM director; ``` + ``` id | name | latest_film ----+---------+-------------- @@ -121,17 +123,19 @@ VALUES ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name; ``` + ``` INSERT 0 3 ``` -This time, we specify a modification to make to the existing row if it conflicts with one of our proposed insertions. We use the virtual `EXCLUDED` table, which contains the items we intended to insert, to update the `name` column to a new value on conflict. +This time, we specify a modification to make to the existing row if it conflicts with one of our proposed insertions. We use the virtual `EXCLUDED` table, which contains the items we intended to insert, to update the `name` column to a new value on conflict. You can show that the records were all updated or added by typing: ```sql SELECT * FROM director; ``` + ``` id | name | latest_film ----+---------+-------------- @@ -146,6 +150,6 @@ SELECT * FROM director; ## Conclusion -PostgreSQL's `INSERT...ON CONFLICT` construct allows you to choose between two options when a proposed record conflicts with an existing record. Both `DO NOTHING` and `DO UPDATE` have their uses depending on the way the data you're adding relates to the existing content. +PostgreSQL's `INSERT...ON CONFLICT` construct allows you to choose between two options when a proposed record conflicts with an existing record. Both `DO NOTHING` and `DO UPDATE` have their uses depending on the way the data you're adding relates to the existing content. -The `DO NOTHING` option allows you to silently skip conflicting rows, allowing you to add any additional records that do _not_ conflict. Meanwhile, the `DO UPDATE` choice let's you conditionally alter the existing record when a conflict occurs, optionally using values from the original proposed row. Understanding the scenario where each may be useful and learning how to this use general format can help simplify your queries when adding new data to an existing data set. +The `DO NOTHING` option allows you to silently skip conflicting rows, allowing you to add any additional records that do _not_ conflict. Meanwhile, the `DO UPDATE` choice let's you conditionally alter the existing record when a conflict occurs, optionally using values from the original proposed row. Understanding the scenario where each may be useful and learning how to this use general format can help simplify your queries when adding new data to an existing data set. diff --git a/content/04-postgresql/12-inserting-and-modifying-data/05-using-transactions.mdx b/content/04-postgresql/12-inserting-and-modifying-data/05-using-transactions.mdx index d41faef9..de54b425 100644 --- a/content/04-postgresql/12-inserting-and-modifying-data/05-using-transactions.mdx +++ b/content/04-postgresql/12-inserting-and-modifying-data/05-using-transactions.mdx @@ -8,106 +8,106 @@ authors: ['justinellingwood'] ## Introduction -Transactions are a mechanism that encapsulates multiple statements into a single operation for the database to process. Instead of feeding in individual statements, the database is able to interpret and act on the group of commands as a cohesive unit. This helps ensure the consistency of the dataset over the course of many closely related statements. +Transactions are a mechanism that encapsulates multiple statements into a single operation for the database to process. Instead of feeding in individual statements, the database is able to interpret and act on the group of commands as a cohesive unit. This helps ensure the consistency of the dataset over the course of many closely related statements. -In this guide, we'll start by discussing what transactions are and why they are beneficial. Afterwards, we'll take a look at how PostgreSQL implements transactions and the various options you have when using them. +In this guide, we'll start by discussing what transactions are and why they are beneficial. Afterwards, we'll take a look at how PostgreSQL implements transactions and the various options you have when using them. ## What are transactions? -[Transactions](/intro/database-glossary#transaction) are a way to group together and isolate multiple statements for processing as a single operation. Instead of executing each command individually as it is sent to the server, in a transaction, commands are bundled together and executed in a separate context than other requests. +[Transactions](/intro/database-glossary#transaction) are a way to group together and isolate multiple statements for processing as a single operation. Instead of executing each command individually as it is sent to the server, in a transaction, commands are bundled together and executed in a separate context than other requests. -Isolation is an important part of transactions. Within a transaction, the executed statements can only affect the environment within the transaction itself. From inside the transaction, statements can modify data and the results are immediately visible. From the outside, no changes are made until the transaction is committed, at which time all of the actions within the transaction become visible at once. +Isolation is an important part of transactions. Within a transaction, the executed statements can only affect the environment within the transaction itself. From inside the transaction, statements can modify data and the results are immediately visible. From the outside, no changes are made until the transaction is committed, at which time all of the actions within the transaction become visible at once. -These features help databases achieve [ACID compliance](/intro/database-glossary#acid) by providing [atomicity](/intro/database-glossary#atomicity) (actions in a transaction are either all committed or all rolled back) and [isolation](/intro/database-glossary#isolation) (outside of the transaction, nothing changes until the commit while inside, the statements have consequences). These together help the database maintain [consistency](/intro/database-glossary#consistency) (by guaranteeing that partial data transformations cannot occur). Furthermore, changes in transactions are not returned as successful until they are committed to non-volatile storage, which provides [durability](/intro/database-glossary#durability). +These features help databases achieve [ACID compliance](/intro/database-glossary#acid) by providing [atomicity](/intro/database-glossary#atomicity) (actions in a transaction are either all committed or all rolled back) and [isolation](/intro/database-glossary#isolation) (outside of the transaction, nothing changes until the commit while inside, the statements have consequences). These together help the database maintain [consistency](/intro/database-glossary#consistency) (by guaranteeing that partial data transformations cannot occur). Furthermore, changes in transactions are not returned as successful until they are committed to non-volatile storage, which provides [durability](/intro/database-glossary#durability). -To achieve these goals, transactions employ a number of different strategies and different database systems use different methods. PostgreSQL uses a system called [Multiversion Concurrency Control (MVCC)](/intro/database-glossary#multiversion-concurrency-control), which allows the database to perform these actions without unnecessary locking using data snapshots. All together, these systems comprise one of the fundamental building blocks of modern relational databases, allowing them to safely process complex data in a crash-resistant manner. +To achieve these goals, transactions employ a number of different strategies and different database systems use different methods. PostgreSQL uses a system called [Multiversion Concurrency Control (MVCC)](/intro/database-glossary#multiversion-concurrency-control), which allows the database to perform these actions without unnecessary locking using data snapshots. All together, these systems comprise one of the fundamental building blocks of modern relational databases, allowing them to safely process complex data in a crash-resistant manner. ## Types of consistency failures -One reason people use transactions is to gain certain guarantees about the consistency of their data and the environment in which it is processed. Consistency can be broken in many different ways, which affects how databases attempt to prevent them. +One reason people use transactions is to gain certain guarantees about the consistency of their data and the environment in which it is processed. Consistency can be broken in many different ways, which affects how databases attempt to prevent them. -There are four primary ways that inconsistency can arise depending on the transaction implementation. Your tolerance for scenarios where these scenarios may arise will affect how you use transactions in your applications. +There are four primary ways that inconsistency can arise depending on the transaction implementation. Your tolerance for scenarios where these scenarios may arise will affect how you use transactions in your applications. ### Dirty reads -[Dirty reads](/intro/database-glossary#dirty-read) occur when the statements within a transaction are able to read data written by other in-progress transactions. This means that even though the statements of a transaction have *not* been committed yet, they can be read and thus influence other transactions. +[Dirty reads](/intro/database-glossary#dirty-read) occur when the statements within a transaction are able to read data written by other in-progress transactions. This means that even though the statements of a transaction have _not_ been committed yet, they can be read and thus influence other transactions. -This is often considered a severe breach of consistency, as transactions are not properly isolated form one another. Statements that may never be committed to the database can affect the execution of other transactions, modifying their behavior. +This is often considered a severe breach of consistency, as transactions are not properly isolated form one another. Statements that may never be committed to the database can affect the execution of other transactions, modifying their behavior. Transactions that allow dirty reads cannot make any reasonable claims about the consistency of the resulting data. ### Non-repeatable reads -[Nonrepeatable reads](/intro/database-glossary#nonrepeatable-read) occur when a commit outside of the transaction alters the data seen within the transaction. You can recognize this type of problem if, within a transaction, the same data is read twice but different values are retrieved in each instance. +[Nonrepeatable reads](/intro/database-glossary#nonrepeatable-read) occur when a commit outside of the transaction alters the data seen within the transaction. You can recognize this type of problem if, within a transaction, the same data is read twice but different values are retrieved in each instance. -As with dirty reads, transactions that allow non-repeatable reads don't offer full isolation between transactions. The difference is that with non-repeatable reads, the statements affecting the transaction have actually been committed outside of the transaction. +As with dirty reads, transactions that allow non-repeatable reads don't offer full isolation between transactions. The difference is that with non-repeatable reads, the statements affecting the transaction have actually been committed outside of the transaction. ### Phantom read A [phantom read](/intro/database-glossary#phantom-read) is a specific type of non-repeatable read that occurs when the rows returned by a query are different the second time it is executed within a transaction. -For instance, if a query within the transaction returns four rows the first time it is executed, but five rows the second time, this is a phantom read. Phantom reads are caused by commits outside of the transaction altering the number of rows that satisfy the query. +For instance, if a query within the transaction returns four rows the first time it is executed, but five rows the second time, this is a phantom read. Phantom reads are caused by commits outside of the transaction altering the number of rows that satisfy the query. ### Serialization anomalies -[Serialization anomalies](/intro/database-glossary#serialization-anomaly) occur when the results of multiple transactions committed concurrently will result in different outcomes than if they were committed one after another. This can occur any time that a transaction allows two commits to occur that each modify the same table or data without resolving conflicts. +[Serialization anomalies](/intro/database-glossary#serialization-anomaly) occur when the results of multiple transactions committed concurrently will result in different outcomes than if they were committed one after another. This can occur any time that a transaction allows two commits to occur that each modify the same table or data without resolving conflicts. -Serialization anomalies are a special type of problem that early types of transactions had no understanding of. This is because early transactions were implemented with locking, where one could not continue if another transaction was reading from or altering the same piece of data. +Serialization anomalies are a special type of problem that early types of transactions had no understanding of. This is because early transactions were implemented with locking, where one could not continue if another transaction was reading from or altering the same piece of data. ## Transaction isolation levels -Transactions are not a "one size fits all" solution. Different scenarios require different trade-offs between performance and protection. Fortunately, PostgreSQL allows you to specify the type of transaction isolation you need. +Transactions are not a "one size fits all" solution. Different scenarios require different trade-offs between performance and protection. Fortunately, PostgreSQL allows you to specify the type of transaction isolation you need. The isolation levels offered by most database systems include the following: ### Read uncommitted -[**Read uncommitted**](/intro/database-glossary#read-uncommitted-isolation-level) is the isolation level that offers the fewest guarantees about maintaining data consistency and isolation. While transactions using `read uncommitted` have certain features frequently associated with transactions, like the ability to commit multiple statements at once or to roll back statements if a mistake occurs, they *do* allow numerous situations where consistency can be broken. +[**Read uncommitted**](/intro/database-glossary#read-uncommitted-isolation-level) is the isolation level that offers the fewest guarantees about maintaining data consistency and isolation. While transactions using `read uncommitted` have certain features frequently associated with transactions, like the ability to commit multiple statements at once or to roll back statements if a mistake occurs, they _do_ allow numerous situations where consistency can be broken. Transactions configured with the `read uncommitted` isolation level allow: -* dirty reads -* non-repeatable reads -* phantom reads -* serialization anomalies +- dirty reads +- non-repeatable reads +- phantom reads +- serialization anomalies -This level of isolation is actually not implemented in PostgreSQL. Although PostgreSQL recognizes the isolation level name, internally, it is not actually supported and "read committed" (described below) will be used instead. +This level of isolation is actually not implemented in PostgreSQL. Although PostgreSQL recognizes the isolation level name, internally, it is not actually supported and "read committed" (described below) will be used instead. ### Read committed -[**Read committed**](/intro/database-glossary#read-committed-isolation-level) is an isolation level that specifically protects against dirty reads. When transactions use the `read committed` level of consistency, uncommitted data can never affect the internal context of a transaction. This provides a basic level of consistency by ensuring that uncommitted data never influences a transaction. +[**Read committed**](/intro/database-glossary#read-committed-isolation-level) is an isolation level that specifically protects against dirty reads. When transactions use the `read committed` level of consistency, uncommitted data can never affect the internal context of a transaction. This provides a basic level of consistency by ensuring that uncommitted data never influences a transaction. -Although `read committed` offers greater protection than `read uncommitted`, it does not protect against all types of inconsistency. These problems can still arise: +Although `read committed` offers greater protection than `read uncommitted`, it does not protect against all types of inconsistency. These problems can still arise: -* non-repeatable reads -* phantom reads -* serialization anomalies +- non-repeatable reads +- phantom reads +- serialization anomalies PostgreSQL will use the `read committed` level by default if no other isolation level is specified. ### Repeatable read -The [**repeatable read**](/intro/database-glossary#repeatable-read-isolation-level) isolation level builds off of the guarantee provided by `read committed`. It avoids dirty reads as before, but prevents non-repeatable reads as well. +The [**repeatable read**](/intro/database-glossary#repeatable-read-isolation-level) isolation level builds off of the guarantee provided by `read committed`. It avoids dirty reads as before, but prevents non-repeatable reads as well. -This means that no changes committed outside of the transaction will ever impact the data read within the transaction. A query executed at the start of a transaction will never have a different result at the end of the transaction unless directly caused by statements within the transaction. +This means that no changes committed outside of the transaction will ever impact the data read within the transaction. A query executed at the start of a transaction will never have a different result at the end of the transaction unless directly caused by statements within the transaction. -While the standard definition of the `repeatable read` isolation level requires only that dirty and non-repeatable reads are prevented, PostgreSQL also prevents phantom reads at this level. This means that commits outside of the transaction cannot alter the number of rows that satisfy a query. +While the standard definition of the `repeatable read` isolation level requires only that dirty and non-repeatable reads are prevented, PostgreSQL also prevents phantom reads at this level. This means that commits outside of the transaction cannot alter the number of rows that satisfy a query. -Since the state of the data seen within the transaction can deviate from the up-to-date data in the database, transactions can fail on commit if the two datasets cannot be reconciled. Because of this, one drawback of this isolation level is that you may have to retry transactions if there is a serialization failure on commit. +Since the state of the data seen within the transaction can deviate from the up-to-date data in the database, transactions can fail on commit if the two datasets cannot be reconciled. Because of this, one drawback of this isolation level is that you may have to retry transactions if there is a serialization failure on commit. PostgreSQL's `repeatable read` isolation level blocks most types of consistency issues but serialization anomalies can still occur. ### Serializable -The [**serializable**](/intro/database-glossary#serializable-isolation-level) isolation level offers the highest level of isolation and consistency. It prevents all of the scenarios that the `repeatable read` level does while also removing the possibility of serialization anomalies. +The [**serializable**](/intro/database-glossary#serializable-isolation-level) isolation level offers the highest level of isolation and consistency. It prevents all of the scenarios that the `repeatable read` level does while also removing the possibility of serialization anomalies. -Serializable isolation guarantees that concurrent transactions are committed as if they were executed one after another. If a scenario occurs where a serialization anomaly could be introduced, one of the transactions will have a serialization failure instead of introducing inconsistency to the data set. +Serializable isolation guarantees that concurrent transactions are committed as if they were executed one after another. If a scenario occurs where a serialization anomaly could be introduced, one of the transactions will have a serialization failure instead of introducing inconsistency to the data set. ## Defining a transaction Now that we've covered the different isolation levels that PostgreSQL can use in transactions, let's demonstrate how to define transactions. -In PostgreSQL, every statement *outside* of an explicitly marked transaction is actually executed in its own, single-statement transaction. To explicitly start a transaction block, you can use either the `BEGIN` or `START TRANSACTION` commands (they are synonymous). To commit a transaction, issue the `COMMIT` command. +In PostgreSQL, every statement _outside_ of an explicitly marked transaction is actually executed in its own, single-statement transaction. To explicitly start a transaction block, you can use either the `BEGIN` or `START TRANSACTION` commands (they are synonymous). To commit a transaction, issue the `COMMIT` command. The basic syntax of a transaction therefore looks like this: @@ -119,7 +119,7 @@ statements COMMIT; ``` -As a more concrete example, imagine that we are attempting to transfer $1000 from one account to another. We want to ensure that the money will always be in one of the two accounts but never in both of them. +As a more concrete example, imagine that we are attempting to transfer $1000 from one account to another. We want to ensure that the money will always be in one of the two accounts but never in both of them. We can wrap the two statements that together encapsulate this transfer in a transaction that looks like this: @@ -137,15 +137,15 @@ UPDATE accounts COMMIT; ``` -Here, the $1000 will not be taken out of the account with `id = 1` without also putting $1000 into the account with `id = 2`. While these two statements are executed sequentially *within* the transaction, they will be committed, and thus be executed on the underlying data set, simultaneously. +Here, the $1000 will not be taken out of the account with `id = 1` without also putting $1000 into the account with `id = 2`. While these two statements are executed sequentially _within_ the transaction, they will be committed, and thus be executed on the underlying data set, simultaneously. ## Rolling back transactions -Within a transaction, either all or none of the statements will be committed to the database. Abandoning the statements and modifications made within a transaction instead of applying them to the database is known as "rolling back" the transaction. +Within a transaction, either all or none of the statements will be committed to the database. Abandoning the statements and modifications made within a transaction instead of applying them to the database is known as "rolling back" the transaction. -Transactions can be rolled back either automatically or manually. PostgreSQL automatically rolls back transactions if one of the statements within results in an error. It also rolls back the transaction if serialization error would occur if the chosen isolation level does not allow them. +Transactions can be rolled back either automatically or manually. PostgreSQL automatically rolls back transactions if one of the statements within results in an error. It also rolls back the transaction if serialization error would occur if the chosen isolation level does not allow them. -To manually roll back statements that have been given during the current transaction, you can use the `ROLLBACK` command. This will cancel all of the statements within the transaction, in essence turning back the clock to the start of the transaction. +To manually roll back statements that have been given during the current transaction, you can use the `ROLLBACK` command. This will cancel all of the statements within the transaction, in essence turning back the clock to the start of the transaction. For instance, supposing we're using the same bank accounts example we were using before, if we find out after issuing the `UPDATE` statements that we accidentally transferred the wrong amount or used the wrong accounts, we could rollback the changes instead of committing them: @@ -155,7 +155,7 @@ For instance, supposing we're using the same bank accounts example we were using UPDATE accounts SET balance = balance - 1500 WHERE id = 1; - + UPDATE accounts SET balance = balance + 1500 WHERE id = 3; -- Wrong account number here! Must rollback @@ -168,11 +168,11 @@ Once we `ROLLBACK`, the $1500 will still be in the account with `id = 1`. ## Using save points when rolling back -By default, the `ROLLBACK` command resets the transaction to where it was when the `BEGIN` or `START TRANSACTION` commands were first called. But what if we only want to revert some of the statements within the transaction? +By default, the `ROLLBACK` command resets the transaction to where it was when the `BEGIN` or `START TRANSACTION` commands were first called. But what if we only want to revert some of the statements within the transaction? -While you cannot specify arbitrary places to roll back to when issuing `ROLLBACK` command, you *can* roll back to any "save points" that you've set throughout the transaction. You can mark places in your transaction ahead of time with the `SAVEPOINT` command and then reference those specific locations when you need to roll back. +While you cannot specify arbitrary places to roll back to when issuing `ROLLBACK` command, you _can_ roll back to any "save points" that you've set throughout the transaction. You can mark places in your transaction ahead of time with the `SAVEPOINT` command and then reference those specific locations when you need to roll back. -These save points allow you to create an intermediate roll back point. You can then optionally revert any statements made between where you are currently and the save point and then continue working on your transaction. +These save points allow you to create an intermediate roll back point. You can then optionally revert any statements made between where you are currently and the save point and then continue working on your transaction. To specify a save point, issue the `SAVEPOINT` command followed by a name for the save point: @@ -190,7 +190,7 @@ Let's continue the account-focused example we've been using: ```sql BEGIN; - + UPDATE accounts SET balance = balance - 1500 WHERE id = 1; @@ -213,11 +213,11 @@ SAVEPOINT save_1; COMMIT; ``` -Here, we're able to recover from a mistake we made without losing all of the work we've done in the transaction so far. After rolling back, we continue with the transaction as planned using the correct statements. +Here, we're able to recover from a mistake we made without losing all of the work we've done in the transaction so far. After rolling back, we continue with the transaction as planned using the correct statements. ## Setting the isolation level of transactions -To set the level of isolation you'd like for a transaction, you can add an `ISOLATION LEVEL` clause to your `START TRANSACTION` or `BEGIN` command. The basic syntax looks like this: +To set the level of isolation you'd like for a transaction, you can add an `ISOLATION LEVEL` clause to your `START TRANSACTION` or `BEGIN` command. The basic syntax looks like this: ```sql BEGIN ISOLATION LEVEL ; @@ -229,24 +229,24 @@ COMMIT; The `` can be any of these (described in detail earlier): -* `READ UNCOMMITTED` (will result in `READ COMMITTED` since this level isn't implemented in PostgreSQL) -* `READ COMMITTED` -* `REPEATABLE READ` -* `SERIALIZABLE` +- `READ UNCOMMITTED` (will result in `READ COMMITTED` since this level isn't implemented in PostgreSQL) +- `READ COMMITTED` +- `REPEATABLE READ` +- `SERIALIZABLE` -The `SET TRANSACTION` command can also used to set the isolation level after a transaction is started. However, you can only use `SET TRANSACTION` before any queries or data modifying commands are executed, so it doesn't allow for increased flexibility. +The `SET TRANSACTION` command can also used to set the isolation level after a transaction is started. However, you can only use `SET TRANSACTION` before any queries or data modifying commands are executed, so it doesn't allow for increased flexibility. ## Chaining transactions If you have multiple transactions that should be executed sequentially, you can optionally chain them together using the `COMMIT AND CHAIN` command. -The `COMMIT AND CHAIN` command completes the current transaction by committing the statements within. After the commit has been processed, it immediately opens a new transaction. This allows you to group another set of statements together in a transaction. +The `COMMIT AND CHAIN` command completes the current transaction by committing the statements within. After the commit has been processed, it immediately opens a new transaction. This allows you to group another set of statements together in a transaction. The statement works exactly as if you'd issued `COMMIT; BEGIN`: ```sql BEGIN; - + UPDATE accounts SET balance = balance - 1500 WHERE id = 1; @@ -273,12 +273,12 @@ Chaining transactions doesn't offer much in terms of new functionality, but it c ## Conclusion -Transactions are not a silver bullet. There are a lot of trade offs that come with various isolation levels and understanding what types of consistency you need to protect can take thought and planning. This is especially true with long running transactions where the underlying data may change significantly and the possibility of conflict with other concurrent transactions increases. +Transactions are not a silver bullet. There are a lot of trade offs that come with various isolation levels and understanding what types of consistency you need to protect can take thought and planning. This is especially true with long running transactions where the underlying data may change significantly and the possibility of conflict with other concurrent transactions increases. -That being said, the transaction mechanic offers a lot of flexibility and power. It goes a long way towards ensuring ACID guarantees are maintained even while performing interrelated, concurrent operations. Knowing when and how to properly use transactions to perform complex, safe operations is invaluable. +That being said, the transaction mechanic offers a lot of flexibility and power. It goes a long way towards ensuring ACID guarantees are maintained even while performing interrelated, concurrent operations. Knowing when and how to properly use transactions to perform complex, safe operations is invaluable. -If you are using JavaScript or TypeScript, you can use [Prisma to manage your PostgreSQL database](https://www.prisma.io/docs/concepts/database-connectors/postgresql). Any operations using the [transactions API](https://www.prisma.io/docs/concepts/components/prisma-client/transactions#the-transaction-api) will use the PostgreSQL server's default isolation level. As an alternative to using transactions interactively, Prisma also provides transaction behavior through [nested writes](https://www.prisma.io/docs/concepts/components/prisma-client/transactions#nested-writes) and with [bulk or batch operations](https://www.prisma.io/docs/concepts/components/prisma-client/transactions#batchbulk-operations). You can learn more by reading [Prisma's transaction guide](https://www.prisma.io/docs/guides/performance-and-optimization/prisma-client-transactions-guide). +If you are using JavaScript or TypeScript, you can use [Prisma to manage your PostgreSQL database](https://www.prisma.io/docs/orm/overview/databases/postgresql). Any operations using the [transactions API](https://www.prisma.io/docs/orm/prisma-client/queries/transactions#the-transaction-api) will use the PostgreSQL server's default isolation level. As an alternative to using transactions interactively, Prisma also provides transaction behavior through [nested writes](https://www.prisma.io/docs/orm/prisma-client/queries/transactions#nested-writes) and with [bulk or batch operations](https://www.prisma.io/docs/orm/prisma-client/queries/transactions#batchbulk-operations). You can learn more by reading [Prisma's transaction guide](https://www.prisma.io/docs/orm/prisma-client/queries/transactions). diff --git a/content/04-postgresql/13-reading-and-querying-data/01-basic-select.mdx b/content/04-postgresql/13-reading-and-querying-data/01-basic-select.mdx index 3bb44307..ba1a2d9a 100644 --- a/content/04-postgresql/13-reading-and-querying-data/01-basic-select.mdx +++ b/content/04-postgresql/13-reading-and-querying-data/01-basic-select.mdx @@ -1,16 +1,16 @@ --- title: 'How to perform basic queries with `SELECT` in PostgreSQL' metaTitle: "Basic Select | PostgreSQL Basic Queries | Prisma's Data Guide" -metaDescription: "The `SELECT` command is the main way to query the data within tables and views in PostgreSQL. This guide demonstrates the basic syntax and operation of this highly flexible command." +metaDescription: 'The `SELECT` command is the main way to query the data within tables and views in PostgreSQL. This guide demonstrates the basic syntax and operation of this highly flexible command.' metaImage: '/social/generic-postgresql.png' authors: ['justinellingwood'] --- ## Introduction -The `SELECT` command is the primary way to query and read information about records stored within database tables within PostgreSQL. Its usefulness, however, is not restricted to [read-only operations](/intro/database-glossary#read-operation). The `SELECT` syntax is combined with many other commands to target specific records or fields within databases for updates, deletions, and more complex operations. +The `SELECT` command is the primary way to query and read information about records stored within database tables within PostgreSQL. Its usefulness, however, is not restricted to [read-only operations](/intro/database-glossary#read-operation). The `SELECT` syntax is combined with many other commands to target specific records or fields within databases for updates, deletions, and more complex operations. -In this guide, we'll show how the basic syntax of `SELECT` supports gathering data from tables. While we'll leave the vast number of optional clauses to the command for other articles, it will hopefully become evident how even the most basic components provide a strong foundation for querying data. These fundamentals only require you to learn a few clauses and constructions. +In this guide, we'll show how the basic syntax of `SELECT` supports gathering data from tables. While we'll leave the vast number of optional clauses to the command for other articles, it will hopefully become evident how even the most basic components provide a strong foundation for querying data. These fundamentals only require you to learn a few clauses and constructions. ## The general syntax of the `SELECT` command @@ -22,14 +22,14 @@ SELECT FROM ; This statement is composed of a few different pieces: -* `SELECT`: The `SELECT` command itself. This SQL statement indicates that we want to query tables or views for data they contains. The arguments and clauses surrounding it determine both the contents and the format of the output by defining criteria. -* ``: The `SELECT` statement can return entire rows (indicated by the `*` wildcard character) or a subset of the available [columns](/intro/database-glossary#column). If you want to output only specific columns, provide the column names you'd like to display, separated by commas. -* `FROM `: The `FROM` keyword is used to indicate the [table](/intro/database-glossary#table) or [view](/intro/database-glossary#view) that should be queried. In most simple queries, this consists of a single table that contains the data you're interested in. -* ``: A large number of filters, output modifiers, and conditions can be specified as additions to the `SELECT` command. You can use these to help pinpoint data with specific properties, modify the output formatting, or further process the results. +- `SELECT`: The `SELECT` command itself. This SQL statement indicates that we want to query tables or views for data they contains. The arguments and clauses surrounding it determine both the contents and the format of the output by defining criteria. +- ``: The `SELECT` statement can return entire rows (indicated by the `*` wildcard character) or a subset of the available [columns](/intro/database-glossary#column). If you want to output only specific columns, provide the column names you'd like to display, separated by commas. +- `FROM `: The `FROM` keyword is used to indicate the [table](/intro/database-glossary#table) or [view](/intro/database-glossary#view) that should be queried. In most simple queries, this consists of a single table that contains the data you're interested in. +- ``: A large number of filters, output modifiers, and conditions can be specified as additions to the `SELECT` command. You can use these to help pinpoint data with specific properties, modify the output formatting, or further process the results. -You can learn about how to query with Prisma Client in our guide on [CRUD operations](https://www.prisma.io/docs/concepts/components/prisma-client/crud). +You can learn about how to query with Prisma Client in our guide on [CRUD operations](https://www.prisma.io/docs/orm/prisma-client/queries/crud). @@ -43,9 +43,9 @@ For ad hoc querying and during data exploration, one of the most helpful options SELECT * FROM my_table; ``` -This will display all of the records from `my_table` since we do not provide any filtering to narrow the results. All of the columns for each record will be shown in the order that they are defined within the table. +This will display all of the records from `my_table` since we do not provide any filtering to narrow the results. All of the columns for each record will be shown in the order that they are defined within the table. -You can also choose to view a subset of available column by specifying them by name. Column names are separated by commas and are displayed in the order in which they are given: +You can also choose to view a subset of available column by specifying them by name. Column names are separated by commas and are displayed in the order in which they are given: ```sql SELECT column2, column1 FROM my_table; @@ -55,7 +55,7 @@ This will display all of the records from `my_table`, but only show the columns -If using Prisma Client, you can achieve the same results using [select fields](https://www.prisma.io/docs/concepts/components/prisma-client/select-fields). +If using Prisma Client, you can achieve the same results using [select fields](https://www.prisma.io/docs/orm/prisma-client/queries/select-fields). @@ -67,21 +67,21 @@ You can optionally set _column aliases_ to modify the name used for columns in t SELECT column1 AS "first column" FROM my_table; ``` -This will show the each of the values for `column1` in `my_table`. However, the column in the output will be labeled as `first column` instead of `column1`. +This will show the each of the values for `column1` in `my_table`. However, the column in the output will be labeled as `first column` instead of `column1`. This is especially useful if the output combines column names from multiple tables that might share names or if it includes computed columns that don't already have a name. ## Defining sort order with `ORDER BY` -The `ORDER BY` clause can be used to sort the resulting rows according to the criteria given. The general syntax looks like this: +The `ORDER BY` clause can be used to sort the resulting rows according to the criteria given. The general syntax looks like this: ```sql SELECT * FROM my_table ORDER BY ; ``` -This will display the values for all columns in all records within `my_table`. The results will be ordered according to the expression represented by the placeholder ``. +This will display the values for all columns in all records within `my_table`. The results will be ordered according to the expression represented by the placeholder ``. -For example, suppose we have a `customer` table that contains columns for `first_name`, `last_name`, `address`, and `phone_number`. If we want to display the results in alphabetical order by `last_name`, we could use the following command: +For example, suppose we have a `customer` table that contains columns for `first_name`, `last_name`, `address`, and `phone_number`. If we want to display the results in alphabetical order by `last_name`, we could use the following command: @@ -139,7 +139,7 @@ sue | abed | 456 side st | 5557654321 -You can also sort by multiple columns. Here, we sort first by `last_name`, and then by `first_name` for any columns with the same `last_name` value. Both sorts are in ascending order: +You can also sort by multiple columns. Here, we sort first by `last_name`, and then by `first_name` for any columns with the same `last_name` value. Both sorts are in ascending order: @@ -167,7 +167,7 @@ john | smith | 123 main st | 5551234567 -One additional option that is often important is clarifying where `NULL` values should be presented in the sort order. You can do this by adding `NULLS FIRST` (the default) or `NULLS LAST` for any sort column: +One additional option that is often important is clarifying where `NULL` values should be presented in the sort order. You can do this by adding `NULLS FIRST` (the default) or `NULLS LAST` for any sort column: ```sql SELECT * FROM customer ORDER BY last_name NULLS LAST; @@ -175,13 +175,13 @@ SELECT * FROM customer ORDER BY last_name NULLS LAST; -You can [sort your results](https://www.prisma.io/docs/concepts/components/prisma-client/filtering-and-sorting) with Prisma Client in much the same way as you would in an SQL query. +You can [sort your results](https://www.prisma.io/docs/orm/prisma-client/queries/filtering-and-sorting) with Prisma Client in much the same way as you would in an SQL query. ## Getting distinct results -If you want to find the range of values for a column in PostgreSQL, you can use the `SELECT DISTINCT` variant. This will display a single row for each distinct value of a column. +If you want to find the range of values for a column in PostgreSQL, you can use the `SELECT DISTINCT` variant. This will display a single row for each distinct value of a column. The basic syntax looks like this: @@ -196,6 +196,7 @@ For example, to display all of the different values for `color` that your `shirt ```sql SELECT DISTINCT color FROM shirt; ``` + ``` color ------ @@ -213,6 +214,7 @@ For instance, this will display all of the different combinations of `color` and ```sql SELECT DISTINCT color,shirt_size FROM shirt; ``` + ``` color | shirt_size -------+----------- @@ -229,7 +231,7 @@ yellow | S This displays every unique combination of `color` and `shirt_size` within the table. -An often more flexible variant is PostgreSQL's `SELECT DISTINCT ON` command. This format allows you to specify a list of columns that should be unique in combination and separately list the columns you wish to display. +An often more flexible variant is PostgreSQL's `SELECT DISTINCT ON` command. This format allows you to specify a list of columns that should be unique in combination and separately list the columns you wish to display. The general syntax looks like this, with the column or columns that should be unique listed in the parentheses after `SELECT DISTINCT ON`, followed by the columns you wish to display: @@ -242,6 +244,7 @@ For example, if you want to display a single color for each shirt size, you coul ```sql SELECT DISTINCT ON (shirt_size) color,shirt_size FROM shirt; ``` + ``` color | shirt_size ------+----------- @@ -250,13 +253,14 @@ green | L green | S ``` -This will show a single row for each unique value in `shirt_size`. For each row, it will display the `color` column, followed by the `shirt_size` column. +This will show a single row for each unique value in `shirt_size`. For each row, it will display the `color` column, followed by the `shirt_size` column. If using an `ORDER BY` clause, the column selected for ordering must match the column selected within the `DISTINCT ON` parentheses for the output to have predictable results: ```sql SELECT DISTINCT ON (shirt_size) color,shirt_size FROM shirt ORDER BY shirt_size DESC; ``` + ``` color | shirt_size ------+----------- @@ -267,18 +271,18 @@ green | L -You can filter duplicate rows from your query with Prisma Client by using the [distinct](https://www.prisma.io/docs/concepts/components/prisma-client/aggregation-grouping-summarizing#select-distinct) functionality. +You can filter duplicate rows from your query with Prisma Client by using the [distinct](https://www.prisma.io/docs/orm/prisma-client/queries/aggregation-grouping-summarizing#select-distinct) functionality. ## Conclusion -In this guide, we covered some of the basic ways you can use the `SELECT` command to identify and display records from your tables and views. The `SELECT` command is one of the most flexible and powerful operations within SQL-oriented databases, with many different ways to add clauses, conditions, and filtering. +In this guide, we covered some of the basic ways you can use the `SELECT` command to identify and display records from your tables and views. The `SELECT` command is one of the most flexible and powerful operations within SQL-oriented databases, with many different ways to add clauses, conditions, and filtering. -While we only covered basic usage in this guide, the general format you learned here will serve as the foundation for all other read and many write queries. Learning ways to filter and target the results more accurately extend the capabilities that we covered today. +While we only covered basic usage in this guide, the general format you learned here will serve as the foundation for all other read and many write queries. Learning ways to filter and target the results more accurately extend the capabilities that we covered today. -You can learn more about sorting and filtering queries with Prisma Client in our [filtering and sorting documentation](https://www.prisma.io/docs/concepts/components/prisma-client/filtering-and-sorting). +You can learn more about sorting and filtering queries with Prisma Client in our [filtering and sorting documentation](https://www.prisma.io/docs/orm/prisma-client/queries/filtering-and-sorting). diff --git a/content/04-postgresql/13-reading-and-querying-data/02-filtering-data.mdx b/content/04-postgresql/13-reading-and-querying-data/02-filtering-data.mdx index 0821cd56..24fb0354 100644 --- a/content/04-postgresql/13-reading-and-querying-data/02-filtering-data.mdx +++ b/content/04-postgresql/13-reading-and-querying-data/02-filtering-data.mdx @@ -8,15 +8,15 @@ authors: ['justinellingwood'] ## Introduction -To work with data in a database, you need to be able to retrieve and target specific [records](/intro/database-glossary#record) effectively. By using filtering clauses within your queries, you can add specific criteria in order to return only the most relevant records. +To work with data in a database, you need to be able to retrieve and target specific [records](/intro/database-glossary#record) effectively. By using filtering clauses within your queries, you can add specific criteria in order to return only the most relevant records. -In this guide, we will take a look at some of the most common filtering operations available within PostgreSQL and demonstrate how to use them to narrow the focus of your statements. We will show how to test against characteristics within individual records with `WHERE` clauses, how to group records together to summarize information with `GROUP BY`, how to filter groups of records with the `HAVING` subclause, and how to set the maximum number of returned rows with the `LIMIT` clause. +In this guide, we will take a look at some of the most common filtering operations available within PostgreSQL and demonstrate how to use them to narrow the focus of your statements. We will show how to test against characteristics within individual records with `WHERE` clauses, how to group records together to summarize information with `GROUP BY`, how to filter groups of records with the `HAVING` subclause, and how to set the maximum number of returned rows with the `LIMIT` clause. ## Using the `WHERE` clause to define match criteria -One of the most common and broadly useful ways to indicate your query requirements is the `WHERE` clause. The `WHERE` clause lets you define actual search criteria for query statements by specifying conditions that must be true for all matching records. +One of the most common and broadly useful ways to indicate your query requirements is the `WHERE` clause. The `WHERE` clause lets you define actual search criteria for query statements by specifying conditions that must be true for all matching records. -`WHERE` clauses work by defining boolean expressions that are checked against each candidate row of data. If the result of the expression is false, the row will be removed from the results and will not be returned or continue to the next stage of processing. If the result of the expression is true, it satisfies the criteria of the search and will continue on for any further processing as a candidate row. +`WHERE` clauses work by defining boolean expressions that are checked against each candidate row of data. If the result of the expression is false, the row will be removed from the results and will not be returned or continue to the next stage of processing. If the result of the expression is true, it satisfies the criteria of the search and will continue on for any further processing as a candidate row. The basic syntax of the `WHERE` clause looks like this: @@ -24,72 +24,72 @@ The basic syntax of the `WHERE` clause looks like this: SELECT * FROM my_table WHERE ; ``` -The `` can be anything that results in a boolean value. In PostgreSQL, a boolean value is any of `TRUE`, `FALSE`, or `NULL`. +The `` can be anything that results in a boolean value. In PostgreSQL, a boolean value is any of `TRUE`, `FALSE`, or `NULL`. Conditions are often formed using one or more of the following operators: -* `=`: equal to -* `>`: greater than -* `<`: less than -* `>=`: greater than or equal to -* `<=`: less than or equal to -* `<>` or `!=`: not equal -* `AND`: the logical "and" operator — joins two conditions and returns `TRUE` if both of the conditions are `TRUE` -* `OR`: logical "or" operator — joins two conditions and returns `TRUE` if at least one of the conditions are `TRUE` -* `IN`: value is contained in the list, series, or range that follows -* `BETWEEN`: value is contained within the range the minimum and maximum values that follow, inclusive -* `IS NULL`: matches if value is `NULL` -* `NOT`: negates the boolean value that follows -* `EXISTS`: the query that follows contains results -* `LIKE`: matches against a pattern (using the wildcards `%` to match 0 or more characters and `_` to match a single character) -* `ILIKE`: matches against a pattern (using the wildcards `%` to match 0 or more characters and `_` to match a single character), case insensitive -* `SIMILAR TO`: matches against a pattern using [SQL's regular expression dialect](https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP) -* `~`: matches against a pattern using [POSIX regular expressions](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended), case sensitive -* `~*`: matches against a pattern using [POSIX regular expressions](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended), case insensitive -* `!~`: does not match against a pattern using [POSIX regular expressions](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended), case sensitive -* `!~*`: does not match against a pattern using [POSIX regular expressions](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended), case insensitive +- `=`: equal to +- `>`: greater than +- `<`: less than +- `>=`: greater than or equal to +- `<=`: less than or equal to +- `<>` or `!=`: not equal +- `AND`: the logical "and" operator — joins two conditions and returns `TRUE` if both of the conditions are `TRUE` +- `OR`: logical "or" operator — joins two conditions and returns `TRUE` if at least one of the conditions are `TRUE` +- `IN`: value is contained in the list, series, or range that follows +- `BETWEEN`: value is contained within the range the minimum and maximum values that follow, inclusive +- `IS NULL`: matches if value is `NULL` +- `NOT`: negates the boolean value that follows +- `EXISTS`: the query that follows contains results +- `LIKE`: matches against a pattern (using the wildcards `%` to match 0 or more characters and `_` to match a single character) +- `ILIKE`: matches against a pattern (using the wildcards `%` to match 0 or more characters and `_` to match a single character), case insensitive +- `SIMILAR TO`: matches against a pattern using [SQL's regular expression dialect](https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP) +- `~`: matches against a pattern using [POSIX regular expressions](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended), case sensitive +- `~*`: matches against a pattern using [POSIX regular expressions](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended), case insensitive +- `!~`: does not match against a pattern using [POSIX regular expressions](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended), case sensitive +- `!~*`: does not match against a pattern using [POSIX regular expressions](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended), case insensitive While the above list represents some of the most common test constructs, there are many other [operators that yield boolean results](https://www.postgresql.org/docs/current/functions.html) that can be used in conjunction with a `WHERE` clause. -Prisma Client supports filtering by multiple criteria. Check out our [documentation on filtering](https://www.prisma.io/docs/concepts/components/prisma-client/filtering-and-sorting) to learn more. +Prisma Client supports filtering by multiple criteria. Check out our [documentation on filtering](https://www.prisma.io/docs/orm/prisma-client/queries/filtering-and-sorting) to learn more. ### Examples using `WHERE` -One of the most common and straightforward checks is for equality, using the `=` operator. Here, we check whether each row in the `customer` table has a `last_name` value equal to `Smith`: +One of the most common and straightforward checks is for equality, using the `=` operator. Here, we check whether each row in the `customer` table has a `last_name` value equal to `Smith`: ```sql SELECT * FROM customer WHERE last_name = 'Smith'; ``` -We can add additional conditions to this to create compound expressions using logical operators. This example uses the `AND` clause to add an additional test against the `first_name` column. Valid rows must satisfy both of the given conditions: +We can add additional conditions to this to create compound expressions using logical operators. This example uses the `AND` clause to add an additional test against the `first_name` column. Valid rows must satisfy both of the given conditions: ```sql SELECT * FROM customer WHERE first_name = 'John' AND last_name = 'Smith'; ``` -Similarly, we can check whether any of a series of conditions are met. Here, we check rows from the `address` table to see whether the `zip_code` value is equal to 60626 or the `neighborhood` column is equal to the string "Roger's Park". We use two single quotation marks to indicate that a literal single quote should be searched for: +Similarly, we can check whether any of a series of conditions are met. Here, we check rows from the `address` table to see whether the `zip_code` value is equal to 60626 or the `neighborhood` column is equal to the string "Roger's Park". We use two single quotation marks to indicate that a literal single quote should be searched for: ```sql SELECT * FROM address WHERE zip_code = '60626' OR neighborhood = 'Roger''s Park'; ``` -The `IN` operator can work like an comparison between a number of values, wrapped in parentheses. If there is a match with any of the given values, the expression is `TRUE`: +The `IN` operator can work like an comparison between a number of values, wrapped in parentheses. If there is a match with any of the given values, the expression is `TRUE`: ```sql SELECT * FROM customer WHERE last_name IN ('Smith', 'Johnson', 'Fredrich'); ``` -Here, we check against a string pattern using `LIKE`. The `%` works as a wildcard matching zero or more characters, so "Pete", "Peter", and any other string that begins with "Pete" would match: +Here, we check against a string pattern using `LIKE`. The `%` works as a wildcard matching zero or more characters, so "Pete", "Peter", and any other string that begins with "Pete" would match: ```sql SELECT * FROM customer WHERE last_name LIKE 'Pete%'; ``` -We could do a similar search using the `~*` operator to check for matches using POSIX regular expressions without regard to case. In this case, we check whether the value of `last_name` begins with a "d" and contains the substring "on", which would match names like "Dickson", "Donald", and "Devon": +We could do a similar search using the `~*` operator to check for matches using POSIX regular expressions without regard to case. In this case, we check whether the value of `last_name` begins with a "d" and contains the substring "on", which would match names like "Dickson", "Donald", and "Devon": ```sql SELECT * FROM customer WHERE last_name ~* '^D.*on.*'; @@ -101,7 +101,7 @@ We can check whether a street number is within the 4000 block of addresses using SELECT * FROM address WHERE street_number BETWEEN 4000 AND 4999; ``` -Here, we can display any `customer` entries that have social security numbers that are not 9 digits long. We use the `LENGTH()` operator to get the number of digits in the field and the `<>` to check for inequality: +Here, we can display any `customer` entries that have social security numbers that are not 9 digits long. We use the `LENGTH()` operator to get the number of digits in the field and the `<>` to check for inequality: ```sql SELECT * FROM customer WHERE LENGTH(SSN) <> 9; @@ -109,24 +109,24 @@ SELECT * FROM customer WHERE LENGTH(SSN) <> 9; ## Using the `GROUP BY` clause to summarize multiple records -The `GROUP BY` clause is another very common way to filter results by representing multiple results with a single row. The basic syntax of the `GROUP BY` clause looks like this: +The `GROUP BY` clause is another very common way to filter results by representing multiple results with a single row. The basic syntax of the `GROUP BY` clause looks like this: ```sql SELECT FROM some_table GROUP BY ``` -When a `GROUP BY` clause is added to a statement, it tells PostgreSQL to display a single row for each unique value for the given column or columns. This has some important implications. +When a `GROUP BY` clause is added to a statement, it tells PostgreSQL to display a single row for each unique value for the given column or columns. This has some important implications. -Since the `GROUP BY` clause is a way of representing multiple rows as a single row, PostgreSQL can only execute the query if it can calculate a value for each of the columns it is tasked with displaying. This means that each column identified by the `SELECT` portion of the statement has to either be: +Since the `GROUP BY` clause is a way of representing multiple rows as a single row, PostgreSQL can only execute the query if it can calculate a value for each of the columns it is tasked with displaying. This means that each column identified by the `SELECT` portion of the statement has to either be: -* included in the `GROUP BY` clause to guarantee that each row has a unique value -* abstracted to summarize all of the rows within each group +- included in the `GROUP BY` clause to guarantee that each row has a unique value +- abstracted to summarize all of the rows within each group Practically speaking, this means that any columns in the `SELECT` list not included in the `GROUP BY` clause must use an aggregate function to produce a single result for the column for each group. -If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), you can use [aggregations](https://www.prisma.io/docs/concepts/components/prisma-client/aggregation-grouping-summarizing) to compute over and summarize values. +If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), you can use [aggregations](https://www.prisma.io/docs/orm/prisma-client/queries/aggregation-grouping-summarizing) to compute over and summarize values. @@ -155,11 +155,12 @@ INSERT INTO pet (type, name, color, age) VALUES ('rabbit', 'Briony', 'brown', 6); ``` -The simplest use of `GROUP BY` is to display the range of unique values for a single column. To do so, use the same column in `SELECT` and `GROUP BY`. Here, we see all of the colors used in the table: +The simplest use of `GROUP BY` is to display the range of unique values for a single column. To do so, use the same column in `SELECT` and `GROUP BY`. Here, we see all of the colors used in the table: ```sql SELECT color FROM pet GROUP BY color; ``` + ``` color -------- @@ -173,11 +174,12 @@ SELECT color FROM pet GROUP BY color; As you move beyond a single column in the `SELECT` column list, you must either add the columns to the `GROUP BY` clause or use an aggregate function to produce a single value for the group of rows being represented. -Here, we add `type` to the `GROUP BY` clause, meaning that each row will represent a unique combination of `type` and `color` values. We also add the `age` column, summarized by the `avg()` function to find the average age of each of the groups: +Here, we add `type` to the `GROUP BY` clause, meaning that each row will represent a unique combination of `type` and `color` values. We also add the `age` column, summarized by the `avg()` function to find the average age of each of the groups: ```sql SELECT type, color, avg(age) AS average_age FROM pet GROUP BY type, color; ``` + ``` type | color | average_age --------+--------+-------------------- @@ -191,11 +193,12 @@ SELECT type, color, avg(age) AS average_age FROM pet GROUP BY type, color; (7 rows) ``` -Aggregate functions work just as well with a single column in the `GROUP BY` clause. Here, we find the average age of each type of animal: +Aggregate functions work just as well with a single column in the `GROUP BY` clause. Here, we find the average age of each type of animal: ```sql SELECT type, avg(age) AS average_age FROM PET GROUP BY type; ``` + ``` type | average_age --------+-------------------- @@ -205,11 +208,12 @@ SELECT type, avg(age) AS average_age FROM PET GROUP BY type; (3 rows) ``` -If we want to display the oldest of each type of animal, we could instead use the `max()` function on the `age` column. The `GROUP BY` clause collapses the results into the same rows as before, but the new function alters the result in the other column: +If we want to display the oldest of each type of animal, we could instead use the `max()` function on the `age` column. The `GROUP BY` clause collapses the results into the same rows as before, but the new function alters the result in the other column: ```sql SELECT type, max(age) AS oldest FROM pet GROUP BY type; ``` + ``` type | oldest --------+------- @@ -221,7 +225,7 @@ SELECT type, max(age) AS oldest FROM pet GROUP BY type; ## Using the `HAVING` clause to filter groups of records -The `GROUP BY` clause is a way to summarize data by collapsing multiple records into a single representative row. But what if you want to narrow these groups based on additional factors? +The `GROUP BY` clause is a way to summarize data by collapsing multiple records into a single representative row. But what if you want to narrow these groups based on additional factors? The `HAVING` clause is a modifier for the `GROUP BY` clause that lets you specify conditions that each group must satisfy to be included in the results. @@ -237,11 +241,12 @@ The operation is very similar to the `WHERE` clause, with the difference being t Using the same table we introduced in the last section, we can demonstrate how the `HAVING` clause works. -Here, we group the rows of the `pet` table by unique values in the `type` column, finding the minimum value of `age` as well. The `HAVING` clause then filters the results to remove any groups where the age is not greater than 1: +Here, we group the rows of the `pet` table by unique values in the `type` column, finding the minimum value of `age` as well. The `HAVING` clause then filters the results to remove any groups where the age is not greater than 1: ```sql SELECT type, min(age) AS youngest FROM pet GROUP BY type HAVING min(age) > 1; ``` + ``` type | youngest --------+---------- @@ -250,12 +255,12 @@ SELECT type, min(age) AS youngest FROM pet GROUP BY type HAVING min(age) > 1; (2 rows) ``` -In this example, we group the rows in `pet` by their color. We then filter the groups that only represent a single row. The result shows us every color that appears more than once: - +In this example, we group the rows in `pet` by their color. We then filter the groups that only represent a single row. The result shows us every color that appears more than once: ```sql SELECT color FROM pet GROUP BY color HAVING count(color) > 1; ``` + ``` color ------- @@ -269,6 +274,7 @@ We can perform a similar query to get the combinations of `type` and `color` tha ```sql SELECT type, color FROM pet GROUP BY type, color HAVING count(color) = 1; ``` + ``` type | color --------+-------- @@ -282,7 +288,7 @@ SELECT type, color FROM pet GROUP BY type, color HAVING count(color) = 1; ## Using the `LIMIT` clause to set the maximum number of records -The `LIMIT` clause offers a different approach to paring down the records your query returns. Rather than eliminating rows of data based on criteria within the row itself, the `LIMIT` clause sets the maximum number of records returned by a query. +The `LIMIT` clause offers a different approach to paring down the records your query returns. Rather than eliminating rows of data based on criteria within the row itself, the `LIMIT` clause sets the maximum number of records returned by a query. The basic syntax of `LIMIT` looks like this: @@ -290,13 +296,13 @@ The basic syntax of `LIMIT` looks like this: SELECT * FROM my_table LIMIT [OFFSET ]; ``` -Here, the `` indicates the maximum number of rows to display from the executed query. This is often used in conjunction with `ORDER BY` clauses to get the rows with the most extreme values in a certain column. For example, to get the five best scores on an exam, a user could `ORDER BY` a `score` column and then `LIMIT` the results to 5. +Here, the `` indicates the maximum number of rows to display from the executed query. This is often used in conjunction with `ORDER BY` clauses to get the rows with the most extreme values in a certain column. For example, to get the five best scores on an exam, a user could `ORDER BY` a `score` column and then `LIMIT` the results to 5. -While `LIMIT` counts from the top of the results by default, the optional `OFFSET` keyword can be used to offset the starting position it uses. In effect, this allows you to paginate through results by displaying the number of results defined by `LIMIT` and then adding the `LIMIT` number to the `OFFSET` to retrieve the following page. +While `LIMIT` counts from the top of the results by default, the optional `OFFSET` keyword can be used to offset the starting position it uses. In effect, this allows you to paginate through results by displaying the number of results defined by `LIMIT` and then adding the `LIMIT` number to the `OFFSET` to retrieve the following page. -If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), you can use [pagination](https://www.prisma.io/docs/concepts/components/prisma-client/pagination) to iterate through results. +If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), you can use [pagination](https://www.prisma.io/docs/orm/prisma-client/queries/pagination) to iterate through results. @@ -304,11 +310,12 @@ If you are connecting to your database with [Prisma Client](https://www.prisma.i We will use the `pet` table from earlier for the examples in this section. -As mentioned above, `LIMIT` is often combined with an `ORDER BY` clause to explicitly define the ordering of the rows before slicing the appropriate number. Here, we sort the `pet` entries according to their `age`, from oldest to youngest. We then use `LIMIT` to display the top 5 oldest animals: +As mentioned above, `LIMIT` is often combined with an `ORDER BY` clause to explicitly define the ordering of the rows before slicing the appropriate number. Here, we sort the `pet` entries according to their `age`, from oldest to youngest. We then use `LIMIT` to display the top 5 oldest animals: ```sql SELECT * FROM pet ORDER BY age DESC LIMIT 5; ``` + ``` type | name | color | age | id --------+---------+--------+-----+---- @@ -320,13 +327,14 @@ SELECT * FROM pet ORDER BY age DESC LIMIT 5; (5 rows) ``` -Without an `ORDER BY` clause, `LIMIT` will make selections in an entirely predictable way. The results returned may be effected by the order of the entries within the table or by indexes. This is not always a bad thing. +Without an `ORDER BY` clause, `LIMIT` will make selections in an entirely predictable way. The results returned may be effected by the order of the entries within the table or by indexes. This is not always a bad thing. -If we need a record for any single `dog` within the table, we could construct a query like this. Keep in mind that while the result might be difficult to predict, this is not a random selection and should not be used as such: +If we need a record for any single `dog` within the table, we could construct a query like this. Keep in mind that while the result might be difficult to predict, this is not a random selection and should not be used as such: ```sql SELECT * FROM pet WHERE type = 'dog' LIMIT 1; ``` + ``` type | name | color | age | id ------+------+-------+-----+---- @@ -334,13 +342,14 @@ SELECT * FROM pet WHERE type = 'dog' LIMIT 1; (1 row) ``` -We can use the `OFFSET` clause to paginate through results. We include an `ORDER BY` clause to define a specific order for the results. +We can use the `OFFSET` clause to paginate through results. We include an `ORDER BY` clause to define a specific order for the results. For the first query, we limit the results without specifying an `OFFSET` to get the first 3 youngest entries: ```sql SELECT * FROM pet ORDER BY age LIMIT 3; ``` + ``` type | name | color | age | id ------+-------+-------+-----+---- @@ -355,8 +364,9 @@ To get the next 3 youngest, we can add the number defined in `LIMIT` to the `OFF ```sql SELECT * FROM pet ORDER BY age LIMIT 3 OFFSET 3; ``` + ``` - type | name | color | age | id + type | name | color | age | id --------+---------+-------+-----+---- rabbit | Buttons | grey | 4 | 7 rabbit | Briany | brown | 6 | 9 @@ -369,6 +379,7 @@ If we add the `LIMIT` to the `OFFSET` again, we'll get the next 3 results: ```sql SELECT * FROM pet ORDER BY age LIMIT 3 OFFSET 6; ``` + ``` type | name | color | age | id --------+---------+--------+-----+---- @@ -382,9 +393,9 @@ This lets us retrieve rows of data from a query in manageable chunks. ## Conclusion -There are many ways to filter and otherwise constrain the results you get from queries. Clauses like `WHERE` and `HAVING` evaluate potential rows or groups of rows to see if they satisfy certain criteria. The `GROUP BY` clause helps you summarize data by grouping together records that have one or more column values in common. The `LIMIT` clause offers users the ability to set a hard maximum on the number of records to retrieve. +There are many ways to filter and otherwise constrain the results you get from queries. Clauses like `WHERE` and `HAVING` evaluate potential rows or groups of rows to see if they satisfy certain criteria. The `GROUP BY` clause helps you summarize data by grouping together records that have one or more column values in common. The `LIMIT` clause offers users the ability to set a hard maximum on the number of records to retrieve. -Learning how these clauses can be applied, individually or in combination, will allow you to extract specific data from large datasets. Query modifiers and filters are essential for turning the data that lives within PostgreSQL into useful answers. +Learning how these clauses can be applied, individually or in combination, will allow you to extract specific data from large datasets. Query modifiers and filters are essential for turning the data that lives within PostgreSQL into useful answers. ## FAQ @@ -396,7 +407,7 @@ An example of an additional condition is the logical operator `AND`. An example ```sql SELECT * FROM customer WHERE first_name = 'John' AND last_name = 'Smith'; -``` +```
@@ -414,7 +425,7 @@ SELECT * FROM my_table LIMIT [OFFSET ];
What is an aggregate function in PostgreSQL? -[Aggregate functions](https://www.postgresql.org/docs/current/functions-aggregate.html) in PostgreSQL are functions that compute a single result from a set of input values. +[Aggregate functions](https://www.postgresql.org/docs/current/functions-aggregate.html) in PostgreSQL are functions that compute a single result from a set of input values.
diff --git a/content/04-postgresql/13-reading-and-querying-data/03-joining-tables.mdx b/content/04-postgresql/13-reading-and-querying-data/03-joining-tables.mdx index 7cf53518..91d0d374 100644 --- a/content/04-postgresql/13-reading-and-querying-data/03-joining-tables.mdx +++ b/content/04-postgresql/13-reading-and-querying-data/03-joining-tables.mdx @@ -1,6 +1,6 @@ --- title: 'Using joins to combine data from different tables in PostgreSQL' -metaTitle: "Joining tables in Postgres | Combine data from different tables" +metaTitle: 'Joining tables in Postgres | Combine data from different tables' metaDescription: "Joins allow you to bring together data from multiple tables by stitching together columns that contain common values. In this guide, we'll talk about the different types of joins PostgreSQL supports and how to use joins to construct more valuable queries." metaImage: '/social/generic-postgresql.png' authors: ['justinellingwood'] @@ -8,13 +8,13 @@ authors: ['justinellingwood'] ## Introduction -Splitting related data into separate tables can be beneficial from the standpoint of consistency, flexibility, and certain types of performance. However, you still need a reasonable way of reintegrating records when the relevant information spans multiple tables. +Splitting related data into separate tables can be beneficial from the standpoint of consistency, flexibility, and certain types of performance. However, you still need a reasonable way of reintegrating records when the relevant information spans multiple tables. -In relational databases, *joins* offer a way to combine the records in two or more tables based on common field values. Different types of joins can achieve different results depending on how unmatched rows should be handled. In this guide, we'll discuss the various types of joins that PostgreSQL offers and how you can use them to combine table data from multiple sources. +In relational databases, _joins_ offer a way to combine the records in two or more tables based on common field values. Different types of joins can achieve different results depending on how unmatched rows should be handled. In this guide, we'll discuss the various types of joins that PostgreSQL offers and how you can use them to combine table data from multiple sources. ## What are joins? -In short, [*joins*](/intro/database-glossary#join) are a way of displaying data from multiple tables. They do this by stitching together records from different sources based on matching values in certain columns. Each resulting row consists of a record from the first table combined with a row from the second table, based on one or more columns in each table having the same value. +In short, [_joins_](/intro/database-glossary#join) are a way of displaying data from multiple tables. They do this by stitching together records from different sources based on matching values in certain columns. Each resulting row consists of a record from the first table combined with a row from the second table, based on one or more columns in each table having the same value. The basic syntax of a join looks like this: @@ -27,21 +27,21 @@ FROM ; ``` -In a join, each resulting row is constructed by including all of the columns of the first table followed by all of the columns from the second table. The `SELECT` portion of the query can be used to specify the exact columns you wish to display. +In a join, each resulting row is constructed by including all of the columns of the first table followed by all of the columns from the second table. The `SELECT` portion of the query can be used to specify the exact columns you wish to display. -Multiple rows may be constructed from the original tables if the values in the columns used for comparison are not unique. For example, imagine you have a column being compared from the first table that has two records with a value of "red". Matched with this is a column from the second table that has three rows with that value. The join will produce six different rows for that value representing the various combinations that can be achieved. +Multiple rows may be constructed from the original tables if the values in the columns used for comparison are not unique. For example, imagine you have a column being compared from the first table that has two records with a value of "red". Matched with this is a column from the second table that has three rows with that value. The join will produce six different rows for that value representing the various combinations that can be achieved. -The type of join and the join conditions determine how each row that is displayed is constructed. This impacts what happens to the rows from each table that do and do *not* have a match on the join condition. +The type of join and the join conditions determine how each row that is displayed is constructed. This impacts what happens to the rows from each table that do and do _not_ have a match on the join condition. -For the sake of convenience, many joins match the primary key on one table with an associated foreign key on the second table. Although primary and foreign keys are only used by the database system to maintain consistency guarantees, their relationship often makes them a good candidate for join conditions. +For the sake of convenience, many joins match the primary key on one table with an associated foreign key on the second table. Although primary and foreign keys are only used by the database system to maintain consistency guarantees, their relationship often makes them a good candidate for join conditions. ## Different types of joins -Various types of joins are available, each of which will potentially produce different results. Understanding how each type is constructed will help you determine which is appropriate for different scenarios. +Various types of joins are available, each of which will potentially produce different results. Understanding how each type is constructed will help you determine which is appropriate for different scenarios. ### Inner join -The default join is called an [*inner join*](/intro/database-glossary#inner-join). In PostgreSQL, this can be specified using either `INNER JOIN` or just simply `JOIN`. +The default join is called an [_inner join_](/intro/database-glossary#inner-join). In PostgreSQL, this can be specified using either `INNER JOIN` or just simply `JOIN`. Here is a typical example demonstrating the syntax of an inner join: @@ -54,13 +54,13 @@ FROM ON table_1.id = table_2.table_1_id; ``` -An inner join is the most restrictive type of join because it only displays rows created by combining rows from each table. Any rows in the constituent tables that did not have a matching counterpart in the other table are removed from the results. For example, if the first table has a value of "blue" in the comparison column, and the second table has no record with that value, that row will be suppressed from the output. +An inner join is the most restrictive type of join because it only displays rows created by combining rows from each table. Any rows in the constituent tables that did not have a matching counterpart in the other table are removed from the results. For example, if the first table has a value of "blue" in the comparison column, and the second table has no record with that value, that row will be suppressed from the output. -If you represent the results as a Venn diagram of the component tables, an inner join allows you to represent the overlapping area of the two circles. None of values that only existed in one of the tables are displayed. +If you represent the results as a Venn diagram of the component tables, an inner join allows you to represent the overlapping area of the two circles. None of values that only existed in one of the tables are displayed. ### Left join -A [left join](/intro/database-glossary#left-join) is a join that shows all of the records found in an inner join, plus all of the *unmatched* rows from the first table. In PostgreSQL, this can be specified as a `LEFT OUTER JOIN` or as just a `LEFT JOIN`. +A [left join](/intro/database-glossary#left-join) is a join that shows all of the records found in an inner join, plus all of the _unmatched_ rows from the first table. In PostgreSQL, this can be specified as a `LEFT OUTER JOIN` or as just a `LEFT JOIN`. The basic syntax of a left join follows this pattern: @@ -73,13 +73,13 @@ LEFT JOIN table_2 ON table_1.id = table_2.table_1_id; ``` -A left join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from the first table are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the second table. +A left join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from the first table are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the second table. -If you represent the results as a Venn diagram of the component tables, a left join allows you to represent the entire left circle. The parts of the left circle represented by the intersection between the two circles will have additional data supplemented by the right table. +If you represent the results as a Venn diagram of the component tables, a left join allows you to represent the entire left circle. The parts of the left circle represented by the intersection between the two circles will have additional data supplemented by the right table. ### Right join -A [right join](/intro/database-glossary#right-join) is a join that shows all of the records found in an inner join, plus all of the *unmatched* rows from the second table. In PostgreSQL, this can be specified as a `RIGHT OUTER JOIN` or as just a `RIGHT JOIN`. +A [right join](/intro/database-glossary#right-join) is a join that shows all of the records found in an inner join, plus all of the _unmatched_ rows from the second table. In PostgreSQL, this can be specified as a `RIGHT OUTER JOIN` or as just a `RIGHT JOIN`. The basic syntax of a right join follows this pattern: @@ -92,13 +92,13 @@ RIGHT JOIN table_2 ON table_1.id = table_2.table_1_id; ``` -A right join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from the second table are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the first table. +A right join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from the second table are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the first table. -If you represent the results as a Venn diagram of the component tables, a right join allows you to represent the entire right circle. The parts of the right circle represented by the intersection between the two circles will have additional data supplemented by the left table. +If you represent the results as a Venn diagram of the component tables, a right join allows you to represent the entire right circle. The parts of the right circle represented by the intersection between the two circles will have additional data supplemented by the left table. ### Full join -A [full join](/intro/database-glossary#outer-join) is a join that shows all of the records found in an inner join, plus all of the *unmatched* rows from both component tables. In PostgreSQL, this can be specified as a `FULL OUTER JOIN` or as just a `FULL JOIN`. +A [full join](/intro/database-glossary#outer-join) is a join that shows all of the records found in an inner join, plus all of the _unmatched_ rows from both component tables. In PostgreSQL, this can be specified as a `FULL OUTER JOIN` or as just a `FULL JOIN`. The basic syntax of a full join follows this pattern: @@ -111,15 +111,15 @@ FULL JOIN table_2 ON table_1.id = table_2.table_1_id; ``` -A full join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from both tables are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the unmatched other table. +A full join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from both tables are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the unmatched other table. -If you represent the results as a Venn diagram of the component tables, a full join allows you to represent both of the component circles entirely. The intersection of the two circles will have values supplied by each of the component tables. The parts of the circles outside of the overlapping area will have the values from the table they belong to, using `NULL` to fill in the columns found in the other table. +If you represent the results as a Venn diagram of the component tables, a full join allows you to represent both of the component circles entirely. The intersection of the two circles will have values supplied by each of the component tables. The parts of the circles outside of the overlapping area will have the values from the table they belong to, using `NULL` to fill in the columns found in the other table. ### Cross join -A special join called a `CROSS JOIN` is also available. A cross join does not use any comparisons to determine whether the rows in each table match one another. Instead, results are constructed by simply adding each of the rows from the first table to each of the rows of the second table. +A special join called a `CROSS JOIN` is also available. A cross join does not use any comparisons to determine whether the rows in each table match one another. Instead, results are constructed by simply adding each of the rows from the first table to each of the rows of the second table. -This produces a Cartesian product of the rows in two or more tables. In effect, this style of join combines rows from each table unconditionally. So, if each table has three rows, the resulting table would have nine rows containing all of the columns from both tables. +This produces a Cartesian product of the rows in two or more tables. In effect, this style of join combines rows from each table unconditionally. So, if each table has three rows, the resulting table would have nine rows containing all of the columns from both tables. For example, if you have a table called `t1` combined with a table called `t2`, each with rows `r1`, `r2`, and `r3`, the result would be nine rows combined like so: @@ -137,11 +137,11 @@ t1.r3 + t2.r3 ### Self join -A self join is any join that combines the rows of a table with itself. It may not be immediately apparent how this could be useful, but it actually has many common applications. +A self join is any join that combines the rows of a table with itself. It may not be immediately apparent how this could be useful, but it actually has many common applications. -Often, tables describe entities that can fulfill multiple roles in relationship to one another. For instance, if you have a table of `people`, each row could potentially contain a `mother` column that reference other `people` in the table. A self join would allow you to stitch these different rows together by joining a second instance of the table to the first where these values match. +Often, tables describe entities that can fulfill multiple roles in relationship to one another. For instance, if you have a table of `people`, each row could potentially contain a `mother` column that reference other `people` in the table. A self join would allow you to stitch these different rows together by joining a second instance of the table to the first where these values match. -Since self joins reference the same table twice, table aliases are required to disambiguate the references. In the example above, for instance, you could join the two instances of the `people` table using the aliases `people AS children` and `people AS mothers`. That way, you can specify which instance of the table you are referring to when defining join conditions. +Since self joins reference the same table twice, table aliases are required to disambiguate the references. In the example above, for instance, you could join the two instances of the `people` table using the aliases `people AS children` and `people AS mothers`. That way, you can specify which instance of the table you are referring to when defining join conditions. Here is another example, this time representing relationships between employees and managers: @@ -156,13 +156,13 @@ JOIN people AS manager ## Join conditions -When combining tables, the join condition determines how rows will be matched together to form the composite results. The basic premise is to define the columns in each table that must match for the join to occur on that row. +When combining tables, the join condition determines how rows will be matched together to form the composite results. The basic premise is to define the columns in each table that must match for the join to occur on that row. ### The `ON` clause -The most standard way of defining the conditions for table joins is with the `ON` clause. The `ON` clause uses an equals sign to specify the exact columns from each table that will be compared to determine when a join may occur. PostgreSQL uses the provided columns to stitch together the rows from each table. +The most standard way of defining the conditions for table joins is with the `ON` clause. The `ON` clause uses an equals sign to specify the exact columns from each table that will be compared to determine when a join may occur. PostgreSQL uses the provided columns to stitch together the rows from each table. -The `ON` clause is the most verbose, but also the most flexible of the available join conditions. It allows for specificity regardless of how standardized the column names are of each table being combined. +The `ON` clause is the most verbose, but also the most flexible of the available join conditions. It allows for specificity regardless of how standardized the column names are of each table being combined. The basic syntax of the `ON` clause looks like this: @@ -177,13 +177,13 @@ ON table1.id = table2.ident; ``` -Here, the rows from `table1` and `table2` will be joined whenever the `id` column from `table1` matches the `ident` column from `table2`. Because an inner join is used, the results will only show the rows that were joined. Since the query uses the wildcard `*` character, all of the columns from both tables will be displayed. +Here, the rows from `table1` and `table2` will be joined whenever the `id` column from `table1` matches the `ident` column from `table2`. Because an inner join is used, the results will only show the rows that were joined. Since the query uses the wildcard `*` character, all of the columns from both tables will be displayed. -This means that both the `id` column from `table1` and the `ident` column from `table2` will be displayed, even though they have the same exact value by virtue of satisfying the join condition. You can avoid this duplication by calling out the exact columns you wish to display in the `SELECT` column list. +This means that both the `id` column from `table1` and the `ident` column from `table2` will be displayed, even though they have the same exact value by virtue of satisfying the join condition. You can avoid this duplication by calling out the exact columns you wish to display in the `SELECT` column list. ### The `USING` clause -The `USING` clause is a shorthand for specifying the conditions of an `ON` clause that can be used when the columns being compared have the same name in both tables. The `USING` clause takes a list, enclosed in parentheses, of the shared column names that should be compared. +The `USING` clause is a shorthand for specifying the conditions of an `ON` clause that can be used when the columns being compared have the same name in both tables. The `USING` clause takes a list, enclosed in parentheses, of the shared column names that should be compared. The general syntax of the `USING` clause uses this format: @@ -213,11 +213,11 @@ ON table1.id = table2.id AND table1.state = table2.state; ``` -While both of the above joins would result in the same rows being constructed with the same data present, they would be displayed slightly different. While the `ON` clause includes all of the columns from both tables, the `USING` clause suppresses the duplicate columns. So instead of there being two separate `id` columns and two separate `state` columns (one for each table), the results would just have one of each of the shared columns, followed by all of the other columns provided by `table1` and `table2`. +While both of the above joins would result in the same rows being constructed with the same data present, they would be displayed slightly different. While the `ON` clause includes all of the columns from both tables, the `USING` clause suppresses the duplicate columns. So instead of there being two separate `id` columns and two separate `state` columns (one for each table), the results would just have one of each of the shared columns, followed by all of the other columns provided by `table1` and `table2`. ### The `NATURAL` clause -The `NATURAL` clause is yet another shorthand that can further reduce the verbosity of the `USING` clause. A `NATURAL` join does not specify *any* columns to be matched. Instead, PostgreSQL will automatically join the tables based on all columns that have matching columns in each database. +The `NATURAL` clause is yet another shorthand that can further reduce the verbosity of the `USING` clause. A `NATURAL` join does not specify _any_ columns to be matched. Instead, PostgreSQL will automatically join the tables based on all columns that have matching columns in each database. The general syntax of the `NATURAL` join clause looks like this: @@ -258,17 +258,17 @@ USING Like the `USING` clause, the `NATURAL` clause suppresses duplicate columns, so there would be only a single instance of each of the joined columns in the results. -While the `NATURAL` clause can reduce the verbosity of your queries, care must be exercised when using it. Because the columns used for joining the tables are automatically calculated, if the columns in the component tables change, the results can be vastly different due to new join conditions. +While the `NATURAL` clause can reduce the verbosity of your queries, care must be exercised when using it. Because the columns used for joining the tables are automatically calculated, if the columns in the component tables change, the results can be vastly different due to new join conditions. ## Join conditions and the `WHERE` clause -Join conditions share many characteristics with the comparisons used to filter rows of data using `WHERE` clauses. Both constructs define expressions that must evaluate to true for the row to be considered. Because of this, it's not always intuitive what the difference is between including additional comparisons in a `WHERE` construct versus defining them within the join clause itself. +Join conditions share many characteristics with the comparisons used to filter rows of data using `WHERE` clauses. Both constructs define expressions that must evaluate to true for the row to be considered. Because of this, it's not always intuitive what the difference is between including additional comparisons in a `WHERE` construct versus defining them within the join clause itself. -In order to understand the differences that will result, we have to take a look at the order in which PostgreSQL processes different portions of a query. In this case, the predicates in the join condition are processed first to construct the virtual joined table in memory. After this stage, the expressions within the `WHERE` clause are evaluated to filter the resulting rows. +In order to understand the differences that will result, we have to take a look at the order in which PostgreSQL processes different portions of a query. In this case, the predicates in the join condition are processed first to construct the virtual joined table in memory. After this stage, the expressions within the `WHERE` clause are evaluated to filter the resulting rows. -As an example, suppose that we have two tables called `customer` and `order` that we need to join together. We want to join the two tables by matching the `customer.id` column with the `order.customer_id` column. Additionally, we're interested in the rows in the `order` table that have a `product_id` of 12345. +As an example, suppose that we have two tables called `customer` and `order` that we need to join together. We want to join the two tables by matching the `customer.id` column with the `order.customer_id` column. Additionally, we're interested in the rows in the `order` table that have a `product_id` of 12345. -Given the above requirements, we have two conditions that we care about. The way we express these conditions, however, will determine the results we receive. +Given the above requirements, we have two conditions that we care about. The way we express these conditions, however, will determine the results we receive. First, let's use both as the join conditions for a `LEFT JOIN`: @@ -304,12 +304,12 @@ The results could potentially look something like this: PostgreSQL arrived at this result by performing the following operations: 1. Combine any rows in the `customer` table with the `order` table where: - * `customer.id` matches `order.customer_id`. - * `order.product_id` matches 12345 -2. Because we are using a left join, include any *unmatched* rows from the left table (`customer`), padding out the columns from the right table (`order`) with `NULL` values. + - `customer.id` matches `order.customer_id`. + - `order.product_id` matches 12345 +2. Because we are using a left join, include any _unmatched_ rows from the left table (`customer`), padding out the columns from the right table (`order`) with `NULL` values. 3. Display only the columns listed in the `SELECT` column specification. -The outcome is that all of our joined rows match both of the conditions that we are looking for. However, the left join causes PostgreSQL to also include any rows from the first table that did not satisfy the join condition. This results in "left over" rows that don't seem to follow the apparent intent of the query. +The outcome is that all of our joined rows match both of the conditions that we are looking for. However, the left join causes PostgreSQL to also include any rows from the first table that did not satisfy the join condition. This results in "left over" rows that don't seem to follow the apparent intent of the query. If we move the second query (`order.product_id` = 12345) to a `WHERE` clause, instead of including it as a join condition, we get different results: @@ -340,26 +340,26 @@ This time, only three rows are displayed: (3 rows) ``` -The order in which the comparisons are executed is the reason for these differences. This time, PostgreSQL processes the query like this: +The order in which the comparisons are executed is the reason for these differences. This time, PostgreSQL processes the query like this: 1. Combine any rows in the `customer` table with the `order` table where `customer.id` matches `order.customer_id`. -2. Because we are using a left join, include any *unmatched* rows from the left table (`customer`), padding out the columns from the right table (`order`) with `NULL` values. +2. Because we are using a left join, include any _unmatched_ rows from the left table (`customer`), padding out the columns from the right table (`order`) with `NULL` values. 3. Evaluate the `WHERE` clause to remove any rows that do not have 12345 as the value for the `order.product_id` column. 4. Display only the columns listed in the `SELECT` column specification. -This time, even though we are using a left join, the `WHERE` clause truncates the results by filtering out all of the rows without the correct `product_id`. Because any unmatched rows would have `product_id` set to `NULL`, this removes all of the unmatched rows that were populated by the left join. It also removes any of the rows that were matched by the join condition that did not pass this second round of checks. +This time, even though we are using a left join, the `WHERE` clause truncates the results by filtering out all of the rows without the correct `product_id`. Because any unmatched rows would have `product_id` set to `NULL`, this removes all of the unmatched rows that were populated by the left join. It also removes any of the rows that were matched by the join condition that did not pass this second round of checks. Understanding the basic process that PostgreSQL uses to execute your queries can help you avoid some easy-to-make but difficult-to-debug mistakes as you work with your data. ## Conclusion -In this guide, we covered how joins enable relational databases to combine data from different tables to provide more valuable answers. We talked about the various joins that PostgreSQL supports, the way each type assembles its results, and what to expect when using specific kinds of joins. Afterwards, we went over different ways to define join conditions and looked at how the interplay between joins and the `WHERE` clause can lead to surprises. +In this guide, we covered how joins enable relational databases to combine data from different tables to provide more valuable answers. We talked about the various joins that PostgreSQL supports, the way each type assembles its results, and what to expect when using specific kinds of joins. Afterwards, we went over different ways to define join conditions and looked at how the interplay between joins and the `WHERE` clause can lead to surprises. -Joins are an essential part of what makes relational databases powerful and flexible enough to handle so many different types of queries. Organizing data using logical boundaries while still being able to recombine the data in novel ways on a case-by-case basis gives relational databases like PostgreSQL incredible versatility. Learning how to perform this stitching between tables will allow you to create more complex queries and rely on the database to create complete pictures of your data. +Joins are an essential part of what makes relational databases powerful and flexible enough to handle so many different types of queries. Organizing data using logical boundaries while still being able to recombine the data in novel ways on a case-by-case basis gives relational databases like PostgreSQL incredible versatility. Learning how to perform this stitching between tables will allow you to create more complex queries and rely on the database to create complete pictures of your data. -Prisma allows you to [define relations](https://www.prisma.io/docs/concepts/components/prisma-schema/relations) between models in the Prisma schema file. You can then use [relation queries](https://www.prisma.io/docs/concepts/components/prisma-client/relation-queries) to work with data that spans multiple models. +Prisma allows you to [define relations](https://www.prisma.io/docs/orm/prisma-schema/data-model/relations) between models in the Prisma schema file. You can then use [relation queries](https://www.prisma.io/docs/orm/prisma-client/queries/relation-queries) to work with data that spans multiple models. @@ -380,13 +380,13 @@ LEFT JOIN table_2
-
What is a lateral join in PostgreSQL? +
What is a lateral join in PostgreSQL? The `LATERAL` key word in [PostgreSQL](https://www.postgresql.org/docs/current/app-createdb.html#SQL-FROM) can precede a sub-SELECT FROM item and allows the sub-SELECT to refer to columns of FROM items that appear before it in the FROM list. (Without LATERAL, each sub-SELECT is evaluated independently and so cannot cross-reference any other FROM item.)
-
Does PostgreSQL support cross joins? +
Does PostgreSQL support cross joins? Yes, a `CROSS JOIN` can be done in PostgreSQL. The syntax will look something like: diff --git a/content/04-postgresql/13-reading-and-querying-data/04-optimizing-postgresql.mdx b/content/04-postgresql/13-reading-and-querying-data/04-optimizing-postgresql.mdx index 643473b6..c3049d07 100644 --- a/content/04-postgresql/13-reading-and-querying-data/04-optimizing-postgresql.mdx +++ b/content/04-postgresql/13-reading-and-querying-data/04-optimizing-postgresql.mdx @@ -14,7 +14,7 @@ In this guide, we'll talk about different ways to identify poorly performing que -The [PostgreSQL database connector](https://www.prisma.io/docs/concepts/database-connectors/postgresql) can help you manage PostgreSQL databases from JavaScript and TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). +The [PostgreSQL database connector](https://www.prisma.io/docs/orm/overview/databases/postgresql) can help you manage PostgreSQL databases from JavaScript and TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). @@ -669,6 +669,6 @@ We also discussed how to use slow query logging to pinpoint exactly which querie -The [PostgreSQL database connector](https://www.prisma.io/docs/concepts/database-connectors/postgresql) can help you manage PostgreSQL databases from JavaScript and TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). +The [PostgreSQL database connector](https://www.prisma.io/docs/orm/overview/databases/postgresql) can help you manage PostgreSQL databases from JavaScript and TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). diff --git a/content/04-postgresql/14-short-guides/01-quoting-rules.mdx b/content/04-postgresql/14-short-guides/01-quoting-rules.mdx index 07d11bff..e36d69d4 100644 --- a/content/04-postgresql/14-short-guides/01-quoting-rules.mdx +++ b/content/04-postgresql/14-short-guides/01-quoting-rules.mdx @@ -1,28 +1,28 @@ --- title: 'How to use single and double quotes in PostgreSQL' -metaTitle: "PostgreSQL Quotes: How to Use Single and Double Quotes" -metaDescription: "Read on to learn how PostgreSQL interprets single and double quotes and the side effects of each type." +metaTitle: 'PostgreSQL Quotes: How to Use Single and Double Quotes' +metaDescription: 'Read on to learn how PostgreSQL interprets single and double quotes and the side effects of each type.' metaImage: '/social/generic-postgresql.png' authors: ['justinellingwood'] --- ## Introduction -Single and double quotation marks are used within PostgreSQL for different purposes. When getting started working with these databases, it can be difficult to understand the differences between these two types of quotes and how to use them correctly. +Single and double quotation marks are used within PostgreSQL for different purposes. When getting started working with these databases, it can be difficult to understand the differences between these two types of quotes and how to use them correctly. -In this guide, we'll take a look at how PostgreSQL interprets both single and double quotes. We'll talk about the side effects of using various quotes and provide examples of scenarios where each are used. +In this guide, we'll take a look at how PostgreSQL interprets both single and double quotes. We'll talk about the side effects of using various quotes and provide examples of scenarios where each are used. ## Double quotes -In PostgreSQL, double quotes (like "a red dog") are always used to denote *delimited identifiers*. In this context, an *identifier* is the name of an object within PostgreSQL, such as a table name or a column name. Delimited identifiers are identifiers that have a specifically marked beginning and end. +In PostgreSQL, double quotes (like "a red dog") are always used to denote _delimited identifiers_. In this context, an _identifier_ is the name of an object within PostgreSQL, such as a table name or a column name. Delimited identifiers are identifiers that have a specifically marked beginning and end. -For example, to select all of the information from a `customer` table, you could type the following. Here, the table name is encapsulated in double quotes. +For example, to select all of the information from a `customer` table, you could type the following. Here, the table name is encapsulated in double quotes. ```sql SELECT * FROM "customer"; ``` -While double quotes indicate an identifier, not all identifiers use double quotes. For examples like the above, it is much more common to see the identifier unquoted entirely: +While double quotes indicate an identifier, not all identifiers use double quotes. For examples like the above, it is much more common to see the identifier unquoted entirely: ```sql SELECT * FROM customer; @@ -30,23 +30,23 @@ SELECT * FROM customer; ### Quoting identifiers and the problem of case sensitivity -While the two formats used above both work correctly for a `customer` table, there *are* important differences. +While the two formats used above both work correctly for a `customer` table, there _are_ important differences. -Unquoted identifiers (like the second version) are *case insensitive*. This means that PostgreSQL will recognize `customer`, `Customer`, and `CUSTOMER` as the same object. +Unquoted identifiers (like the second version) are _case insensitive_. This means that PostgreSQL will recognize `customer`, `Customer`, and `CUSTOMER` as the same object. -However, quoted identifiers are *case sensitive*. This leads to PostgreSQL treating `"CUSTOMER"` and `"customer"` as entirely different objects. +However, quoted identifiers are _case sensitive_. This leads to PostgreSQL treating `"CUSTOMER"` and `"customer"` as entirely different objects. -This difference allows you to create identifiers that would otherwise not be legal within PostgreSQL. For instance, if you need to create a column with a period in it, you would need to use double quotes so that PostgreSQL interprets it correctly. +This difference allows you to create identifiers that would otherwise not be legal within PostgreSQL. For instance, if you need to create a column with a period in it, you would need to use double quotes so that PostgreSQL interprets it correctly. -However, keep in mind that this can lead to usability issues if not used carefully. For example, suppose you use double quotes to preserve upper-case characters in the identifier when creating an object. From then on, you will be required to use double quotes to match that case every time you reference it. +However, keep in mind that this can lead to usability issues if not used carefully. For example, suppose you use double quotes to preserve upper-case characters in the identifier when creating an object. From then on, you will be required to use double quotes to match that case every time you reference it. -Use double quotes sparingly for better compatibility, especially when creating objects. If you want to use double quotes, keep in mind that the case problem does not arise if you use double quotes with fully lower-cased identifiers. +Use double quotes sparingly for better compatibility, especially when creating objects. If you want to use double quotes, keep in mind that the case problem does not arise if you use double quotes with fully lower-cased identifiers. ## Single quotes -Single quotes, on the other hand, are used to indicate that a token is a string. This is used in many different contexts throughout PostgreSQL. +Single quotes, on the other hand, are used to indicate that a token is a string. This is used in many different contexts throughout PostgreSQL. -In general, if an item is a string, it needs to be surrounded by single quotation marks. Keep in mind that when creating and referencing objects, identifiers must be represented by unquoted or double quoted text. +In general, if an item is a string, it needs to be surrounded by single quotation marks. Keep in mind that when creating and referencing objects, identifiers must be represented by unquoted or double quoted text. For example, here we use single quotes to insert a string into a `text` field within a database: @@ -62,7 +62,7 @@ INSERT INTO "my_table"("text") VALUES ('hello there!'); The two statements above are the same, assuming that both `my_table` and the `text` column were unquoted or lower-case when created. -If you need to include a single quote *within* your string, you can do so by instead inserting two sequential single quotes (Two single quotes, not a double quote). +If you need to include a single quote _within_ your string, you can do so by instead inserting two sequential single quotes (Two single quotes, not a double quote). For example, you could insert another string with an embedded single quote by typing: @@ -86,8 +86,8 @@ CREATE ROLE "user1" WITH LOGIN PASSWORD 'secretpassword'; The statement has two quoted components: -* `user1` is in double quotes because it will reference a role, which is an identifier. -* `secretpassword` is a string that will exist in a table column. It is therefore a string value and needs single quotations. +- `user1` is in double quotes because it will reference a role, which is an identifier. +- `secretpassword` is a string that will exist in a table column. It is therefore a string value and needs single quotations. ### Checking if your current user has privileges necessary to manage roles @@ -99,19 +99,19 @@ SELECT 'Yes' AS "Can I manage roles?" FROM pg_roles WHERE rolname = :'USER' AND There are a few different quoting patterns in use here: -* `Yes` is in single quotes because it's a value that will be printed within the context of a column value. -* `Can I manage roles?` is in double quotes because it will be the name of the column in the constructed table, and is therefore an identifier. -* `USER` is in single quotes because we are checking the value of a string. -* The `:'USER'` syntax is a special format used to interpolate the `psql` `USER` variable while placing the resulting value in single quotes. +- `Yes` is in single quotes because it's a value that will be printed within the context of a column value. +- `Can I manage roles?` is in double quotes because it will be the name of the column in the constructed table, and is therefore an identifier. +- `USER` is in single quotes because we are checking the value of a string. +- The `:'USER'` syntax is a special format used to interpolate the `psql` `USER` variable while placing the resulting value in single quotes. ## Conclusion -In this guide, we took a look at both single and double quoting in PostgreSQL. Double quotes are used to indicate identifiers within the database, which are objects like tables, column names, and roles. In contrast, single quotes are used to indicate string literals. +In this guide, we took a look at both single and double quoting in PostgreSQL. Double quotes are used to indicate identifiers within the database, which are objects like tables, column names, and roles. In contrast, single quotes are used to indicate string literals. -Learning how to correctly use quotes in PostgreSQL, as well as the implications of different quotation choices, will help you avoid frustrating mistakes. While the quotation rules may not correspond to other systems you may be familiar with, they are useful once you understand their distinct purposes. +Learning how to correctly use quotes in PostgreSQL, as well as the implications of different quotation choices, will help you avoid frustrating mistakes. While the quotation rules may not correspond to other systems you may be familiar with, they are useful once you understand their distinct purposes. -If you are using [Prisma to manage your PostgreSQL database](https://www.prisma.io/docs/concepts/database-connectors/postgresql), the quotation types are resolved automatically before being sent to the database. The exception to this is if you are using [raw queries](https://www.prisma.io/docs/concepts/components/prisma-client/raw-database-access), in which case, you will want to pay attention to the information covered here to avoid mixing how PostgreSQL interprets different types of quotations. +If you are using [Prisma to manage your PostgreSQL database](https://www.prisma.io/docs/orm/overview/databases/postgresql), the quotation types are resolved automatically before being sent to the database. The exception to this is if you are using [raw queries](https://www.prisma.io/docs/orm/prisma-client/queries/raw-database-access), in which case, you will want to pay attention to the information covered here to avoid mixing how PostgreSQL interprets different types of quotations. diff --git a/content/04-postgresql/14-short-guides/03-connection-uris.mdx b/content/04-postgresql/14-short-guides/03-connection-uris.mdx index fe0447d3..535f4144 100644 --- a/content/04-postgresql/14-short-guides/03-connection-uris.mdx +++ b/content/04-postgresql/14-short-guides/03-connection-uris.mdx @@ -1,33 +1,33 @@ --- title: 'Introduction to PostgreSQL connection URIs' -metaTitle: "Understanding connection URI strings in PostgreSQL" -metaDescription: "Learn how to encode PostgreSQL connection details in connection URIs for applications and libraries" +metaTitle: 'Understanding connection URI strings in PostgreSQL' +metaDescription: 'Learn how to encode PostgreSQL connection details in connection URIs for applications and libraries' metaImage: '/social/generic-postgresql.png' authors: ['justinellingwood'] --- ## Introduction -Connecting to your database server is usually one of the first tasks you need to accomplish when designing and configuring database-backed applications. While there are many methods of providing the address, listening port, credentials, and other details to applications, connection URIs, sometimes called connection strings or connection URLs, are one of the most powerful and flexible ways of specifying complex configuration in a compact format. +Connecting to your database server is usually one of the first tasks you need to accomplish when designing and configuring database-backed applications. While there are many methods of providing the address, listening port, credentials, and other details to applications, connection URIs, sometimes called connection strings or connection URLs, are one of the most powerful and flexible ways of specifying complex configuration in a compact format. -In this guide, we'll talk about how to format a connection URI with your PostgreSQL database information and [authentication](/intro/database-glossary#authentication) details. Connection URIs are divided into sections, so we'll cover each part as we go along. +In this guide, we'll talk about how to format a connection URI with your PostgreSQL database information and [authentication](/intro/database-glossary#authentication) details. Connection URIs are divided into sections, so we'll cover each part as we go along. ## Percent encoding values -Before we begin, we should mention that PostgreSQL connection URIs expect [percent-encoded values](https://en.wikipedia.org/wiki/Percent-encoding). This means that any characters that have a special meaning within the URL must be converted to their percent-encoded counterparts to ensure that libraries and applications can interpret them correctly. +Before we begin, we should mention that PostgreSQL connection URIs expect [percent-encoded values](https://en.wikipedia.org/wiki/Percent-encoding). This means that any characters that have a special meaning within the URL must be converted to their percent-encoded counterparts to ensure that libraries and applications can interpret them correctly. Characters you should percent-encode include: -* (space): `%20` -* `%`: `%25` -* `&`: `%26` -* `/`: `%2F` -* `:`: `%3A` -* `=`: `%3D` -* `?`: `%3F` -* `@`: `%40` -* `[`: `%5B` -* `]`: `%5D` +- (space): `%20` +- `%`: `%25` +- `&`: `%26` +- `/`: `%2F` +- `:`: `%3A` +- `=`: `%3D` +- `?`: `%3F` +- `@`: `%40` +- `[`: `%5B` +- `]`: `%5D` These have special meaning within the connection URI. @@ -43,7 +43,7 @@ pe@ce&lo\/3 pe%40ce%26lo\%2F3 ``` -If you are unsure about whether a character should be percent-encoded, it's usually best to encode it anyways. For example, if you are unsure if the `\` character is reserved, you can use it's percent-encoded equivalent, `%5C`, to be safe: +If you are unsure about whether a character should be percent-encoded, it's usually best to encode it anyways. For example, if you are unsure if the `\` character is reserved, you can use it's percent-encoded equivalent, `%5C`, to be safe: ``` pe%40ce%26lo%5C%2F3 @@ -67,20 +67,20 @@ postgres[ql]://[username[:password]@][host[:port],]/database[?parameter_list] |- hostspec ``` -The parts in square brackets indicate optional parts. You may have noticed that most parts of the URI are optional. It might also be apparent that there are many pieces of information you can encode in the URI. +The parts in square brackets indicate optional parts. You may have noticed that most parts of the URI are optional. It might also be apparent that there are many pieces of information you can encode in the URI. A quick description of each of the individual components: -* `postgres[ql]`: The schema identifier. Can be `postgresql` or simply `postgres`. -* `userspec`: An optional component of the URI that can be used to specify the user and password to connect as. - * `username`: An optional username. If included, it should start after the second slash (`/`) and continue until a colon (`:`) (if a password is also provided) or until an at sign (`@`) (if a password is not provided). - * `password`: An optional password. If included, it begins after a colon (`:`) and continues until the at sign (`@`). -* `hostspec`: An optional component used to specify the hostname and port to connect to. - * `host`: An optional IP address, DNS name, or locally resolvable name of the server to connect to. The host continues until a colon (`:`) (if a port is included) or until a slash (if a port is not included) - * `port`: An optional port specification to indicate which port PostgreSQL is listening to on the server. The port begins with a colon (`:`) and continues until the slash (`/`) -* `database name`: The name of the individual database to connect to. -* `parameter list`: An optional list of additional parameters that can affect the connection behavior. The parameter list begins with a question mark (`?`). - * `parameter pairs`: The parameter list is composed of key-value pairs. The key and value within each pair are separated by an equal sign (`=`) and each pair is separated from the next by an ampersand (`&`). +- `postgres[ql]`: The schema identifier. Can be `postgresql` or simply `postgres`. +- `userspec`: An optional component of the URI that can be used to specify the user and password to connect as. + - `username`: An optional username. If included, it should start after the second slash (`/`) and continue until a colon (`:`) (if a password is also provided) or until an at sign (`@`) (if a password is not provided). + - `password`: An optional password. If included, it begins after a colon (`:`) and continues until the at sign (`@`). +- `hostspec`: An optional component used to specify the hostname and port to connect to. + - `host`: An optional IP address, DNS name, or locally resolvable name of the server to connect to. The host continues until a colon (`:`) (if a port is included) or until a slash (if a port is not included) + - `port`: An optional port specification to indicate which port PostgreSQL is listening to on the server. The port begins with a colon (`:`) and continues until the slash (`/`) +- `database name`: The name of the individual database to connect to. +- `parameter list`: An optional list of additional parameters that can affect the connection behavior. The parameter list begins with a question mark (`?`). + - `parameter pairs`: The parameter list is composed of key-value pairs. The key and value within each pair are separated by an equal sign (`=`) and each pair is separated from the next by an ampersand (`&`). Here is an example of a PostgreSQL connection URI that incorporates all of these components: @@ -96,9 +96,9 @@ postgresql://sally:sallyspassword@dbserver.example:5555/userdata?connect_timeout ## Specifying the URI type -The item in a connection URI is usually the protocol specification or application type. Since the URI will be used to connect and authenticate to a PostgreSQL database, we need to use a signifier that signifies that to the applications and libraries we're using. +The item in a connection URI is usually the protocol specification or application type. Since the URI will be used to connect and authenticate to a PostgreSQL database, we need to use a signifier that signifies that to the applications and libraries we're using. -The [PostgreSQL project accepts both `postgresql://` and `postgres://`](https://www.postgresql.org/docs/current/libpq-connect.html#id-1.7.3.8.3.6) as valid URI schema designators. Therefore, you should start your connection URI with either of these two strings: +The [PostgreSQL project accepts both `postgresql://` and `postgres://`](https://www.postgresql.org/docs/current/libpq-connect.html#id-1.7.3.8.3.6) as valid URI schema designators. Therefore, you should start your connection URI with either of these two strings: ``` postgresql:// @@ -109,7 +109,7 @@ The schema designator will ensure that the information that follows is interpret ## Specifying a username and password -The next part of the URI is the user credentials. This is called the `userspec` in the specification. The `userspec` is technically optional, but is typically required if you don't want to rely on defaults configured by either your application or the database. +The next part of the URI is the user credentials. This is called the `userspec` in the specification. The `userspec` is technically optional, but is typically required if you don't want to rely on defaults configured by either your application or the database. If included, the `userspec` begins after the colon and double forward slash (`://`) and ends with an at sign (`@`). @@ -119,19 +119,19 @@ To specify only a username, you can place it in between those two symbols: postgresql://username@ ``` -To specify a username *and* a password, provide the username first, followed by a colon (`:`), and then the password and at sign: +To specify a username _and_ a password, provide the username first, followed by a colon (`:`), and then the password and at sign: ``` postgresql://username:password@ ``` -Applications are able to interpret this data as the `userspec` by noting the inclusion of the terminating at sign (`@`). If only one field is provided (if no colon is present between the slashes and the at sign), it is interpreted as a username. +Applications are able to interpret this data as the `userspec` by noting the inclusion of the terminating at sign (`@`). If only one field is provided (if no colon is present between the slashes and the at sign), it is interpreted as a username. ## Specifying where the server is listening -After the `userspec` comes the `hostspec` which defines where the server is listening. The `hostspec` is, again, optional, but almost always useful if you aren't relying on defaults set in your client or library. +After the `userspec` comes the `hostspec` which defines where the server is listening. The `hostspec` is, again, optional, but almost always useful if you aren't relying on defaults set in your client or library. -The `hostspec` consists of a `host` and an optional `port`. The `host` can either be a locally resolvable host name, a name resolved by an external name system like DNS, or an IP address or other direct address. The port signifies the port number where PostgreSQL is listening. +The `hostspec` consists of a `host` and an optional `port`. The `host` can either be a locally resolvable host name, a name resolved by an external name system like DNS, or an IP address or other direct address. The port signifies the port number where PostgreSQL is listening. To specify that the application should attempt to connect to the default PostgreSQL port (5432) on the local computer, you can use: @@ -145,25 +145,25 @@ If you needed to include a username and password, that information would come fi postgresql://username:password@localhost ``` -To specify a remote server running on a non-standard port, separate those details with a colon. For example, to connect to port 3333 on a host at `198.51.100.22`, you could use: +To specify a remote server running on a non-standard port, separate those details with a colon. For example, to connect to port 3333 on a host at `198.51.100.22`, you could use: ``` postgresql://username:password@198.51.100.22:3333 ``` -You can actually provide more than one host and port pairs, separated by the commas (`,`) to tell the application to try the latter servers if the first cannot be reached. For example, to extend the previous example to include a fallback server listening on port 5555 on `198.51.100.33`, you could use: +You can actually provide more than one host and port pairs, separated by the commas (`,`) to tell the application to try the latter servers if the first cannot be reached. For example, to extend the previous example to include a fallback server listening on port 5555 on `198.51.100.33`, you could use: ``` postgresql://username:password@198.51.100.22:3333,198.51.100.33:5555 ``` -Conforming clients and applications will try to first connect to the server listening at `198.51.100.22:3333`. If that fails, they will try to reach a PostgreSQL database listening on `198.51.100.33:5555`. +Conforming clients and applications will try to first connect to the server listening at `198.51.100.22:3333`. If that fails, they will try to reach a PostgreSQL database listening on `198.51.100.33:5555`. ## Providing the database name -After the `hostspec`, the next piece of data is the database name. While not true for all database management systems, with PostgreSQL, you must connect to a specific database when establishing a connection. +After the `hostspec`, the next piece of data is the database name. While not true for all database management systems, with PostgreSQL, you must connect to a specific database when establishing a connection. -The database name begins with a forward slash (`/`) and proceeds until either the end of the line or a question mark (`?`). You must include the database name if you aren't relying on the default values. +The database name begins with a forward slash (`/`) and proceeds until either the end of the line or a question mark (`?`). You must include the database name if you aren't relying on the default values. To connect to a database called `sales` hosted on a PostgreSQL server listening on `198.51.100.22:3333`, you could type: @@ -173,9 +173,9 @@ postgresql://username:password@198.51.100.22:3333/sales ## Specifying additional parameters -The last part of the connection URI is used to provide additional parameters for the connection. The list of parameters is introduced by a leading question mark (`?`) and continues until the end of the line. +The last part of the connection URI is used to provide additional parameters for the connection. The list of parameters is introduced by a leading question mark (`?`) and continues until the end of the line. -Each parameter listed is defined as a key and value pair joined with an equals sign (`=`). After the first parameter pair, each additional key-value pair is separated by an ampersand (`&`). +Each parameter listed is defined as a key and value pair joined with an equals sign (`=`). After the first parameter pair, each additional key-value pair is separated by an ampersand (`&`). For example, to specify that the client should apply a 10 second timeout for the connection we were previously defining, you could use: @@ -183,7 +183,7 @@ For example, to specify that the client should apply a 10 second timeout for the postgresql://username:password@198.51.100.22:3333/sales?connect_timeout=10 ``` -If you wanted to provide additional parameters, you'd add them afterwards with an ampersand (`&`) between each pair. For instance, we could additionally specify that we require SSL and want to connect only if the server is a primary in a replica set, we could additionally add: +If you wanted to provide additional parameters, you'd add them afterwards with an ampersand (`&`) between each pair. For instance, we could additionally specify that we require SSL and want to connect only if the server is a primary in a replica set, we could additionally add: ``` postgresql://username:password@198.51.100.22:3333/sales?connect_timeout=10&sslmode=require&target_session_attrs=primary @@ -193,10 +193,10 @@ The PostgreSQL documentation has a [full list of parameters](https://www.postgre ## Conclusion -In this guide, we discussed what a PostgreSQL connection URI is, how to interpret the various components, and how to construct your own URIs given a set of connection information. Connection URIs encode all of the information required to connect to a given database within a single string. Because of this flexibility and due to their wide adoption, understanding how to parse and construct those strings can be pretty helpful. +In this guide, we discussed what a PostgreSQL connection URI is, how to interpret the various components, and how to construct your own URIs given a set of connection information. Connection URIs encode all of the information required to connect to a given database within a single string. Because of this flexibility and due to their wide adoption, understanding how to parse and construct those strings can be pretty helpful. -If you are using [Prisma to manage your PostgreSQL database](https://www.prisma.io/docs/concepts/database-connectors/postgresql), you need to set a connection URI within a 'datasource' block in your [Prisma schema file](https://www.prisma.io/docs/concepts/components/prisma-schema). You must provide a [connection URI for the 'url' field](https://www.prisma.io/docs/concepts/database-connectors/postgresql#example) so that Prisma can connect to your database. +If you are using [Prisma to manage your PostgreSQL database](https://www.prisma.io/docs/orm/overview/databases/postgresql), you need to set a connection URI within a 'datasource' block in your [Prisma schema file](https://www.prisma.io/docs/orm/prisma-schema/overview). You must provide a [connection URI for the 'url' field](https://www.prisma.io/docs/orm/overview/databases/postgresql#example) so that Prisma can connect to your database. diff --git a/content/04-postgresql/14-short-guides/04-exporting-schemas.mdx b/content/04-postgresql/14-short-guides/04-exporting-schemas.mdx index 94e789c0..88e1ddf0 100644 --- a/content/04-postgresql/14-short-guides/04-exporting-schemas.mdx +++ b/content/04-postgresql/14-short-guides/04-exporting-schemas.mdx @@ -14,7 +14,7 @@ In this short guide, we'll discuss how to export PostgreSQL database schemas usi -You can use [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). +You can use [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). @@ -118,6 +118,6 @@ Being able to export your schemas allows you to save your database structures ou -You can use [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). +You can use [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. Learn how to add Prisma to an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-postgresql) or how to [start with Prisma from scratch](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-postgresql). diff --git a/content/05-mysql/07-introduction-to-data-types.mdx b/content/05-mysql/07-introduction-to-data-types.mdx index 8a6fefab..63b45da8 100644 --- a/content/05-mysql/07-introduction-to-data-types.mdx +++ b/content/05-mysql/07-introduction-to-data-types.mdx @@ -8,102 +8,102 @@ authors: ['justinellingwood'] ## Introduction -One of the primary features of relational databases in general is the ability to define [schemas](/intro/intro-to-schemas) or [table structures](/mysql/create-and-delete-databases-and-tables#create-tables-within-databases) that exactly specify the format of the data they will contain. This is done by prescribing the columns that these structures contain along with their *data type* and any constraints. +One of the primary features of relational databases in general is the ability to define [schemas](/intro/intro-to-schemas) or [table structures](/mysql/create-and-delete-databases-and-tables#create-tables-within-databases) that exactly specify the format of the data they will contain. This is done by prescribing the columns that these structures contain along with their _data type_ and any constraints. -[Data types](/intro/database-glossary#data-type) specify a general pattern for the data they accept and store. Values must adhere to the requirements that they outline in order to be accepted by MySQL. While it is possible to define custom requirements, data types provide the basic building blocks that allow MySQL to validate input and work with the data using appropriate operations. +[Data types](/intro/database-glossary#data-type) specify a general pattern for the data they accept and store. Values must adhere to the requirements that they outline in order to be accepted by MySQL. While it is possible to define custom requirements, data types provide the basic building blocks that allow MySQL to validate input and work with the data using appropriate operations. -MySQL includes [a wide range of data types](https://dev.mysql.com/doc/refman/8.0/en/data-types.html) that are used to label and validate that values conform to appropriate types. In this guide, we will discuss the most common data types available in MySQL, the different input and output formats they use, and how to configure various fields to meet your applications' needs. +MySQL includes [a wide range of data types](https://dev.mysql.com/doc/refman/8.0/en/data-types.html) that are used to label and validate that values conform to appropriate types. In this guide, we will discuss the most common data types available in MySQL, the different input and output formats they use, and how to configure various fields to meet your applications' needs. ### What are the data types in MySQL? Before going into detail, let's take a broad view of what data types MySQL provides. -MySQL supports a reasonable range of data types suitable for various types of simple and complex data. These include: - -* `TINYINT` -* `SMALLINT` -* `MEDIUMINT` -* `INT` -* `BIGINT` -* `DECIMAL` -* `NUMERIC` -* `FLOAT` -* `DOUBLE` -* `BIT` -* `DATE` -* `DATETIME` -* `TIMESTAMP` -* `TIME` -* `YEAR` -* `CHAR` -* `VARCHAR` -* `BINARY` -* `VARBINARY` -* `BLOB` -* `TEXT` -* `ENUM` -* `SET` -* `GEOMETRY` -* `POINT` -* `LINESTRING` -* `POLYGON` -* `MULTIPOINT` -* `MULTILINESTRING` -* `MULTIPOLYGON` -* `GEOMETRYCOLLECTION` -* `JSON` +MySQL supports a reasonable range of data types suitable for various types of simple and complex data. These include: + +- `TINYINT` +- `SMALLINT` +- `MEDIUMINT` +- `INT` +- `BIGINT` +- `DECIMAL` +- `NUMERIC` +- `FLOAT` +- `DOUBLE` +- `BIT` +- `DATE` +- `DATETIME` +- `TIMESTAMP` +- `TIME` +- `YEAR` +- `CHAR` +- `VARCHAR` +- `BINARY` +- `VARBINARY` +- `BLOB` +- `TEXT` +- `ENUM` +- `SET` +- `GEOMETRY` +- `POINT` +- `LINESTRING` +- `POLYGON` +- `MULTIPOINT` +- `MULTILINESTRING` +- `MULTIPOLYGON` +- `GEOMETRYCOLLECTION` +- `JSON` We'll cover the most common of these in more depth throughout this guide. ### Getting started with MySQL data types -As you get started with types, it's important to remember that types alone are not always a complete solution to data validation, but a component. Other database tools, like [constraints](/intro/database-glossary#constraint) also have a role to play in defining correctness. Still, data types are often the first line of defense against invalid data. +As you get started with types, it's important to remember that types alone are not always a complete solution to data validation, but a component. Other database tools, like [constraints](/intro/database-glossary#constraint) also have a role to play in defining correctness. Still, data types are often the first line of defense against invalid data. -For many cases, the general types provided by MySQL are appropriate for the kinds of data you'll be storing. For example, while you could store the coordinates of a geometric point in two different number columns, the provided [`point` type](https://dev.mysql.com/doc/refman/8.0/en/gis-class-point.html) is purpose built to store and validate exactly this type of information. When choosing types, check to see that you are using the most specific type applicable to your use case. +For many cases, the general types provided by MySQL are appropriate for the kinds of data you'll be storing. For example, while you could store the coordinates of a geometric point in two different number columns, the provided [`point` type](https://dev.mysql.com/doc/refman/8.0/en/gis-class-point.html) is purpose built to store and validate exactly this type of information. When choosing types, check to see that you are using the most specific type applicable to your use case. ## Numbers and numeric values -MySQL includes a range of numeric data types suitable for different scenarios. The appropriate type depends on the exact nature of the values you plan to store as well as your precision requirements. +MySQL includes a range of numeric data types suitable for different scenarios. The appropriate type depends on the exact nature of the values you plan to store as well as your precision requirements. ### Integers -The *integer* data type is a category of types used to store numbers without any fractions or decimals. These can be either positive or negative values, and different integer types can store different ranges of numbers. Integer types with smaller ranges of acceptable values take up less space than those with wider ranges. +The _integer_ data type is a category of types used to store numbers without any fractions or decimals. These can be either positive or negative values, and different integer types can store different ranges of numbers. Integer types with smaller ranges of acceptable values take up less space than those with wider ranges. The basic list of integer types includes the following: | Integer type | Length | Applicable signed range | Applicable unsigned range | -| ------------ | ------- | --------------------------| ------------------------- | +| ------------ | ------- | ------------------------- | ------------------------- | | `TINYINT` | 1 bytes | -128 to 127 | 0 to 255 | | `SMALLINT` | 2 bytes | -32768 to 32767 | 0 to 65535 | | `MEDIUMINT` | 3 bytes | -8388608 to 8388607 | 0 to 16777215 | | `INT` | 4 bytes | -2147483648 to 2147483647 | 0 to 4294967295 | | `BIGINT` | 8 bytes | -2^63 to 2^63-1 | 0 to 2^64-1 | -The types above are limited by their valid range. Any value outside of the range will result in an error. +The types above are limited by their valid range. Any value outside of the range will result in an error. -In addition to the types mentioned above, MySQL also recognizes an alias called `SERIAL`. Marking a column as `SERIAL` will give it these properties: `BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE`. This is used as a shorthand for common primary key column properties. The column will automatically assign a new unique value whenever a record is added. +In addition to the types mentioned above, MySQL also recognizes an alias called `SERIAL`. Marking a column as `SERIAL` will give it these properties: `BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE`. This is used as a shorthand for common primary key column properties. The column will automatically assign a new unique value whenever a record is added. ### Fixed point -Fixed point types are used to control the amount of *precision* or specificity possible for a number with decimals. In MySQL, this can be controlled by manipulating two factors: precision and scale. +Fixed point types are used to control the amount of _precision_ or specificity possible for a number with decimals. In MySQL, this can be controlled by manipulating two factors: precision and scale. -*Precision* is the maximum amount of total digits that a number can have. In contrast, *scale* is the number of digits to the right of the decimal point. By manipulating these numbers, you can control how large the fractional and non-fractional components of a number are allowed to be. +_Precision_ is the maximum amount of total digits that a number can have. In contrast, _scale_ is the number of digits to the right of the decimal point. By manipulating these numbers, you can control how large the fractional and non-fractional components of a number are allowed to be. -These two arguments are used to control arbitrary precision using the *`numeric`* or *`decimal`* data types (these two types are synonymous in MySQL). The `numeric` type takes zero to two arguments. +These two arguments are used to control arbitrary precision using the _`numeric`_ or _`decimal`_ data types (these two types are synonymous in MySQL). The `numeric` type takes zero to two arguments. -With no arguments, the column is defined as having a precision of 10 and a scale of 0. This means that the column can hold up to 10 digits, but none of these can be after the decimal point: +With no arguments, the column is defined as having a precision of 10 and a scale of 0. This means that the column can hold up to 10 digits, but none of these can be after the decimal point: ``` NUMERIC ``` -When a single argument is provided, it is interpreted as the precision of the column with scale set to 0. This effectively allows you to specify the maximum number of digits in an integer-like number (no fractional or decimal components). For example, if you need a 5 digit whole number, you can specify: +When a single argument is provided, it is interpreted as the precision of the column with scale set to 0. This effectively allows you to specify the maximum number of digits in an integer-like number (no fractional or decimal components). For example, if you need a 5 digit whole number, you can specify: ``` NUMERIC(5) ``` -Specify precision followed by scale when configuring a column using both controls. MySQL will round the decimal component of any input to the correct number of digits using the scale number. MySQL will use the precision and scale to determine how many digits are allowed on the left side of the decimal point. If an entry exceeds the allowed number of digits, MySQL will produce an error. +Specify precision followed by scale when configuring a column using both controls. MySQL will round the decimal component of any input to the correct number of digits using the scale number. MySQL will use the precision and scale to determine how many digits are allowed on the left side of the decimal point. If an entry exceeds the allowed number of digits, MySQL will produce an error. For example, we can specify a column with a total precision of 5 and a scale of 2: @@ -113,17 +113,17 @@ NUMERIC(5, 2) This column would have the following behavior: -Input value | Rounded value | Accepted (fits precision)? ------------ | ------------- | --------- -400.28080 | 400.28 | Yes -8.332799 | 8.33 | Yes -11799.799 | 11799.80 | No -11799 | 11799 | No -2802.27 | 2802.27 | No +| Input value | Rounded value | Accepted (fits precision)? | +| ----------- | ------------- | -------------------------- | +| 400.28080 | 400.28 | Yes | +| 8.332799 | 8.33 | Yes | +| 11799.799 | 11799.80 | No | +| 11799 | 11799 | No | +| 2802.27 | 2802.27 | No | ### Floating point -Floating point numbers are another way to express decimal numbers, but without exact, consistent precision. Instead, floating point types only have a concept of a maximum precision which is often related to the architecture and platform of the hardware. +Floating point numbers are another way to express decimal numbers, but without exact, consistent precision. Instead, floating point types only have a concept of a maximum precision which is often related to the architecture and platform of the hardware. For example, to limit a floating point column to 8 digits of precision, you can use the `FLOAT` type, which stores results using 4 bytes with anywhere from 0 to 23 digits of precision: @@ -133,23 +133,23 @@ FLOAT(8) Similarly, the `DOUBLE` type uses 8 bytes to store data and can use precisions of 24 to 53 digits. -Because of these design choices, floating point numbers can work with numbers with large number of decimals efficiently, but not always exactly. The internal representation of numbers may cause slight differences between the input and output. This can cause unexpected behavior when comparing values, doing floating point math, or performing operations that require exact values. +Because of these design choices, floating point numbers can work with numbers with large number of decimals efficiently, but not always exactly. The internal representation of numbers may cause slight differences between the input and output. This can cause unexpected behavior when comparing values, doing floating point math, or performing operations that require exact values. ### Floating point vs numeric -Both floating point numbers provided by types like `FLOAT` and `DOUBLE` and fixed point numbers provided by the `NUMERIC` or `DECIMAL` types can be used to store decimal values. How do you know which one to use? +Both floating point numbers provided by types like `FLOAT` and `DOUBLE` and fixed point numbers provided by the `NUMERIC` or `DECIMAL` types can be used to store decimal values. How do you know which one to use? -The general rule is that if you need exactness in your calculations, the `NUMERIC` type is always the better choice. The `NUMERIC` type will store values exactly as they are provided, meaning that the results are entirely predictable when retrieving or computing over values. The `NUMERIC` type is called arbitrary precision because you specify the amount of precision the type requires and it will store that exact amount of digits in the field. +The general rule is that if you need exactness in your calculations, the `NUMERIC` type is always the better choice. The `NUMERIC` type will store values exactly as they are provided, meaning that the results are entirely predictable when retrieving or computing over values. The `NUMERIC` type is called arbitrary precision because you specify the amount of precision the type requires and it will store that exact amount of digits in the field. -In contrast, types like `FLOAT` and `DOUBLE` are variable precision types. The amount of precision they maintain depends on the input value. When they reach the end of their allowed level of precision, they may round the remaining digits, leading to differences between the submitted and retrieved values. +In contrast, types like `FLOAT` and `DOUBLE` are variable precision types. The amount of precision they maintain depends on the input value. When they reach the end of their allowed level of precision, they may round the remaining digits, leading to differences between the submitted and retrieved values. -So when would you use variable precision types? Variable precision types like `FLOAT` and `DOUBLE` are well suited for scenarios where exact values are not necessary (for example, if they'll be rounded anyways) and when speed is highly valuable. Variable precision will generally offer performance benefits over the `NUMERIC` type. +So when would you use variable precision types? Variable precision types like `FLOAT` and `DOUBLE` are well suited for scenarios where exact values are not necessary (for example, if they'll be rounded anyways) and when speed is highly valuable. Variable precision will generally offer performance benefits over the `NUMERIC` type. ## String types -MySQL's character types and string types can be placed into two categories: *fixed length* and *variable length*. The choice between these two affects how MySQL allocates space for each value and how it validates input. +MySQL's character types and string types can be placed into two categories: _fixed length_ and _variable length_. The choice between these two affects how MySQL allocates space for each value and how it validates input. -The simplest character-based data type within MySQL is the *`char`* type. With no arguments, the `char` type accepts a single character as input: +The simplest character-based data type within MySQL is the _`char`_ type. With no arguments, the `char` type accepts a single character as input: ``` CHAR @@ -163,14 +163,13 @@ CHAR(10) If a string is provided with fewer characters, blank spaces will be appended to pad the length: -Input | # of input characters | Stored value | # of stored characters ------ | --------------------- | ------------ | ---------------------- -'tree' | 4 | 'tree      ' | 10 +| Input | # of input characters | Stored value | # of stored characters | +| ------ | --------------------- | ------------------------------------------ | ---------------------- | +| 'tree' | 4 | 'tree      ' | 10 | +If a string is given with greater than the allowed number of characters, MySQL will raise an error. As an exception to this rule, if the overflowing characters are all spaces, MySQL will simply truncate the excess spaces to fit the field. -If a string is given with greater than the allowed number of characters, MySQL will raise an error. As an exception to this rule, if the overflowing characters are all spaces, MySQL will simply truncate the excess spaces to fit the field. - -The alternative to fixed length character fields are variable length fields. For this, MySQL provides the *`varchar`* type. The `varchar` type stores characters with no fixed size. Unlike `char`, `varchar` cannot be used without specifying the maximum number of characters to store. +The alternative to fixed length character fields are variable length fields. For this, MySQL provides the _`varchar`_ type. The `varchar` type stores characters with no fixed size. Unlike `char`, `varchar` cannot be used without specifying the maximum number of characters to store. By defining a `varchar` with a positive integer, you can set a maximum string length: @@ -180,25 +179,25 @@ VARCHAR(10) This differs from using the `char` type with an integer in that `varchar` will not pad the value if the input does not meet the maximum field length: -Input | # of input characters | Stored value | # of stored characters ------ | --------------------- | ------------ | ---------------------- -'tree' | 4 | 'tree' | 4 +| Input | # of input characters | Stored value | # of stored characters | +| ------ | --------------------- | ------------ | ---------------------- | +| 'tree' | 4 | 'tree' | 4 | -If the string is greater than the maximum length, MySQL will throw an error. The same truncation behavior that's present in `char` fields occurs here: if the overflowing characters are spaces, they will be truncated to fit inside the maximum character length. +If the string is greater than the maximum length, MySQL will throw an error. The same truncation behavior that's present in `char` fields occurs here: if the overflowing characters are spaces, they will be truncated to fit inside the maximum character length. -MySQL also supports the *`binary`* and *`varbinary`* data types. These operate in a similar manner to the `char` and `varchar` types, but store binary strings rather than character strings. This has implications on how they are stored and operated on (for things like comparisons, sorting, etc.). +MySQL also supports the _`binary`_ and _`varbinary`_ data types. These operate in a similar manner to the `char` and `varchar` types, but store binary strings rather than character strings. This has implications on how they are stored and operated on (for things like comparisons, sorting, etc.). For `binary` and `varbinary` types, the integer given when defining the column type represents the number of bytes instead of the number of characters. -Two other data types that MySQL provides for strings and character storage are *`blob`* and *`text`*. These types operate similar to the `varchar` and `varbinary` types respectively and are meant for storing large objects. They operate mostly the same as their counterparts, but have a few differences like not begin able to have default values and requiring a prefix length when creating an index. +Two other data types that MySQL provides for strings and character storage are _`blob`_ and _`text`_. These types operate similar to the `varchar` and `varbinary` types respectively and are meant for storing large objects. They operate mostly the same as their counterparts, but have a few differences like not begin able to have default values and requiring a prefix length when creating an index. ## Booleans MySQL does not actually have a native boolean type to represent true and false values. -MySQL recognizes the types `BOOL` or `BOOLEAN` in an effort for compatibility with other database systems. Its internal implementation, however, [uses a `TINYINT(1)` column to store the values](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html#idm45912181298400) and interprets them as true or false based on a set of rules. +MySQL recognizes the types `BOOL` or `BOOLEAN` in an effort for compatibility with other database systems. Its internal implementation, however, [uses a `TINYINT(1)` column to store the values](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html#idm45912181298400) and interprets them as true or false based on a set of rules. -When interpreting numeric values in a boolean context, the value of `0` is considered false. All non-zero values are considered true. +When interpreting numeric values in a boolean context, the value of `0` is considered false. All non-zero values are considered true. MySQL recognizes the [boolean literals `TRUE` and `FALSE`](https://dev.mysql.com/doc/refman/8.0/en/boolean-literals.html) and converts `TRUE` to 1 and `FALSE` to 0 when storing them. @@ -214,9 +213,9 @@ The `date` type can store a date without an associated time value: DATE ``` -When processing input for `date` columns, MySQL can interpret different formats to determine the correct date to store. However, the component parts must always come in the same sequence: year, month, and then day. The [`STR_TO_DATE()` function](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_str-to-date) is available to help convert other date formats to a format MySQL will interpret correctly. +When processing input for `date` columns, MySQL can interpret different formats to determine the correct date to store. However, the component parts must always come in the same sequence: year, month, and then day. The [`STR_TO_DATE()` function](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_str-to-date) is available to help convert other date formats to a format MySQL will interpret correctly. -When displaying dates, MySQL uses the `YYYY-MM-DD` format. You can use the [`DATE_FORMAT()` function](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format) to format output in other formats. +When displaying dates, MySQL uses the `YYYY-MM-DD` format. You can use the [`DATE_FORMAT()` function](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format) to format output in other formats. The `date` type can store values ranging from `1000-01-01` to `9999-12-31`. @@ -224,37 +223,37 @@ The `date` type can store values ranging from `1000-01-01` to `9999-12-31`. The `time` data type can store a specific time of day without an associated timezone or date. -When processing input for `time` columns, MySQL can interpret multiple formats to determine the correct time to store. When input has colons, it is generally interpreted as `hh:mm:ss`. Any shortened value (using only one column) will be interpreted as using `hh:mm`. When the input does *not* have colons, the time is processed to fill up the smallest value first. For example, `1045` is taken as 10 minutes and 45 seconds. +When processing input for `time` columns, MySQL can interpret multiple formats to determine the correct time to store. When input has colons, it is generally interpreted as `hh:mm:ss`. Any shortened value (using only one column) will be interpreted as using `hh:mm`. When the input does _not_ have colons, the time is processed to fill up the smallest value first. For example, `1045` is taken as 10 minutes and 45 seconds. -MySQL also supports fractional seconds if a decimal point is given. It stores up to 6 digits of precision after the decimal. Values in `time` columns can range from `-838:59:59.000000` to `838:59:59.000000`. +MySQL also supports fractional seconds if a decimal point is given. It stores up to 6 digits of precision after the decimal. Values in `time` columns can range from `-838:59:59.000000` to `838:59:59.000000`. -When displaying time values, MySQL uses the `hh:mm:ss` format. As with dates, a function is provided, called [`TIME_FORMAT()`](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_time-format) to display time values using other formats. +When displaying time values, MySQL uses the `hh:mm:ss` format. As with dates, a function is provided, called [`TIME_FORMAT()`](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_time-format) to display time values using other formats. ## Timestamps and datetime -MySQL can represent [timestamps](https://en.wikipedia.org/wiki/Timestamp), a combination of a date and time used to represent a specific moment in time, in two different variations: using the *`timestamp`* type and the *`datetime`* type. +MySQL can represent [timestamps](https://en.wikipedia.org/wiki/Timestamp), a combination of a date and time used to represent a specific moment in time, in two different variations: using the _`timestamp`_ type and the _`datetime`_ type. -The `datetime` type can represent values from `1000-01-01 00:00:00` to `9999-12-31 23:59:59`. It can also include fractional seconds of up to six digits similar to the `time` type. +The `datetime` type can represent values from `1000-01-01 00:00:00` to `9999-12-31 23:59:59`. It can also include fractional seconds of up to six digits similar to the `time` type. -The `timestamp` type can represent values from `1970-01-01 00:00:01` UTC to `2038-01-19 03:14:07` UTC. It can handle fractional seconds as well. When storing `timestamp` values, all values are converted from the given timezone to UTC for storage and converted back to the local timezone on retrieval. The `datetime` type does not do this. +The `timestamp` type can represent values from `1970-01-01 00:00:01` UTC to `2038-01-19 03:14:07` UTC. It can handle fractional seconds as well. When storing `timestamp` values, all values are converted from the given timezone to UTC for storage and converted back to the local timezone on retrieval. The `datetime` type does not do this. -From MySQL 8.0.19 onward, you can include a timezone offset when storing a `timestamp` to explicitly set the timezone for the stored value. You do this by including a value after the time component, with no space to indicate the offset. The range of accepted values goes from `-14:00` to `+14:00`, which represents the offset of the stored value from UTC. +From MySQL 8.0.19 onward, you can include a timezone offset when storing a `timestamp` to explicitly set the timezone for the stored value. You do this by including a value after the time component, with no space to indicate the offset. The range of accepted values goes from `-14:00` to `+14:00`, which represents the offset of the stored value from UTC. When deciding on whether to store date and time values using `datetime` or `timezone` types, it's often helpful to separate them by what they are best for. -Think of `datetime` values as a specific date and time, in relationship to the calendar and clock wherever it is retrieved. If a person goes to bed at 11pm at night, a `datetime` value can represent that value, regardless of what timezone the person is in currently. +Think of `datetime` values as a specific date and time, in relationship to the calendar and clock wherever it is retrieved. If a person goes to bed at 11pm at night, a `datetime` value can represent that value, regardless of what timezone the person is in currently. -On the other hand, `timezone` values are best at representing a specific moment in time that is unambiguous across timezones. To send a video call invite, a `timezone` value would be able to make sure the meeting occurs at the same time for everyone, regardless of what timezone the participant is in. +On the other hand, `timezone` values are best at representing a specific moment in time that is unambiguous across timezones. To send a video call invite, a `timezone` value would be able to make sure the meeting occurs at the same time for everyone, regardless of what timezone the participant is in. ## Other useful types -Along with the types we covered with some depth above, there are additional types that are useful in specific scenarios. We'll cover these briefly to give you an idea of how to use them and when they may be useful. +Along with the types we covered with some depth above, there are additional types that are useful in specific scenarios. We'll cover these briefly to give you an idea of how to use them and when they may be useful. ### Enumerated and set types -Two related types that allow users to dictate the valid values for a column are the *`enum`* and *`set`* types. +Two related types that allow users to dictate the valid values for a column are the _`enum`_ and _`set`_ types. -The [`enum` type](https://dev.mysql.com/doc/refman/8.0/en/enum.html) is a string type that allows the user to define a collection of valid values when the column is created. Any value that matches one of the defined values is accepted and all other values are rejected. This functions similar to a drop down menu in that a choice can be made from a specific set of options. For example, an `enum` called `season` could be created with the values `winter`, `spring`, `summer`, and `autumn`. +The [`enum` type](https://dev.mysql.com/doc/refman/8.0/en/enum.html) is a string type that allows the user to define a collection of valid values when the column is created. Any value that matches one of the defined values is accepted and all other values are rejected. This functions similar to a drop down menu in that a choice can be made from a specific set of options. For example, an `enum` called `season` could be created with the values `winter`, `spring`, `summer`, and `autumn`. To create an `enum` column, specify the type as `enum`, giving the possible values as strings, separated by commas, inside of a set of parentheses, like this: @@ -262,7 +261,7 @@ To create an `enum` column, specify the type as `enum`, giving the possible valu season ENUM('winter', 'spring', 'summer', 'autumn') ``` -A similar type of user-defined type is the [`set` type](https://dev.mysql.com/doc/refman/8.0/en/set.html). Like the `enum` type, `set` types allow users to specify valid values as strings upon definition. The difference between these two types is that in a `set`, more than one value can be stored for each record. +A similar type of user-defined type is the [`set` type](https://dev.mysql.com/doc/refman/8.0/en/set.html). Like the `enum` type, `set` types allow users to specify valid values as strings upon definition. The difference between these two types is that in a `set`, more than one value can be stored for each record. For example, if you needed a column to a represent the days of the week volunteers are available to work, you could have a `set` column like this: @@ -270,7 +269,7 @@ For example, if you needed a column to a represent the days of the week voluntee availability SET('sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday') ``` -When entering values for the `availability` column we just created, you provide a single string with commas separating all of the days the volunteer is available. For example: +When entering values for the `availability` column we just created, you provide a single string with commas separating all of the days the volunteer is available. For example: ``` 'monday,tuesday,wednesday,thursday,friday' @@ -283,7 +282,7 @@ For `set` types in MySQL, duplicate values in input are always removed and upon ### JSON -MySQL supports columns in [JSON](https://en.wikipedia.org/wiki/JSON) using the `json` type. Data stored as `json` is stored in binary for faster execution and processing so that the server does not have to interpret a string to operate on `JSON` values. +MySQL supports columns in [JSON](https://en.wikipedia.org/wiki/JSON) using the `json` type. Data stored as `json` is stored in binary for faster execution and processing so that the server does not have to interpret a string to operate on `JSON` values. ``` JSON @@ -293,15 +292,15 @@ To operate on `JSON` columns, MySQL provides [a number of functions](https://dev ## Conclusion -In this article, we've covered a lot of the most common data types that are useful when working with MySQL databases. There are [additional types](https://dev.mysql.com/doc/refman/8.0/en/data-types.html) not covered in this guide that are helpful to know about, but these represent a good starting point for most use cases. +In this article, we've covered a lot of the most common data types that are useful when working with MySQL databases. There are [additional types](https://dev.mysql.com/doc/refman/8.0/en/data-types.html) not covered in this guide that are helpful to know about, but these represent a good starting point for most use cases. -It is important to use the type system appropriately so that you can control valid values and operate on data as expected. There are pitfalls you can run into if you choose a type not suited for your data, so giving it thought before you commit to a data type is worthwhile in most cases. +It is important to use the type system appropriately so that you can control valid values and operate on data as expected. There are pitfalls you can run into if you choose a type not suited for your data, so giving it thought before you commit to a data type is worthwhile in most cases. -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to work with your MySQL databases, you can find a mapping between some of the common MySQL and Prisma types in [Prisma's MySQL data connectors docs](https://www.prisma.io/docs/concepts/database-connectors/mysql#type-mapping-between-mysql-to-prisma-schema). +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to work with your MySQL databases, you can find a mapping between some of the common MySQL and Prisma types in [Prisma's MySQL data connectors docs](https://www.prisma.io/docs/orm/overview/databases/mysql#type-mapping-between-mysql-to-prisma-schema). -In the data model used by Prisma schema, data types are represented by [field types](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model#defining-fields). Check out our documentation to learn more. +In the data model used by Prisma schema, data types are represented by [field types](https://www.prisma.io/docs/orm/prisma-schema/data-model/models#defining-fields). Check out our documentation to learn more. @@ -311,21 +310,21 @@ In the data model used by Prisma schema, data types are represented by [field ty The declaration syntax for a `DECIMAL` column is `DECIMAL(M, D)`. The ranges of values for the arguments are as follows: -* *M* is the maximum number of digits (the precision). It has a range of 1 to 65. -* *D* is the number of digits to the right of the decimal point (the scale). It has a range of 0 to 30 and must be no larger than *M*. +- _M_ is the maximum number of digits (the precision). It has a range of 1 to 65. +- _D_ is the number of digits to the right of the decimal point (the scale). It has a range of 0 to 30 and must be no larger than _M_.
How does MySQL store `TEXT`? -The [storage requirements](https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html) for string types in MySQL can be represented in the following table with *L* representing the actual length in bytes of a given string value. +The [storage requirements](https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html) for string types in MySQL can be represented in the following table with _L_ representing the actual length in bytes of a given string value. -| Data Type | Storage Required | -| --------- | ---------------- | -| TINYTEXT | *L* + 1 bytes, where *L* < 2^8 | -| TEXT | *L* + 2 bytes, where *L* < 2^16 | -| MEDIUMTEXT | *L* + 3 bytes, where *L* < 2^24| -| LONGTEXT | *L* + 4 bytes, where *L* < 2^32 | +| Data Type | Storage Required | +| ---------- | ------------------------------- | +| TINYTEXT | _L_ + 1 bytes, where _L_ < 2^8 | +| TEXT | _L_ + 2 bytes, where _L_ < 2^16 | +| MEDIUMTEXT | _L_ + 3 bytes, where _L_ < 2^24 | +| LONGTEXT | _L_ + 4 bytes, where _L_ < 2^32 |
@@ -339,7 +338,7 @@ The example definition syntax of a `VARCHAR` would look like this: ``` VARCHAR(10) -``` +```
@@ -357,8 +356,8 @@ The effective maximum length of a `VARCHAR` is subject to the maximum row size a It has the same storage requirements as all columns: -| Data Type | Storage Required | -| --------- | ---------------- | -| ENUM | 1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum) | +| Data Type | Storage Required | +| --------- | ----------------------------------------------------------------------------------- | +| ENUM | 1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum) |
diff --git a/content/05-mysql/08-column-and-table-constraints.mdx b/content/05-mysql/08-column-and-table-constraints.mdx index 89efcbfa..a1c80d87 100644 --- a/content/05-mysql/08-column-and-table-constraints.mdx +++ b/content/05-mysql/08-column-and-table-constraints.mdx @@ -8,9 +8,9 @@ authors: ['justinellingwood'] ## What are MySQL column and table constraints? -[Constraints](/intro/database-glossary#constraint) are user defined requirements that define what values are valid for a column or table. You can think of them as additional restrictions to narrow in on acceptable values more strictly than [data types](/mysql/introduction-to-data-types) allow. +[Constraints](/intro/database-glossary#constraint) are user defined requirements that define what values are valid for a column or table. You can think of them as additional restrictions to narrow in on acceptable values more strictly than [data types](/mysql/introduction-to-data-types) allow. -Constraints allow you to define qualities that all entries must have, with the server itself enforcing the restrictions upon data entry or update. As an example, it might not make sense for a column representing boiling point of various substances to be lower than its freezing point. A constraint can enforce this type of requirement, even though types would not be able to. +Constraints allow you to define qualities that all entries must have, with the server itself enforcing the restrictions upon data entry or update. As an example, it might not make sense for a column representing boiling point of various substances to be lower than its freezing point. A constraint can enforce this type of requirement, even though types would not be able to. ## Where constraints are defined: column vs table constraints @@ -18,23 +18,23 @@ MySQL allows you to create constraints associated with a specific column or with Almost all constraints can be used in both forms without modification: -Constraint | Column | Table ----------- | ------ | ----- -CHECK | Yes | Yes -NOT NULL | Yes | No* -UNIQUE | Yes | Yes -PRIMARY KEY | Yes | Yes -FOREIGN KEY | No | Yes +| Constraint | Column | Table | +| ----------- | ------ | ----- | +| CHECK | Yes | Yes | +| NOT NULL | Yes | No\* | +| UNIQUE | Yes | Yes | +| PRIMARY KEY | Yes | Yes | +| FOREIGN KEY | No | Yes | -*: `NOT NULL` cannot be used as a table constraint. However, you can approximate the results by using `IS NOT NULL` as the statement within a `CHECK` table constraint. +\*: `NOT NULL` cannot be used as a table constraint. However, you can approximate the results by using `IS NOT NULL` as the statement within a `CHECK` table constraint. Let's look at how column and table constraints differ. ### Column constraints -**Column constraints** are constraints attached to a single [column](/intro/database-glossary#column). They are used to determine whether a proposed value for a column is valid or not. Column constraints are evaluated after the input is validated against basic type requirements (like making sure a value is a whole number for `int` columns). +**Column constraints** are constraints attached to a single [column](/intro/database-glossary#column). They are used to determine whether a proposed value for a column is valid or not. Column constraints are evaluated after the input is validated against basic type requirements (like making sure a value is a whole number for `int` columns). -Column constraints are great for expressing requirements that are limited to a single field. They attach the constraint condition directly to the column involved. For instance, we could model the `age` restriction in a `person` table by adding a constraint after the column name and data type: +Column constraints are great for expressing requirements that are limited to a single field. They attach the constraint condition directly to the column involved. For instance, we could model the `age` restriction in a `person` table by adding a constraint after the column name and data type: ```sql CREATE TABLE person ( @@ -44,11 +44,11 @@ CREATE TABLE person ( ); ``` -This snippet defines a `person` table with one of the columns being an `int` called `age`. The `age` must be greater than or equal to zero. Column constraints are easy to understand because they are added as additional requirements onto the column they affect. +This snippet defines a `person` table with one of the columns being an `int` called `age`. The `age` must be greater than or equal to zero. Column constraints are easy to understand because they are added as additional requirements onto the column they affect. ### Table constraints -The other type of constraint is called a **table constraint**. [Table](/intro/database-glossary#table) constraints can express almost any restrictions that a column constraint can, but can additionally express restrictions that involve more than one column. Instead of being attached to a specific column, table constraints are defined as a separate component of the table and can reference any of the table's columns. +The other type of constraint is called a **table constraint**. [Table](/intro/database-glossary#table) constraints can express almost any restrictions that a column constraint can, but can additionally express restrictions that involve more than one column. Instead of being attached to a specific column, table constraints are defined as a separate component of the table and can reference any of the table's columns. The column constraint we saw earlier could be expressed as a table constraint like this: @@ -61,9 +61,9 @@ CREATE TABLE person ( ); ``` -The same basic syntax is used, but the constraint is listed separately. To take advantage of the ability for table constraints to introduce compound restrictions, we can use the logical `AND` operator to join multiple conditions from different columns. +The same basic syntax is used, but the constraint is listed separately. To take advantage of the ability for table constraints to introduce compound restrictions, we can use the logical `AND` operator to join multiple conditions from different columns. -For example, in a banking database, a table called `qualified_borrowers` might need to check whether individuals have an existing account and the ability to offer collateral in order to qualify for a loan. It might make sense to include both of these in the same check: +For example, in a banking database, a table called `qualified_borrowers` might need to check whether individuals have an existing account and the ability to offer collateral in order to qualify for a loan. It might make sense to include both of these in the same check: ```sql CREATE TABLE qualified_borrowers ( @@ -75,24 +75,25 @@ CREATE TABLE qualified_borrowers ( ); ``` -Here, we use the `CHECK` constraint again to check that the `account_number` is not null and that the loan officer has marked the client as having acceptable collateral by checking the `acceptable_collateral` column. A table constraint is necessary since multiple columns are being checked. +Here, we use the `CHECK` constraint again to check that the `account_number` is not null and that the loan officer has marked the client as having acceptable collateral by checking the `acceptable_collateral` column. A table constraint is necessary since multiple columns are being checked. -Now is a good time to mention that although we'll mainly be using the `CREATE TABLE` SQL command in these examples to create a new table, you can also add constraints to an existing table with [`ALTER TABLE`](https://dev.mysql.com/doc/refman/8.0/en/alter-table.html). When using `ALTER TABLE`, new constraints cause the values currently in the table to be checked against the new constraint. If the values violate the constraint, the constraint cannot be added. +Now is a good time to mention that although we'll mainly be using the `CREATE TABLE` SQL command in these examples to create a new table, you can also add constraints to an existing table with [`ALTER TABLE`](https://dev.mysql.com/doc/refman/8.0/en/alter-table.html). When using `ALTER TABLE`, new constraints cause the values currently in the table to be checked against the new constraint. If the values violate the constraint, the constraint cannot be added. ## Creating names for constraints ### Default constraint names -When you create constraints using the syntax above, MySQL automatically chooses a reasonable, but vague, name. In the case of the `qualified_borrowers` table above, MySQL would name the constraint `qualified_borrowers_chk_1`: +When you create constraints using the syntax above, MySQL automatically chooses a reasonable, but vague, name. In the case of the `qualified_borrowers` table above, MySQL would name the constraint `qualified_borrowers_chk_1`: ```sql INSERT INTO qualified_borrowers VALUES (123, false); ``` + ``` ERROR 3819 (HY000): Check constraint 'qualified_borrowers_chk_1' is violated. ``` -This name gives you information about the table and type of constraint when a constraint is violated. In cases where multiple constraints are present on a table, however, more descriptive names are helpful to help troubleshooting. +This name gives you information about the table and type of constraint when a constraint is violated. In cases where multiple constraints are present on a table, however, more descriptive names are helpful to help troubleshooting. ### Custom constraint names @@ -104,7 +105,7 @@ The basic syntax for adding a custom name is this: CONSTRAINT ``` -For example, if you wanted to name the constraint in the `qualified_borrowers` table `loan_worthiness`, you could instead define the table like this: +For example, if you wanted to name the constraint in the `qualified_borrowers` table `loan_worthiness`, you could instead define the table like this: ```sql CREATE TABLE qualified_borrowers ( @@ -121,6 +122,7 @@ Now, when we violate a constraint, we get our more descriptive label: ```sql INSERT INTO qualified_borrowers VALUES (123, false); ``` + ``` ERROR 3819 (HY000): Check constraint 'loan_worthiness' is violated. ``` @@ -143,7 +145,7 @@ Now that we've covered some of the basics of how constraints work, we can take a [**Check** constraints](/intro/database-glossary#check-constraint) are a general purpose constraint that allows you to specify an expression involving column or table values that evaluates to a boolean. -You've already seen a few examples of check constraints earlier. Check constraints begin with the keyword `CHECK` and then provide an expression enclosed in parentheses. For column constraints, this is placed after the data type declaration. For table constraints, these can be placed anywhere after the columns that they interact with are defined. +You've already seen a few examples of check constraints earlier. Check constraints begin with the keyword `CHECK` and then provide an expression enclosed in parentheses. For column constraints, this is placed after the data type declaration. For table constraints, these can be placed anywhere after the columns that they interact with are defined. For example, we can create a `film_nominations` table that contains films that have been nominated and are eligible for a feature length award for 2019: @@ -158,9 +160,9 @@ CREATE TABLE film_nominations ( ); ``` -We have one column check restraint that checks that the `release_date` is within 2019. Afterwards, we have a table check constraint ensuring that the film has received enough votes to be nominated and that the length qualifies it for the "feature length" category. +We have one column check restraint that checks that the `release_date` is within 2019. Afterwards, we have a table check constraint ensuring that the film has received enough votes to be nominated and that the length qualifies it for the "feature length" category. -When evaluating check constraints, acceptable values evaluate as being true. If the new record's values satisfy all type requirements and constraints, the record will be added to the table: +When evaluating check constraints, acceptable values evaluate as being true. If the new record's values satisfy all type requirements and constraints, the record will be added to the table: ```sql INSERT INTO film_nominations VALUES ( @@ -171,6 +173,7 @@ INSERT INTO film_nominations VALUES ( 45 ); ``` + ``` Query OK, 1 row affected (0.01 sec) ``` @@ -186,15 +189,16 @@ INSERT INTO film_nominations VALUES ( 1 ); ``` + ``` ERROR 3819 (HY000): Check constraint 'film_nominations_chk_2' is violated. ``` -In this case, the film has satisfied every condition except for the number of votes required. MySQL rejects the submission since it does not pass the final table check constraint. +In this case, the film has satisfied every condition except for the number of votes required. MySQL rejects the submission since it does not pass the final table check constraint. ### Not null constraints -The `NOT NULL` constraint is much more focused. It guarantees that values within a column are not null. While this is a simple constraint, it is used very frequently. +The `NOT NULL` constraint is much more focused. It guarantees that values within a column are not null. While this is a simple constraint, it is used very frequently. #### How to add not null constraints in MySQL @@ -207,7 +211,7 @@ CREATE TABLE national_capitals ( ); ``` -In the above example, we have a simple two column table mapping countries to their national capitals. Since both of these are required fields that would not make sense to leave blank, we add the `NOT NULL` constraint. +In the above example, we have a simple two column table mapping countries to their national capitals. Since both of these are required fields that would not make sense to leave blank, we add the `NOT NULL` constraint. Inserting a null value now results in an error: @@ -217,11 +221,12 @@ INSERT INTO national_capitals VALUES ( 'London', ); ``` + ``` ERROR 1048 (23000): Column 'country' cannot be null ``` -The `NOT NULL` constraint functions only as a column constraint (it cannot be used as a table constraint). However, you can easily work around this by using `IS NOT NULL` within a table `CHECK` constraint. +The `NOT NULL` constraint functions only as a column constraint (it cannot be used as a table constraint). However, you can easily work around this by using `IS NOT NULL` within a table `CHECK` constraint. For example, this offers equivalent guarantees using a table constraint: @@ -235,15 +240,15 @@ CREATE TABLE national_capitals ( -When working with Prisma Client, you can control whether each field is [optional or mandatory](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model#optional-and-mandatory-fields) to get equivalent functionality to the `NOT NULL` constraint in PostgreSQL. +When working with Prisma Client, you can control whether each field is [optional or mandatory](https://www.prisma.io/docs/orm/prisma-schema/data-model/models#optional-and-mandatory-fields) to get equivalent functionality to the `NOT NULL` constraint in PostgreSQL. ### Unique constraints -The `UNIQUE` constraint tells MySQL that each value within a column must not be repeated. This is useful in many different scenarios where having the same value in multiple records should be impossible. +The `UNIQUE` constraint tells MySQL that each value within a column must not be repeated. This is useful in many different scenarios where having the same value in multiple records should be impossible. -For example, columns that deals with IDs of any kind should, by definition, have unique values. A social security number, a student or customer ID, or a product UPC (barcode number) would be useless if they were not able to differentiate between specific people or items. +For example, columns that deals with IDs of any kind should, by definition, have unique values. A social security number, a student or customer ID, or a product UPC (barcode number) would be useless if they were not able to differentiate between specific people or items. A `UNIQUE` constraint can be specified at the column level: @@ -266,7 +271,7 @@ CREATE TABLE supplies ( ); ``` -One of the advantages of using `UNIQUE` table constraints is that it allows you to perform uniqueness checks on a combination of columns. This works by specifying two or more columns that MySQL should evaluate together. The values in individual columns may repeat but the combination of values specified must be unique. +One of the advantages of using `UNIQUE` table constraints is that it allows you to perform uniqueness checks on a combination of columns. This works by specifying two or more columns that MySQL should evaluate together. The values in individual columns may repeat but the combination of values specified must be unique. As an example, let's look back at the `national_capitals` table we used before: @@ -286,7 +291,7 @@ CREATE TABLE national_capitals ( ); ``` -This would ensure that both the countries and capitals are only present once in each table. However, [some countries have multiple capitals](https://en.wikipedia.org/wiki/List_of_countries_with_multiple_capitals). This would mean we may have multiple entries with the same `country` value. These wouldn't work with the current design: +This would ensure that both the countries and capitals are only present once in each table. However, [some countries have multiple capitals](https://en.wikipedia.org/wiki/List_of_countries_with_multiple_capitals). This would mean we may have multiple entries with the same `country` value. These wouldn't work with the current design: ```sql INSERT INTO national_capitals VALUES ( @@ -298,6 +303,7 @@ INSERT INTO national_capitals VALUES ( 'La Paz' ); ``` + ``` ERROR 1062 (23000): Duplicate entry 'Bolivia' for key 'national_capitals.country' ``` @@ -324,6 +330,7 @@ INSERT INTO national_capitals VALUES ( 'La Paz' ); ``` + ``` Query OK, 1 row affected (0.00 sec) Query OK, 1 row affected (0.00 sec) @@ -341,6 +348,7 @@ INSERT INTO national_capitals VALUES ( 'Sucre' ); ``` + ``` Query OK, 1 row affected (0.00 sec) ERROR 1062 (23000): Duplicate entry 'Bolivia-Sucre' for key 'national_capitals.country' @@ -348,9 +356,9 @@ ERROR 1062 (23000): Duplicate entry 'Bolivia-Sucre' for key 'national_capitals.c ### Primary key constraints -The [`PRIMARY KEY`](/intro/database-glossary#primary-key) constraint serves a special purpose. It indicates that the column can be used to uniquely identify a record within the table. This means that it must be reliably unique and that every record must have a value in that column. +The [`PRIMARY KEY`](/intro/database-glossary#primary-key) constraint serves a special purpose. It indicates that the column can be used to uniquely identify a record within the table. This means that it must be reliably unique and that every record must have a value in that column. -Primary keys are recommended for every table but not required, and every table may only have one primary key. Primary keys are mainly used to identify, retrieve, modify, or delete individual records within a table. They allow users and administrators to target the operation using an identifier that is guaranteed by MySQL to match exactly one record. +Primary keys are recommended for every table but not required, and every table may only have one primary key. Primary keys are mainly used to identify, retrieve, modify, or delete individual records within a table. They allow users and administrators to target the operation using an identifier that is guaranteed by MySQL to match exactly one record. Let's use the `supplies` table we saw before as an example: @@ -362,7 +370,7 @@ CREATE TABLE supplies ( ); ``` -Here we've identified that the `supply_id` should be unique. If we wanted to use this column as our primary key (guaranteeing uniqueness and a non-null value), we could simply change the `UNIQUE` constraint to `PRIMARY KEY`: +Here we've identified that the `supply_id` should be unique. If we wanted to use this column as our primary key (guaranteeing uniqueness and a non-null value), we could simply change the `UNIQUE` constraint to `PRIMARY KEY`: ```sql CREATE TABLE supplies ( @@ -382,6 +390,7 @@ INSERT INTO supplies VALUES ( ); UPDATE supplies set inventory = 10 WHERE supply_id = 38; ``` + ``` Query OK, 1 row affected (0.00 sec) Query OK, 1 row affected (0.00 sec) @@ -390,7 +399,7 @@ Rows matched: 1 Changed: 1 Warnings: 0 While many tables use a single column as the primary key, it is also possible to create a primary key using a set of columns, as a table constraint. -The `national_capitals` table is a good candidate to demonstrate this. If we wanted to create a primary key using the existing columns, we could replace the `UNIQUE` table constraint with `PRIMARY KEY`: +The `national_capitals` table is a good candidate to demonstrate this. If we wanted to create a primary key using the existing columns, we could replace the `UNIQUE` table constraint with `PRIMARY KEY`: ```sql CREATE TABLE national_capitals ( @@ -402,9 +411,9 @@ CREATE TABLE national_capitals ( ### Foreign key constraints -[**Foreign keys**](/intro/database-glossary#foreign-key) are columns within one table that reference column values within another table. This is desirable and often necessary in a variety of scenarios where tables contain related data. This ability for the database to easily connect and reference data stored in separate tables is one of the primary features of relational databases. +[**Foreign keys**](/intro/database-glossary#foreign-key) are columns within one table that reference column values within another table. This is desirable and often necessary in a variety of scenarios where tables contain related data. This ability for the database to easily connect and reference data stored in separate tables is one of the primary features of relational databases. -For example, you may have a `orders` table to track individual orders and a `customers` table to track contact info and information about your customers. It makes sense to put this information separately since customers may have many orders. However, it also makes sense to be able to easily link the records in these two tables to allow more complex operations. +For example, you may have a `orders` table to track individual orders and a `customers` table to track contact info and information about your customers. It makes sense to put this information separately since customers may have many orders. However, it also makes sense to be able to easily link the records in these two tables to allow more complex operations. #### How to create foreign key constraints in MySQL @@ -419,9 +428,9 @@ CREATE TABLE customers ( ); ``` -This table is pretty simple. It includes columns to store the customer's first name, last name, and phone number. It also specifies an ID column that uses the `PRIMARY KEY` constraint. The `serial` alias is used to automatically generate the next ID in the sequence if an ID is not specified. +This table is pretty simple. It includes columns to store the customer's first name, last name, and phone number. It also specifies an ID column that uses the `PRIMARY KEY` constraint. The `serial` alias is used to automatically generate the next ID in the sequence if an ID is not specified. -For the `orders` table, we want to be able to specify information about individual orders. One essential piece of data is what customer placed the order. We can use a foreign key to link the order to the customer without duplicating information. We do this with the `FOREIGN KEY` constraint, which defines a foreign key relationship to a column in another table: +For the `orders` table, we want to be able to specify information about individual orders. One essential piece of data is what customer placed the order. We can use a foreign key to link the order to the customer without duplicating information. We do this with the `FOREIGN KEY` constraint, which defines a foreign key relationship to a column in another table: ```sql CREATE TABLE orders ( @@ -434,7 +443,7 @@ CREATE TABLE orders ( Here, we are indicating that the `customer` column in the `orders` table has a foreign key relationship with the `customer_id` column in the `customers` table. -We have to ensure that the type of the foreign key column is compatible with the type used in the foreign table. The `customer_id` column in the `customers` table uses the `SERIAL` alias, which stands for `BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE`, so we can use `BIGINT UNSIGNED` as our data type for the `customer` column in the `orders` table to match. +We have to ensure that the type of the foreign key column is compatible with the type used in the foreign table. The `customer_id` column in the `customers` table uses the `SERIAL` alias, which stands for `BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE`, so we can use `BIGINT UNSIGNED` as our data type for the `customer` column in the `orders` table to match. If we try to insert a value into the `orders` table that doesn't reference a valid customer, MySQL will reject it: @@ -445,6 +454,7 @@ INSERT INTO orders VALUES ( 300 ); ``` + ``` ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`prisma`.`orders`, CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`customer`) REFERENCES `customers` (`customer_id`)) ``` @@ -464,6 +474,7 @@ INSERT INTO orders VALUES ( 300 ); ``` + ``` Query OK, 1 row affected (0.00 sec) Query OK, 1 row affected (0.00 sec) @@ -482,7 +493,7 @@ CREATE TABLE example ( -We cover how to define [relations](https://www.prisma.io/docs/concepts/components/prisma-schema/relations) in the Prisma schema in our documentation. +We cover how to define [relations](https://www.prisma.io/docs/orm/prisma-schema/data-model/relations) in the Prisma schema in our documentation. @@ -490,17 +501,17 @@ We cover how to define [relations](https://www.prisma.io/docs/concepts/component One consideration you'll need to think about when defining foreign key constraints is what to do when a referenced table value is deleted or updated. -As an example, let's look at the `customers` and `orders` tables again. We need to specify how we want the system to respond when we delete a customer from the `customers` table when the customer has an associated order in the `orders` table. +As an example, let's look at the `customers` and `orders` tables again. We need to specify how we want the system to respond when we delete a customer from the `customers` table when the customer has an associated order in the `orders` table. We can choose between the following options: -* **RESTRICT**: Choosing to restrict deletions means that MySQL will refuse to delete the `customer` record if it's referenced by a record in the `orders` table. To delete a customer, you will first have to remove any associated records from the `orders` table. Only then will you be able to remove the value from the customer table. This is the default action. -* **CASCADE**: Selecting the cascade option means that when we delete the `customer` record, the records that reference it in the `orders` table are *also* deleted. This is useful in many cases but must be used with care to avoid deleting data by mistake. -* **NO ACTION**: Although some other database systems allow you to defer checks with the `NO ACTION` option, in MySQL, this is equivalent to `RESTRICT`. The system will reject the update or deletion request. -* **SET NULL**: This option tells MySQL to set the referencing columns to `NULL` when the referenced records are removed. So if we delete a customer from the `customers` table, the `customer` column in the `orders` table will be set to `NULL`. -* **SET DEFAULT**: Although some other database systems allow you to set a column to a default value in case of a reference deletion or update, MySQL does not actually allow this action and will not let you define tables using this option. +- **RESTRICT**: Choosing to restrict deletions means that MySQL will refuse to delete the `customer` record if it's referenced by a record in the `orders` table. To delete a customer, you will first have to remove any associated records from the `orders` table. Only then will you be able to remove the value from the customer table. This is the default action. +- **CASCADE**: Selecting the cascade option means that when we delete the `customer` record, the records that reference it in the `orders` table are _also_ deleted. This is useful in many cases but must be used with care to avoid deleting data by mistake. +- **NO ACTION**: Although some other database systems allow you to defer checks with the `NO ACTION` option, in MySQL, this is equivalent to `RESTRICT`. The system will reject the update or deletion request. +- **SET NULL**: This option tells MySQL to set the referencing columns to `NULL` when the referenced records are removed. So if we delete a customer from the `customers` table, the `customer` column in the `orders` table will be set to `NULL`. +- **SET DEFAULT**: Although some other database systems allow you to set a column to a default value in case of a reference deletion or update, MySQL does not actually allow this action and will not let you define tables using this option. -These actions can be specified when defining a foreign key constraint by adding `ON DELETE` followed by the action. So if we want to remove associated orders from our system when a customer is deleted, we could specify that like this: +These actions can be specified when defining a foreign key constraint by adding `ON DELETE` followed by the action. So if we want to remove associated orders from our system when a customer is deleted, we could specify that like this: ```sql CREATE TABLE orders ( @@ -511,10 +522,10 @@ CREATE TABLE orders ( ); ``` -These type of actions can also be applied when *updating* a referenced column instead of deleting one by using `ON UPDATE` instead of `ON DELETE`. +These type of actions can also be applied when _updating_ a referenced column instead of deleting one by using `ON UPDATE` instead of `ON DELETE`. ## Conclusion -In this guide, we covered what constraints are and how they can help you control the data that is entered into your MySQL tables. We discussed the difference between column and table constraints and the increased flexibility offered by using the table format. We then went over what constraints MySQL supports and how to use them in your tables. +In this guide, we covered what constraints are and how they can help you control the data that is entered into your MySQL tables. We discussed the difference between column and table constraints and the increased flexibility offered by using the table format. We then went over what constraints MySQL supports and how to use them in your tables. -Constraints help you define the exact requirements of your table columns and as such, they are indispensable in many scenarios. Understanding the way various constraints work and what scenarios they help you prevent will go a long way into ensuring that you data conforms to the standards you require. Once defined, MySQL can help you enforce constraints automatically to prevent problems before they occur. +Constraints help you define the exact requirements of your table columns and as such, they are indispensable in many scenarios. Understanding the way various constraints work and what scenarios they help you prevent will go a long way into ensuring that you data conforms to the standards you require. Once defined, MySQL can help you enforce constraints automatically to prevent problems before they occur. diff --git a/content/05-mysql/09-inserting-and-modifying-data/01-inserting-and-deleting-data.mdx b/content/05-mysql/09-inserting-and-modifying-data/01-inserting-and-deleting-data.mdx index 50866c5b..4d19985c 100644 --- a/content/05-mysql/09-inserting-and-modifying-data/01-inserting-and-deleting-data.mdx +++ b/content/05-mysql/09-inserting-and-modifying-data/01-inserting-and-deleting-data.mdx @@ -1,16 +1,16 @@ --- title: 'How to insert and delete data in MySQL' metaTitle: 'How to insert, update, and delete data in MySQL tables' -metaDescription: "Learn how to use the `INSERT` and `DELETE` queries to add and remove data from MySQL tables." +metaDescription: 'Learn how to use the `INSERT` and `DELETE` queries to add and remove data from MySQL tables.' metaImage: '/social/generic-mysql.png' authors: ['justinellingwood'] --- ## Introduction -Adding and deleting data are fairly foundational operations that allow you to control what data is maintained by the database. To insert, you specify items that fulfill each of the [column](/intro/database-glossary#column) requirements of the table for each new [row](/intro/database-glossary#row). To remove, you provide the match criteria for rows in the table you wish to delete. +Adding and deleting data are fairly foundational operations that allow you to control what data is maintained by the database. To insert, you specify items that fulfill each of the [column](/intro/database-glossary#column) requirements of the table for each new [row](/intro/database-glossary#row). To remove, you provide the match criteria for rows in the table you wish to delete. -In this article, we'll take a look at how to use the `INSERT` and `DELETE` commands to add or remove data from MySQL [tables](/intro/database-glossary#table). We will cover the syntax as well as slightly more advanced variations like operating on multiple rows in a single statement. +In this article, we'll take a look at how to use the `INSERT` and `DELETE` commands to add or remove data from MySQL [tables](/intro/database-glossary#table). We will cover the syntax as well as slightly more advanced variations like operating on multiple rows in a single statement. ## Reviewing the table's structure @@ -21,6 +21,7 @@ To find the structure of a table called `employee`, you can use the MySQL `DESCR ```sql no-lines DESCRIBE employee; ``` + ``` +-------------+-----------------+------+-----+-------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | @@ -35,11 +36,12 @@ DESCRIBE employee; The output displays the table's column names, data types, and default values, among other information. -An alternative is to show the information that could be used to recreate the table. You can find this information with the `SHOW CREATE TABLE` command: +An alternative is to show the information that could be used to recreate the table. You can find this information with the `SHOW CREATE TABLE` command: ```sql no-lines SHOW CREATE TABLE employee\G ``` + ``` *************************** 1. row *************************** Table: employee @@ -53,13 +55,13 @@ Create Table: CREATE TABLE `employee` ( 1 row in set (0.00 sec) ``` -Here, we use the `\G` terminator to display the output vertically for better readability. Along with the properties manually set during table creation, the output shows any of the values that were set due to MySQL defaults. +Here, we use the `\G` terminator to display the output vertically for better readability. Along with the properties manually set during table creation, the output shows any of the values that were set due to MySQL defaults. These should give you a good idea of the table's structure so that you can insert values correctly. ## Using `INSERT` to add new records to tables -The SQL `INSERT` command is used to add rows of data to an existing table. Once you know the table's structure, you can construct a command that matches the table's columns with the corresponding values you wish to insert for the new record. +The SQL `INSERT` command is used to add rows of data to an existing table. Once you know the table's structure, you can construct a command that matches the table's columns with the corresponding values you wish to insert for the new record. The basic syntax of the command looks like this: @@ -77,11 +79,12 @@ INSERT INTO employee(first_name, last_name) VALUES ('Bob', 'Smith'); ``` -Here, we provide values for the `first_name` and `last_name` columns while leaving the other columns to be populated by their default values. If you query the table, you can see that the new record has been added: +Here, we provide values for the `first_name` and `last_name` columns while leaving the other columns to be populated by their default values. If you query the table, you can see that the new record has been added: ```sql SELECT * FROM employee; ``` + ``` +-------------+------------+-----------+---------------------+ | employee_id | first_name | last_name | last_update | @@ -93,13 +96,13 @@ SELECT * FROM employee; -You can also use the Prisma Client to add data to your tables by issuing a [create query](https://www.prisma.io/docs/concepts/components/prisma-client/crud#create). +You can also use the Prisma Client to add data to your tables by issuing a [create query](https://www.prisma.io/docs/orm/prisma-client/queries/crud#create). ## Using `INSERT` to add multiple rows at once -Inserting records one statement at a time is more time consuming and less efficient than inserting multiple rows at once. MySQL allows you to specify multiple rows to add to the same table. Each new row is encapsulated in parentheses, with each set of parentheses separated by commas. +Inserting records one statement at a time is more time consuming and less efficient than inserting multiple rows at once. MySQL allows you to specify multiple rows to add to the same table. Each new row is encapsulated in parentheses, with each set of parentheses separated by commas. The basic syntax for multi-record insertion looks like this: @@ -121,6 +124,7 @@ VALUES ('Katie', 'Singh'), ('Felipe', 'Espinosa'); ``` + ``` Query OK, 4 rows affected (0.01 sec) Records: 4 Duplicates: 0 Warnings: 0 @@ -128,7 +132,7 @@ Records: 4 Duplicates: 0 Warnings: 0 ## Using `DELETE` to remove rows from tables -The SQL `DELETE` command is used to remove rows from tables, functioning as the complementary action to `INSERT`. In order to remove rows from a table, you must identify the rows you wish to target by providing match criteria within a `WHERE` clause. +The SQL `DELETE` command is used to remove rows from tables, functioning as the complementary action to `INSERT`. In order to remove rows from a table, you must identify the rows you wish to target by providing match criteria within a `WHERE` clause. The basic syntax looks like this: @@ -143,6 +147,7 @@ For instance, to remove every row in our `employee` table that has its `first_na DELETE FROM employee WHERE first_name = 'Abigail'; ``` + ``` Query OK, 1 row affected (0.01 sec) ``` @@ -151,7 +156,7 @@ The return value here indicates that the `DELETE` command was processed with a s -To remove data from your tables using Prisma Client, use a [delete query](https://www.prisma.io/docs/concepts/components/prisma-client/crud#delete). +To remove data from your tables using Prisma Client, use a [delete query](https://www.prisma.io/docs/orm/prisma-client/queries/crud#delete). @@ -165,6 +170,7 @@ For instance, to remove multiple rows by ID, you could type something like this: DELETE FROM employee WHERE employee_id in (3,4); ``` + ``` Query OK, 2 rows affected (0.00 sec) ``` @@ -174,6 +180,7 @@ You can even leave out the `WHERE` clause to remove all of the rows from a given ```sql DELETE FROM employee; ``` + ``` Query OK, 2 rows affected (0.00 sec) ``` @@ -182,12 +189,12 @@ Be aware, however, that using `DELETE` to empty a table of data [may not be as e -Prisma Client uses a separate query called [deleteMany](https://www.prisma.io/docs/concepts/components/prisma-client/crud#deletemany) to delete multiple rows of data at one time. +Prisma Client uses a separate query called [deleteMany](https://www.prisma.io/docs/orm/prisma-client/queries/crud#deletemany) to delete multiple rows of data at one time. ## Conclusion -In this article, we discussed how to insert and remove data from MySQL tables. First, we covered how to find the table's structure to help construct valid data insertion queries. Then we inserted data one at a time and in batches using the `INSERT` command. Finally, we covered the `DELETE` command to remove records from the table according to query conditions. +In this article, we discussed how to insert and remove data from MySQL tables. First, we covered how to find the table's structure to help construct valid data insertion queries. Then we inserted data one at a time and in batches using the `INSERT` command. Finally, we covered the `DELETE` command to remove records from the table according to query conditions. -While fairly basic, the `INSERT` and `DELETE` commands are some of the most useful commands for managing what data your tables actually maintain. Understanding their basic syntax and operation will allow you to add or remove records from your database structures quickly and efficiently. +While fairly basic, the `INSERT` and `DELETE` commands are some of the most useful commands for managing what data your tables actually maintain. Understanding their basic syntax and operation will allow you to add or remove records from your database structures quickly and efficiently. diff --git a/content/05-mysql/09-inserting-and-modifying-data/02-updating-existing-data.mdx b/content/05-mysql/09-inserting-and-modifying-data/02-updating-existing-data.mdx index 203eb8ed..1cf0bf32 100644 --- a/content/05-mysql/09-inserting-and-modifying-data/02-updating-existing-data.mdx +++ b/content/05-mysql/09-inserting-and-modifying-data/02-updating-existing-data.mdx @@ -1,16 +1,16 @@ --- title: 'How to update existing data in MySQL' metaTitle: "How to update existing data in MySQL | Prisma's Data Guide" -metaDescription: "The `UPDATE` command allows you to change the data in existing MySQL records. This guide demonstrates how to use the `UPDATE` operation to modify the values within your tables." +metaDescription: 'The `UPDATE` command allows you to change the data in existing MySQL records. This guide demonstrates how to use the `UPDATE` operation to modify the values within your tables.' metaImage: '/social/generic-mysql.png' authors: ['justinellingwood'] --- ## Introduction -Many database tables manage data that will need to be changed or updated from time to time. The SQL `UPDATE` command can help in these situations by allowing you to change the values stored in [records](/intro/database-glossary#record) within a table. +Many database tables manage data that will need to be changed or updated from time to time. The SQL `UPDATE` command can help in these situations by allowing you to change the values stored in [records](/intro/database-glossary#record) within a table. -To update records, you must provide the columns where changes will occur and their new values. To tell MySQL which records to target, you need to also give match criteria so it can determine which row or rows to change. In this article, we'll discuss how to use `UPDATE` to change the values of your table data one at a time or in bulk. +To update records, you must provide the columns where changes will occur and their new values. To tell MySQL which records to target, you need to also give match criteria so it can determine which row or rows to change. In this article, we'll discuss how to use `UPDATE` to change the values of your table data one at a time or in bulk. ## Using `UPDATE` to modify data @@ -27,9 +27,9 @@ WHERE As shown above, the basic structure involves three separate clauses: -* specifying a table to act on, -* providing the columns you wish to update as well as their new values, and -* defining criteria to determine which records to match +- specifying a table to act on, +- providing the columns you wish to update as well as their new values, and +- defining criteria to determine which records to match When successfully committed, MySQL confirms the action by outputting the number of rows matched and altered: @@ -40,15 +40,15 @@ Rows matched: 1 Changed: 1 Warnings: 0 -To update data with Prisma Client, issue an [update query](https://www.prisma.io/docs/concepts/components/prisma-client/crud#update). +To update data with Prisma Client, issue an [update query](https://www.prisma.io/docs/orm/prisma-client/queries/crud#update). ## Updating records based on values in another table -Updates based on providing new external data are relatively straightforward. You just need to provide the table, the columns, the new values, and the targeting criteria. +Updates based on providing new external data are relatively straightforward. You just need to provide the table, the columns, the new values, and the targeting criteria. -However, you can also use `UPDATE` to conditionally update table values based on information stored in a [joined table](/intro/database-glossary#join). The basic syntax looks like this: +However, you can also use `UPDATE` to conditionally update table values based on information stored in a [joined table](/intro/database-glossary#join). The basic syntax looks like this: ```sql UPDATE , @@ -56,7 +56,7 @@ SET . = . WHERE . = .; ``` -Here, we are updating the value of `column1` in the `table1` table to the value stored in `column1` of `table2`, but only in rows where `column2` of `table1` match `column2` of `table2`. Even though the value is only changing in one table, we need to add both tables to the list of tables that `UPDATE` operates on. The `WHERE` construction specifies the join conditions to integrate the two tables. +Here, we are updating the value of `column1` in the `table1` table to the value stored in `column1` of `table2`, but only in rows where `column2` of `table1` match `column2` of `table2`. Even though the value is only changing in one table, we need to add both tables to the list of tables that `UPDATE` operates on. The `WHERE` construction specifies the join conditions to integrate the two tables. As an example, suppose that we have two tables called `film` and `director`. @@ -92,7 +92,7 @@ VALUES
-These two tables have a relation with `film.director_id` referencing `director.id`. Currently, the `latest_film` for the `director` table is `NULL`. However, we can populate it by with the director's latest film title using the `WHERE` clause to bring to bring the two tables together. +These two tables have a relation with `film.director_id` referencing `director.id`. Currently, the `latest_film` for the `director` table is `NULL`. However, we can populate it by with the director's latest film title using the `WHERE` clause to bring to bring the two tables together. Here, we use a `WITH` clause to create a Common Table Expression (CTE) called `latest_films` that we can reference in our `UPDATE` statement: @@ -123,6 +123,7 @@ If you query the `director` table, it should show you each director's latest fil ```sql SELECT * FROM director; ``` + ``` +----+-------+--------------+ | id | name | latest_film | @@ -136,4 +137,4 @@ SELECT * FROM director; ## Conclusion -In this article, we've demonstrated how to use the `UPDATE` command to alter the values of existing MySQL records. The `UPDATE` command is very flexible when combined with other SQL constructs, allowing you to modify data in interesting ways according to conditions and values found throughout the database. As you get familiar with the operation, you will be able to find new ways of changing your data to match your requirements. +In this article, we've demonstrated how to use the `UPDATE` command to alter the values of existing MySQL records. The `UPDATE` command is very flexible when combined with other SQL constructs, allowing you to modify data in interesting ways according to conditions and values found throughout the database. As you get familiar with the operation, you will be able to find new ways of changing your data to match your requirements. diff --git a/content/05-mysql/09-inserting-and-modifying-data/04-importing-and-exporting-data-in-mysql.mdx b/content/05-mysql/09-inserting-and-modifying-data/04-importing-and-exporting-data-in-mysql.mdx index fe8a2ace..468c59e7 100644 --- a/content/05-mysql/09-inserting-and-modifying-data/04-importing-and-exporting-data-in-mysql.mdx +++ b/content/05-mysql/09-inserting-and-modifying-data/04-importing-and-exporting-data-in-mysql.mdx @@ -32,8 +32,8 @@ mysqldump DB_NAME > OUTPUT_FILE You need to replace the `DB_NAME` and `OUTPUT_FILE` placeholders with the respective values for: -* your **database name** -* the name of the desired **output file** (should end in `.sql` for best interoperability) +- your **database name** +- the name of the desired **output file** (should end in `.sql` for best interoperability) For example, to export data from a local MySQL server from a database called `mydb` into a file called `mydb.sql`, you can use the following command: @@ -57,7 +57,7 @@ To authenticate against the MySQL database server, you can use the following arg | `--user` (short: `-u`) | - | The name of the database user. | | `--password` (short: `-p`) | - | Trigger password prompt. | -For example, if you want to export data from a MySQL database that has the following [connection string](https://www.prisma.io/docs/concepts/database-connectors/mysql): +For example, if you want to export data from a MySQL database that has the following [connection string](https://www.prisma.io/docs/orm/overview/databases/mysql): ``` mysql://opnmyfngbknppm:XXX@ec2-46-137-91-216.eu-west-1.compute.amazonaws.com:5432/d50rgmkqi2ipus @@ -75,18 +75,18 @@ Note that **this command will trigger a prompt where you need to specify the pas There might be cases where you don't want to dump the _entire_ database, for example you might want to: -* dump only the actual data but exclude the [DDL](/intro/database-glossary#data-definition-language) (i.e. the SQL statements that define your database schema like `CREATE TABLE`,...) -* dump only the DDL but exclude the actual data -* exclude specific tables +- dump only the actual data but exclude the [DDL](/intro/database-glossary#data-definition-language) (i.e. the SQL statements that define your database schema like `CREATE TABLE`,...) +- dump only the DDL but exclude the actual data +- exclude specific tables Here's an overview of a few command line options you can use in these scenarios: -| Argument | Default | Description | -| ------------------------------ | -------------------------------- | ------------------------------------------------------------------------------------------------ | +| Argument | Default | Description | +| ------------------------------ | -------------------------------- | ------------------------------------------------------------------------------------------------------------------ | | `--no-create-db` (short: `-n`) | `false` | Exclude any [DDL](https://dev.mysql.com/doc/refman/8.0/en/glossary.html#glos_ddl) statements and export only data. | | `--no-data` (short: `-d`) | `false` | Exclude data and export only [DDL](https://dev.mysql.com/doc/refman/8.0/en/glossary.html#glos_ddl) statements. | -| `--tables` | _includes all tables by default_ | Explicitly specify the names of the tables to be dumped. | -| `--ignore-table` | - | Exclude specific tables from the dump. | +| `--tables` | _includes all tables by default_ | Explicitly specify the names of the tables to be dumped. | +| `--ignore-table` | - | Exclude specific tables from the dump. | ## Importing data from SQL files @@ -100,8 +100,8 @@ Note that your [MySQL installation](https://dev.mysql.com/doc/refman/8.0/en/inst You need to replace the `DB_NAME` and `INPUT_FILE` placeholders with the respective values for: -* your **database name** (a database with that name must be created beforehand!) -* the name of the target **input file** (likely ends in `.sql`) +- your **database name** (a database with that name must be created beforehand!) +- the name of the target **input file** (likely ends in `.sql`) For example: @@ -123,4 +123,4 @@ CREATE DATABASE mydb; ## Conclusion -Exporting data from MySQL and ingesting it again to recreate your data structures and populate databases is a good way to migrate data, back up and recover, or prepare for replication. Understanding how the `mysqldump` and `mysql` tools work together to accomplish this task will help you transfer data across the boundaries of your databases. +Exporting data from MySQL and ingesting it again to recreate your data structures and populate databases is a good way to migrate data, back up and recover, or prepare for replication. Understanding how the `mysqldump` and `mysql` tools work together to accomplish this task will help you transfer data across the boundaries of your databases. diff --git a/content/05-mysql/10-reading-and-querying-data/01-basic-select.mdx b/content/05-mysql/10-reading-and-querying-data/01-basic-select.mdx index 3ad26974..2f4055ac 100644 --- a/content/05-mysql/10-reading-and-querying-data/01-basic-select.mdx +++ b/content/05-mysql/10-reading-and-querying-data/01-basic-select.mdx @@ -1,16 +1,16 @@ --- title: 'How to perform basic queries with `SELECT` in MySQL' metaTitle: "Basic Select | MySQL Basic Queries | Prisma's Data Guide" -metaDescription: "The `SELECT` command is the main way to query the data within tables and views in MySQL. This guide demonstrates the basic syntax and operation of this highly flexible command." +metaDescription: 'The `SELECT` command is the main way to query the data within tables and views in MySQL. This guide demonstrates the basic syntax and operation of this highly flexible command.' metaImage: '/social/generic-mysql.png' authors: ['justinellingwood'] --- ## Introduction -`SELECT` is the most suitable SQL command for when you are trying to query and return information inside of your MySQL tables. As its name implies, it is used to specify criteria used to select matching records from within the database. This is a broadly useful role that is suitable not only for [reading data](/intro/database-glossary#read-operation), but also for targeting updates and other actions. +`SELECT` is the most suitable SQL command for when you are trying to query and return information inside of your MySQL tables. As its name implies, it is used to specify criteria used to select matching records from within the database. This is a broadly useful role that is suitable not only for [reading data](/intro/database-glossary#read-operation), but also for targeting updates and other actions. -In this article, we'll introduce the basic form of the `SELECT` command and demonstrate how to use it to return data. While `SELECT` supports many advanced use cases, we'll stick to some of the simpler forms to demonstrate the basic command structure. +In this article, we'll introduce the basic form of the `SELECT` command and demonstrate how to use it to return data. While `SELECT` supports many advanced use cases, we'll stick to some of the simpler forms to demonstrate the basic command structure. ## The general syntax of the `SELECT` command @@ -22,10 +22,10 @@ SELECT FROM ; This statement is composed of a few different pieces: -* `SELECT`: The `SELECT` command itself. This SQL command indicates that we want to query [tables](/intro/database-glossary#table) or [views](/intro/database-glossary#view) for data they contain. The arguments and clauses surrounding it determine both the contents and the format of the output returned. -* ``: The `SELECT` statement can return entire rows (if specified with the `*` wildcard character) or a subset of the available columns. If you want to output only specific columns, provide the column names you'd like to display, separated by commas. -* `FROM `: The `FROM` keyword is used to indicate the table or view that should be queried. In most simple queries, this consists of a single table that contains the data you're interested in. -* ``: A large number of filters, output modifiers, and conditions can be specified as additions to the `SELECT` command. You can use these to help pinpoint data with specific properties, modify the output formatting, or further process the results. +- `SELECT`: The `SELECT` command itself. This SQL command indicates that we want to query [tables](/intro/database-glossary#table) or [views](/intro/database-glossary#view) for data they contain. The arguments and clauses surrounding it determine both the contents and the format of the output returned. +- ``: The `SELECT` statement can return entire rows (if specified with the `*` wildcard character) or a subset of the available columns. If you want to output only specific columns, provide the column names you'd like to display, separated by commas. +- `FROM `: The `FROM` keyword is used to indicate the table or view that should be queried. In most simple queries, this consists of a single table that contains the data you're interested in. +- ``: A large number of filters, output modifiers, and conditions can be specified as additions to the `SELECT` command. You can use these to help pinpoint data with specific properties, modify the output formatting, or further process the results. ## Specifying columns to display with `SELECT` @@ -37,7 +37,7 @@ For ad hoc querying and during data exploration, one of the most helpful options SELECT * FROM my_table; ``` -This will display all of the records from `my_table` since we do not provide any filtering to narrow the results. All of the columns for each record will be shown in the order that they are defined within the table. +This will display all of the records from `my_table` since we do not provide any filtering to narrow the results. All of the columns for each record will be shown in the order that they are defined within the table. One modification you may want to use if querying a table with many columns is to end your statement with `\G` instead of a semicolon `;`: @@ -45,9 +45,9 @@ One modification you may want to use if querying a table with many columns is to SELECT * FROM my_table\G ``` -The `\G` statement terminator tells MySQL to display the results vertically instead of horizontally, which can improve readability in tables with many columns or long values. You can use `\G` to terminate any statement, not just with `SELECT`. +The `\G` statement terminator tells MySQL to display the results vertically instead of horizontally, which can improve readability in tables with many columns or long values. You can use `\G` to terminate any statement, not just with `SELECT`. -You can also choose to view a subset of available column by specifying them by name. Column names are separated by commas and are displayed in the order in which they are given: +You can also choose to view a subset of available column by specifying them by name. Column names are separated by commas and are displayed in the order in which they are given: ```sql SELECT column2, column1 FROM my_table; @@ -63,21 +63,21 @@ You can optionally set _column aliases_ to modify the name used for columns in t SELECT column1 AS "first column" FROM my_table; ``` -This will show the each of the values for `column1` in `my_table`. However, the column in the output will be labeled as `first column` instead of `column1`. +This will show the each of the values for `column1` in `my_table`. However, the column in the output will be labeled as `first column` instead of `column1`. This is especially useful if the output combines column names from multiple tables that might share names or if it includes computed columns that don't already have a name. ## Defining sort order with `ORDER BY` -The `ORDER BY` clause can be used to sort the resulting rows according to the criteria given. The general syntax looks like this: +The `ORDER BY` clause can be used to sort the resulting rows according to the criteria given. The general syntax looks like this: ```sql SELECT * FROM my_table ORDER BY ; ``` -This will display the values for all columns in all records within `my_table`. The results will be ordered according to the expression represented by the placeholder ``. +This will display the values for all columns in all records within `my_table`. The results will be ordered according to the expression represented by the placeholder ``. -For example, suppose we have a `customer` table that contains columns for `first_name`, `last_name`, `address`, and `phone_number`. If we want to display the results in alphabetical order by `last_name`, we could use the following command: +For example, suppose we have a `customer` table that contains columns for `first_name`, `last_name`, `address`, and `phone_number`. If we want to display the results in alphabetical order by `last_name`, we could use the following command: @@ -141,7 +141,7 @@ SELECT * FROM customer ORDER BY last_name DESC; -You can also sort by multiple columns. Here, we sort first by `last_name`, and then by `first_name` for any columns with the same `last_name` value. Both sorts are in ascending order: +You can also sort by multiple columns. Here, we sort first by `last_name`, and then by `first_name` for any columns with the same `last_name` value. Both sorts are in ascending order: @@ -174,13 +174,13 @@ SELECT * FROM customer ORDER BY last_name, first_name; -You can [sort your results](https://www.prisma.io/docs/concepts/components/prisma-client/filtering-and-sorting) with Prisma Client in much the same way as you would in an SQL query. +You can [sort your results](https://www.prisma.io/docs/orm/prisma-client/queries/filtering-and-sorting) with Prisma Client in much the same way as you would in an SQL query. ## Getting distinct results -If you want to find the range of values for a column in MySQL, you can use the `SELECT DISTINCT` variant. This will display a single row for each distinct value of a column. +If you want to find the range of values for a column in MySQL, you can use the `SELECT DISTINCT` variant. This will display a single row for each distinct value of a column. The basic syntax looks like this: @@ -195,6 +195,7 @@ For example, to display all of the different values for `color` that your `shirt ```sql SELECT DISTINCT color FROM shirt; ``` + ``` +--------+ | color | @@ -215,6 +216,7 @@ For instance, this will display all of the different combinations of `color` and ```sql SELECT DISTINCT color,shirt_size FROM shirt; ``` + ``` +--------+------------+ | color | shirt_size | @@ -236,10 +238,10 @@ This displays every unique combination of `color` and `shirt_size` within the ta -You can filter duplicate rows from your query with Prisma Client by using the [distinct](https://www.prisma.io/docs/concepts/components/prisma-client/aggregation-grouping-summarizing#select-distinct) functionality. +You can filter duplicate rows from your query with Prisma Client by using the [distinct](https://www.prisma.io/docs/orm/prisma-client/queries/aggregation-grouping-summarizing#select-distinct) functionality. ## Conclusion -In this article, we introduced some basic elements of the `SELECT` command to demonstrate how to return data from MySQL tables. There are many more optional clauses that modify the behavior of the command, allowing you to narrow down which results you want, specify the number of rows to return, and more. In later articles, we explore these modifiers to enhance the usefulness of `SELECT`. +In this article, we introduced some basic elements of the `SELECT` command to demonstrate how to return data from MySQL tables. There are many more optional clauses that modify the behavior of the command, allowing you to narrow down which results you want, specify the number of rows to return, and more. In later articles, we explore these modifiers to enhance the usefulness of `SELECT`. diff --git a/content/05-mysql/10-reading-and-querying-data/02-filtering-data.mdx b/content/05-mysql/10-reading-and-querying-data/02-filtering-data.mdx index dc18a420..7306c652 100644 --- a/content/05-mysql/10-reading-and-querying-data/02-filtering-data.mdx +++ b/content/05-mysql/10-reading-and-querying-data/02-filtering-data.mdx @@ -1,24 +1,24 @@ --- title: 'How to filter query results in MySQL' metaTitle: 'Filtering Data in MySQL | How to filter query results' -metaDescription: "This articles shows you how to use filtering clauses to limit the results for `SELECT` commands and other SQL statements." +metaDescription: 'This articles shows you how to use filtering clauses to limit the results for `SELECT` commands and other SQL statements.' metaImage: '/social/generic-mysql.png' authors: ['justinellingwood'] --- ## Introduction -The `SELECT` command is the primary means of retrieving data from a MySQL database. While the basic command allows you to specify the columns you want to display, the table to pull from, and the output format to use, much of the power of `SELECT` comes from its ability to filter results. +The `SELECT` command is the primary means of retrieving data from a MySQL database. While the basic command allows you to specify the columns you want to display, the table to pull from, and the output format to use, much of the power of `SELECT` comes from its ability to filter results. -Filtering queries allows you to return only the results that you're interested in by providing specific criteria that the records must match. There are many different ways to filter queries in SQL and in this guide, we'll introduce some of the most common filtering options available for your MySQL databases: `WHERE`, `GROUP BY`, `HAVING`, and `LIMIT`. +Filtering queries allows you to return only the results that you're interested in by providing specific criteria that the records must match. There are many different ways to filter queries in SQL and in this guide, we'll introduce some of the most common filtering options available for your MySQL databases: `WHERE`, `GROUP BY`, `HAVING`, and `LIMIT`. By familiarizing yourself with these optional clauses, you can learn to construct queries that target the correct data, even in databases with many records. ## Using the `WHERE` clause to define match criteria -One of the most flexible and most common ways of specifying your data requirements is with the `WHERE` clause. The `WHERE` clause provides a way of specifying the requirements that a [record](/intro/database-glossary#record) must meet to match the query. If a record does not satisfy all of the conditions specified by the `WHERE` clause, it is not included in the query results. +One of the most flexible and most common ways of specifying your data requirements is with the `WHERE` clause. The `WHERE` clause provides a way of specifying the requirements that a [record](/intro/database-glossary#record) must meet to match the query. If a record does not satisfy all of the conditions specified by the `WHERE` clause, it is not included in the query results. -The `WHERE` clause works by specifying boolean expressions that are checked against each candidate row of data. If the result of the expression is false, the row will be removed from the results and will not be returned or continue to the next stage of processing. If the result of the expression is true, it satisfies the criteria of the search and will continue on for any further processing as a candidate row. +The `WHERE` clause works by specifying boolean expressions that are checked against each candidate row of data. If the result of the expression is false, the row will be removed from the results and will not be returned or continue to the next stage of processing. If the result of the expression is true, it satisfies the criteria of the search and will continue on for any further processing as a candidate row. The basic syntax of the `WHERE` clause looks like this: @@ -26,71 +26,71 @@ The basic syntax of the `WHERE` clause looks like this: SELECT * FROM WHERE ; ``` -The `` can be anything that results in a boolean value. MySQL does not have a dedicated builtin boolean type and uses the `TINYINT` type to express boolean values instead. MySQL recognizes `BOOLEAN` and `BOOL` as aliases for the `TINYINT` type. +The `` can be anything that results in a boolean value. MySQL does not have a dedicated builtin boolean type and uses the `TINYINT` type to express boolean values instead. MySQL recognizes `BOOLEAN` and `BOOL` as aliases for the `TINYINT` type. -Because of this implementation, [nonzero values are considered true, while `0` is considered false](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html#idm45863523900464). To handle the reverse case, the constant `TRUE` is an alias for `1`, while `FALSE` likewise is an alias for `0`. +Because of this implementation, [nonzero values are considered true, while `0` is considered false](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html#idm45863523900464). To handle the reverse case, the constant `TRUE` is an alias for `1`, while `FALSE` likewise is an alias for `0`. Conditions are often formed using one or more of the following operators: -* `=`: equal to -* `>`: greater than -* `<`: less than -* `>=`: greater than or equal to -* `<=`: less than or equal to -* `<>` or `!=`: not equal -* `<=>`: `NULL`-safe equal to (returns 1 if both values are `NULL` and 0 if just one value is `NULL`) -* `AND`: the logical "and" operator — joins two conditions and returns `TRUE` if both of the conditions are `TRUE` -* `OR`: logical "or" operator — joins two conditions and returns `TRUE` if at least one of the conditions are `TRUE` -* `IN`: value is contained in the list, series, or range that follows -* `BETWEEN`: value is contained within the range the minimum and maximum values that follow, inclusive -* `IS NULL`: matches if value is `NULL` -* `NOT`: negates the boolean value that follows -* `EXISTS`: the query that follows contains results -* `LIKE`: matches against a pattern (using the wildcards `%` to match 0 or more characters and `_` to match a single character) -* [`REGEXP` or `REGEXP_LIKE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html): matches against a pattern using [regular expressions](https://en.wikipedia.org/wiki/Regular_expression) -* `STRCMP`: Compares strings using lexicographical sort to determine which value comes first. +- `=`: equal to +- `>`: greater than +- `<`: less than +- `>=`: greater than or equal to +- `<=`: less than or equal to +- `<>` or `!=`: not equal +- `<=>`: `NULL`-safe equal to (returns 1 if both values are `NULL` and 0 if just one value is `NULL`) +- `AND`: the logical "and" operator — joins two conditions and returns `TRUE` if both of the conditions are `TRUE` +- `OR`: logical "or" operator — joins two conditions and returns `TRUE` if at least one of the conditions are `TRUE` +- `IN`: value is contained in the list, series, or range that follows +- `BETWEEN`: value is contained within the range the minimum and maximum values that follow, inclusive +- `IS NULL`: matches if value is `NULL` +- `NOT`: negates the boolean value that follows +- `EXISTS`: the query that follows contains results +- `LIKE`: matches against a pattern (using the wildcards `%` to match 0 or more characters and `_` to match a single character) +- [`REGEXP` or `REGEXP_LIKE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html): matches against a pattern using [regular expressions](https://en.wikipedia.org/wiki/Regular_expression) +- `STRCMP`: Compares strings using lexicographical sort to determine which value comes first. While the above list represents some of the most common test constructs, there are many other [operators that may yield boolean results](https://dev.mysql.com/doc/refman/8.0/en/functions.html) that can be used in conjunction with a `WHERE` clause. -Prisma Client supports filtering by multiple criteria. Check out our [documentation on filtering](https://www.prisma.io/docs/concepts/components/prisma-client/filtering-and-sorting) to learn more. +Prisma Client supports filtering by multiple criteria. Check out our [documentation on filtering](https://www.prisma.io/docs/orm/prisma-client/queries/filtering-and-sorting) to learn more. ### Examples using `WHERE` -One of the most common and straightforward checks is for equality, using the `=` operator. Here, we check whether each row in the `customer` table has a `last_name` value equal to `Smith`: +One of the most common and straightforward checks is for equality, using the `=` operator. Here, we check whether each row in the `customer` table has a `last_name` value equal to `Smith`: ```sql SELECT * FROM customer WHERE last_name = 'Smith'; ``` -We can add additional conditions to this to create compound expressions using logical operators. This example uses the `AND` clause to add an additional test against the `first_name` column. Valid rows must satisfy both of the given conditions: +We can add additional conditions to this to create compound expressions using logical operators. This example uses the `AND` clause to add an additional test against the `first_name` column. Valid rows must satisfy both of the given conditions: ```sql SELECT * FROM customer WHERE first_name = 'John' AND last_name = 'Smith'; ``` -Similarly, we can check whether any of a series of conditions are met. Here, we check rows from the `address` table to see whether the `zip_code` value is equal to 60626 or the `neighborhood` column is equal to the string "Roger's Park": +Similarly, we can check whether any of a series of conditions are met. Here, we check rows from the `address` table to see whether the `zip_code` value is equal to 60626 or the `neighborhood` column is equal to the string "Roger's Park": ```sql SELECT * FROM address WHERE zip_code = '60626' OR neighborhood = "Roger's Park"; ``` -The `IN` operator can work like an comparison between a number of values, wrapped in parentheses. If there is a match with any of the given values, the expression is `TRUE`: +The `IN` operator can work like an comparison between a number of values, wrapped in parentheses. If there is a match with any of the given values, the expression is `TRUE`: ```sql SELECT * FROM customer WHERE last_name IN ('Smith', 'Johnson', 'Fredrich'); ``` -Here, we check against a string pattern using `LIKE`. The `%` works as a wildcard matching zero or more characters, so "Pete", "Peter", and any other string that begins with "Pete" would match: +Here, we check against a string pattern using `LIKE`. The `%` works as a wildcard matching zero or more characters, so "Pete", "Peter", and any other string that begins with "Pete" would match: ```sql SELECT * FROM customer WHERE last_name LIKE 'Pete%'; ``` -We could do a similar search using the `REGEXP` function to check for matches using regular expressions. In this case, we check whether the value of `last_name` begins with a "d" and contains the substring "on", which would match names like "Dickson", "Donald", and "Devon": +We could do a similar search using the `REGEXP` function to check for matches using regular expressions. In this case, we check whether the value of `last_name` begins with a "d" and contains the substring "on", which would match names like "Dickson", "Donald", and "Devon": ```sql SELECT * FROM customer WHERE last_name REGEXP '^D.*on.*'; @@ -102,7 +102,7 @@ We can check whether a street number is within the 4000 block of addresses using SELECT * FROM address WHERE street_number BETWEEN 4000 AND 4999; ``` -Here, we can display any `customer` entries that have social security numbers that are not 9 digits long. We use the `LENGTH()` function to get the number of digits in the field and the `<>` to check for inequality: +Here, we can display any `customer` entries that have social security numbers that are not 9 digits long. We use the `LENGTH()` function to get the number of digits in the field and the `<>` to check for inequality: ```sql SELECT * FROM customer WHERE LENGTH(SSN) <> 9; @@ -110,24 +110,24 @@ SELECT * FROM customer WHERE LENGTH(SSN) <> 9; ## Using the `GROUP BY` clause to summarize multiple records -The `GROUP BY` clause is another very common way to filter results by representing multiple results with a single row. The basic syntax of the `GROUP BY` clause looks like this: +The `GROUP BY` clause is another very common way to filter results by representing multiple results with a single row. The basic syntax of the `GROUP BY` clause looks like this: ```sql SELECT FROM
GROUP BY ; ``` -When a `GROUP BY` clause is added to a statement, it tells MySQL to display a single row for each unique value for the given column or columns. This has some important implications. +When a `GROUP BY` clause is added to a statement, it tells MySQL to display a single row for each unique value for the given column or columns. This has some important implications. -Since the `GROUP BY` clause is a way of representing multiple rows as a single row, MySQL can only execute the query if it can calculate a value for each of the columns it is tasked with displaying. This means that each column identified by the `SELECT` portion of the statement has to either be: +Since the `GROUP BY` clause is a way of representing multiple rows as a single row, MySQL can only execute the query if it can calculate a value for each of the columns it is tasked with displaying. This means that each column identified by the `SELECT` portion of the statement has to either be: -* included in the `GROUP BY` clause to guarantee that each row has a unique value -* abstracted to summarize all of the rows within each group +- included in the `GROUP BY` clause to guarantee that each row has a unique value +- abstracted to summarize all of the rows within each group Practically speaking, this means that any columns in the `SELECT` list not included in the `GROUP BY` clause must use an aggregate function to produce a single result for the column for each group. -If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), you can use [aggregations](https://www.prisma.io/docs/concepts/components/prisma-client/aggregation-grouping-summarizing) to compute over and summarize values. +If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), you can use [aggregations](https://www.prisma.io/docs/orm/prisma-client/queries/aggregation-grouping-summarizing) to compute over and summarize values. @@ -156,11 +156,12 @@ INSERT INTO pet (type, name, color, age) VALUES ('rabbit', 'Briony', 'brown', 6); ``` -The simplest use of `GROUP BY` is to display the range of unique values for a single column. To do so, use the same column in `SELECT` and `GROUP BY`. Here, we see all of the colors used in the table: +The simplest use of `GROUP BY` is to display the range of unique values for a single column. To do so, use the same column in `SELECT` and `GROUP BY`. Here, we see all of the colors used in the table: ```sql SELECT color FROM pet GROUP BY color; ``` + ``` +--------+ | color | @@ -176,11 +177,12 @@ SELECT color FROM pet GROUP BY color; As you move beyond a single column in the `SELECT` column list, you must either add the columns to the `GROUP BY` clause or use an aggregate function to produce a single value for the group of rows being represented. -Here, we add `type` to the `GROUP BY` clause, meaning that each row will represent a unique combination of `type` and `color` values. We also add the `age` column, summarized by the `avg()` function to find the average age of each of the groups: +Here, we add `type` to the `GROUP BY` clause, meaning that each row will represent a unique combination of `type` and `color` values. We also add the `age` column, summarized by the `avg()` function to find the average age of each of the groups: ```sql SELECT type, color, avg(age) AS average_age FROM pet GROUP BY type, color; ``` + ``` +--------+--------+-------------+ | type | color | average_age | @@ -196,11 +198,12 @@ SELECT type, color, avg(age) AS average_age FROM pet GROUP BY type, color; 7 rows in set (0.00 sec) ``` -Aggregate functions work just as well with a single column in the `GROUP BY` clause. Here, we find the average age of each type of animal: +Aggregate functions work just as well with a single column in the `GROUP BY` clause. Here, we find the average age of each type of animal: ```sql SELECT type, avg(age) AS average_age FROM pet GROUP BY type; ``` + ``` +--------+-------------+ | type | average_age | @@ -212,11 +215,12 @@ SELECT type, avg(age) AS average_age FROM pet GROUP BY type; 3 rows in set (0.00 sec) ``` -If we want to display the oldest of each type of animal, we could instead use the `max()` function on the `age` column. The `GROUP BY` clause collapses the results into the same rows as before, but the new function alters the result in the other column: +If we want to display the oldest of each type of animal, we could instead use the `max()` function on the `age` column. The `GROUP BY` clause collapses the results into the same rows as before, but the new function alters the result in the other column: ```sql SELECT type, max(age) AS oldest FROM pet GROUP BY type; ``` + ``` +--------+--------+ | type | oldest | @@ -230,7 +234,7 @@ SELECT type, max(age) AS oldest FROM pet GROUP BY type; ## Using the `HAVING` clause to filter groups of records -The `GROUP BY` clause is a way to summarize data by collapsing multiple records into a single representative row. But what if you want to narrow these groups based on additional factors? +The `GROUP BY` clause is a way to summarize data by collapsing multiple records into a single representative row. But what if you want to narrow these groups based on additional factors? The `HAVING` clause is a modifier for the `GROUP BY` clause that lets you specify conditions that each group must satisfy to be included in the results. @@ -246,11 +250,12 @@ The operation is very similar to the `WHERE` clause, with the difference being t Using the same table we introduced in the last section, we can demonstrate how the `HAVING` clause works. -Here, we group the rows of the `pet` table by unique values in the `type` column, finding the minimum value of `age` as well. The `HAVING` clause then filters the results to remove any groups where the age is not greater than 1: +Here, we group the rows of the `pet` table by unique values in the `type` column, finding the minimum value of `age` as well. The `HAVING` clause then filters the results to remove any groups where the age is not greater than 1: ```sql SELECT type, min(age) AS youngest FROM pet GROUP BY type HAVING min(age) > 1; ``` + ``` +--------+----------+ | type | youngest | @@ -261,12 +266,12 @@ SELECT type, min(age) AS youngest FROM pet GROUP BY type HAVING min(age) > 1; 2 rows in set (0.00 sec) ``` -In this example, we group the rows in `pet` by their color. We then filter the groups that only represent a single row. The result shows us every color that appears more than once: - +In this example, we group the rows in `pet` by their color. We then filter the groups that only represent a single row. The result shows us every color that appears more than once: ```sql SELECT color FROM pet GROUP BY color HAVING count(color) > 1; ``` + ``` +-------+ | color | @@ -282,6 +287,7 @@ We can perform a similar query to get the combinations of `type` and `color` tha ```sql SELECT type, color FROM pet GROUP BY type, color HAVING count(color) = 1; ``` + ``` +--------+--------+ | type | color | @@ -297,7 +303,7 @@ SELECT type, color FROM pet GROUP BY type, color HAVING count(color) = 1; ## Using the `LIMIT` clause to set the maximum number of records -The `LIMIT` clause offers a different approach to paring down the records your query returns. Rather than eliminating rows of data based on criteria within the row itself, the `LIMIT` clause sets the maximum number of records returned by a query. +The `LIMIT` clause offers a different approach to paring down the records your query returns. Rather than eliminating rows of data based on criteria within the row itself, the `LIMIT` clause sets the maximum number of records returned by a query. The basic syntax of `LIMIT` looks like this: @@ -305,13 +311,13 @@ The basic syntax of `LIMIT` looks like this: SELECT * FROM
LIMIT [OFFSET ]; ``` -Here, the `` indicates the maximum number of rows to display from the executed query. This is often used in conjunction with `ORDER BY` clauses to get the rows with the most extreme values in a certain column. For example, to get the five best scores on an exam, a user could `ORDER BY` a `score` column and then `LIMIT` the results to 5. +Here, the `` indicates the maximum number of rows to display from the executed query. This is often used in conjunction with `ORDER BY` clauses to get the rows with the most extreme values in a certain column. For example, to get the five best scores on an exam, a user could `ORDER BY` a `score` column and then `LIMIT` the results to 5. -While `LIMIT` counts from the top of the results by default, the optional `OFFSET` keyword can be used to offset the starting position it uses. In effect, this allows you to paginate through results by displaying the number of results defined by `LIMIT` and then adding the `LIMIT` number to the `OFFSET` to retrieve the following page. +While `LIMIT` counts from the top of the results by default, the optional `OFFSET` keyword can be used to offset the starting position it uses. In effect, this allows you to paginate through results by displaying the number of results defined by `LIMIT` and then adding the `LIMIT` number to the `OFFSET` to retrieve the following page. -If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), you can use [pagination](https://www.prisma.io/docs/concepts/components/prisma-client/pagination) to iterate through results. +If you are connecting to your database with [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), you can use [pagination](https://www.prisma.io/docs/orm/prisma-client/queries/pagination) to iterate through results. @@ -319,11 +325,12 @@ If you are connecting to your database with [Prisma Client](https://www.prisma.i We will use the `pet` table from earlier for the examples in this section. -As mentioned above, `LIMIT` is often combined with an `ORDER BY` clause to explicitly define the ordering of the rows before slicing the appropriate number. Here, we sort the `pet` entries according to their `age`, from oldest to youngest. We then use `LIMIT` to display the top 5 oldest animals: +As mentioned above, `LIMIT` is often combined with an `ORDER BY` clause to explicitly define the ordering of the rows before slicing the appropriate number. Here, we sort the `pet` entries according to their `age`, from oldest to youngest. We then use `LIMIT` to display the top 5 oldest animals: ```sql SELECT * FROM pet ORDER BY age DESC LIMIT 5; ``` + ``` +----+--------+---------+--------+------+ | id | type | name | color | age | @@ -337,11 +344,12 @@ SELECT * FROM pet ORDER BY age DESC LIMIT 5; 5 rows in set (0.00 sec) ``` -If we need a record for any single `dog` within the table, we could construct a query like this. Keep in mind that while the result might be difficult to predict, this is not a random selection and should not be used as such: +If we need a record for any single `dog` within the table, we could construct a query like this. Keep in mind that while the result might be difficult to predict, this is not a random selection and should not be used as such: ```sql SELECT * FROM pet WHERE type = 'dog' LIMIT 1; ``` + ``` +----+------+------+-------+------+ | id | type | name | color | age | @@ -351,13 +359,14 @@ SELECT * FROM pet WHERE type = 'dog' LIMIT 1; 1 row in set (0.00 sec) ``` -We can use the `OFFSET` clause to paginate through results. We include an `ORDER BY` clause to define a specific order for the results. +We can use the `OFFSET` clause to paginate through results. We include an `ORDER BY` clause to define a specific order for the results. For the first query, we limit the results without specifying an `OFFSET` to get the first 3 youngest entries: ```sql SELECT * FROM pet ORDER BY age LIMIT 3; ``` + ``` +----+--------+---------+-------+------+ | id | type | name | color | age | @@ -374,6 +383,7 @@ To get the next 3 youngest, we can add the number defined in `LIMIT` to the `OFF ```sql SELECT * FROM pet ORDER BY age LIMIT 3 OFFSET 3; ``` + ``` +----+--------+---------+-------+------+ | id | type | name | color | age | @@ -390,6 +400,7 @@ If we add the `LIMIT` to the `OFFSET` again, we'll get the next 3 results: ```sql SELECT * FROM pet ORDER BY age LIMIT 3 OFFSET 6; ``` + ``` +----+--------+---------+--------+------+ | id | type | name | color | age | @@ -405,6 +416,6 @@ This lets us retrieve rows of data from a query in manageable chunks. ## Conclusion -Most of the time, when retrieving data from MySQL tables, you will likely be applying filtering conditions to pick out the appropriate records. Whether that is an unambiguous `WHERE` clause that matches a specific `id` using the `=` equality operator, or a `GROUP BY` clause that helps you summarize multiple records in a single value, filtering data is a normal part of working with records. +Most of the time, when retrieving data from MySQL tables, you will likely be applying filtering conditions to pick out the appropriate records. Whether that is an unambiguous `WHERE` clause that matches a specific `id` using the `=` equality operator, or a `GROUP BY` clause that helps you summarize multiple records in a single value, filtering data is a normal part of working with records. -Understanding how to use these optional clauses to evaluate potential data against your criteria and mould the results accordingly allows MySQL to do the selection work for you. Using these constructs, you can extract useful information from large, semi-organized collections of data. +Understanding how to use these optional clauses to evaluate potential data against your criteria and mould the results accordingly allows MySQL to do the selection work for you. Using these constructs, you can extract useful information from large, semi-organized collections of data. diff --git a/content/05-mysql/10-reading-and-querying-data/03-joining-tables.mdx b/content/05-mysql/10-reading-and-querying-data/03-joining-tables.mdx index 9c116c72..b64e3ce4 100644 --- a/content/05-mysql/10-reading-and-querying-data/03-joining-tables.mdx +++ b/content/05-mysql/10-reading-and-querying-data/03-joining-tables.mdx @@ -1,20 +1,20 @@ --- title: 'Using joins to combine data from different tables in MySQL' -metaTitle: "Joining tables in MySQL | Combine data from different tables" -metaDescription: "Joining tables allows you to associate and bring together related data stored in different tables in MySQL." +metaTitle: 'Joining tables in MySQL | Combine data from different tables' +metaDescription: 'Joining tables allows you to associate and bring together related data stored in different tables in MySQL.' metaImage: '/social/generic-mysql.png' authors: ['justinellingwood'] --- ## Introduction -Though it's often useful to separate data into discrete tables for performance and consistency purposes, you often need to consult data from multiple tables to answer certain requests. *Joining* tables is a way of combining the data from various tables by matching each record based on common field values. +Though it's often useful to separate data into discrete tables for performance and consistency purposes, you often need to consult data from multiple tables to answer certain requests. _Joining_ tables is a way of combining the data from various tables by matching each record based on common field values. -There are a few different types of joins, which offer various ways of combining table records. In this article, we'll cover how MySQL implements joins and discuss the scenarios in which each is most useful. +There are a few different types of joins, which offer various ways of combining table records. In this article, we'll cover how MySQL implements joins and discuss the scenarios in which each is most useful. ## What are joins? -In short, [*joins*](/intro/database-glossary#join) are a way of displaying data from multiple tables. They do this by stitching together records from different sources based on matching values in certain columns. Each resulting row consists of a record from the first table combined with a row from the second table, based on one or more columns in each table having the same value. +In short, [_joins_](/intro/database-glossary#join) are a way of displaying data from multiple tables. They do this by stitching together records from different sources based on matching values in certain columns. Each resulting row consists of a record from the first table combined with a row from the second table, based on one or more columns in each table having the same value. The basic syntax of a join looks like this: @@ -27,21 +27,21 @@ FROM ; ``` -In a join, each resulting row is constructed by including all of the columns of the first table followed by all of the columns from the second table. The `SELECT` portion of the query can be used to specify the exact columns you wish to display. +In a join, each resulting row is constructed by including all of the columns of the first table followed by all of the columns from the second table. The `SELECT` portion of the query can be used to specify the exact columns you wish to display. -Multiple rows may be constructed from the original tables if the values in the columns used for comparison are not unique. For example, imagine you have a column being compared from the first table that has two records with a value of "red". Matched with this is a column from the second table that has three rows with that value. The join will produce six different rows for that value representing the various combinations that can be achieved. +Multiple rows may be constructed from the original tables if the values in the columns used for comparison are not unique. For example, imagine you have a column being compared from the first table that has two records with a value of "red". Matched with this is a column from the second table that has three rows with that value. The join will produce six different rows for that value representing the various combinations that can be achieved. -The type of join and the join conditions determine how each row that is displayed is constructed. This impacts what happens to the rows from each table that do and do *not* have a match on the join condition. +The type of join and the join conditions determine how each row that is displayed is constructed. This impacts what happens to the rows from each table that do and do _not_ have a match on the join condition. -For the sake of convenience, many joins match the primary key on one table with an associated foreign key on the second table. Although primary and foreign keys are only used by the database system to maintain consistency guarantees, their relationship often makes them a good candidate for join conditions. +For the sake of convenience, many joins match the primary key on one table with an associated foreign key on the second table. Although primary and foreign keys are only used by the database system to maintain consistency guarantees, their relationship often makes them a good candidate for join conditions. ## Different types of joins -Various types of joins are available, each of which will potentially produce different results. Understanding how each type is constructed will help you determine which is appropriate for different scenarios. +Various types of joins are available, each of which will potentially produce different results. Understanding how each type is constructed will help you determine which is appropriate for different scenarios. ### Inner and cross joins -The default join is called an [*inner join*](/intro/database-glossary#inner-join). In MySQL, this can be specified using either `INNER JOIN`, just `JOIN`, or `CROSS JOIN`. For other database systems, `INNER JOIN` and `CROSS JOIN` are often two separate concepts, but MySQL implements them in the same construct. +The default join is called an [_inner join_](/intro/database-glossary#inner-join). In MySQL, this can be specified using either `INNER JOIN`, just `JOIN`, or `CROSS JOIN`. For other database systems, `INNER JOIN` and `CROSS JOIN` are often two separate concepts, but MySQL implements them in the same construct. Here is a typical example demonstrating the syntax of an inner join: @@ -54,13 +54,13 @@ FROM ON table_1.id = table_2.table_1_id; ``` -An inner join is the most restrictive type of join because it only displays rows created by combining rows from each table. Any rows in the constituent tables that did not have a matching counterpart in the other table are removed from the results. For example, if the first table has a value of "blue" in the comparison column, and the second table has no record with that value, that row will be suppressed from the output. +An inner join is the most restrictive type of join because it only displays rows created by combining rows from each table. Any rows in the constituent tables that did not have a matching counterpart in the other table are removed from the results. For example, if the first table has a value of "blue" in the comparison column, and the second table has no record with that value, that row will be suppressed from the output. -If you represent the results as a Venn diagram of the component tables, an inner join allows you to represent the overlapping area of the two circles. None of values that only existed in one of the tables are displayed. +If you represent the results as a Venn diagram of the component tables, an inner join allows you to represent the overlapping area of the two circles. None of values that only existed in one of the tables are displayed. -As mentioned above, MySQL also uses this format to produce cross joins. In MySQL, you can produce a cross join using an inner join without any match conditions. A cross join does not use any comparisons to determine whether the rows in each table match one another. Instead, results are constructed by simply adding each of the rows from the first table to each of the rows of the second table. +As mentioned above, MySQL also uses this format to produce cross joins. In MySQL, you can produce a cross join using an inner join without any match conditions. A cross join does not use any comparisons to determine whether the rows in each table match one another. Instead, results are constructed by simply adding each of the rows from the first table to each of the rows of the second table. -This produces a Cartesian product of the rows in two or more tables. In effect, this style of join combines rows from each table unconditionally. So, if each table has three rows, the resulting table would have nine rows containing all of the columns from both tables. +This produces a Cartesian product of the rows in two or more tables. In effect, this style of join combines rows from each table unconditionally. So, if each table has three rows, the resulting table would have nine rows containing all of the columns from both tables. For example, if you have a table called `t1` combined with a table called `t2`, each with rows `r1`, `r2`, and `r3`, the result would be nine rows combined like so: @@ -78,7 +78,7 @@ t1.r3 + t2.r3 ### Left join -A [left join](/intro/database-glossary#left-join) is a join that shows all of the records found in an inner join, plus all of the *unmatched* rows from the first table. In MYSQL, this can be specified as a `LEFT OUTER JOIN` or as just a `LEFT JOIN`. +A [left join](/intro/database-glossary#left-join) is a join that shows all of the records found in an inner join, plus all of the _unmatched_ rows from the first table. In MYSQL, this can be specified as a `LEFT OUTER JOIN` or as just a `LEFT JOIN`. The basic syntax of a left join follows this pattern: @@ -91,13 +91,13 @@ LEFT JOIN table_2 ON table_1.id = table_2.table_1_id; ``` -A left join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from the first table are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the second table. +A left join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from the first table are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the second table. -If you represent the results as a Venn diagram of the component tables, a left join allows you to represent the entire left circle. The parts of the left circle represented by the intersection between the two circles will have additional data supplemented by the right table. +If you represent the results as a Venn diagram of the component tables, a left join allows you to represent the entire left circle. The parts of the left circle represented by the intersection between the two circles will have additional data supplemented by the right table. ### Right join -A [right join](/intro/database-glossary#right-join) is a join that shows all of the records found in an inner join, plus all of the *unmatched* rows from the second table. In MySQL, this can be specified as a `RIGHT OUTER JOIN` or as just a `RIGHT JOIN`. +A [right join](/intro/database-glossary#right-join) is a join that shows all of the records found in an inner join, plus all of the _unmatched_ rows from the second table. In MySQL, this can be specified as a `RIGHT OUTER JOIN` or as just a `RIGHT JOIN`. The basic syntax of a right join follows this pattern: @@ -110,17 +110,17 @@ RIGHT JOIN table_2 ON table_1.id = table_2.table_1_id; ``` -A right join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from the second table are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the first table. +A right join is constructed by first performing an inner join to construct rows from all of the matching records in both tables. Afterwards, the unmatched records from the second table are also included. Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the first table. -If you represent the results as a Venn diagram of the component tables, a right join allows you to represent the entire right circle. The parts of the right circle represented by the intersection between the two circles will have additional data supplemented by the left table. +If you represent the results as a Venn diagram of the component tables, a right join allows you to represent the entire right circle. The parts of the right circle represented by the intersection between the two circles will have additional data supplemented by the left table. For portability reasons, MySQL recommends you use left joins instead of right joins where possible. ### Full join -A [full join](/intro/database-glossary#outer-join) is a join that shows all of the records found in an inner join, plus all of the *unmatched* rows from both component tables. MySQL does not natively implement full joins, but we can emulate the behavior using a few tricks. +A [full join](/intro/database-glossary#outer-join) is a join that shows all of the records found in an inner join, plus all of the _unmatched_ rows from both component tables. MySQL does not natively implement full joins, but we can emulate the behavior using a few tricks. -To replicate the results of a full outer join, we will perform a left join to all of the results that are shared by both tables and all of the unmatched rows from the left table. Then we will use the [`UNION ALL` set operator](https://dev.mysql.com/doc/refman/8.0/en/union.html#union-distinct-all) to combine those results with an "anti-join" for the right table. An "anti-join" is a join operation that exclusively finds the results *not* in common between tables. +To replicate the results of a full outer join, we will perform a left join to all of the results that are shared by both tables and all of the unmatched rows from the left table. Then we will use the [`UNION ALL` set operator](https://dev.mysql.com/doc/refman/8.0/en/union.html#union-distinct-all) to combine those results with an "anti-join" for the right table. An "anti-join" is a join operation that exclusively finds the results _not_ in common between tables. The basic syntax of a full join follows this pattern: @@ -145,15 +145,15 @@ UNION ALL Since each row in a join includes the columns of both tables, the unmatched columns use `NULL` as the value for all of the columns in the unmatched other table. -If you represent the results as a Venn diagram of the component tables, a full join allows you to represent both of the component circles entirely. The intersection of the two circles will have values supplied by each of the component tables. The parts of the circles outside of the overlapping area will have the values from the table they belong to, using `NULL` to fill in the columns found in the other table. +If you represent the results as a Venn diagram of the component tables, a full join allows you to represent both of the component circles entirely. The intersection of the two circles will have values supplied by each of the component tables. The parts of the circles outside of the overlapping area will have the values from the table they belong to, using `NULL` to fill in the columns found in the other table. ### Self join -A self join is any join that combines the rows of a table with itself. It may not be immediately apparent how this could be useful, but it actually has many common applications. +A self join is any join that combines the rows of a table with itself. It may not be immediately apparent how this could be useful, but it actually has many common applications. -Often, tables describe entities that can fulfill multiple roles in relationship to one another. For instance, if you have a table of `people`, each row could potentially contain a `mother` column that reference other `people` in the table. A self join would allow you to stitch these different rows together by joining a second instance of the table to the first where these values match. +Often, tables describe entities that can fulfill multiple roles in relationship to one another. For instance, if you have a table of `people`, each row could potentially contain a `mother` column that reference other `people` in the table. A self join would allow you to stitch these different rows together by joining a second instance of the table to the first where these values match. -Since self joins reference the same table twice, table aliases are required to disambiguate the references. In the example above, for instance, you could join the two instances of the `people` table using the aliases `people AS children` and `people AS mothers`. That way, you can specify which instance of the table you are referring to when defining join conditions. +Since self joins reference the same table twice, table aliases are required to disambiguate the references. In the example above, for instance, you could join the two instances of the `people` table using the aliases `people AS children` and `people AS mothers`. That way, you can specify which instance of the table you are referring to when defining join conditions. Here is another example, this time representing relationships between employees and managers: @@ -168,13 +168,13 @@ JOIN people AS manager ## Join conditions -When combining tables, the join condition determines how rows will be matched together to form the composite results. The basic premise is to define the columns in each table that must match for the join to occur on that row. +When combining tables, the join condition determines how rows will be matched together to form the composite results. The basic premise is to define the columns in each table that must match for the join to occur on that row. ### The `ON` clause -The most standard way of defining the conditions for table joins is with the `ON` clause. The `ON` clause uses an equals sign to specify the exact columns from each table that will be compared to determine when a join may occur. MySQL uses the provided columns to stitch together the rows from each table. +The most standard way of defining the conditions for table joins is with the `ON` clause. The `ON` clause uses an equals sign to specify the exact columns from each table that will be compared to determine when a join may occur. MySQL uses the provided columns to stitch together the rows from each table. -The `ON` clause is the most verbose, but also the most flexible of the available join conditions. It allows for specificity regardless of how standardized the column names are of each table being combined. +The `ON` clause is the most verbose, but also the most flexible of the available join conditions. It allows for specificity regardless of how standardized the column names are of each table being combined. The basic syntax of the `ON` clause looks like this: @@ -189,13 +189,13 @@ ON table1.id = table2.ident; ``` -Here, the rows from `table1` and `table2` will be joined whenever the `id` column from `table1` matches the `ident` column from `table2`. Because an inner join is used, the results will only show the rows that were joined. Since the query uses the wildcard `*` character, all of the columns from both tables will be displayed. +Here, the rows from `table1` and `table2` will be joined whenever the `id` column from `table1` matches the `ident` column from `table2`. Because an inner join is used, the results will only show the rows that were joined. Since the query uses the wildcard `*` character, all of the columns from both tables will be displayed. -This means that both the `id` column from `table1` and the `ident` column from `table2` will be displayed, even though they have the same exact value by virtue of satisfying the join condition. You can avoid this duplication by calling out the exact columns you wish to display in the `SELECT` column list. +This means that both the `id` column from `table1` and the `ident` column from `table2` will be displayed, even though they have the same exact value by virtue of satisfying the join condition. You can avoid this duplication by calling out the exact columns you wish to display in the `SELECT` column list. ### The `USING` clause -The `USING` clause is a shorthand for specifying the conditions of an `ON` clause that can be used when the columns being compared have the same name in both tables. The `USING` clause takes a list, enclosed in parentheses, of the shared column names that should be compared. +The `USING` clause is a shorthand for specifying the conditions of an `ON` clause that can be used when the columns being compared have the same name in both tables. The `USING` clause takes a list, enclosed in parentheses, of the shared column names that should be compared. The general syntax of the `USING` clause uses this format: @@ -225,11 +225,11 @@ ON table1.id = table2.id AND table1.state = table2.state; ``` -While both of the above joins would result in the same rows being constructed with the same data present, they would be displayed slightly different. While the `ON` clause includes all of the columns from both tables, the `USING` clause suppresses the duplicate columns. So instead of there being two separate `id` columns and two separate `state` columns (one for each table), the results would just have one of each of the shared columns, followed by all of the other columns provided by `table1` and `table2`. +While both of the above joins would result in the same rows being constructed with the same data present, they would be displayed slightly different. While the `ON` clause includes all of the columns from both tables, the `USING` clause suppresses the duplicate columns. So instead of there being two separate `id` columns and two separate `state` columns (one for each table), the results would just have one of each of the shared columns, followed by all of the other columns provided by `table1` and `table2`. ### The `NATURAL` clause -The `NATURAL` clause is yet another shorthand that can further reduce the verbosity of the `USING` clause. A `NATURAL` join does not specify *any* columns to be matched. Instead, MySQL will automatically join the tables based on all columns that have matching columns in each database. +The `NATURAL` clause is yet another shorthand that can further reduce the verbosity of the `USING` clause. A `NATURAL` join does not specify _any_ columns to be matched. Instead, MySQL will automatically join the tables based on all columns that have matching columns in each database. The general syntax of the `NATURAL` join clause looks like this: @@ -270,17 +270,17 @@ USING Like the `USING` clause, the `NATURAL` clause suppresses duplicate columns, so there would be only a single instance of each of the joined columns in the results. -While the `NATURAL` clause can reduce the verbosity of your queries, care must be exercised when using it. Because the columns used for joining the tables are automatically calculated, if the columns in the component tables change, the results can be vastly different due to new join conditions. +While the `NATURAL` clause can reduce the verbosity of your queries, care must be exercised when using it. Because the columns used for joining the tables are automatically calculated, if the columns in the component tables change, the results can be vastly different due to new join conditions. ## Join conditions and the `WHERE` clause -Join conditions share many characteristics with the comparisons used to filter rows of data using `WHERE` clauses. Both constructs define expressions that must evaluate to true for the row to be considered. Because of this, it's not always intuitive what the difference is between including additional comparisons in a `WHERE` construct versus defining them within the join clause itself. +Join conditions share many characteristics with the comparisons used to filter rows of data using `WHERE` clauses. Both constructs define expressions that must evaluate to true for the row to be considered. Because of this, it's not always intuitive what the difference is between including additional comparisons in a `WHERE` construct versus defining them within the join clause itself. -In order to understand the differences that will result, we have to take a look at the order in which MySQL processes different portions of a query. In this case, the predicates in the join condition are processed first to construct the virtual joined table in memory. After this stage, the expressions within the `WHERE` clause are evaluated to filter the resulting rows. +In order to understand the differences that will result, we have to take a look at the order in which MySQL processes different portions of a query. In this case, the predicates in the join condition are processed first to construct the virtual joined table in memory. After this stage, the expressions within the `WHERE` clause are evaluated to filter the resulting rows. -As an example, suppose that we have two tables called `customers` and `orders` that we need to join together. We want to join the two tables by matching the `customers.id` column with the `orders.customer_id` column. Additionally, we're interested in the rows in the `orders` table that have a `product_id` of 12345. +As an example, suppose that we have two tables called `customers` and `orders` that we need to join together. We want to join the two tables by matching the `customers.id` column with the `orders.customer_id` column. Additionally, we're interested in the rows in the `orders` table that have a `product_id` of 12345. -Given the above requirements, we have two conditions that we care about. The way we express these conditions, however, will determine the results we receive. +Given the above requirements, we have two conditions that we care about. The way we express these conditions, however, will determine the results we receive. First, let's use both as the join conditions for a `LEFT JOIN`: @@ -316,12 +316,12 @@ The results could potentially look something like this: MySQL arrived at this result by performing the following operations: 1. Combine any rows in the `customers` table with the `orders` table where: - * `customers.id` matches `orders.customers_id`. - * `orders.product_id` matches 12345 -2. Because we are using a left join, include any *unmatched* rows from the left table (`customers`), padding out the columns from the right table (`orders`) with `NULL` values. + - `customers.id` matches `orders.customers_id`. + - `orders.product_id` matches 12345 +2. Because we are using a left join, include any _unmatched_ rows from the left table (`customers`), padding out the columns from the right table (`orders`) with `NULL` values. 3. Display only the columns listed in the `SELECT` column specification. -The outcome is that all of our joined rows match both of the conditions that we are looking for. However, the left join causes MySQL to also include any rows from the first table that did not satisfy the join condition. This results in "left over" rows that don't seem to follow the apparent intent of the query. +The outcome is that all of our joined rows match both of the conditions that we are looking for. However, the left join causes MySQL to also include any rows from the first table that did not satisfy the join condition. This results in "left over" rows that don't seem to follow the apparent intent of the query. If we move the second query (`orders.product_id` = 12345) to a `WHERE` clause, instead of including it as a join condition, we get different results: @@ -354,25 +354,25 @@ This time, only three rows are displayed: 3 rows in set (0.00 sec) ``` -The order in which the comparisons are executed is the reason for these differences. This time, MySQL processes the query like this: +The order in which the comparisons are executed is the reason for these differences. This time, MySQL processes the query like this: 1. Combine any rows in the `customers` table with the `orders` table where `customers.id` matches `orders.customers_id`. -2. Because we are using a left join, include any *unmatched* rows from the left table (`customers`), padding out the columns from the right table (`orders`) with `NULL` values. +2. Because we are using a left join, include any _unmatched_ rows from the left table (`customers`), padding out the columns from the right table (`orders`) with `NULL` values. 3. Evaluate the `WHERE` clause to remove any rows that do not have 12345 as the value for the `orders.product_id` column. 4. Display only the columns listed in the `SELECT` column specification. -This time, even though we are using a left join, the `WHERE` clause truncates the results by filtering out all of the rows without the correct `product_id`. Because any unmatched rows would have `product_id` set to `NULL`, this removes all of the unmatched rows that were populated by the left join. It also removes any of the rows that were matched by the join condition that did not pass this second round of checks. +This time, even though we are using a left join, the `WHERE` clause truncates the results by filtering out all of the rows without the correct `product_id`. Because any unmatched rows would have `product_id` set to `NULL`, this removes all of the unmatched rows that were populated by the left join. It also removes any of the rows that were matched by the join condition that did not pass this second round of checks. Understanding the basic process that MySQL uses to execute your queries can help you avoid some easy-to-make but difficult-to-debug mistakes as you work with your data. ## Conclusion -In this article, we discussed what joins are and how MySQL implements them as a way of combining records from multiple tables. We covered the different types of joins available and the way that different conditions like the `ON` and `WHERE` clauses affect the way that the database constructs results. +In this article, we discussed what joins are and how MySQL implements them as a way of combining records from multiple tables. We covered the different types of joins available and the way that different conditions like the `ON` and `WHERE` clauses affect the way that the database constructs results. -As you get more familiar with joins, you'll be able to use them as a regular part of your toolkit to pull in data from various sources and stitch together pieces of information to create a more full picture. Joins help bring together the data that organization principles and performance considerations may separate. Learning how to effectively use joins can help you bring together data regardless of how it's organized in the system. +As you get more familiar with joins, you'll be able to use them as a regular part of your toolkit to pull in data from various sources and stitch together pieces of information to create a more full picture. Joins help bring together the data that organization principles and performance considerations may separate. Learning how to effectively use joins can help you bring together data regardless of how it's organized in the system. -Prisma allows you to [define relations](https://www.prisma.io/docs/concepts/components/prisma-schema/relations) between models in the Prisma schema file. You can then use [relation queries](https://www.prisma.io/docs/concepts/components/prisma-client/relation-queries) to work with data that spans multiple models. +Prisma allows you to [define relations](https://www.prisma.io/docs/orm/prisma-schema/data-model/relations) between models in the Prisma schema file. You can then use [relation queries](https://www.prisma.io/docs/orm/prisma-client/queries/relation-queries) to work with data that spans multiple models. diff --git a/content/05-mysql/10-reading-and-querying-data/04-identifying-slow-queries.mdx b/content/05-mysql/10-reading-and-querying-data/04-identifying-slow-queries.mdx index 7e362564..84f6aa9c 100644 --- a/content/05-mysql/10-reading-and-querying-data/04-identifying-slow-queries.mdx +++ b/content/05-mysql/10-reading-and-querying-data/04-identifying-slow-queries.mdx @@ -467,6 +467,6 @@ The MySQL ecosystem has a lot of tooling built to make these tasks easier. Looki -If you are using Prisma with your MySQL database, you can read about ways to optimize your queries in the [query optimization section of the docs](https://www.prisma.io/docs/guides/performance-and-optimization/query-optimization-performance). This will help you understand how various query constructions can impact your database performance when using Prisma. +If you are using Prisma with your MySQL database, you can read about ways to optimize your queries in the [query optimization section of the docs](hhttps://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance). This will help you understand how various query constructions can impact your database performance when using Prisma. diff --git a/content/05-mysql/10-reading-and-querying-data/05-optimizing-slow-queries.mdx b/content/05-mysql/10-reading-and-querying-data/05-optimizing-slow-queries.mdx index 1dbea0fd..dcae0652 100644 --- a/content/05-mysql/10-reading-and-querying-data/05-optimizing-slow-queries.mdx +++ b/content/05-mysql/10-reading-and-querying-data/05-optimizing-slow-queries.mdx @@ -447,6 +447,6 @@ However, the database is only able to automatically optimize in a limited sense. -If you are using Prisma with your MySQL database, you can read about ways to optimize your queries in the [query optimization section of the docs](https://www.prisma.io/docs/guides/performance-and-optimization/query-optimization-performance). This will help you understand how various query constructions can impact your database performance when using Prisma. +If you are using Prisma with your MySQL database, you can read about ways to optimize your queries in the [query optimization section of the docs](hhttps://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance). This will help you understand how various query constructions can impact your database performance when using Prisma. diff --git a/content/05-mysql/11-tools/01-mysql-config-editor.mdx b/content/05-mysql/11-tools/01-mysql-config-editor.mdx index e8efccce..e4a84f1a 100644 --- a/content/05-mysql/11-tools/01-mysql-config-editor.mdx +++ b/content/05-mysql/11-tools/01-mysql-config-editor.mdx @@ -1,26 +1,26 @@ --- title: 'Using `mysql_config_editor` to manage MySQL credentials' -metaTitle: "Managing connection information with `mysql_config_editor`" -metaDescription: "Manage connection information like usernames, passwords, and MySQL servers with `mysql_config_editor`." +metaTitle: 'Managing connection information with `mysql_config_editor`' +metaDescription: 'Manage connection information like usernames, passwords, and MySQL servers with `mysql_config_editor`.' metaImage: '/social/generic-mysql.png' authors: ['justinellingwood'] --- ## Introduction -Maintaining credentials for different users and databases across a variety of hosts can be challenging from a usability perspective. If you regularly log into multiple MySQL servers or if you have projects that have separate user accounts with unique privileges for security reasons, you can easily lose track of how to connect to the accounts you need. +Maintaining credentials for different users and databases across a variety of hosts can be challenging from a usability perspective. If you regularly log into multiple MySQL servers or if you have projects that have separate user accounts with unique privileges for security reasons, you can easily lose track of how to connect to the accounts you need. -Fortunately, [MySQL provides a small utility called `mysql_config_editor`](https://dev.mysql.com/doc/refman/8.0/en/mysql-config-editor.html) specifically designed to store and manage MySQL credentials so that you can [authenticate](/intro/database-glossary#authentication) easily with MySQL clients and tools. In this guide, we'll cover how `mysql_config_editor` works, how to manage multiple credentials securely, and how to tell your other MySQL tools to leverage the configuration to authenticate to your servers. +Fortunately, [MySQL provides a small utility called `mysql_config_editor`](https://dev.mysql.com/doc/refman/8.0/en/mysql-config-editor.html) specifically designed to store and manage MySQL credentials so that you can [authenticate](/intro/database-glossary#authentication) easily with MySQL clients and tools. In this guide, we'll cover how `mysql_config_editor` works, how to manage multiple credentials securely, and how to tell your other MySQL tools to leverage the configuration to authenticate to your servers. ## How does `mysql_config_editor` work? -The `mysql_config_editor` utility is a small program included in MySQL installations that is used to manage credentials for connecting to different MySQL servers or different accounts. It encrypts credential information and stores it in a file called `.mylogin.cnf` in your home directory. +The `mysql_config_editor` utility is a small program included in MySQL installations that is used to manage credentials for connecting to different MySQL servers or different accounts. It encrypts credential information and stores it in a file called `.mylogin.cnf` in your home directory. -Each set of credentials describing how to log in to a MySQL account is called a "login path". These usually specify the account's username and password and can additionally store relevant information about how to connect to the appropriate MySQL server like the hostname and port where the MySQL is listening. +Each set of credentials describing how to log in to a MySQL account is called a "login path". These usually specify the account's username and password and can additionally store relevant information about how to connect to the appropriate MySQL server like the hostname and port where the MySQL is listening. -MySQL clients and tools are automatically configured to use the information in the `.mylogin.cnf` file to help login to MySQL servers. You can use the `--login-path=` parameter on MySQL tools like the `mysql` client to specify which login details should be used. If no login path is provided, the tools will use the credentials associated with the default login path, known as `client`, if it is defined. +MySQL clients and tools are automatically configured to use the information in the `.mylogin.cnf` file to help login to MySQL servers. You can use the `--login-path=` parameter on MySQL tools like the `mysql` client to specify which login details should be used. If no login path is provided, the tools will use the credentials associated with the default login path, known as `client`, if it is defined. -If the login paths don't define certain values, the MySQL clients and tools will use their configured default values instead. For instance, if you do not specify a host when creating a login path with `mysql_config_editor`, the `mysql` client will automatically assume `localhost`, just as if you were to omit the `--host=` option when providing credentials manually on the command line. +If the login paths don't define certain values, the MySQL clients and tools will use their configured default values instead. For instance, if you do not specify a host when creating a login path with `mysql_config_editor`, the `mysql` client will automatically assume `localhost`, just as if you were to omit the `--host=` option when providing credentials manually on the command line. ## Defining credentials by creating a new login path @@ -34,16 +34,16 @@ mysql_config_editor set [options] Typically, you'll include some of the following options: -* `--login-path=`: The label you want to use for these credentials -* `--user=`: The account username -* `--password`: A flag to tell `mysql_config_editor` to prompt for a password for the account. The password prompt allows you to securely enter the password so that it isn't recorded to shell history files as it would be if provided it directly on the command line. -* `--host=`: The host name or IP address where the MySQL server is hosted. -* `--port=`: The port number where the MySQL server is listening. -* `--socket=`: The path to the local socket file to connect with if you are connecting to a local server through Unix sockets. +- `--login-path=`: The label you want to use for these credentials +- `--user=`: The account username +- `--password`: A flag to tell `mysql_config_editor` to prompt for a password for the account. The password prompt allows you to securely enter the password so that it isn't recorded to shell history files as it would be if provided it directly on the command line. +- `--host=`: The host name or IP address where the MySQL server is hosted. +- `--port=`: The port number where the MySQL server is listening. +- `--socket=`: The path to the local socket file to connect with if you are connecting to a local server through Unix sockets. You only need to provide the information that differs from the default options for the MySQL utilities. -As you create entries, keep in mind that `mysql_config_editor` provides no way to edit the details associated with a login path once it's created. To change any details, you'll need to respecify all of the appropriate connection information again to overwrite the previous entry. +As you create entries, keep in mind that `mysql_config_editor` provides no way to edit the details associated with a login path once it's created. To change any details, you'll need to respecify all of the appropriate connection information again to overwrite the previous entry. ### Setting connection information for a local account @@ -53,9 +53,9 @@ For example, to create a login for a user named `salesadmin` on the local MySQL mysql_config_editor set --login-path=sales --user=salesadmin --password ``` -You will be prompted for the account password and the new connection information will be saved to the `.mylogin.cnf` file under a label called `sales`. We provide the account name with `--user=salesadmin` and tell `mysql_config_editor` to prompt for the password by including the `--password` flag. +You will be prompted for the account password and the new connection information will be saved to the `.mylogin.cnf` file under a label called `sales`. We provide the account name with `--user=salesadmin` and tell `mysql_config_editor` to prompt for the password by including the `--password` flag. -Since this is a local account, it will connect through a local socket file if running on a Unix-like system. If you've not modified MySQL to run differently, however, the MySQL tools will know what to do and you do not need to provide those details when configuring. +Since this is a local account, it will connect through a local socket file if running on a Unix-like system. If you've not modified MySQL to run differently, however, the MySQL tools will know what to do and you do not need to provide those details when configuring. ### Setting connection information for a remote account @@ -69,13 +69,13 @@ The entry for the `testing` login path will have all of the information required ### Setting the default connection information -MySQL tools are designed to use reasonable defaults when called without explicit connection information. For instance, on Unix-like systems, they will try to connect using the following details if not overwritten: +MySQL tools are designed to use reasonable defaults when called without explicit connection information. For instance, on Unix-like systems, they will try to connect using the following details if not overwritten: -* **User**: Your operating system username -* **Password**: No password -* **Host**: `localhost`, which by default means you'll be connecting over a Unix socket at the default location for your platform. +- **User**: Your operating system username +- **Password**: No password +- **Host**: `localhost`, which by default means you'll be connecting over a Unix socket at the default location for your platform. -If these options aren't appropriate for your use case, you can change the default connection information with `mysql_config_editor`. To do so, provide the connection information you'd like use by default without specifying a login path. +If these options aren't appropriate for your use case, you can change the default connection information with `mysql_config_editor`. To do so, provide the connection information you'd like use by default without specifying a login path. For example: @@ -114,6 +114,7 @@ You can verify the user you're connected as by typing: ``` SELECT user(); ``` + ``` +----------------------+ | user() | @@ -128,12 +129,13 @@ If you want to verify the user and the method you're connecting with, you can us ``` status ``` + ``` -------------- mysql Ver 8.0.27-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu)) - + Connection id: 28 - Current database: + Current database: Current user: sammy@localhost SSL: Not in use Current pager: stdout @@ -149,7 +151,7 @@ status UNIX socket: /var/run/mysqld/mysqld.sock Binary data as: Hexadecimal Uptime: 1 day 21 hours 37 min 49 sec - + Threads: 2 Questions: 66 Slow queries: 0 Opens: 186 Flush tables: 3 Open tables: 105 Queries per second avg: 0.000 -------------- ``` @@ -158,26 +160,28 @@ You can see the current user under `Current user`, but you can also view the det ## Displaying available login paths -While the connection details you define are stored in a file called `.mylogin.cnf` in your home directory, the contents are encrypted for security. To view the configured information, you need to use the `mysql_config_editor` again. +While the connection details you define are stored in a file called `.mylogin.cnf` in your home directory, the contents are encrypted for security. To view the configured information, you need to use the `mysql_config_editor` again. To view the default login information you've configured, which is stored under the `client` login path, you can use the `print` subcommand without any additional options: ``` mysql_config_editor print ``` + ``` [client] user = "root" password = ***** ``` -MySQL uses an [INI style file format](https://en.wikipedia.org/wiki/INI_file) to group connection details under the appropriate login path label. You may also notice that the password is obscured. This, again, is a security measure so as not to leak the saved password. +MySQL uses an [INI style file format](https://en.wikipedia.org/wiki/INI_file) to group connection details under the appropriate login path label. You may also notice that the password is obscured. This, again, is a security measure so as not to leak the saved password. To view a different login path, you can supply the `--login-path=` option as usual: ``` mysql_config_editor print --login-path=testing ``` + ``` [testing] user = "testuser" @@ -191,6 +195,7 @@ To show all of the configured login paths, you can add the `--all` flag instead: ``` mysql_config_editor print --all ``` + ``` [sales] user = "salesadmin" @@ -207,7 +212,7 @@ password = ***** ## Removing connection information -You can remove the connection information associated with a login path with the `remove` subcommand. Providing the `--login-path` will allow `mysql_config_editor` to target the appropriate entry. +You can remove the connection information associated with a login path with the `remove` subcommand. Providing the `--login-path` will allow `mysql_config_editor` to target the appropriate entry. For example, to remove the connection information for the `sales` login path, you can type: @@ -220,6 +225,7 @@ If you check the configured entries, you will find that the `sales` login path h ``` mysql_config_editor print --all ``` + ``` [testing] user = "testuser" @@ -231,7 +237,7 @@ user = "root" password = ***** ``` -You can also remove a specific parameter from the login path's connection information. For instance, if the MySQL server at "dev.example.com" has been reconfigured to now run on the default 3306 port, you can remove the port information. To do so, you'd provide the `--port` flag along with the `--login-path=`: +You can also remove a specific parameter from the login path's connection information. For instance, if the MySQL server at "dev.example.com" has been reconfigured to now run on the default 3306 port, you can remove the port information. To do so, you'd provide the `--port` flag along with the `--login-path=`: ``` mysql_config_editor remove --login-path=testing --port @@ -242,6 +248,7 @@ You can verify that the port specification has been removed from the `testing` l ``` mysql_config_editor print --all ``` + ``` [testing] user = "testuser" @@ -262,12 +269,12 @@ This removes all of the configured login paths. ## Conclusion -In this guide, we took a look at `mysql_config_editor`, one of MySQL's small utilities designed to improve user experience by managing connection information. We covered how to configure connection information using login paths and how to call MySQL tools using our configured credentials. We also discussed how to override defaults and manage existing login path information. +In this guide, we took a look at `mysql_config_editor`, one of MySQL's small utilities designed to improve user experience by managing connection information. We covered how to configure connection information using login paths and how to call MySQL tools using our configured credentials. We also discussed how to override defaults and manage existing login path information. -By taking advantage of `mysql_config_editor` and other tools that the MySQL project provides, you can remove some of the frustration that can arise when managing multiple projects from a single location. It is a good example of a relatively simple tool designed to streamline repetitive, error-prone tasks to help you focus on more important work. +By taking advantage of `mysql_config_editor` and other tools that the MySQL project provides, you can remove some of the frustration that can arise when managing multiple projects from a single location. It is a good example of a relatively simple tool designed to streamline repetitive, error-prone tasks to help you focus on more important work. -If you are using Prisma, you can set connection information for MySQL databases when configuring the [MySQL database connector](https://www.prisma.io/docs/concepts/database-connectors/mysql#connection-details). +If you are using Prisma, you can set connection information for MySQL databases when configuring the [MySQL database connector](https://www.prisma.io/docs/orm/overview/databases/mysql#connection-details). diff --git a/content/05-mysql/12-short-guides/02-validate-configuration.mdx b/content/05-mysql/12-short-guides/02-validate-configuration.mdx index 25f6257f..a76619bc 100644 --- a/content/05-mysql/12-short-guides/02-validate-configuration.mdx +++ b/content/05-mysql/12-short-guides/02-validate-configuration.mdx @@ -1,39 +1,40 @@ --- title: 'How check your MySQL configuration for syntax errors' -metaTitle: "Validating MySQL server configuration" -metaDescription: "Learn how to check your MySQL server configuration file for syntax errors." +metaTitle: 'Validating MySQL server configuration' +metaDescription: 'Learn how to check your MySQL server configuration file for syntax errors.' metaImage: '/social/generic-mysql.png' authors: ['justinellingwood'] --- ## Introduction -When configuring services, it's important to try to validate your changes before applying them to the environments that they will impact. This is especially true for changes made to essential services like MySQL databases. +When configuring services, it's important to try to validate your changes before applying them to the environments that they will impact. This is especially true for changes made to essential services like MySQL databases. -While many changes need to be checked manually for semantic correctness (making sure the configuration means what you intend it to mean), often you can automate checks for syntactic correctness (making sure the changes are in an expected format). This provides a limited, but extremely helpful check that can catch spelling mistakes, incorrect punctuation, and malformed configuration options that would otherwise prevent the server from starting correctly when attempting to incorporate new changes. +While many changes need to be checked manually for semantic correctness (making sure the configuration means what you intend it to mean), often you can automate checks for syntactic correctness (making sure the changes are in an expected format). This provides a limited, but extremely helpful check that can catch spelling mistakes, incorrect punctuation, and malformed configuration options that would otherwise prevent the server from starting correctly when attempting to incorporate new changes. -In this short guide, we'll cover how to validate MySQL server configuration files using the built-in options. This can help you validate that the edits you perform won't interfere with MySQL when you restart the service. +In this short guide, we'll cover how to validate MySQL server configuration files using the built-in options. This can help you validate that the edits you perform won't interfere with MySQL when you restart the service. ## Checking MySQL server configuration files -The `mysqld` binary used to run the MySQL server includes many options that you might not use on a regular basis. One of these is the `--validate-config` flag. +The `mysqld` binary used to run the MySQL server includes many options that you might not use on a regular basis. One of these is the `--validate-config` flag. -The `--validate-config` flag causes the `mysqld` binary to parse its configuration file and then exit. You can run the validation check by typing: +The `--validate-config` flag causes the `mysqld` binary to parse its configuration file and then exit. You can run the validation check by typing: ```shell mysqld --validate-config ``` -If no problems are found, the program will exit successfully with no output instead of attempting to start the MySQL server. You can verify that the process exited successfully by checking its exit code: +If no problems are found, the program will exit successfully with no output instead of attempting to start the MySQL server. You can verify that the process exited successfully by checking its exit code: ```shell echo $? ``` + ``` 0 ``` -If any errors are found, MySQL will abort the process as it would in an actual startup scenario and output information about the file and line where the problem occurred. You can add syntax errors to your MySQL configuration file to trigger this process see the output. For instance, you can +If any errors are found, MySQL will abort the process as it would in an actual startup scenario and output information about the file and line where the problem occurred. You can add syntax errors to your MySQL configuration file to trigger this process see the output. For instance, you can ```shell echo "hello there" | sudo tee --append /etc/mysql/my.cnf @@ -51,11 +52,12 @@ The exit code also verifies that an error occurred: ```shell echo $? ``` + ``` 1 ``` -You can use `sed` to remove the line we added to the MySQL configuration file. Validate the configuration file again to ensure that the removal was successful: +You can use `sed` to remove the line we added to the MySQL configuration file. Validate the configuration file again to ensure that the removal was successful: ```shell sudo sed --in-place '/hello there/d' /etc/mysql/my.cnf @@ -66,11 +68,10 @@ The server configuration file is now verified to by syntactically correct again. ## Conclusion -In this guide, we covered how to use the `mysqld` binary with the `--validate-config` flag to validate the MySQL server's configuration files for errors. While this cannot be used to detect all errors, it goes a long way towards preventing typos, incorrect configuration grammar, and invalid options. It is a good idea to always use the `--validate-config` flag before restarting your server to ensure that your new configuration is valid before stopping your database service. - +In this guide, we covered how to use the `mysqld` binary with the `--validate-config` flag to validate the MySQL server's configuration files for errors. While this cannot be used to detect all errors, it goes a long way towards preventing typos, incorrect configuration grammar, and invalid options. It is a good idea to always use the `--validate-config` flag before restarting your server to ensure that your new configuration is valid before stopping your database service. -If you are using Prisma, you can connect with your MySQL server and manage your data using the [MySQL database connector](https://www.prisma.io/docs/concepts/database-connectors/mysql#connection-details). +If you are using Prisma, you can connect with your MySQL server and manage your data using the [MySQL database connector](https://www.prisma.io/docs/orm/overview/databases/mysql#connection-details). diff --git a/content/05-mysql/12-short-guides/03-exporting-schemas.mdx b/content/05-mysql/12-short-guides/03-exporting-schemas.mdx index 0818ae7a..2f6a035e 100644 --- a/content/05-mysql/12-short-guides/03-exporting-schemas.mdx +++ b/content/05-mysql/12-short-guides/03-exporting-schemas.mdx @@ -1,20 +1,20 @@ --- title: 'How to export database and table schemas in MySQL' -metaTitle: "Export schemas for MySQL databases and tables" -metaDescription: "Learn how to export your database objects from MySQL for analysis, migrating, backups, and more." +metaTitle: 'Export schemas for MySQL databases and tables' +metaDescription: 'Learn how to export your database objects from MySQL for analysis, migrating, backups, and more.' metaImage: '/social/generic-mysql.png' authors: ['justinellingwood'] --- ## Introduction -In relational databases, the [database schema](/intro/database-glossary#schema) defines the structure of the database and its component parts like tables, fields, and indexes. Extracting and exporting this information is useful in many scenarios, including backups, migrating to new environments, visualizing data structures, and managing these structures within a codebase. +In relational databases, the [database schema](/intro/database-glossary#schema) defines the structure of the database and its component parts like tables, fields, and indexes. Extracting and exporting this information is useful in many scenarios, including backups, migrating to new environments, visualizing data structures, and managing these structures within a codebase. -In this short guide, we'll discuss how to export MySQL database schemas using the `mysqldump` command. While this utility can export many types of data from MySQL, we'll focus on extracting the data structures themselves in this guide. +In this short guide, we'll discuss how to export MySQL database schemas using the `mysqldump` command. While this utility can export many types of data from MySQL, we'll focus on extracting the data structures themselves in this guide. -If you are using Prisma, you can connect with your MySQL server and manage your data using the [MySQL database connector](https://www.prisma.io/docs/concepts/database-connectors/mysql#connection-details). +If you are using Prisma, you can connect with your MySQL server and manage your data using the [MySQL database connector](https://www.prisma.io/docs/orm/overview/databases/mysql#connection-details). @@ -30,16 +30,16 @@ The options here can be divided into two separate categories. The first category defines the generic basic connection information that you need to provide in order to connect with any MySQL utility: -* `--user=` / `-u`: The database username that you want to authenticate with -* `--password` / `-p`: Force `mysqldump` to prompt for a password to authenticate -* `--host=` / `-h`: The hostname or IP address of where MySQL is located -* `--port=` / `-p`: The port number where MySQL is listening +- `--user=` / `-u`: The database username that you want to authenticate with +- `--password` / `-p`: Force `mysqldump` to prompt for a password to authenticate +- `--host=` / `-h`: The hostname or IP address of where MySQL is located +- `--port=` / `-p`: The port number where MySQL is listening If you are connecting to a local MySQL instance running in the default configuration, you can typically omit the host and port options. The second category tells `mysqldump` what to export: -* `--no-data` / `-d`: This tells the utility to only export the structure itself, not the records they contain +- `--no-data` / `-d`: This tells the utility to only export the structure itself, not the records they contain Additionally, the first non-option argument (represented here by the word "DATABASE") indicates the exact database to export. @@ -51,14 +51,14 @@ mysqldump --user=sales_reporter --password --no-data SALES > sales_database_sche ## Modifying the export behavior -The basic usage discussed above will output every structure related to the database in question. We can modify this behavior with a number of additional options. +The basic usage discussed above will output every structure related to the database in question. We can modify this behavior with a number of additional options. ### Targeting more than one database You can modify how many databases the export will target with one of the following options: -* `--databases` / `-B`: Treat all name arguments as database names. This allows you to export the schema from multiple databases at the same time. -* `--all-databases` / `-A`: Export all databases within MySQL (with the exception of the `performance_schema` database that is used internally) +- `--databases` / `-B`: Treat all name arguments as database names. This allows you to export the schema from multiple databases at the same time. +- `--all-databases` / `-A`: Export all databases within MySQL (with the exception of the `performance_schema` database that is used internally) So to dump all databases, you could use: @@ -82,14 +82,14 @@ For example, if three of the tables in your `SALES` database are called `EMPLOYE mysqldump --user=USERNAME --password --no-data SALES EMPLOYEE STORE INVENTORY > some_sales_tables.sql ``` -In this construction, the first argument is always assumed to the be database name and all additional named arguments are taken to be tables within that database. Because of this, this usage is incompatible with the `--databases` option which modifies how `mysqldump` interprets additional arguments. +In this construction, the first argument is always assumed to the be database name and all additional named arguments are taken to be tables within that database. Because of this, this usage is incompatible with the `--databases` option which modifies how `mysqldump` interprets additional arguments. ### Exporting additional structures In addition to databases and tables, you can also explicitly export event and routine definitions by including these options: -* `--routines` / `-R`: Include stored procedures and functions within the exported schema dump -* `--events` / `-E`: Include the definition of Event Scheduler events in the output +- `--routines` / `-R`: Include stored procedures and functions within the exported schema dump +- `--events` / `-E`: Include the definition of Event Scheduler events in the output For example, to include a dump of the database `SALES` that includes these extra definitions, you could type: @@ -101,17 +101,17 @@ mysqldump --user=USERNAME --password --no-data --routines --events SALES > all_s Some additional options that can be useful depending on your goals include: -* `--add-drop-database`: Add a `DROP DATABASE` statement to the dump file prior to each `CREATE DATABASE` statement. This ensures that any previously defined structure for a given database is removed first to avoid conflicts. -* `--single-transaction`: Sets the transaction isolation level to "[repeatable read](/intro/database-glossary#repeatable-read-isolation-level)" to help ensure a more consistent database state with storage engines like InnoDB. This dumps a snapshot of the database at the point-in-time when the dump is initialized. +- `--add-drop-database`: Add a `DROP DATABASE` statement to the dump file prior to each `CREATE DATABASE` statement. This ensures that any previously defined structure for a given database is removed first to avoid conflicts. +- `--single-transaction`: Sets the transaction isolation level to "[repeatable read](/intro/database-glossary#repeatable-read-isolation-level)" to help ensure a more consistent database state with storage engines like InnoDB. This dumps a snapshot of the database at the point-in-time when the dump is initialized. These options can be added to your schema dump commands without altering the basic semantics or meaning of the other components. ## Conclusion -Being able to export your schemas allows you to save your database structures outside of the database itself. This is helpful when setting up new environments, evolving your schema as your needs change, and visualizing the structure of the information you are storing. +Being able to export your schemas allows you to save your database structures outside of the database itself. This is helpful when setting up new environments, evolving your schema as your needs change, and visualizing the structure of the information you are storing. -If you are using Prisma, you can connect with your MySQL server and manage your data using the [MySQL database connector](https://www.prisma.io/docs/concepts/database-connectors/mysql#connection-details). +If you are using Prisma, you can connect with your MySQL server and manage your data using the [MySQL database connector](https://www.prisma.io/docs/orm/overview/databases/mysql#connection-details). diff --git a/content/06-sqlite/03-creating-and-deleting-databases-and-tables.mdx b/content/06-sqlite/03-creating-and-deleting-databases-and-tables.mdx index 6b60d4b4..63f466f1 100644 --- a/content/06-sqlite/03-creating-and-deleting-databases-and-tables.mdx +++ b/content/06-sqlite/03-creating-and-deleting-databases-and-tables.mdx @@ -192,6 +192,6 @@ Within the statements mentioned, like `CREATE TABLE` and `DROP TABLE`, many addi -When using [Prisma](https://github.com/prisma/prisma), you can use the [Prisma Migrate](https://www.prisma.io/docs/concepts/components/prisma-migrate) to create your databases and tables. [Developing with Prisma Migrate](https://www.prisma.io/docs/guides/migrate/developing-with-prisma-migrate) generates migration files based on the declarative [Prisma schema](https://www.prisma.io/docs/concepts/components/prisma-schema) and applies them to your database. +When using [Prisma](https://github.com/prisma/prisma), you can use the [Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate) to create your databases and tables. [Developing with Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate/workflows) generates migration files based on the declarative [Prisma schema](https://www.prisma.io/docs/orm/prisma-schema/overview) and applies them to your database. diff --git a/content/06-sqlite/04-inserting-and-deleting-data.mdx b/content/06-sqlite/04-inserting-and-deleting-data.mdx index 7b6f8c02..c215210b 100644 --- a/content/06-sqlite/04-inserting-and-deleting-data.mdx +++ b/content/06-sqlite/04-inserting-and-deleting-data.mdx @@ -36,11 +36,11 @@ To get a more easily read result, you can use the `.fullschema --indent` command ```sql CREATE TABLE student ( - id INTEGER PRIMARY KEY, - first_name TEXT, - last_name TEXT, - age INTEGER, - student_email TEXT NOT NULL, + id INTEGER PRIMARY KEY, + first_name TEXT, + last_name TEXT, + age INTEGER, + student_email TEXT NOT NULL, class TEXT ); ``` @@ -83,7 +83,7 @@ SELECT * FROM student; -You can also use the Prisma Client to add data to your tables by issuing a [create query](https://www.prisma.io/docs/concepts/components/prisma-client/crud#create). +You can also use the Prisma Client to add data to your tables by issuing a [create query](https://www.prisma.io/docs/orm/prisma-client/queries/crud#create). @@ -99,7 +99,7 @@ VALUES ('value', 'value2'), ('value3', 'value4'), ('value5', 'value6'); - + ``` For the `student` table we've been referencing, you can add three new students in a single statement by typing: @@ -110,7 +110,7 @@ VALUES ('Abigail', 'Spencer', 'abispence@university.com'), ('Tamal', 'Wayne', 'tamalwayne@university.com'), ('Felipe', 'Espinosa', 'felesp@university.com'); - + ``` ## Using `DELETE` to remove rows from tables @@ -133,7 +133,7 @@ WHERE last_name = 'Wayne'; -To remove data from your tables using Prisma Client, use a [delete query](https://www.prisma.io/docs/concepts/components/prisma-client/crud#delete). +To remove data from your tables using Prisma Client, use a [delete query](https://www.prisma.io/docs/orm/prisma-client/queries/crud#delete). @@ -156,7 +156,7 @@ DELETE FROM student; -Prisma Client uses a separate query called [deleteMany](https://www.prisma.io/docs/concepts/components/prisma-client/crud#deletemany) to delete multiple rows of data at one time. +Prisma Client uses a separate query called [deleteMany](https://www.prisma.io/docs/orm/prisma-client/queries/crud#deletemany) to delete multiple rows of data at one time. @@ -165,4 +165,3 @@ Prisma Client uses a separate query called [deleteMany](https://www.prisma.io/do In this article, we covered the basics of how to insert and remove data from SQLite tables. We first discussed how to find your table's structure to ensure the construction of valid data insertion queries. We then covered how to insert and delete data both one at a time and in batches. The `INSERT` and `DELETE` commands are some of the most useful commands for managing what data is maintained inside of your tables. A comprehension of their basic syntax and operation will allow you to add or remove records from your database structures quickly and effectively. - diff --git a/content/06-sqlite/05-basic-select.mdx b/content/06-sqlite/05-basic-select.mdx index 751c473d..e6f4c464 100644 --- a/content/06-sqlite/05-basic-select.mdx +++ b/content/06-sqlite/05-basic-select.mdx @@ -1,14 +1,14 @@ --- title: 'How to perform basic queries with `SELECT` with SQLite' metaTitle: "Basic queries with SELECT | SQLite | Prisma's Data Guide" -metaDescription: "Learn how to perform basic queries with `SELECT` using SQLite." +metaDescription: 'Learn how to perform basic queries with `SELECT` using SQLite.' metaImage: '/social/generic-sqlite.png' authors: ['alexemerich'] --- ## Introduction -The `SELECT` [SQL](/intro/database-glossary#sql) command is the most fitting command for querying and returning information from inside your tables in SQLite. This command achieves what its name implies by *selecting* the matching records based on the criteria specified in the command. This command is not only useful for [reading](/intro/database-glossary#read-operation) data, but also for targeting updates and other actions within your database. +The `SELECT` [SQL](/intro/database-glossary#sql) command is the most fitting command for querying and returning information from inside your tables in SQLite. This command achieves what its name implies by _selecting_ the matching records based on the criteria specified in the command. This command is not only useful for [reading](/intro/database-glossary#read-operation) data, but also for targeting updates and other actions within your database. In this article, we will cover the basics of the `SELECT` command and demonstrate how to use it to return data. `SELECT` is capable of handling many more advanced use cases, but we will stick to the simpler forms in our demonstration to highlight the basic command structure. @@ -22,13 +22,13 @@ SELECT FROM ; This statement is made up of several components: -* `SELECT`: The `SELECT` command itself. This SQL command indicates that we want to query tables or views for data they contain. The arguments and clauses surrounding it determine both the contents and the format of the output returned. +- `SELECT`: The `SELECT` command itself. This SQL command indicates that we want to query tables or views for data they contain. The arguments and clauses surrounding it determine both the contents and the format of the output returned. -* ``: The `SELECT` statement can return entire rows (if specified with the `*` wildcard character) or a subset of the available columns. If you want to output only specific columns, provide the column names you'd like to display, separated by commas. +- ``: The `SELECT` statement can return entire rows (if specified with the `*` wildcard character) or a subset of the available columns. If you want to output only specific columns, provide the column names you'd like to display, separated by commas. -* `FROM `: The `FROM` keyword is used to indicate the table or view that should be queried. In most simple queries, this consists of a single table that contains the data you're interested in. +- `FROM `: The `FROM` keyword is used to indicate the table or view that should be queried. In most simple queries, this consists of a single table that contains the data you're interested in. -* ``: A large number of filters, output modifiers, and conditions can be specified as additions to the `SELECT` command. You can use these to help pinpoint data with specific properties, modify the output formatting, or further process the results. +- ``: A large number of filters, output modifiers, and conditions can be specified as additions to the `SELECT` command. You can use these to help pinpoint data with specific properties, modify the output formatting, or further process the results. ## Specifying columns to display with `SELECT` @@ -42,12 +42,8 @@ SELECT * FROM my_table; This will display all of the records from `my_table` since there is no specified column name in the statement. All of the columns for each record will be shown in the order they are defined within `my_table`. - - **Note**: The asterisk wildcard option is going to be best for testing, ad hoc querying, and data exploration. It is not a useful method for real application development where a more controlled, explicit statement syntax is stronger and more reliable. - - You can also choose to view a subset of available columns by specifying their names. Column names are separated by commas and will be displayed in the order you specify: ```sql @@ -58,7 +54,7 @@ This will display all of the records from `my_table`, but only show `column2` an -When using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), you can control the columns that are returned with the [select fields](https://www.prisma.io/docs/concepts/components/prisma-client/select-fields) functionality. +When using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), you can control the columns that are returned with the [select fields](https://www.prisma.io/docs/orm/prisma-client/queries/select-fields) functionality. @@ -71,6 +67,7 @@ First you use the `.header` command which is an on|off switch for the display of ```sql .header on ``` + Second you use the `.mode` command to set the output mode to column. This makes it so the headers are in alignment with the corresponding column values: @@ -80,6 +77,7 @@ Second you use the `.mode` command to set the output mode to column. This makes ```sql .mode column ``` + @@ -92,10 +90,11 @@ Second you use the `.mode` command to set the output mode to column. This makes | rachael | smith | 789 other st | 5559876543 | +------------+-----------+--------------+--------------+ ``` + -Now when running a query, the output display will include column names above the results. This now allows you to optionally set *column aliases* to modify the name used for columns in the output: +Now when running a query, the output display will include column names above the results. This now allows you to optionally set _column aliases_ to modify the name used for columns in the output: ```sql SELECT column1 AS 'first column' FROM my_table; @@ -123,7 +122,7 @@ To give an example, suppose there is a `student` table that contains columns for SELECT * FROM student ORDER BY last_name; ``` -The result will display the student last names from A to Z according to the values in `last_name`. +The result will display the student last names from A to Z according to the values in `last_name`. ``` +-------------+------------+-----------+--------------------------+ @@ -139,8 +138,6 @@ The result will display the student last names from A to Z according to the valu +-------------+------------+-----------+--------------------------+ ``` - - To reverse the resulting order, we can add the `DESC` modifier to the end of the `ORDER BY` clause: ```sql @@ -163,8 +160,6 @@ The result will be the reverse of the previous query showing results Z to A acco +-------------+------------+-----------+--------------------------+ ``` - - It is also possible to sort by multiple columns. This can be useful especially in a case where people share a surname for instance. The query would look like this: ```sql @@ -246,4 +241,3 @@ This displays all of the unique combinations of `color` and `shoe_size` within y ## Conclusion This article introduces the basics of the `SELECT` command for returning data from SQLite tables. There are many more optional clauses that modify the behavior of the command, allowing you to control the results to the specifications you want. In later articles, we dive into these modifiers to develop even more the usefulness of `SELECT`. - diff --git a/content/06-sqlite/06-update-data.mdx b/content/06-sqlite/06-update-data.mdx index 4d592b6b..7c4e6910 100644 --- a/content/06-sqlite/06-update-data.mdx +++ b/content/06-sqlite/06-update-data.mdx @@ -1,7 +1,7 @@ --- title: 'How to update existing data with SQLite' metaTitle: "Updating existing data | SQLite | Prisma's Data Guide" -metaDescription: "Learn how to update existing data using SQLite." +metaDescription: 'Learn how to update existing data using SQLite.' metaImage: '/social/generic-sqlite.png' authors: ['alexemerich'] --- @@ -27,9 +27,9 @@ WHERE The basic structure involves three seperate clauses: -* specifying a table to act on -* providing the columns you wish to update as well as their new values -* defining any criteria SQLite needs to evaluate to determine which records to match +- specifying a table to act on +- providing the columns you wish to update as well as their new values +- defining any criteria SQLite needs to evaluate to determine which records to match While you can assign values directly to columns like we did above, you can also use the column list syntax too, as is often seen in `INSERT` commands. @@ -45,7 +45,7 @@ WHERE -To update data with Prisma Client, issue an [update query](https://www.prisma.io/docs/concepts/components/prisma-client/crud#update). +To update data with Prisma Client, issue an [update query](https://www.prisma.io/docs/orm/prisma-client/queries/crud#update). @@ -95,14 +95,13 @@ Here, we are directly updating the value of `column1` in `table1` to be the retu As an example, let's suppose that we have two tables called `book` and `author`. -
Expand to see the commands to create and populate these tables ```sql CREATE TABLE author ( - id INTEGER PRIMARY KEY, - first_name TEXT, - last_name TEXT, + id INTEGER PRIMARY KEY, + first_name TEXT, + last_name TEXT, last_publication TEXT ); @@ -118,7 +117,7 @@ VALUES ('Leo', 'Tolstoy'), ('James', 'Joyce'), ('Jean-Paul', 'Sarte'); - + INSERT INTO book (author_id, title, publication_year) VALUES (1, 'Anna Karenina', '1877'), @@ -127,18 +126,18 @@ VALUES (2, 'Dubliners', '1914'), (3, 'Nausea', '1938'); ``` +
These two tables have a relation with `book.author_id` referencing `author.id`. Currently the `last_publication` for the `author` table is `NULL`. We can populate it with the author's latest published book in our `book` table using `FROM` and `WHERE` clauses to bring the two tables together. Here, we show an example updating `last_publication`: - ```sql -UPDATE author +UPDATE author SET last_publication=( - SELECT title - FROM book + SELECT title + FROM book WHERE author_id = author.id ORDER BY author_id, publication_year DESC); ``` @@ -221,7 +220,7 @@ RETURNING *;
How do you insert or update in SQLite depending on whether data exists? -In SQLite, there is not an `IF EXISTS` clause like many other relational databases. +In SQLite, there is not an `IF EXISTS` clause like many other relational databases. To control an `INSERT` or `UPDATE` for data existing, you will want to add an [`ON CONFLICT` clause to your statement](https://sqlite.org/lang_conflict.html). diff --git a/content/06-sqlite/07-exporting-schemas.mdx b/content/06-sqlite/07-exporting-schemas.mdx index 518434a0..bf07c5f3 100644 --- a/content/06-sqlite/07-exporting-schemas.mdx +++ b/content/06-sqlite/07-exporting-schemas.mdx @@ -1,6 +1,6 @@ --- title: 'How to export database and table schemas in SQLite' -metaTitle: "Export schemas for SQLite databases and tables" +metaTitle: 'Export schemas for SQLite databases and tables' metaDescription: 'Learn how to export your database objects from SQLite for analysis, migrating, backups, and more.' metaImage: '/social/generic-sqlite.png' authors: ['justinellingwood'] @@ -8,13 +8,13 @@ authors: ['justinellingwood'] ## Introduction -In relational databases, the [database schema](/intro/database-glossary#schema) defines the structure of the database and its component parts like tables, fields, and indexes. Extracting and exporting this information is useful in many scenarios, including backups, migrating to new environments, visualizing data structures, and managing these structures within a codebase. +In relational databases, the [database schema](/intro/database-glossary#schema) defines the structure of the database and its component parts like tables, fields, and indexes. Extracting and exporting this information is useful in many scenarios, including backups, migrating to new environments, visualizing data structures, and managing these structures within a codebase. -In this short guide, we'll discuss how to export SQLite database schemas using the `sqlite3` command. The same command that you can use to manage your databases can be used to export database data and structures. We'll focus on extracting the data structures themselves in this guide. +In this short guide, we'll discuss how to export SQLite database schemas using the `sqlite3` command. The same command that you can use to manage your databases can be used to export database data and structures. We'll focus on extracting the data structures themselves in this guide. -You can use [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. To learn how to use Prisma with SQLite, check out [Prisma's SQLite database connector page](https://www.prisma.io/docs/concepts/database-connectors/sqlite). +You can use [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. To learn how to use Prisma with SQLite, check out [Prisma's SQLite database connector page](https://www.prisma.io/docs/orm/overview/databases/sqlite). @@ -26,11 +26,11 @@ The basic command needed to export the database schema from SQLite looks like th sqlite3 DATABASE_FILE.sqlite '.schema' > schema.sql ``` -Here, the `DATABASE_FILE.sqlite` is the SQLite database file that contains your data and structures. The `'.schema'` component is the command that tells SQLite to export the database schema without any accompanying data. The `schema.sql` file is the target file that will received the exported database structures. +Here, the `DATABASE_FILE.sqlite` is the SQLite database file that contains your data and structures. The `'.schema'` component is the command that tells SQLite to export the database schema without any accompanying data. The `schema.sql` file is the target file that will received the exported database structures. ### Executing interactively -The above command can be executed from the command line. You can perform this same procedure interactively within the `sqlite3` shell. +The above command can be executed from the command line. You can perform this same procedure interactively within the `sqlite3` shell. First, open the SQLite database file with the `sqlite3` command: @@ -80,7 +80,7 @@ You can also export specific tables by including the table name after `.schema`: sqlite3 DATABASE_FILE.sqlite '.schema TABLE' > table_schema.sql ``` -An alternative to this approach is to use a wildcard match instead of the specific name. The `schema` command uses [LIKE pattern matching](https://sqlite.org/lang_expr.html#like) for this, which means that the percent character (`%`) is used to match zero or more characters and the underscore (`_`) can stand in for exactly one character. +An alternative to this approach is to use a wildcard match instead of the specific name. The `schema` command uses [LIKE pattern matching](https://sqlite.org/lang_expr.html#like) for this, which means that the percent character (`%`) is used to match zero or more characters and the underscore (`_`) can stand in for exactly one character. For example, to export all of the tables that start with `inventory`, you could type: @@ -92,7 +92,7 @@ This syntax does not allow for specifying multiple patterns or multiple specific ## Include database statistics in the schema dump -You can use the `.fullschema` command in place of the `.schema` command to also include all of the statistics tables that SQLite uses internally to decide on query plans, etc. This can be useful information when trying to debug why a query executed in a specific way: +You can use the `.fullschema` command in place of the `.schema` command to also include all of the statistics tables that SQLite uses internally to decide on query plans, etc. This can be useful information when trying to debug why a query executed in a specific way: ```command sqlite3 DATABASE_FILE.sqlite '.fullschema' > schema_with_statistics.sql @@ -102,10 +102,10 @@ Note that the `.fullschema` command does not allow you to filter by table name. ## Conclusion -Being able to export your schemas allows you to save your database structures outside of the database itself. This is helpful when setting up new environments, evolving your schema as your needs change, and visualizing the structure of the information you are storing. +Being able to export your schemas allows you to save your database structures outside of the database itself. This is helpful when setting up new environments, evolving your schema as your needs change, and visualizing the structure of the information you are storing. -You can use [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. To learn how to use Prisma with SQLite, check out [Prisma's SQLite database connector page](https://www.prisma.io/docs/concepts/database-connectors/sqlite). +You can use [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage PostgreSQL databases from within your JavaScript or TypeScript applications. To learn how to use Prisma with SQLite, check out [Prisma's SQLite database connector page](https://www.prisma.io/docs/orm/overview/databases/sqlite). diff --git a/content/07-mssql/01-setting-up-a-local-sql-server-database.mdx b/content/07-mssql/01-setting-up-a-local-sql-server-database.mdx index 66d3644a..3dd07af1 100644 --- a/content/07-mssql/01-setting-up-a-local-sql-server-database.mdx +++ b/content/07-mssql/01-setting-up-a-local-sql-server-database.mdx @@ -78,7 +78,7 @@ EXIT -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/concepts/database-connectors/sql-server) to connect, map your models, and manage your data. +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/orm/overview/databases/sql-server) to connect, map your models, and manage your data. You can also check out our guides to see how to use Prisma with Microsoft SQL Server on a [new project](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-sqlserver) or in an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-sqlserver). @@ -173,7 +173,7 @@ To persist the data in your SQL Server container, you can use [one of the techni -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/concepts/database-connectors/sql-server) to connect, map your models, and manage your data. +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/orm/overview/databases/sql-server) to connect, map your models, and manage your data. You can also check out our guides to see how to use Prisma with Microsoft SQL Server on a [new project](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-sqlserver) or in an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-sqlserver). @@ -313,7 +313,7 @@ EXIT -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/concepts/database-connectors/sql-server) to connect, map your models, and manage your data. +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/orm/overview/databases/sql-server) to connect, map your models, and manage your data. You can also check out our guides to see how to use Prisma with Microsoft SQL Server on a [new project](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-sqlserver) or in an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-sqlserver). @@ -464,7 +464,7 @@ EXIT -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/concepts/database-connectors/sql-server) to connect, map your models, and manage your data. +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/orm/overview/databases/sql-server) to connect, map your models, and manage your data. You can also check out our guides to see how to use Prisma with Microsoft SQL Server on a [new project](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-sqlserver) or in an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-sqlserver). @@ -557,7 +557,7 @@ To persist the data in your SQL Server container, you can use [one of the techni -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/concepts/database-connectors/sql-server) to connect, map your models, and manage your data. +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) with SQL Server, you can use the [SQL Server connector](https://www.prisma.io/docs/orm/overview/databases/sql-server) to connect, map your models, and manage your data. You can also check out our guides to see how to use Prisma with Microsoft SQL Server on a [new project](https://www.prisma.io/docs/getting-started/setup-prisma/start-from-scratch/relational-databases-typescript-sqlserver) or in an [existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/relational-databases-typescript-sqlserver). diff --git a/content/08-mongodb/01-what-is-mongodb.mdx b/content/08-mongodb/01-what-is-mongodb.mdx index af96d43b..e3e678fc 100644 --- a/content/08-mongodb/01-what-is-mongodb.mdx +++ b/content/08-mongodb/01-what-is-mongodb.mdx @@ -9,13 +9,13 @@ authors: ['alexemerich'] ## What is MongoDB? -MongoDB is an [open-source](https://github.com/mongodb/mongo) document-oriented [NoSQL](/intro/database-glossary#nosql) database system. Data is stored using JavaScript Object Notation(JSON)-like structures that can be specified at the time of data storage. +MongoDB is an [open-source](https://github.com/mongodb/mongo) document-oriented [NoSQL](/intro/database-glossary#nosql) database system. Data is stored using JavaScript Object Notation(JSON)-like structures that can be specified at the time of data storage. Each document can have its own structure with as much or as little complexity as needed. MongoDB provides non-SQL methods and commands to manage and query data programmatically or interactively. MongoDB is known for its fast performance, scalability, and for enabling a rapid development pace. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -23,21 +23,23 @@ To get started working with MongoDB and Prisma, checkout our [getting started fr ## Origin story -MongoDB was founded in 2007 by Dwight Merriman, Eliot Horowitz, and Kevin Ryan. They previously worked at an internet advertising company that was serving 400,000 ads per second. +MongoDB was founded in 2007 by Dwight Merriman, Eliot Horowitz, and Kevin Ryan. They previously worked at an internet advertising company that was serving 400,000 ads per second. -The team was developing many custom data stores to accommodate this traffic. They regularly ran into struggles around scalability and agility, which led to the inspiration to create a database that solved for these issues. And voila, MongoDB. +The team was developing many custom data stores to accommodate this traffic. They regularly ran into struggles around scalability and agility, which led to the inspiration to create a database that solved for these issues. And voila, MongoDB. ## How it works -To achieve better scalability and agility, MongoDB works with the JSON variant, Binary JSON, or [BSON](https://www.prisma.io/dataguide/mongodb/mongodb-datatypes). BSON accommodates more data types than JSON, giving more flexibility to the types of data that can be stored in MongoDB. + +To achieve better scalability and agility, MongoDB works with the JSON variant, Binary JSON, or [BSON](https://www.prisma.io/dataguide/mongodb/mongodb-datatypes). BSON accommodates more data types than JSON, giving more flexibility to the types of data that can be stored in MongoDB. ### MongoDB data structure + Rather than tables made up of [rows](/intro/database-glossary#row) and [columns](/intro/database-glossary#column), MongoDB has [collections](/intro/database-glossary#collection) made up of [documents](/intro/database-glossary#document). Documents are comprised of [field](/intro/database-glossary#field) and [value](/intro/database-glossary#value) pairs. In comparison to [relational databases](/intro/database-glossary#relational-database), the pieces that make up a MongoDB database can loosely be thought of as follows: -| Relational | | Document | -|----------- |---|--------- | -|Table | = |Collection| -|Row | = | Document | -|Column | =| Field| +| Relational | | Document | +| ---------- | --- | ---------- | +| Table | = | Collection | +| Row | = | Document | +| Column | = | Field | The basic structure looks as follows: @@ -53,7 +55,7 @@ The basic structure looks as follows: Documents look very similar to a JSON object. Documents can hold any of the available BSON data types as well as other documents, arrays, and arrays of documents. This JSON-style format directly maps to native objects in most modern programming languages, making it a natural choice for developers. - An example would look like: +An example would look like: ``` var mydoc = { @@ -67,16 +69,17 @@ var mydoc = { ``` - `_id` holds an [*ObjectId*]. -- `name` holds an *embedded document* that contains the fields `first` and `last`. +- `name` holds an _embedded document_ that contains the fields `first` and `last`. - `birth` and `death` hold values of the [*Date*] type. - `books` holds an array of [*strings*]. - `sales` holds a value of the [*NumberLong*] type. -Depending on the data model, rather than a single document housing all of the information on James Joyce and his work, a relational database could potentially have an `author` table and a `book` table that is joined with something like `author_id`. +Depending on the data model, rather than a single document housing all of the information on James Joyce and his work, a relational database could potentially have an `author` table and a `book` table that is joined with something like `author_id`. -In this particular simple use case, the document model simplifies data access and keeps data that is accessed together, stored together. +In this particular simple use case, the document model simplifies data access and keeps data that is accessed together, stored together. + +### MongoDB document vs SQL table record -### MongoDB document vs SQL table record Expanding on the previous example, let's look at how a document in MongoDB looks compared to the same record in a relational database. In a relational database, a table storing records might look something like this: ``` @@ -86,6 +89,7 @@ Expanding on the previous example, let's look at how a document in MongoDB looks | 2 | Sarah | Green | 84 | (NULL) | 555-8088 | | 3 | Sam | White | 22 | sammi@123.org | 555-1234 | ``` + Whereas, as we saw with our author document previously, a collection of similar documents in MongoDB might be modeled in JSON like this: ``` @@ -117,10 +121,12 @@ Whereas, as we saw with our author document previously, a collection of similar ] } ``` + Deciding on a document database versus a relational database comes down to the type of data that the database is storing and how it should be accessed. MongoDB's documents promote flexibility over rigidity and ease the way data is accessed from an application. ### MongoDB query language -Since MongoDB databases are document-oriented, the data does not typically follow a pre-defined schema like a relational database. This makes accessing data different then when querying a relational database with structured query language (SQL). MongoDB has its own query language simply called MongoDB Query Language (MQL). + +Since MongoDB databases are document-oriented, the data does not typically follow a pre-defined schema like a relational database. This makes accessing data different then when querying a relational database with structured query language (SQL). MongoDB has its own query language simply called MongoDB Query Language (MQL). Similar to SQL, MQL also allows a user to access data under certain specifications. You can control for returning documents that match criteria relevant to the need. We won’t go into the finer details of querying in MongoDB in this guide, but you can read more about it in our [querying in MongoDB guide](https://www.prisma.io/dataguide/mongodb/querying-documents). @@ -137,14 +143,15 @@ db..find().pretty() ``` ## Conclusion -MongoDB has become the most popular NoSQL database on the market. Developers enjoy working with MongoDB because of its ability to keep them working flexibly and with agility. -MongoDB provides an alternative to the rigidity of relational databases and is better suited for applications with evolving schemas. Checkout out some of our other MongoDB content for in depth guides on [getting started with MongoDB, indexes, transactions, and more](https://www.prisma.io/dataguide/mongodb). +MongoDB has become the most popular NoSQL database on the market. Developers enjoy working with MongoDB because of its ability to keep them working flexibly and with agility. + +MongoDB provides an alternative to the rigidity of relational databases and is better suited for applications with evolving schemas. Checkout out some of our other MongoDB content for in depth guides on [getting started with MongoDB, indexes, transactions, and more](https://www.prisma.io/dataguide/mongodb). -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). - \ No newline at end of file + diff --git a/content/08-mongodb/02-setting-up-a-local-mongodb-database.mdx b/content/08-mongodb/02-setting-up-a-local-mongodb-database.mdx index 75d0020a..72e7d2b7 100644 --- a/content/08-mongodb/02-setting-up-a-local-mongodb-database.mdx +++ b/content/08-mongodb/02-setting-up-a-local-mongodb-database.mdx @@ -23,7 +23,7 @@ Navigate to the sections that match the platforms you will be working with. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -424,7 +424,7 @@ sudo systemctl stop mongod.service -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). diff --git a/content/08-mongodb/03-connecting-to-mongodb.mdx b/content/08-mongodb/03-connecting-to-mongodb.mdx index 9945c131..4149cbdb 100644 --- a/content/08-mongodb/03-connecting-to-mongodb.mdx +++ b/content/08-mongodb/03-connecting-to-mongodb.mdx @@ -8,15 +8,15 @@ authors: ['justinellingwood'] ## Introduction -Once you have a [MongoDB](/intro/database-glossary#mongodb) server available, one of the first and most common actions you'll need to take is to connect to the actual database. This requires coordination to make sure that the database is configured in a way that allows your client to connect and authenticate. +Once you have a [MongoDB](/intro/database-glossary#mongodb) server available, one of the first and most common actions you'll need to take is to connect to the actual database. This requires coordination to make sure that the database is configured in a way that allows your client to connect and authenticate. -This means that you'll need to understand how to connect to your MongoDB database by providing the server location, connection parameters, and the correct credentials. In this guide, we'll focus on how to connect to the database from the client side using the [`mongo` MongoDB shell client](https://docs.mongodb.com/manual/mongo/), designed mainly for interactive sessions with your databases. +This means that you'll need to understand how to connect to your MongoDB database by providing the server location, connection parameters, and the correct credentials. In this guide, we'll focus on how to connect to the database from the client side using the [`mongo` MongoDB shell client](https://docs.mongodb.com/manual/mongo/), designed mainly for interactive sessions with your databases. -In a companion guide, you can find out how to configure MongoDB's authentication settings to match your requirements. Consider reading both pieces for a complete picture of how authentication is implemented from the perspective of both parties. +In a companion guide, you can find out how to configure MongoDB's authentication settings to match your requirements. Consider reading both pieces for a complete picture of how authentication is implemented from the perspective of both parties. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -24,12 +24,12 @@ To get started working with MongoDB and Prisma, checkout our [getting started fr ## Basic information about the `mongo` client -The `mongo` client is a command line JavaScript client for connecting to, controlling, and interacting with MongoDB database servers. In many ways, it is the simplest way to connect to and start using your MongoDB database because it is included in the MongoDB installation and available on all popular platforms. The `mongo` client is especially useful for performing initial configuration and for interactive sessions where you want to explore your data or iterate on queries based on preliminary results. +The `mongo` client is a command line JavaScript client for connecting to, controlling, and interacting with MongoDB database servers. In many ways, it is the simplest way to connect to and start using your MongoDB database because it is included in the MongoDB installation and available on all popular platforms. The `mongo` client is especially useful for performing initial configuration and for interactive sessions where you want to explore your data or iterate on queries based on preliminary results. -The way that you connect with the `mongo` shell depends on the configuration of the MongoDB server and the options available for you to authenticate to an account. In the following sections, we'll go over some of the basic connection options. For clarity's sake, we'll differentiate between local and remote connections: +The way that you connect with the `mongo` shell depends on the configuration of the MongoDB server and the options available for you to authenticate to an account. In the following sections, we'll go over some of the basic connection options. For clarity's sake, we'll differentiate between local and remote connections: -* **local connection**: a connection where the client and the MongoDB instance are located on the same server -* **remote connection**: where the client is connecting to a network-accessible MongoDB instance running on a different computer +- **local connection**: a connection where the client and the MongoDB instance are located on the same server +- **remote connection**: where the client is connecting to a network-accessible MongoDB instance running on a different computer Let's start with connecting to a database from the same computer. @@ -37,7 +37,7 @@ Let's start with connecting to a database from the same computer. Without any arguments, the `mongo` command attempts to connect to a local MongoDB instance. -To do this, it attempts to connect to port 27017 on the local loopback address: `127.0.0.1:27017`. This is one of the interfaces that MongoDB servers bind to in their default configuration (MongoDB may also be accessible through a local socket file). +To do this, it attempts to connect to port 27017 on the local loopback address: `127.0.0.1:27017`. This is one of the interfaces that MongoDB servers bind to in their default configuration (MongoDB may also be accessible through a local socket file). You can connect to a local MongoDB server running with its default configuration by typing: @@ -60,28 +60,28 @@ On a successful connection, you will likely see a fairly long set of messages fo --- Enable MongoDB's free cloud-based monitoring service, which will then receive and display metrics about your deployment (disk utilization, CPU, operation statistics, etc). - + The monitoring data will be available on a MongoDB website with a unique URL accessible to you and anyone you share the URL with. MongoDB may use this information to make product improvements and to suggest MongoDB products and deployment options to you. - + To enable free monitoring, run the following command: db.enableFreeMonitoring() To permanently disable this reminder, run the following command: db.disableFreeMonitoring() --- > ``` -The output shows logs generated by the `mongo` command while establishing the connection, followed by some warnings generated by the MongoDB server on startup. Finally, there is a notice about a MongoDB monitoring service that you may choose to take advantage of or disable. +The output shows logs generated by the `mongo` command while establishing the connection, followed by some warnings generated by the MongoDB server on startup. Finally, there is a notice about a MongoDB monitoring service that you may choose to take advantage of or disable. -One of the warnings from the MongoDB server indicates that access control is not enabled currently. This is the reason we were able to connect without providing credentials or other authentication details. +One of the warnings from the MongoDB server indicates that access control is not enabled currently. This is the reason we were able to connect without providing credentials or other authentication details. -If you are connecting to a local MongoDB server that *has* been configured with access control, you will need to provide additional information to connect. You will need to provide at least a username and password to connect using the associated `--username` and `--password` options: +If you are connecting to a local MongoDB server that _has_ been configured with access control, you will need to provide additional information to connect. You will need to provide at least a username and password to connect using the associated `--username` and `--password` options: ```bash mongo --username --password ``` -Placing the `--password` option at the end and not providing the password inline indicates that you want MongoDB to prompt for a password instead. This is more secure than providing a password in the command itself as that may be visible or recoverable through shell history, process lists, and other mechanisms. +Placing the `--password` option at the end and not providing the password inline indicates that you want MongoDB to prompt for a password instead. This is more secure than providing a password in the command itself as that may be visible or recoverable through shell history, process lists, and other mechanisms. The MongoDB server will prompt you for the user's password before connecting to the database: @@ -108,13 +108,13 @@ First, connect to the MongoDB database without providing credentials: mongo ``` -You will be given a command prompt like usual, but if access control is enabled, you won't have permission perform many actions until you authenticate. For instance, the `show dbs` command will likely be empty since you don't have access to query the available databases: +You will be given a command prompt like usual, but if access control is enabled, you won't have permission perform many actions until you authenticate. For instance, the `show dbs` command will likely be empty since you don't have access to query the available databases: ``` show dbs ``` -To authenticate, first, select the database that your user is defined in. Most often, that will be the `admin` database: +To authenticate, first, select the database that your user is defined in. Most often, that will be the `admin` database: ``` use admin @@ -144,6 +144,7 @@ You will now have the regular access of the user you authenticated as: ``` show dbs ``` + ``` admin 0.000GB config 0.000GB @@ -155,6 +156,7 @@ You can view the list authenticated users and roles associated with the current ``` db.runCommand("connectionStatus") ``` + ``` { "authInfo" : { @@ -189,7 +191,7 @@ If you are looking to get started working with MongoDB and Prisma, checkout our If you want to connect to a remote MongoDB database, you'll have to provide some additional details when using the `mongo` shell. -Specifically, you'll need to include the `--host` option and potentially the `--port` option as well if the MongoDB server is listening on a non-default port. In almost all cases, you'll also need to provide the `--user` and `--password` options to authenticate to the remote server too. +Specifically, you'll need to include the `--host` option and potentially the `--port` option as well if the MongoDB server is listening on a non-default port. In almost all cases, you'll also need to provide the `--user` and `--password` options to authenticate to the remote server too. The basic structure of the command when connecting to a remote MongoDB database therefore looks something like this: @@ -197,7 +199,7 @@ The basic structure of the command when connecting to a remote MongoDB database mongo --host --port --user --password ``` -As mentioned in the section on connecting to a local database, placing the `--password` option at the end and not providing the password inline indicates that you want the `mongo` shell to prompt for a password instead. This is more secure than providing a password in the command itself as that may be visible or recoverable through shell history, process lists, and other mechanisms. +As mentioned in the section on connecting to a local database, placing the `--password` option at the end and not providing the password inline indicates that you want the `mongo` shell to prompt for a password instead. This is more secure than providing a password in the command itself as that may be visible or recoverable through shell history, process lists, and other mechanisms. The MongoDB server will prompt you for the user's password before connecting to the database: @@ -216,20 +218,19 @@ mongo "mongodb://:@:" Since we've indicated that the user has a password with the `:` syntax, but haven't provided one, the `mongo` shell will prompt for the password. - ## Adjusting a MongoDB server's authentication configuration -If you want to modify the rules that dictate how users can authenticate to your MongoDB instances, you can do so by modifying your server's configuration. You can find out [how to modify MongoDB's authentication configuration in this article](/mongodb/configuring-mongodb-user-accounts-and-authentication). +If you want to modify the rules that dictate how users can authenticate to your MongoDB instances, you can do so by modifying your server's configuration. You can find out [how to modify MongoDB's authentication configuration in this article](/mongodb/configuring-mongodb-user-accounts-and-authentication). ## Conclusion -In this guide, we covered MongoDB authentication from the client side. We demonstrated how to use the `mongo` shell to connect to both local and remote database instances using a variety of methods. +In this guide, we covered MongoDB authentication from the client side. We demonstrated how to use the `mongo` shell to connect to both local and remote database instances using a variety of methods. -Knowing how to connect to various MongoDB instances is vital as you start to work the database system. You may run a local MongoDB instance for development that doesn't need any special authentication, but your databases in staging and production will almost certainly require authentication. Being able to authenticate in either case will allow you to work well in different environments. +Knowing how to connect to various MongoDB instances is vital as you start to work the database system. You may run a local MongoDB instance for development that doesn't need any special authentication, but your databases in staging and production will almost certainly require authentication. Being able to authenticate in either case will allow you to work well in different environments. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). diff --git a/content/08-mongodb/04-mongodb-atlas-setup.mdx b/content/08-mongodb/04-mongodb-atlas-setup.mdx index 69aed993..ee7ede1f 100644 --- a/content/08-mongodb/04-mongodb-atlas-setup.mdx +++ b/content/08-mongodb/04-mongodb-atlas-setup.mdx @@ -16,7 +16,7 @@ In this guide, we are going to walk through the steps to provision a MongoDB Atl -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -89,7 +89,7 @@ Lastly, there are two sections displaying spec information on IOPS, max connecti -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. to get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -141,7 +141,7 @@ In this guide, we walked through all of the MongoDB Atlas setup sections. We dis -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). diff --git a/content/08-mongodb/05-configuring-mongodb-user-accounts-and-authentication.mdx b/content/08-mongodb/05-configuring-mongodb-user-accounts-and-authentication.mdx index 0b5052ed..5aee7986 100644 --- a/content/08-mongodb/05-configuring-mongodb-user-accounts-and-authentication.mdx +++ b/content/08-mongodb/05-configuring-mongodb-user-accounts-and-authentication.mdx @@ -1,22 +1,22 @@ --- title: 'How to manage users and authentication in MongoDB' -metaTitle: "MongoDB Users and Authentication - Create, List, and Delete" -metaDescription: "Find out how to configure user accounts and set up authentication credentials on your MongoDB server, including how to create, delete, and show list of users." +metaTitle: 'MongoDB Users and Authentication - Create, List, and Delete' +metaDescription: 'Find out how to configure user accounts and set up authentication credentials on your MongoDB server, including how to create, delete, and show list of users.' metaImage: '/social/generic-mongodb.png' authors: ['justinellingwood'] --- ## Introduction -Managing users and [authentication](/intro/database-glossary#authentication) are some of the most important administration tasks of managing MongoDB servers. You must ensure that the server is configured to be able to properly identify your users and applications and deny connections or operations that are unable to authenticate correctly. +Managing users and [authentication](/intro/database-glossary#authentication) are some of the most important administration tasks of managing MongoDB servers. You must ensure that the server is configured to be able to properly identify your users and applications and deny connections or operations that are unable to authenticate correctly. -To manage these requirements, you must be able to decide which the users your server requires and create those accounts. As part of this process, you can set the authentication details to allow external access using the new identity. +To manage these requirements, you must be able to decide which the users your server requires and create those accounts. As part of this process, you can set the authentication details to allow external access using the new identity. -In this guide, we will walk through how to create, view, and remove user accounts. We will go over how to set up authentication for your accounts and how to update the credentials when you need to change your user passwords. +In this guide, we will walk through how to create, view, and remove user accounts. We will go over how to set up authentication for your accounts and how to update the credentials when you need to change your user passwords. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -30,54 +30,54 @@ To follow along with this guide, you'll need an account on a [MongoDB](/intro/da To create, modify, and delete users within MongoDB and configure authentication, the core methods you need are: -* `db.createUser`: create a new MongoDB user account -* `db.updateUser`: update the details of a user account -* `db.changeUserPassword`: change the password used by a user account -* `db.dropUser`: delete a MongoDB user account +- `db.createUser`: create a new MongoDB user account +- `db.updateUser`: update the details of a user account +- `db.changeUserPassword`: change the password used by a user account +- `db.dropUser`: delete a MongoDB user account Additionally, the following database command is useful for finding information about users on the system: -* `db.runCommand('usersInfo')`: show information about one or more MongoDB user accounts +- `db.runCommand('usersInfo')`: show information about one or more MongoDB user accounts ### Required privileges -To execute the commands above, you need to login to MongoDB with an account with a number of different [privilege actions](https://docs.mongodb.com/manual/reference/privilege-actions/). The specific privileges you require depend on the commands you need to use. +To execute the commands above, you need to login to MongoDB with an account with a number of different [privilege actions](https://docs.mongodb.com/manual/reference/privilege-actions/). The specific privileges you require depend on the commands you need to use. To get info about other users, your current user must have the following privilege action enabled: -* [`viewUser` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-viewUser) +- [`viewUser` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-viewUser) To create new users, your current user must have the following privilege actions enabled: -* [`createUser` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-createUser) -* [`grantRole` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-grantRole) +- [`createUser` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-createUser) +- [`grantRole` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-grantRole) To change a user's password or account details, you might need the following privileges: -* [`changeOwnPassword` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-changeOwnPassword) to change your own account password -* [`changeOwnCustomData` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-changeOwnCustomData) to change your own account's custom data -* [`changePassword` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-changePassword) to change other users' passwords -* [`changeCustomData` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-changeCustomData) to change other users' custom data +- [`changeOwnPassword` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-changeOwnPassword) to change your own account password +- [`changeOwnCustomData` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-changeOwnCustomData) to change your own account's custom data +- [`changePassword` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-changePassword) to change other users' passwords +- [`changeCustomData` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-changeCustomData) to change other users' custom data We won't be covering role management in this guide, so the `grantRole` and `revokeRole` privilege actions are not be required. To delete a user account, your current user must have the following privilege action enabled: -* [`dropUser` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-dropUser) +- [`dropUser` privilege action](https://docs.mongodb.com/manual/reference/privilege-actions/#mongodb-authaction-dropUser) ## Understanding how MongoDB implements users and authentication Before we start creating and managing accounts, it's helpful to take some time to get familiar with how MongoDB defines and stores this information. -In MongoDB, user accounts are a combination of the account username along with a specific authentication database. The authentication database is simply the database where the user is defined and does not imply a limitation on scope or rights. Authentication databases are regular databases used to manage other data and are not special, dedicated databases. +In MongoDB, user accounts are a combination of the account username along with a specific authentication database. The authentication database is simply the database where the user is defined and does not imply a limitation on scope or rights. Authentication databases are regular databases used to manage other data and are not special, dedicated databases. -A user account name must be unique in its authentication database. However, the same username may be reused with a different authentication database to create a new, distinct user account. +A user account name must be unique in its authentication database. However, the same username may be reused with a different authentication database to create a new, distinct user account. -As a result of this design, an account can only be accurately identified by including the username and authentication database. To authenticate to an account, one also needs to be able to provide the credentials associated with an account. This is usually a password, but can also be a certificate. +As a result of this design, an account can only be accurately identified by including the username and authentication database. To authenticate to an account, one also needs to be able to provide the credentials associated with an account. This is usually a password, but can also be a certificate. ## How do you create users? -Now that we've taken a look at how MongoDB conceptualizes user accounts, we can discuss how to create new users. Remember to log in to your MongoDB server with a user that has the appropriate privileges to follow along. +Now that we've taken a look at how MongoDB conceptualizes user accounts, we can discuss how to create new users. Remember to log in to your MongoDB server with a user that has the appropriate privileges to follow along. To create a new user, you must first switch to the database you want to use as the new user's authentication database. @@ -86,6 +86,7 @@ First, you can get a list of the databases that are already configured on your s ``` show dbs ``` + ``` admin 0.000GB config 0.000GB @@ -97,11 +98,12 @@ Switch to the database the user will be associated with using the `use` command: ``` use admin ``` + ``` switched to db admin ``` -To create a new user, you can use either the `db.createUser()` method or you can use the `createUser` database command. Either way, you will need to pass the username (the `user` field), password (the `pwd` field), and an array of roles that the user should be added to (the `roles` key) within a `user` object. +To create a new user, you can use either the `db.createUser()` method or you can use the `createUser` database command. Either way, you will need to pass the username (the `user` field), password (the `pwd` field), and an array of roles that the user should be added to (the `roles` key) within a `user` object. To create a new user called `tom` with a password set to `hellothere` with an empty roles array using the `db.createUser()` method, you can type: @@ -112,6 +114,7 @@ db.createUser({ roles: [] }) ``` + ``` Successfully added user: { "user" : "tom", "roles" : [ ] } ``` @@ -125,13 +128,14 @@ db.runCommand({ roles: [] }) ``` + ``` Successfully added user: { "user" : "tom", "roles" : [ ] } ``` -The two different options are very similar, so we'll only be showing the database methods where applicable moving forward. However, if you prefer the database command syntax, you can find each of the associated commands in the [MongoDB command reference documentation](https://docs.mongodb.com/manual/reference/command/). +The two different options are very similar, so we'll only be showing the database methods where applicable moving forward. However, if you prefer the database command syntax, you can find each of the associated commands in the [MongoDB command reference documentation](https://docs.mongodb.com/manual/reference/command/). -In the above commands, we explicitly defined the password inline within the `user` object. To prevent the password from being logged and retrievable, you can alternatively use the `passwordPrompt()` method within the `user` document to have MongoDB interactively prompt you for a password when the command is run. The password will not be visible, so your command history will be clean: +In the above commands, we explicitly defined the password inline within the `user` object. To prevent the password from being logged and retrievable, you can alternatively use the `passwordPrompt()` method within the `user` document to have MongoDB interactively prompt you for a password when the command is run. The password will not be visible, so your command history will be clean: ``` db.createUser({ @@ -140,6 +144,7 @@ db.createUser({ roles: [] }) ``` + ``` Enter password: Successfully added user: { "user" : "tom", "roles" : [ ] } @@ -151,7 +156,7 @@ Keep in mind that the password will still be sent to the server in plain text if Next, let's take a look at how to find information about the existing users. -To return multiple users, you can use the `db.getUsers()` method on to show all of the users within the current database. First, switch to the database you're interested in querying: +To return multiple users, you can use the `db.getUsers()` method on to show all of the users within the current database. First, switch to the database you're interested in querying: ``` use admin @@ -162,6 +167,7 @@ Next, use the `db.getUsers()` method to return all of the users associated with ``` db.getUsers() ``` + ``` [ { @@ -202,6 +208,7 @@ db.getUsers({ showCredentials: true }) ``` + ``` [ { @@ -257,6 +264,7 @@ db.getUsers({ } }) ``` + ``` [ { @@ -278,12 +286,13 @@ db.getUsers({ ] ``` -To get a specific user, you can use the `db.getUser()` method instead. This works like the `db.getUsers()` method, but returns a single user. Instead of passing an object to the method, you pass a string containing the username you wish to retrieve: +To get a specific user, you can use the `db.getUser()` method instead. This works like the `db.getUsers()` method, but returns a single user. Instead of passing an object to the method, you pass a string containing the username you wish to retrieve: ``` use admin db.getUser("tom") ``` + ``` { "_id" : "admin.tom", @@ -300,9 +309,9 @@ db.getUser("tom") You can optionally include an extra `args` object that allows you to specify additional information you'd like by setting the following keys to `true`: -* `showCredentials`: shows credential information in addition to the regular output -* `showPrivileges`: shows privilege information in addition to the regular output -* `showAuthenticationRestrictions`: shows authentication restrictions on the account in addition to the regular output +- `showCredentials`: shows credential information in addition to the regular output +- `showPrivileges`: shows privilege information in addition to the regular output +- `showAuthenticationRestrictions`: shows authentication restrictions on the account in addition to the regular output For example, you can tell MongoDB to supply you will all of the above information by typing: @@ -315,6 +324,7 @@ db.getUser("tom", showAuthenticationRestrictions: true }) ``` + ``` { "_id" : "admin.tom", @@ -349,7 +359,7 @@ db.getUser("tom", ## How do you change the password for a MongoDB user? -To change a user's password, you can use the `db.changeUserPassword()` method. Again, you must switch to the user's authentication database before executing the command. +To change a user's password, you can use the `db.changeUserPassword()` method. Again, you must switch to the user's authentication database before executing the command. The `db.changeUserPassword()` method takes two arguments: the username of the account you wish to change and the new password for the account. @@ -360,30 +370,31 @@ use admin db.changeUserPassword("tom", "secretpassword") ``` -Just as with the `db.createUser()` method, you can use the `passwordPrompt()` method for the second argument instead of providing a password inline. MongoDB will prompt you to enter a password when the command is executed: +Just as with the `db.createUser()` method, you can use the `passwordPrompt()` method for the second argument instead of providing a password inline. MongoDB will prompt you to enter a password when the command is executed: ``` use admin db.changeUserPassword("tom", passwordPrompt()) ``` + ``` Enter password: ``` ## How do you change other user account details? -To change other information associated with a user account, you can use the `db.updateUser()` method. Make sure you switch to the user's authentication database before updating their details. +To change other information associated with a user account, you can use the `db.updateUser()` method. Make sure you switch to the user's authentication database before updating their details. -The `db.updateUser()` method requires you to specify the username and then provide an object containing the data you wish to update. Any field you choose to update will be completely replaced with the new information, so be sure to include the original data as well as the new data in your object if you only hope to append new information. +The `db.updateUser()` method requires you to specify the username and then provide an object containing the data you wish to update. Any field you choose to update will be completely replaced with the new information, so be sure to include the original data as well as the new data in your object if you only hope to append new information. -The object that you include in the command with the change information can contain many different fields. Let's go over them: +The object that you include in the command with the change information can contain many different fields. Let's go over them: -* `customData`: Any arbitrary data to be associated with the user account. -* `roles`: The roles that the user is granted. It is often better to use the [`db.grantRolesToUser()`](https://docs.mongodb.com/manual/reference/method/db.grantRolesToUser/) and [`db.revokeRolesFromUser()`](https://docs.mongodb.com/manual/reference/method/db.revokeRolesFromUser/) methods to control role membership rather than updating with this key as you can append and remove roles individually. -* `pwd`: The user's password. Using the `db.ChangeUserPassword()` method is usually easier if that is the only field that needs to be updated. -* `authenticationRestrictions`: Specifies restrictions for the account that can limit the IP addresses users can connect from or to. The value of this key is an object or array that defines `clientSource` and or `serverAddress`, which contain arrays specifying the valid IP addresses or ranges. Find out more in the MongoDB docs on [authentication restrictions](https://docs.mongodb.com/manual/reference/method/db.createUser/#authentication-restrictions). -* `mechanisms`: The specific authentication mechanisms to be used for credentials. Can be set to either one or both of `SCRAM-SHA-1` or `SCRAM-SHA-256`, but can only be changed to a subset of the current mechanisms if a new password isn't currently being supplied. -* `passwordDigestor`: Specifies which component processes the user's password. Can be either `server` (the default) or `client`. +- `customData`: Any arbitrary data to be associated with the user account. +- `roles`: The roles that the user is granted. It is often better to use the [`db.grantRolesToUser()`](https://docs.mongodb.com/manual/reference/method/db.grantRolesToUser/) and [`db.revokeRolesFromUser()`](https://docs.mongodb.com/manual/reference/method/db.revokeRolesFromUser/) methods to control role membership rather than updating with this key as you can append and remove roles individually. +- `pwd`: The user's password. Using the `db.ChangeUserPassword()` method is usually easier if that is the only field that needs to be updated. +- `authenticationRestrictions`: Specifies restrictions for the account that can limit the IP addresses users can connect from or to. The value of this key is an object or array that defines `clientSource` and or `serverAddress`, which contain arrays specifying the valid IP addresses or ranges. Find out more in the MongoDB docs on [authentication restrictions](https://docs.mongodb.com/manual/reference/method/db.createUser/#authentication-restrictions). +- `mechanisms`: The specific authentication mechanisms to be used for credentials. Can be set to either one or both of `SCRAM-SHA-1` or `SCRAM-SHA-256`, but can only be changed to a subset of the current mechanisms if a new password isn't currently being supplied. +- `passwordDigestor`: Specifies which component processes the user's password. Can be either `server` (the default) or `client`. As an example, we can update the `tom` account that authenticates against the `admin` database to only be able to login from the same computer that hosts the server itself by changing the `authenticationRestrictions` field: @@ -405,6 +416,7 @@ db.getUser("tom", { showAuthenticationRestrictions: true }) ``` + ``` { "_id" : "admin.tom", @@ -445,7 +457,7 @@ db.changeUser("tom", { ## How do you delete MongoDB users? -To remove MongoDB user accounts, you can use the `db.dropUser()` method. Be sure to connect to the user's authentication database before removing them. +To remove MongoDB user accounts, you can use the `db.dropUser()` method. Be sure to connect to the user's authentication database before removing them. To execute the `db.dropUser()` method, you need to supply the name of the user you wish to remove: @@ -463,11 +475,11 @@ If the account did not exist in the current database, it will instead return `fa ## Conclusion -MongoDB's user management and authentication configuration lets you control who can connect to your servers and what their user properties are. In a following article, we'll cover how to restrict the level of access that users have by tackling the authorization portion of user management. +MongoDB's user management and authentication configuration lets you control who can connect to your servers and what their user properties are. In a following article, we'll cover how to restrict the level of access that users have by tackling the authorization portion of user management. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -477,7 +489,7 @@ To get started working with MongoDB and Prisma, checkout our [getting started fr
How do you list existing users in MongoDB? -To list existing users in MongoDB, you can use the `db.getUsers()` method to show all of the users within the current database. +To list existing users in MongoDB, you can use the `db.getUsers()` method to show all of the users within the current database. The syntax would look like: @@ -485,15 +497,16 @@ The syntax would look like: use admin db.getUsers() ``` + For more details on [`db.getUsers()`](https://www.prisma.io/dataguide/mongodb/configuring-mongodb-user-accounts-and-authentication#how-do-you-show-existing-users).
How do you create a database admin user in MongoDB? -In order to create a [database admin user](https://www.prisma.io/dataguide/mongodb/authorization-and-privileges#what-roles-are-available-in-mongodb-by-default) in MongoDB, you will want to use the `db.createUser()` method in the `admin` database. +In order to create a [database admin user](https://www.prisma.io/dataguide/mongodb/authorization-and-privileges#what-roles-are-available-in-mongodb-by-default) in MongoDB, you will want to use the `db.createUser()` method in the `admin` database. -The following demonstrates the syntax to use for creating database admins. +The following demonstrates the syntax to use for creating database admins. ``` use admin @@ -543,4 +556,5 @@ db.getUser("tom", showPrivileges: true, }) ``` +
diff --git a/content/08-mongodb/06-authorization-and-privileges.mdx b/content/08-mongodb/06-authorization-and-privileges.mdx index f857b696..94339fdb 100644 --- a/content/08-mongodb/06-authorization-and-privileges.mdx +++ b/content/08-mongodb/06-authorization-and-privileges.mdx @@ -1,6 +1,6 @@ --- title: 'How to manage authorization and privileges in MongoDB' -metaTitle: "MongoDB Role Management, Authorization, and Privileges" +metaTitle: 'MongoDB Role Management, Authorization, and Privileges' metaDescription: "Learn how to use MongoDB's authorization and privilege systems to manage access to your data, including how to assign roles to your user list." metaImage: '/social/generic-mongodb.png' authors: ['justinellingwood'] @@ -8,13 +8,13 @@ authors: ['justinellingwood'] ## Introduction -[Authorization](/intro/database-glossary#authorization) is an essential part of user management and access control that defines the policies for what each user is allowed to do on the system. Deciding on what policies are appropriate and implementing them in your databases ensures that users can interact with the resources they require while protecting against inappropriate behavior. +[Authorization](/intro/database-glossary#authorization) is an essential part of user management and access control that defines the policies for what each user is allowed to do on the system. Deciding on what policies are appropriate and implementing them in your databases ensures that users can interact with the resources they require while protecting against inappropriate behavior. -In this guide, we'll go over how authorization works in MongoDB. We'll take a look at how MongoDB conceptualizes access management, what types of privileges can be granted to users, and how to attach policies to user accounts. +In this guide, we'll go over how authorization works in MongoDB. We'll take a look at how MongoDB conceptualizes access management, what types of privileges can be granted to users, and how to attach policies to user accounts. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -26,514 +26,514 @@ To follow along with this guide, you'll need an account on a MongoDB server with To adjust the configuration of MongoDB and enable authorization on the database, you need `root` level access on the server. -Additionally, within MongoDB, you'll need an account that has at least the `userAdmin` role so that role-based authorization policies can be set. Roles that include the `userAdmin` role, listed from most narrowly focused to the broadest level of privileges are: +Additionally, within MongoDB, you'll need an account that has at least the `userAdmin` role so that role-based authorization policies can be set. Roles that include the `userAdmin` role, listed from most narrowly focused to the broadest level of privileges are: -* `userAdmin` -* `dbOwner` -* `userAdminAnyDatabase` -* `root` +- `userAdmin` +- `dbOwner` +- `userAdminAnyDatabase` +- `root` ## How does authorization work in MongoDB? -Authorization and privilege management in MongoDB is implemented using [Role-Based Access Control (RBAC)](/intro/database-glossary#role-based-access-control). In this system, different levels of access are associated with individual roles. To give a user permission to perform an action, you grant them membership to a role that has the appropriate privileges. +Authorization and privilege management in MongoDB is implemented using [Role-Based Access Control (RBAC)](/intro/database-glossary#role-based-access-control). In this system, different levels of access are associated with individual roles. To give a user permission to perform an action, you grant them membership to a role that has the appropriate privileges. -Roles in MongoDB can be nested. This means that roles can be granted to other roles. A role that contains another role inherits all of the child role's privileges. This makes it possible to create new roles that have the desired privileges by combining roles appropriately. +Roles in MongoDB can be nested. This means that roles can be granted to other roles. A role that contains another role inherits all of the child role's privileges. This makes it possible to create new roles that have the desired privileges by combining roles appropriately. -Privileges themselves are defined by a combination of an action and a resource. The action component describes the type of behavior that is permitted by the privilege, while the resource indicates the target or scope of the action. +Privileges themselves are defined by a combination of an action and a resource. The action component describes the type of behavior that is permitted by the privilege, while the resource indicates the target or scope of the action. ## What resources are available in MongoDB? -The target or scope of an action is known as a *resource* within MongoDB's access control model. Each action can only be applied to certain types of resources. When configuring privileges, you specify the exact resources for which the privilege should be scoped. +The target or scope of an action is known as a _resource_ within MongoDB's access control model. Each action can only be applied to certain types of resources. When configuring privileges, you specify the exact resources for which the privilege should be scoped. We can go over the available resources in order from the narrowest focus to the broadest. -Privileges can be most narrowly defined by scoping them to a specific [**collection**](/intro/database-glossary#collections) within a specific database within the cluster. Within the same databases, different collections can specify different privileges. This allows you to implement granular policies for different types of data. +Privileges can be most narrowly defined by scoping them to a specific [**collection**](/intro/database-glossary#collections) within a specific database within the cluster. Within the same databases, different collections can specify different privileges. This allows you to implement granular policies for different types of data. -The next largest resource you can enact policies against is a [**database**](/intro/database-glossary#database). Managing privileges on the database level allows you to provide general policy that will effect the database as a whole and all of the collections within. +The next largest resource you can enact policies against is a [**database**](/intro/database-glossary#database). Managing privileges on the database level allows you to provide general policy that will effect the database as a whole and all of the collections within. -You can also set policies that apply to collections of the same name across all databases. This allows you to use naming conventions to implement access control for specific collections throughout your system. A broader version of this is to apply policy to all databases and non-system collections on the system. +You can also set policies that apply to collections of the same name across all databases. This allows you to use naming conventions to implement access control for specific collections throughout your system. A broader version of this is to apply policy to all databases and non-system collections on the system. -Finally, you can apply policy against the entire [**cluster**](/intro/database-glossary#cluster). Actions targeting the cluster affect the general system instead of the data it is managing directly. Policies on the cluster level tend to be focused on administrative operations. +Finally, you can apply policy against the entire [**cluster**](/intro/database-glossary#cluster). Actions targeting the cluster affect the general system instead of the data it is managing directly. Policies on the cluster level tend to be focused on administrative operations. ## What actions are available in MongoDB? -There are a large number of actions available within MongoDB are related to general usage, database management, and system management. In general, actions correspond to one or more commands or methods within MongoDB. +There are a large number of actions available within MongoDB are related to general usage, database management, and system management. In general, actions correspond to one or more commands or methods within MongoDB. To view the list of actions available in MongoDB as well as their function, expand the section below:
List of MongoDB actions -| Action | Scope | Description | -|--------|-------|-------------| -| `find` | database or collection | Allows read operations on a database | -| `insert` | database or collection | Allows write operations on a database | -| `remove` | database or collection | Allows delete operations on a database | -| `update` | database or collection | Allows replacement operations on a database | -| `bypassDocumentValidation` | database or collection | Lets the user to ignore data validation policies for documents. | -| `useUUID` | cluster | Lets the user use `UUID` values to find for documents | -| `changeCustomData` | database | The user can modify custom data associated with any user in the database | -| `changeOwnCustomData` | database | Allows the user to change the custom data associated with their own user | -| `changeOwnPassword` | database | Lets a user change their own account password | -| `changePassword` | database | Lets a user change the password for any user in the database | -| `createCollection` | database or collection | Allows the user to create collections in the database | -| `createIndex` | database or collection | Lets the user create indexes for the database | -| `createRole` | database | Allows the user to create custom roles in the database | -| `createUser` | database | Lets the user define new user accounts | -| `dropCollection` | database or collection | Allows the user to delete collections | -| `dropRole` | database | Lets the user delete roles | -| `dropUser` | database | Lets the user delete users | -| `enableProfiler` | database | Allows the user to enable performance profiling | -| `grantRole` | database | Lets the user grant roles associated with the database to any user on the system | -| `killCursors` | collection | Lets a user kill their own cursors in versions of MongoDB prior to 4.2 | -| `killAnyCursor` | collection | Lets a user kill other users' cursors | -| `revokeRole` | database | Allows the user to remove roles from any user in the system | -| `setAuthenticationRestriction` | database | Allows the user to set authentication requirements for users and roles | -| `unlock` | cluster | Lets the user reduce the number of write locks on the cluster | -| `viewRole` | database | Allows viewing details about roles in the database | -| `viewUser` | database | Allows viewing details about users in the database | -| `authSchemaUpgrade` | cluster | Allows user to upgrade the authentication mechanisms between MongoDB versions | -| `cleanupOrphaned` | cluster | Lets the user clean up orphaned documents in MongoDB versions prior to 4.4 | -| `cpuProfiler` | cluster | Allows the user to enable CPU profiling | -| `inprog` | cluster | Lets the user view information on other users' in progress or queued operations | -| `invalidateUserCache` | cluster | Lets the user manually flush user details from the cache | -| `killop` | cluster | Allows the user to kill other users' operations | -| `planCacheRead` | database or collection | Lets the user view information about cached query plans | -| `planCacheWrite` | database or collection | Allows the user to delete cached query plans | -| `storageDetails` | database or collection | Deprecated | -| `changeStream` | collection, database, or cluster | Allows the user to access real time change data from non-system collections | -| `appendOpLogNote` | cluster | Lets the user add notes to the oplog | -| `replSetConfigure` | cluster | Allows configuring replica sets | -| `replSetGetConfig` | cluster | Lets the user view the current replica set configuration | -| `replSetGetStatus` | cluster | Lets the user find the current status of the replica set | -| `replSetHeartbeat` | cluster | Deprecated | -| `replSetStateChange` | cluster | Allows the user to manage the state of cluster replica sets | -| `resync` | cluster | Deprecated | -| `addShard` | cluster | Lets the user add a shard replica to a sharded cluster | -| `clearJumboFlag` | database or collection | Lets the user clean up oversized chunks in a shard | -| `enableSharding` | cluster, database, or collection | Allows user to enable sharding on clusters and database or manage sharding on a cluster level | -| `refineCollectionShardKey` | database or collection | Lets the user add additional fields to an existing shard key | -| `flushRouterConfig` | cluster | Lets the user mark a cached routing table as obsolete | -| `getShardVersion` | database | Internal command | -| `listShards` | cluster | Lets user see a list of configured shards for the cluster | -| `moveChunk` | database or collection | Allows a user to move a chunk to a new shard | -| `removeShard` | cluster | Lets the user drain chunks from a shard and then remove the shard from the cluster | -| `shardingState` | cluster | Allows the user to view whether the MongoDB server is part of a sharded cluster | -| `splitChunk` | database or collection | Allows the user to merge or split chunks in a shard | -| `splitVector` | database or collection | Internal command | -| `applicationMessage` | cluster | Lets the user add custom messages to the audit log | -| `closeAllDatabases` | cluster | Deprecated | -| `collMod` | database or collection | Lets the user modify the options associated with a collection | -| `compact` | database or collection | Allows the user to defragment data and indexes in a collection | -| `connPoolSync` | cluster | Internal command | -| `convertToCapped` | database or collection | Lets the user convert a collection into a capped collection with a set maximum size | -| `dropConnections` | cluster | Lets the user remove outgoing connections from MongoDB to a specified host | -| `dropDatabase` | database | Lets the user delete the current database | -| `dropIndex` | database or collection | Lets the user delete an index | -| `forceUUID` | cluster | Lets users define collections using globally unique UUIDs | -| `fsync` | cluster | Allows the user to flush all pending writes to storage and lock the cluster for writes | -| `getDefaultRWConcern` | cluster | Lets the user view the default read and write consistency and isolation settings | -| `getParameter` | cluster | Lets the user query for the value of a parameter | -| `hostInfo` | cluster | Allows the user to see information about the server running the MongoDB instance | -| `logRotate` | cluster | Lets the user trigger log rotation | -| `reIndex` | database or collection | Lets the user rebuild the indexes for a collection | -| `renameCollectionSameDB` | database | Lets the user rename a collection in the current database | -| `setDefaultRWConcern` | cluster | Lets the user specify the default read and write consistency and isolation settings | -| `setParameter` | cluster | Allows the user to define the value for parameters | -| `shutdown` | cluster | Lets the user shutdown the MongoDB instance | -| `touch` | cluster | Deprecated | -| `impersonate` | cluster | Lets the user kill sessions associated with other users and roles | -| `listSessions` | cluster | Lets the user list all sessions by all users | -| `killAnySession` | cluster | Lets the user kill all sessions for a specific user or a pattern | -| `checkFreeMonitoringStatus` | cluster | Allows the user to see the status of cloud monitoring functionality | -| `setFreeMonitoring` | cluster | Lets the user enable or disable cloud monitoring functionality | -| `collStats` | database or collection | Allows the user to view statistics about collections | -| `connPoolStats` | cluster | Lets the user see the status of outgoing connections from MongoDB instances | -| `dbHash` | database or collection | Lets the user query for hashed values of collections in a database | -| `dbStats` | database | Allows the user to view storage statistics | -| `getCmdLineOpts` | cluster | Lets the user see the command line options used to start the MongoDB instance | -| `getLog` | cluster | Lets the user see the most recent MongoDB events | -| `listDatabases` | cluster | Lets the user see the list of all databases | -| `listCollections` | database | Allows the user to see a list of collections and views in a database | -| `listIndexes` | database or collection | Lets the user see what indexes are associated with a specific collection | -| `netstat` | cluster | Internal command | -| `serverStatus` | cluster | Lets the user view information about the database's current state | -| `validate` | database or collection | Allows the user to check a collections data and indexes for correctness | -| `top` | cluster | Lets the user see usage statistics for collections | +| Action | Scope | Description | +| ------------------------------ | -------------------------------- | --------------------------------------------------------------------------------------------- | +| `find` | database or collection | Allows read operations on a database | +| `insert` | database or collection | Allows write operations on a database | +| `remove` | database or collection | Allows delete operations on a database | +| `update` | database or collection | Allows replacement operations on a database | +| `bypassDocumentValidation` | database or collection | Lets the user to ignore data validation policies for documents. | +| `useUUID` | cluster | Lets the user use `UUID` values to find for documents | +| `changeCustomData` | database | The user can modify custom data associated with any user in the database | +| `changeOwnCustomData` | database | Allows the user to change the custom data associated with their own user | +| `changeOwnPassword` | database | Lets a user change their own account password | +| `changePassword` | database | Lets a user change the password for any user in the database | +| `createCollection` | database or collection | Allows the user to create collections in the database | +| `createIndex` | database or collection | Lets the user create indexes for the database | +| `createRole` | database | Allows the user to create custom roles in the database | +| `createUser` | database | Lets the user define new user accounts | +| `dropCollection` | database or collection | Allows the user to delete collections | +| `dropRole` | database | Lets the user delete roles | +| `dropUser` | database | Lets the user delete users | +| `enableProfiler` | database | Allows the user to enable performance profiling | +| `grantRole` | database | Lets the user grant roles associated with the database to any user on the system | +| `killCursors` | collection | Lets a user kill their own cursors in versions of MongoDB prior to 4.2 | +| `killAnyCursor` | collection | Lets a user kill other users' cursors | +| `revokeRole` | database | Allows the user to remove roles from any user in the system | +| `setAuthenticationRestriction` | database | Allows the user to set authentication requirements for users and roles | +| `unlock` | cluster | Lets the user reduce the number of write locks on the cluster | +| `viewRole` | database | Allows viewing details about roles in the database | +| `viewUser` | database | Allows viewing details about users in the database | +| `authSchemaUpgrade` | cluster | Allows user to upgrade the authentication mechanisms between MongoDB versions | +| `cleanupOrphaned` | cluster | Lets the user clean up orphaned documents in MongoDB versions prior to 4.4 | +| `cpuProfiler` | cluster | Allows the user to enable CPU profiling | +| `inprog` | cluster | Lets the user view information on other users' in progress or queued operations | +| `invalidateUserCache` | cluster | Lets the user manually flush user details from the cache | +| `killop` | cluster | Allows the user to kill other users' operations | +| `planCacheRead` | database or collection | Lets the user view information about cached query plans | +| `planCacheWrite` | database or collection | Allows the user to delete cached query plans | +| `storageDetails` | database or collection | Deprecated | +| `changeStream` | collection, database, or cluster | Allows the user to access real time change data from non-system collections | +| `appendOpLogNote` | cluster | Lets the user add notes to the oplog | +| `replSetConfigure` | cluster | Allows configuring replica sets | +| `replSetGetConfig` | cluster | Lets the user view the current replica set configuration | +| `replSetGetStatus` | cluster | Lets the user find the current status of the replica set | +| `replSetHeartbeat` | cluster | Deprecated | +| `replSetStateChange` | cluster | Allows the user to manage the state of cluster replica sets | +| `resync` | cluster | Deprecated | +| `addShard` | cluster | Lets the user add a shard replica to a sharded cluster | +| `clearJumboFlag` | database or collection | Lets the user clean up oversized chunks in a shard | +| `enableSharding` | cluster, database, or collection | Allows user to enable sharding on clusters and database or manage sharding on a cluster level | +| `refineCollectionShardKey` | database or collection | Lets the user add additional fields to an existing shard key | +| `flushRouterConfig` | cluster | Lets the user mark a cached routing table as obsolete | +| `getShardVersion` | database | Internal command | +| `listShards` | cluster | Lets user see a list of configured shards for the cluster | +| `moveChunk` | database or collection | Allows a user to move a chunk to a new shard | +| `removeShard` | cluster | Lets the user drain chunks from a shard and then remove the shard from the cluster | +| `shardingState` | cluster | Allows the user to view whether the MongoDB server is part of a sharded cluster | +| `splitChunk` | database or collection | Allows the user to merge or split chunks in a shard | +| `splitVector` | database or collection | Internal command | +| `applicationMessage` | cluster | Lets the user add custom messages to the audit log | +| `closeAllDatabases` | cluster | Deprecated | +| `collMod` | database or collection | Lets the user modify the options associated with a collection | +| `compact` | database or collection | Allows the user to defragment data and indexes in a collection | +| `connPoolSync` | cluster | Internal command | +| `convertToCapped` | database or collection | Lets the user convert a collection into a capped collection with a set maximum size | +| `dropConnections` | cluster | Lets the user remove outgoing connections from MongoDB to a specified host | +| `dropDatabase` | database | Lets the user delete the current database | +| `dropIndex` | database or collection | Lets the user delete an index | +| `forceUUID` | cluster | Lets users define collections using globally unique UUIDs | +| `fsync` | cluster | Allows the user to flush all pending writes to storage and lock the cluster for writes | +| `getDefaultRWConcern` | cluster | Lets the user view the default read and write consistency and isolation settings | +| `getParameter` | cluster | Lets the user query for the value of a parameter | +| `hostInfo` | cluster | Allows the user to see information about the server running the MongoDB instance | +| `logRotate` | cluster | Lets the user trigger log rotation | +| `reIndex` | database or collection | Lets the user rebuild the indexes for a collection | +| `renameCollectionSameDB` | database | Lets the user rename a collection in the current database | +| `setDefaultRWConcern` | cluster | Lets the user specify the default read and write consistency and isolation settings | +| `setParameter` | cluster | Allows the user to define the value for parameters | +| `shutdown` | cluster | Lets the user shutdown the MongoDB instance | +| `touch` | cluster | Deprecated | +| `impersonate` | cluster | Lets the user kill sessions associated with other users and roles | +| `listSessions` | cluster | Lets the user list all sessions by all users | +| `killAnySession` | cluster | Lets the user kill all sessions for a specific user or a pattern | +| `checkFreeMonitoringStatus` | cluster | Allows the user to see the status of cloud monitoring functionality | +| `setFreeMonitoring` | cluster | Lets the user enable or disable cloud monitoring functionality | +| `collStats` | database or collection | Allows the user to view statistics about collections | +| `connPoolStats` | cluster | Lets the user see the status of outgoing connections from MongoDB instances | +| `dbHash` | database or collection | Lets the user query for hashed values of collections in a database | +| `dbStats` | database | Allows the user to view storage statistics | +| `getCmdLineOpts` | cluster | Lets the user see the command line options used to start the MongoDB instance | +| `getLog` | cluster | Lets the user see the most recent MongoDB events | +| `listDatabases` | cluster | Lets the user see the list of all databases | +| `listCollections` | database | Allows the user to see a list of collections and views in a database | +| `listIndexes` | database or collection | Lets the user see what indexes are associated with a specific collection | +| `netstat` | cluster | Internal command | +| `serverStatus` | cluster | Lets the user view information about the database's current state | +| `validate` | database or collection | Allows the user to check a collections data and indexes for correctness | +| `top` | cluster | Lets the user see usage statistics for collections |
## What roles are available in MongoDB by default? -MongoDB includes a number of useful roles that combine similar privileges together. These roles allow you to grant and revoke privileges to database resources in a concise way. +MongoDB includes a number of useful roles that combine similar privileges together. These roles allow you to grant and revoke privileges to database resources in a concise way. To view the list of actions available in MongoDB as well as their function, expand the section below:
List of default MongoDB roles -* `read`: Provides read access to non-system collections - * Actions: - * `changeStream` - * `collStats` - * `dbHash` - * `dbStats` - * `find` - * `killCursors` - * `listIndexes` - * `listCollections` -* `readWrite`: Provides read and write access to non-system collections - * Actions: - * `collStats` - * `convertToCapped` - * `createCollection` - * `dbHash` - * `dbStats` - * `dropCollection` - * `createIndex` - * `dropIndex` - * `find` - * `insert` - * `killCursors` - * `listIndexes` - * `listCollections` - * `remove` - * `renameCollectionSameDB` - * `update` -* `dbAdmin`: Provides access to administrative tasks at the database level, excluding role and user management - * Actions within the `system.profile` collection: - * `changeStream` - * `collStats` - * `convertToCapped` - * `createCollection` - * `dbHash` - * `dbStats` - * `dropCollection` - * `find` - * `killCursors` - * `listCollections` - * `listIndexes` - * `planCacheRead` - * Actions in non-system collections: - * `bypassDocumentValidation` - * `collMod` - * `collStats` - * `compact` - * `convertToCapped` - * `createCollection` - * `createIndex` - * `dbStats` - * `dropCollection` - * `dropDatabase` - * `dropIndex` - * `enableProfiler` - * `listCollections` - * `listIndexes` - * `planCacheIndexFilter` - * `planCacheRead` - * `planCacheWrite` - * `reIndex` - * `renameCollectionSameDB` - * `storageDetails` - * `validate` -* `userAdmin`: Provides access to create and modify users and roles - * Actions: - * `changeCustomData` - * `changePassword` - * `createRole` - * `createUser` - * `dropRole` - * `dropUser` - * `grantRole` - * `revokeRole` - * `setAuthenticationRestriction` - * `viewRole` - * `viewUser` -* `dbOwner`: Provides administrative access to the database including role and user management - * Roles this role inherits: - * `readWrite` - * `dbAdmin` - * `userAdmin` -* `clusterMonitor`: Provides read access to the cluster - * Actions for the whole cluster: - * `checkFreeMonitoringStatus` - * `connPoolStats` - * `getCmdLineOpts` - * `getDefaultRWConcern` - * `getLog` - * `getParameter` - * `getShardMap` - * `hostInfo` - * `inprog` - * `listDatabases` - * `listSessions` - * `listShards` - * `netstat` - * `replSetGetConfig` - * `replSetGetStatus` - * `serverStatus` - * `setFreeMonitoring` - * `shardingState` - * `top` - * Actions for all databases within the cluster: - * `collStats` - * `dbStats` - * `getShardVersion` - * `indexStats` - * `useUUID` - * Actions for all `system.profile` collections: - * `find` - * Actions on the non-system collections in the `config` database: - * `collStats` - * `dbHash` - * `dbStats` - * `find` - * `getShardVersion` - * `indexStats` - * `killCursors` - * `listCollections` - * `listIndexes` - * `planCacheRead` - * Actions on the `system.js` collection in the `config` database: - * `collStats` - * `dbHash` - * `dbStats` - * `find` - * `killCursors` - * `listCollections` - * `listIndexes` - * `planCacheRead` - * Actions on all collections in the `local` database: - * `collStats` - * `dbHash` - * `dbStats` - * `find` - * `getShardVersion` - * `indexStats` - * `killCursors` - * `listCollections` - * `listIndexes` - * `planCacheRead` - * Actions on the `system.js` collection in the `local` database: - * `collStats` - * `dbHash` - * `dbStats` - * `find` - * `killCursors` - * `listCollections` - * `listIndexes` - * `planCacheRead` - * Actions on the `system.replset` and `system.profile` collections in the `local` database: - * `find` -* `clusterManager`: Provides monitoring and management access on the cluster through the `config` and `local` databases - * Actions on the entire cluster: - * `addShard` - * `appendOplogNote` - * `applicationMessage` - * `cleanupOrphaned` - * `flushRouterConfig` - * `getDefaultRWConcern` - * `listSessions` - * `listShards` - * `removeShard` - * `replSetConfigure` - * `replSetGetConfig` - * `replSetGetStatus` - * `replSetStateChange` - * `resync` - * `setDefaultRWConcern` - * `setFeatureCompatibilityVersion` - * `setFreeMonitoring` - * Actions on all databases within the cluster: - * `clearJumboFlag` - * `enableSharding` - * `refineCollectionShardKey` - * `moveChunk` - * `splitChunk` - * `splitVector` - * Actions on non-system collections in the `config` database: - * `collStats` - * `dbHash` - * `dbStats` - * `enableSharding` - * `find` - * `insert` - * `killCursors` - * `listCollections` - * `listIndexes` - * `moveChunk` - * `planCacheRead` - * `remove` - * `splitChunk` - * `splitVector` - * `update` - * Actions on the `system.js` collection in the `config` database: - * `collStats` - * `dbHash` - * `dbStats` - * `find` - * `killCursors` - * `listCollections` - * `listIndexes` - * `planCacheRead` - * Actions on all non-system collections in the `local` database: - * `enableSharding` - * `insert` - * `moveChunk` - * `remove` - * `splitChunk` - * `splitVector` - * `update` - * Actions for the `system.replset` collection in the `local` database: - * `collStats` - * `dbHash` - * `dbStats` - * `find` - * `killCursors` - * `listCollections` - * `listIndexes` - * `planCacheRead` -* `hostManager`: Provides the ability to monitor and manage servers. - * Actions on the entire cluster: - * `applicationMessage` - * `closeAllDatabases` - * `connPoolSync` - * `flushRouterConfig` - * `fsync` - * `invalidateUserCache` - * `killAnyCursor` - * `KillAnySession` - * `killop` - * `logRotate` - * `resync` - * `setParameter` - * `shutdown` - * `touch` - * `unlock` -* `clusterAdmin`: Provides all cluster management access - * Roles this role inherits: - * `clusterManager` - * `clusterMonitor` - * `hostManager` - * Additional actions: - * `dropDatabase` -* `backup`: Provides privileges needed to back up data - * Actions on all resources: - * `listDatabases` - * `listCollections` - * `listIndexes` - * Actions on the entire cluster: - * `appendOplogNote` - * `getParameter` - * `listDatabases` - * `serverStatus` - * Actions on non-system collections, the `system.js` and `system.profile` collections, the `admin.system.users` and `admin.system.roles` collections, and the `config.settings` collection: - * `find` - * Actions on the `config.settings` collection: - * `insert` - * `update` -* `restore`: Provides the privileges to restore data to the cluster - * Actions on the entire cluster: - * `getParameter` - * Actions on non-system collections: - * `bypassDocumentValidation` - * `changeCustomData` - * `changePassword` - * `collMod` - * `convertToCapped` - * `createCollection` - * `createIndex` - * `createRole` - * `createUser` - * `dropCollection` - * `dropRole` - * `dropUser` - * `grantRole` - * `insert` - * `revokeRole` - * `viewRole` - * `viewUser` - * Actions on the `system.js` collection: - * `bypassDocumentValidation` - * `collMod` - * `createCollection` - * `createIndex` - * `dropCollection` - * `insert` - * Actions on any resource - * `listCollections` - * Actions on non-system collections in the `config` and `local` databases: - * `bypassDocumentValidation` - * `collMod` - * `createCollection` - * `createIndex` - * `dropCollection` - * `insert` - * Actions on the `admin.system.version` collection: - * `bypassDocumentValidation` - * `collMod` - * `createCollection` - * `createIndex` - * `dropCollection` - * `find` - * `insert` - * Actions on the `admin.system.roles` collection: - * `createIndex` - * Actions on the `admin.system.users` collection: - * `bypassDocumentValidation` - * `collMod` - * `createCollection` - * `createIndex` - * `dropCollection` - * `find` - * `insert` - * `remove` - * `update` -* `readAnyDatabase`: Provides same privileges as `read` to all databases except `local` and `config` - * Additional actions on the entire cluster: - * `listDatabases` -* `readWriteAnyDatabase`: Provides the same privileges as `readWrite` to all databases except `local` and `config` - * Additional actions on the entire cluster: - * `listDatabases` -* `userAdminAnyDatabase`: Provides the same privileges as `userAdmin` on all databases except `local` and `config`. - * Additional actions on the entire cluster: - * `authSchemaUpgrade` - * `invalidateUserCache` - * `listDatabases` - * Additional actions on the `system.users` and `system.roles` clusters in the `admin` database: - * `collStats` - * `dbHash` - * `dbStats` - * `find` - * `killCursors` - * `planCacheRead` -* `dbAdminAnyDatabase`: Provides the same privileges as `dbAdmin` for all databases except `local` and `config`. - * Additional actions on the entire cluster: - * `listDatabases` -* `root`: Provides complete access to the entire system. - * Roles this role inherits: - * `readWriteAnyDatabase` - * `dbAdminAnyDatabase` - * `userAdminAnyDatabase` - * `clusterAdmin` - * `restore` - * `backup` - * Additional actions on `system` collections: - * `validate` +- `read`: Provides read access to non-system collections + - Actions: + - `changeStream` + - `collStats` + - `dbHash` + - `dbStats` + - `find` + - `killCursors` + - `listIndexes` + - `listCollections` +- `readWrite`: Provides read and write access to non-system collections + - Actions: + - `collStats` + - `convertToCapped` + - `createCollection` + - `dbHash` + - `dbStats` + - `dropCollection` + - `createIndex` + - `dropIndex` + - `find` + - `insert` + - `killCursors` + - `listIndexes` + - `listCollections` + - `remove` + - `renameCollectionSameDB` + - `update` +- `dbAdmin`: Provides access to administrative tasks at the database level, excluding role and user management + - Actions within the `system.profile` collection: + - `changeStream` + - `collStats` + - `convertToCapped` + - `createCollection` + - `dbHash` + - `dbStats` + - `dropCollection` + - `find` + - `killCursors` + - `listCollections` + - `listIndexes` + - `planCacheRead` + - Actions in non-system collections: + - `bypassDocumentValidation` + - `collMod` + - `collStats` + - `compact` + - `convertToCapped` + - `createCollection` + - `createIndex` + - `dbStats` + - `dropCollection` + - `dropDatabase` + - `dropIndex` + - `enableProfiler` + - `listCollections` + - `listIndexes` + - `planCacheIndexFilter` + - `planCacheRead` + - `planCacheWrite` + - `reIndex` + - `renameCollectionSameDB` + - `storageDetails` + - `validate` +- `userAdmin`: Provides access to create and modify users and roles + - Actions: + - `changeCustomData` + - `changePassword` + - `createRole` + - `createUser` + - `dropRole` + - `dropUser` + - `grantRole` + - `revokeRole` + - `setAuthenticationRestriction` + - `viewRole` + - `viewUser` +- `dbOwner`: Provides administrative access to the database including role and user management + - Roles this role inherits: + - `readWrite` + - `dbAdmin` + - `userAdmin` +- `clusterMonitor`: Provides read access to the cluster + - Actions for the whole cluster: + - `checkFreeMonitoringStatus` + - `connPoolStats` + - `getCmdLineOpts` + - `getDefaultRWConcern` + - `getLog` + - `getParameter` + - `getShardMap` + - `hostInfo` + - `inprog` + - `listDatabases` + - `listSessions` + - `listShards` + - `netstat` + - `replSetGetConfig` + - `replSetGetStatus` + - `serverStatus` + - `setFreeMonitoring` + - `shardingState` + - `top` + - Actions for all databases within the cluster: + - `collStats` + - `dbStats` + - `getShardVersion` + - `indexStats` + - `useUUID` + - Actions for all `system.profile` collections: + - `find` + - Actions on the non-system collections in the `config` database: + - `collStats` + - `dbHash` + - `dbStats` + - `find` + - `getShardVersion` + - `indexStats` + - `killCursors` + - `listCollections` + - `listIndexes` + - `planCacheRead` + - Actions on the `system.js` collection in the `config` database: + - `collStats` + - `dbHash` + - `dbStats` + - `find` + - `killCursors` + - `listCollections` + - `listIndexes` + - `planCacheRead` + - Actions on all collections in the `local` database: + - `collStats` + - `dbHash` + - `dbStats` + - `find` + - `getShardVersion` + - `indexStats` + - `killCursors` + - `listCollections` + - `listIndexes` + - `planCacheRead` + - Actions on the `system.js` collection in the `local` database: + - `collStats` + - `dbHash` + - `dbStats` + - `find` + - `killCursors` + - `listCollections` + - `listIndexes` + - `planCacheRead` + - Actions on the `system.replset` and `system.profile` collections in the `local` database: + - `find` +- `clusterManager`: Provides monitoring and management access on the cluster through the `config` and `local` databases + - Actions on the entire cluster: + - `addShard` + - `appendOplogNote` + - `applicationMessage` + - `cleanupOrphaned` + - `flushRouterConfig` + - `getDefaultRWConcern` + - `listSessions` + - `listShards` + - `removeShard` + - `replSetConfigure` + - `replSetGetConfig` + - `replSetGetStatus` + - `replSetStateChange` + - `resync` + - `setDefaultRWConcern` + - `setFeatureCompatibilityVersion` + - `setFreeMonitoring` + - Actions on all databases within the cluster: + - `clearJumboFlag` + - `enableSharding` + - `refineCollectionShardKey` + - `moveChunk` + - `splitChunk` + - `splitVector` + - Actions on non-system collections in the `config` database: + - `collStats` + - `dbHash` + - `dbStats` + - `enableSharding` + - `find` + - `insert` + - `killCursors` + - `listCollections` + - `listIndexes` + - `moveChunk` + - `planCacheRead` + - `remove` + - `splitChunk` + - `splitVector` + - `update` + - Actions on the `system.js` collection in the `config` database: + - `collStats` + - `dbHash` + - `dbStats` + - `find` + - `killCursors` + - `listCollections` + - `listIndexes` + - `planCacheRead` + - Actions on all non-system collections in the `local` database: + - `enableSharding` + - `insert` + - `moveChunk` + - `remove` + - `splitChunk` + - `splitVector` + - `update` + - Actions for the `system.replset` collection in the `local` database: + - `collStats` + - `dbHash` + - `dbStats` + - `find` + - `killCursors` + - `listCollections` + - `listIndexes` + - `planCacheRead` +- `hostManager`: Provides the ability to monitor and manage servers. + - Actions on the entire cluster: + - `applicationMessage` + - `closeAllDatabases` + - `connPoolSync` + - `flushRouterConfig` + - `fsync` + - `invalidateUserCache` + - `killAnyCursor` + - `KillAnySession` + - `killop` + - `logRotate` + - `resync` + - `setParameter` + - `shutdown` + - `touch` + - `unlock` +- `clusterAdmin`: Provides all cluster management access + - Roles this role inherits: + - `clusterManager` + - `clusterMonitor` + - `hostManager` + - Additional actions: + - `dropDatabase` +- `backup`: Provides privileges needed to back up data + - Actions on all resources: + - `listDatabases` + - `listCollections` + - `listIndexes` + - Actions on the entire cluster: + - `appendOplogNote` + - `getParameter` + - `listDatabases` + - `serverStatus` + - Actions on non-system collections, the `system.js` and `system.profile` collections, the `admin.system.users` and `admin.system.roles` collections, and the `config.settings` collection: + - `find` + - Actions on the `config.settings` collection: + - `insert` + - `update` +- `restore`: Provides the privileges to restore data to the cluster + - Actions on the entire cluster: + - `getParameter` + - Actions on non-system collections: + - `bypassDocumentValidation` + - `changeCustomData` + - `changePassword` + - `collMod` + - `convertToCapped` + - `createCollection` + - `createIndex` + - `createRole` + - `createUser` + - `dropCollection` + - `dropRole` + - `dropUser` + - `grantRole` + - `insert` + - `revokeRole` + - `viewRole` + - `viewUser` + - Actions on the `system.js` collection: + - `bypassDocumentValidation` + - `collMod` + - `createCollection` + - `createIndex` + - `dropCollection` + - `insert` + - Actions on any resource + - `listCollections` + - Actions on non-system collections in the `config` and `local` databases: + - `bypassDocumentValidation` + - `collMod` + - `createCollection` + - `createIndex` + - `dropCollection` + - `insert` + - Actions on the `admin.system.version` collection: + - `bypassDocumentValidation` + - `collMod` + - `createCollection` + - `createIndex` + - `dropCollection` + - `find` + - `insert` + - Actions on the `admin.system.roles` collection: + - `createIndex` + - Actions on the `admin.system.users` collection: + - `bypassDocumentValidation` + - `collMod` + - `createCollection` + - `createIndex` + - `dropCollection` + - `find` + - `insert` + - `remove` + - `update` +- `readAnyDatabase`: Provides same privileges as `read` to all databases except `local` and `config` + - Additional actions on the entire cluster: + - `listDatabases` +- `readWriteAnyDatabase`: Provides the same privileges as `readWrite` to all databases except `local` and `config` + - Additional actions on the entire cluster: + - `listDatabases` +- `userAdminAnyDatabase`: Provides the same privileges as `userAdmin` on all databases except `local` and `config`. + - Additional actions on the entire cluster: + - `authSchemaUpgrade` + - `invalidateUserCache` + - `listDatabases` + - Additional actions on the `system.users` and `system.roles` clusters in the `admin` database: + - `collStats` + - `dbHash` + - `dbStats` + - `find` + - `killCursors` + - `planCacheRead` +- `dbAdminAnyDatabase`: Provides the same privileges as `dbAdmin` for all databases except `local` and `config`. + - Additional actions on the entire cluster: + - `listDatabases` +- `root`: Provides complete access to the entire system. + - Roles this role inherits: + - `readWriteAnyDatabase` + - `dbAdminAnyDatabase` + - `userAdminAnyDatabase` + - `clusterAdmin` + - `restore` + - `backup` + - Additional actions on `system` collections: + - `validate`
## How to enable authorization in MongoDB -Before MongoDB can use authorization to manage user privileges, you must enable the functionality on your server or cluster. To do so, you must log in to your server with `root` or other administrative privileges. +Before MongoDB can use authorization to manage user privileges, you must enable the functionality on your server or cluster. To do so, you must log in to your server with `root` or other administrative privileges. **Note:** Before enabling authorization, double check to make sure that you have access to at least one role with the privileges necessary to manage roles. -Modify the MongoDB server's configuration by opening the `/etc/mongod.conf` file in a text editor as an administrator. This command will open the file using the text editor defined in the `EDITOR` environment variable and fall back to `vi`, which is available on almost all systems: +Modify the MongoDB server's configuration by opening the `/etc/mongod.conf` file in a text editor as an administrator. This command will open the file using the text editor defined in the `EDITOR` environment variable and fall back to `vi`, which is available on almost all systems: ```bash sudo ${EDITOR:-vi} /etc/mongod.conf ``` -The MongoDB configuration file uses [the YAML serialization format](https://en.wikipedia.org/wiki/YAML) to define the configuration. Uncomment or add a `security:` section key to the file. Beneath this key, indent a line using spaces (tabs are not permitted in YAML) and set `authorization` to `enabled`: +The MongoDB configuration file uses [the YAML serialization format](https://en.wikipedia.org/wiki/YAML) to define the configuration. Uncomment or add a `security:` section key to the file. Beneath this key, indent a line using spaces (tabs are not permitted in YAML) and set `authorization` to `enabled`: ```yaml . . . @@ -544,7 +544,7 @@ security: Save and close the file when you are finished. -To enable the new settings, restart your MongoDB server process. If your MongoDB server is running on a Linux host, the operation will look like this: +To enable the new settings, restart your MongoDB server process. If your MongoDB server is running on a Linux host, the operation will look like this: ```bash sudo systemctl restart mongod.service @@ -568,7 +568,7 @@ db.getRoles({ The returned list will include a whole list of nested information about each of the roles and the privileges they have on various resources throughout the system. -To get information about a specific role, use the `db.getRole()` method instead. You must be on the database where the user is defined before executing the command: +To get information about a specific role, use the `db.getRole()` method instead. You must be on the database where the user is defined before executing the command: ``` use admin @@ -599,7 +599,7 @@ db.getUser("root") To grant additional privileges to a user, you must grant them access to an existing role. -The `db.grantRolesToUser()` method allows you to specify additional roles that you want to add to a user. Its first argument is the user you wish to grant additional privileges to and the second argument is an array of additional roles you wish to add: +The `db.grantRolesToUser()` method allows you to specify additional roles that you want to add to a user. Its first argument is the user you wish to grant additional privileges to and the second argument is an array of additional roles you wish to add: ``` db.grantRolesToUser( @@ -625,7 +625,7 @@ db.grantRolesToUser( ) ``` -To revoke roles from a user, you can use the companion method called `db.revokeRolesFromUser()`. The argument syntax works exactly the same, but this time, the command removes the roles from the specified account. +To revoke roles from a user, you can use the companion method called `db.revokeRolesFromUser()`. The argument syntax works exactly the same, but this time, the command removes the roles from the specified account. To remove roles defined in the current database you can use the role names without mentioning the database: @@ -653,18 +653,18 @@ db.revokeRolesFromUser( ## Creating and managing custom roles -There are times when the system's built-in roles do not adequately match up with the types of permissions you need to assign. In these cases, you can create your own custom roles. +There are times when the system's built-in roles do not adequately match up with the types of permissions you need to assign. In these cases, you can create your own custom roles. ### Creating new roles -The `db.createRole()` method allows you to define a new role that you can assign privileges and other roles to. You can then grant your new role to users to give them the specific privileges you defined. +The `db.createRole()` method allows you to define a new role that you can assign privileges and other roles to. You can then grant your new role to users to give them the specific privileges you defined. -The basic syntax of the `db.createRole()` method involves passing a document that defines the role's characteristics. The document can have the following fields: +The basic syntax of the `db.createRole()` method involves passing a document that defines the role's characteristics. The document can have the following fields: -* `role`: The name you want to give the role -* `privileges`: An array containing the set of loose privileges you want to assign to the role. Each privilege is defined in a nested document that defines a `resource` document (specifying which resources this privilege applies to) as well as an array of `actions` that are being granted -* `roles`: An array of additional roles that this role should inherit from. The new role will acquire all of the privileges granted to any of the roles listed here. -* `authenticationRestrictions`: An array that specifies any restrictions on authentication for the role. This allows you to deny a role's privileges if the user has not authenticated in a way the role approves of. +- `role`: The name you want to give the role +- `privileges`: An array containing the set of loose privileges you want to assign to the role. Each privilege is defined in a nested document that defines a `resource` document (specifying which resources this privilege applies to) as well as an array of `actions` that are being granted +- `roles`: An array of additional roles that this role should inherit from. The new role will acquire all of the privileges granted to any of the roles listed here. +- `authenticationRestrictions`: An array that specifies any restrictions on authentication for the role. This allows you to deny a role's privileges if the user has not authenticated in a way the role approves of. The first three fields are required for every new role created. @@ -713,7 +713,7 @@ db.getRole( ### Granting additional privileges to custom roles -To grant additional privileges to an existing user-defined role, you can use the `db.grantPrivilegesToRole()` method. It takes an array of privileges that are defined by documents containing a `resource` document and `actions` array, just as we saw above with `db.createRole()`. +To grant additional privileges to an existing user-defined role, you can use the `db.grantPrivilegesToRole()` method. It takes an array of privileges that are defined by documents containing a `resource` document and `actions` array, just as we saw above with `db.createRole()`. For instance, to add the `listCollections` privilege to the `salesMonitor` role, you could type: @@ -721,7 +721,7 @@ For instance, to add the `listCollections` privilege to the `salesMonitor` role, db.grantPrivilegesToRole( "salesMonitor", [ - { + { resource: { db: "sales", collection: "" }, actions: [ "listCollections" ] } @@ -737,7 +737,7 @@ If you change your mind, you can use the `db.revokePrivilegesFromRole()` method db.revokePrivilegesFromRole( "salesMonitor", [ - { + { resource: { db: "sales", collection: "" }, actions: [ "listCollections" ] } @@ -747,7 +747,7 @@ db.revokePrivilegesFromRole( ### Granting roles to custom roles -To add the privileges defined by a role to another role, you can use the `db.grantRolesToRole()` method. The method takes the role you want to modify and an array of roles you want to add to it as arguments. +To add the privileges defined by a role to another role, you can use the `db.grantRolesToRole()` method. The method takes the role you want to modify and an array of roles you want to add to it as arguments. To specify that you want to use the `read` role for the `salesMonitor` role after all, you can do so by typing: @@ -775,11 +775,11 @@ db.revokeRolesFromRole( ### Replacing the values of a custom role -To redefine the characteristics of a user-defined role, you can use the `db.updateRole()` command. It works by *replacing* the fields it specifies instead of *appending* or *truncating* them. For this reason, it's a good idea to be careful when issuing the command so that you do not accidentally overwrite important information. +To redefine the characteristics of a user-defined role, you can use the `db.updateRole()` command. It works by _replacing_ the fields it specifies instead of _appending_ or _truncating_ them. For this reason, it's a good idea to be careful when issuing the command so that you do not accidentally overwrite important information. -The syntax for the `db.updateRole()` command involves passing the role name as the first argument and a document specifying the field or fields you wish to replace as the second argument. The fields that can be replaced include the `privileges` array, the `roles` array, and the `authenticationRestrictions` array. At least one of these must be included in the document. +The syntax for the `db.updateRole()` command involves passing the role name as the first argument and a document specifying the field or fields you wish to replace as the second argument. The fields that can be replaced include the `privileges` array, the `roles` array, and the `authenticationRestrictions` array. At least one of these must be included in the document. -For example, once we've finally decided we want the `salesMonitor` role to use the `read` role on the `sales` database, we may want to redefine the role's privilege and role arrays to clean up any extra privileges that have been left behind by our experimentation. You can do this by updating the role with the new information you want to set: +For example, once we've finally decided we want the `salesMonitor` role to use the `read` role on the `sales` database, we may want to redefine the role's privilege and role arrays to clean up any extra privileges that have been left behind by our experimentation. You can do this by updating the role with the new information you want to set: ``` db.updateRole( @@ -810,13 +810,13 @@ The role will be removed from the system and any privileges granted to users by ## Conclusion -In this article, we covered a lot of ground about how MongoDB implements access control and privilege management. We took a look at the conceptual underpinnings of the system, saw the roles, actions, and resources available for administrators to manage, and then learned about how to use the roles system to configure authorization across the system. +In this article, we covered a lot of ground about how MongoDB implements access control and privilege management. We took a look at the conceptual underpinnings of the system, saw the roles, actions, and resources available for administrators to manage, and then learned about how to use the roles system to configure authorization across the system. -These skills are necessary to provide users with access to the resources they need to complete their required tasks while limiting exposure to unrelated parts of the system. Learning how to define and leverage roles increases your ability to offer fine grained access control on the MongoDB systems you manage. +These skills are necessary to provide users with access to the resources they need to complete their required tasks while limiting exposure to unrelated parts of the system. Learning how to define and leverage roles increases your ability to offer fine grained access control on the MongoDB systems you manage. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -863,7 +863,7 @@ The `root` role in MongoDB provides access to all of the operations and resource
What is Role-Based Access Control (RBAC) in MongoDB? -MongoDB employs [RBAC](https://www.mongodb.com/docs/manual/core/authorization/#role-based-access-control) to govern access to a MongoDB system. RBAC is a security strategy that restricts the operations permitted to a user based on their assigned roles. +MongoDB employs [RBAC](https://www.mongodb.com/docs/manual/core/authorization/#role-based-access-control) to govern access to a MongoDB system. RBAC is a security strategy that restricts the operations permitted to a user based on their assigned roles. MongoDB does not allow operations or access to a database if a user does not have an assigned role. diff --git a/content/08-mongodb/07-creating-dbs-and-collections.mdx b/content/08-mongodb/07-creating-dbs-and-collections.mdx index 1fa79d6d..ae7b8625 100644 --- a/content/08-mongodb/07-creating-dbs-and-collections.mdx +++ b/content/08-mongodb/07-creating-dbs-and-collections.mdx @@ -8,13 +8,13 @@ authors: ['justinellingwood'] ## Introduction -MongoDB uses [document-oriented structures](/intro/database-glossary#document-database) to store, manage, and process data. Individual [documents](/intro/database-glossary#document) are organized into [collections](/intro/database-glossary#collections), which in turn, are stored in databases. Because the schema of each document is not defined by a static schema, document based systems offer more flexibility than relational systems that are composed of tables and records. +MongoDB uses [document-oriented structures](/intro/database-glossary#document-database) to store, manage, and process data. Individual [documents](/intro/database-glossary#document) are organized into [collections](/intro/database-glossary#collections), which in turn, are stored in databases. Because the schema of each document is not defined by a static schema, document based systems offer more flexibility than relational systems that are composed of tables and records. -In this guide, we'll talk about how to create and manage the structures that MongoDB uses to organize data. We'll cover how to create and manage databases and then how to make collections to hold similar or related documents. +In this guide, we'll talk about how to create and manage the structures that MongoDB uses to organize data. We'll cover how to create and manage databases and then how to make collections to hold similar or related documents. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -22,13 +22,14 @@ To get started working with MongoDB and Prisma, checkout our [getting started fr ## How to view existing databases -Before we begin creating new databases, it's helpful to get familiar with some of the methods that MongoDB provides for finding information about existing databases. This can help you understand the current state of the system before you begin making changes. +Before we begin creating new databases, it's helpful to get familiar with some of the methods that MongoDB provides for finding information about existing databases. This can help you understand the current state of the system before you begin making changes. To display all of the databases on the system that you have access to, use the `show dbs` method: ``` show dbs ``` + ``` admin 0.000GB config 0.000GB @@ -42,17 +43,19 @@ To see which database you are currently set to operate on, use the `db.getName() ``` db ``` + ``` test ``` -You may find that you are currently using a database that wasn't listed by the `show dbs` command. This is because in MongoDB, until you write the first document to the database, the database is not actually created. So, in the example output above, the shell is prepared to operate on a `test` database, but since it does not exist yet, it will not be returned by the `show dbs` command. +You may find that you are currently using a database that wasn't listed by the `show dbs` command. This is because in MongoDB, until you write the first document to the database, the database is not actually created. So, in the example output above, the shell is prepared to operate on a `test` database, but since it does not exist yet, it will not be returned by the `show dbs` command. To switch to a different database, you can use the `use` command: ``` use admin ``` + ``` switched to db admin ``` @@ -62,6 +65,7 @@ To get some basic information about your current database, you can use the `db.s ``` db.stats() ``` + ``` { "db" : "admin", @@ -86,7 +90,7 @@ The output shows information about the number of collections within the database ## How to create databases -MongoDB does not have an explicit command for creating a new database. Instead, as mentioned earlier, you have to instead indicate to MongoDB that you want to write new documents to a new database. When those documents are created, they will implicitly create the database. +MongoDB does not have an explicit command for creating a new database. Instead, as mentioned earlier, you have to instead indicate to MongoDB that you want to write new documents to a new database. When those documents are created, they will implicitly create the database. To prepare MongoDB to write to a new database, issue the `use` command to switch to a non-existent database. @@ -95,6 +99,7 @@ Here, we will set up MongoDB to create a new database called `playground`: ``` use playground ``` + ``` switched to db playground ``` @@ -104,6 +109,7 @@ If you check your current database, it will confirm that the `playground` databa ``` db ``` + ``` playground ``` @@ -113,6 +119,7 @@ However, as mentioned before, since we have not yet created any documents, the d ``` show dbs ``` + ``` admin 0.000GB config 0.000GB @@ -121,10 +128,9 @@ local 0.000GB To actually create the new database, we will need to create something first. - ## How to view the collections in a database -In MongoDB, *collections* are structures used to group documents together using whatever system of categorization you want to implement. They live inside databases and store documents. +In MongoDB, _collections_ are structures used to group documents together using whatever system of categorization you want to implement. They live inside databases and store documents. You can see the available collections in the database you're currently using by using the `show collections` method. @@ -134,6 +140,7 @@ Here, we'll switch to the `admin` database that has some collections available t use admin show collections ``` + ``` system.roles system.users @@ -145,6 +152,7 @@ Alternatively, you can retrieve the same collection names in an array using the ``` db.getCollectionNames() ``` + ``` [ "system.roles", "system.users", "system.version" ] ``` @@ -154,6 +162,7 @@ To show additional information about the collections in the current database, us ``` db.getCollectionInfos() ``` + ``` [ { @@ -213,7 +222,7 @@ db.getCollectionInfos() ] ``` -You can also optionally pass in a document to the command to filter the results. For example, if you are only interested in seeing the information about the `system.version` collection, you could type: +You can also optionally pass in a document to the command to filter the results. For example, if you are only interested in seeing the information about the `system.version` collection, you could type: ``` db.getCollectionInfos( @@ -222,6 +231,7 @@ db.getCollectionInfos( } ) ``` + ``` [ { @@ -245,11 +255,12 @@ db.getCollectionInfos( ] ``` -To check how many documents a collection contains, use the `db..count()` method. For instance, the following command checks how many documents are in the `system.users` collection: +To check how many documents a collection contains, use the `db..count()` method. For instance, the following command checks how many documents are in the `system.users` collection: ``` db.system.users.count() ``` + ``` 2 ``` @@ -266,20 +277,21 @@ The command may output more information than you can easily consume, but contain To create a new collection, there are two options: you can create collections either implicitly or explicitly. -As with databases, MongoDB can automatically create collections the first time a document is written to them. This method tells MongoDB to create a new collection by inserting a document into a collection that does not exist yet. +As with databases, MongoDB can automatically create collections the first time a document is written to them. This method tells MongoDB to create a new collection by inserting a document into a collection that does not exist yet. -For instance, we can change back to the `playground` database that we were interested in earlier. Once we are in that namespace, we can insert a new document into a collection by calling the `insert.()` command on the name we'd like to use for the new collection. Here, we can create a document about a slide in a new collection called `equipment`: +For instance, we can change back to the `playground` database that we were interested in earlier. Once we are in that namespace, we can insert a new document into a collection by calling the `insert.()` command on the name we'd like to use for the new collection. Here, we can create a document about a slide in a new collection called `equipment`: ``` use playground db.equipment.insert({name: "slide"}) ``` + ``` switched to db playground WriteResult({ "nInserted" : 1 }) ``` -The output indicates that one document was written. The above command performed three separate actions. First, MongoDB created the `playground` database that we'd referenced in our `use` command. It also created the `equipment` collection within the database since we call the `insert()` command on that collection name. Finally, it creates the actual document within the `equipment` collection using the input we provided to the `insert()` command. +The output indicates that one document was written. The above command performed three separate actions. First, MongoDB created the `playground` database that we'd referenced in our `use` command. It also created the `equipment` collection within the database since we call the `insert()` command on that collection name. Finally, it creates the actual document within the `equipment` collection using the input we provided to the `insert()` command. You can verify that all of these actions have been performed with the following commands: @@ -292,13 +304,14 @@ db.equipment.find() The output should show that the `playground` database is now among the listed databases, that the `equipment` collection is listed, that there is one document within the `equipement` collection, and that the document is the `{name: "slide"}` document we inserted in the command. -The other option to use to create collections is to explicitly use the `db.createCollection()` method. This allows you to create collections without adding any documents to them. +The other option to use to create collections is to explicitly use the `db.createCollection()` method. This allows you to create collections without adding any documents to them. For example, you can create a new collection in the `playground` database called `maintenance.requests` by typing: ``` db.createCollection("maintenance.requests") ``` + ``` { "ok" : 1 } ``` @@ -309,13 +322,14 @@ We can verify that the new collection shows up when we query for it, but that it show collections db.maintenance.requests.count() ``` + ``` equipment maintenance.requests 0 ``` -The `db.createCollection()` method is primarily useful because it allows you to specify various options upon creation. For example, we may want to create a create a *capped collection*, which is a collection that maintains an upper limit on its allocated size it stores by deleting its oldest document when full. +The `db.createCollection()` method is primarily useful because it allows you to specify various options upon creation. For example, we may want to create a create a _capped collection_, which is a collection that maintains an upper limit on its allocated size it stores by deleting its oldest document when full. To create a capped collection called `notifications` that can store, at most, 10240 bytes of information, you could call: @@ -328,6 +342,7 @@ db.createCollection( } ) ``` + ``` { "ok" : 1} ``` @@ -337,6 +352,7 @@ This will create a capped `notification` collection, which we can verify by typi ``` db.getCollecitonInfos({"options.capped": true}) ``` + ``` [ { @@ -361,7 +377,6 @@ db.getCollecitonInfos({"options.capped": true}) ] ``` - ## How to delete collections To delete a collection, you can use the `drop()` method on the collection itself. @@ -371,6 +386,7 @@ For example, to drop the capped `notifications` collection we created, you can t ``` db.notifications.drop() ``` + ``` true ``` @@ -380,6 +396,7 @@ You can verify that the operation was successful by listing the collections in t ``` show collections ``` + ``` equipment maintenance.requests @@ -387,12 +404,13 @@ maintenance.requests ## How to delete databases -To delete a whole database, call the `db.dropDatabase()` command. This will delete the current database, so be sure you are on the correct database before executing: +To delete a whole database, call the `db.dropDatabase()` command. This will delete the current database, so be sure you are on the correct database before executing: ``` use playground db.dropDatabase() ``` + ``` switched to db playground { "dropped" : "playground", "ok" : 1 } @@ -403,28 +421,30 @@ If you check the list of available databases, `playground` is no longer displaye ``` show dbs ``` + ``` admin 0.000GB config 0.000GB local 0.000GB ``` -Since we haven't switched to a new database yet, MongoDB is still set up to create a `playground` database should we choose to add a new collection or document. You can verify this with the `db` command: +Since we haven't switched to a new database yet, MongoDB is still set up to create a `playground` database should we choose to add a new collection or document. You can verify this with the `db` command: ``` db ``` + ``` playground ``` ## Conclusion -Creating and managing databases and collections is an important skill when using MongoDB. These basic organizational tools allow you to group related documents together, query subsets of information, and set up authorization policies for different types of data. Getting familiar with how to effectively manage these structures will allow you to manage your data more effectively with fewer surprises. +Creating and managing databases and collections is an important skill when using MongoDB. These basic organizational tools allow you to group related documents together, query subsets of information, and set up authorization policies for different types of data. Getting familiar with how to effectively manage these structures will allow you to manage your data more effectively with fewer surprises. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -448,11 +468,12 @@ To drop a database, the basic syntax looks like: use playground db.dropDatabase() ``` +
How do you rename a collection in MongoDB? -To rename a collection in MongoDB, you can use the [`renameCollection()`](https://docs.mongodb.com/manual/reference/method/db.collection.renameCollection/) method. +To rename a collection in MongoDB, you can use the [`renameCollection()`](https://docs.mongodb.com/manual/reference/method/db.collection.renameCollection/) method. The basic syntax would look like: @@ -480,6 +501,7 @@ Alternatively, you can also use the `db.getCollectionNames()` method to get the ``` db.getCollectionNames() ``` + ``` [ "system.roles", "system.users", "system.version" ] ``` diff --git a/content/08-mongodb/08-managing-documents.mdx b/content/08-mongodb/08-managing-documents.mdx index 682921d9..616c6e95 100644 --- a/content/08-mongodb/08-managing-documents.mdx +++ b/content/08-mongodb/08-managing-documents.mdx @@ -1,20 +1,20 @@ --- title: 'How to manage documents in MongoDB' -metaTitle: "MongoDB Documents - How to Delete, Update, Query, and More" -metaDescription: "Read on to learn how to create and manage documents within MongoDB, including how to delete, update, query, and more." +metaTitle: 'MongoDB Documents - How to Delete, Update, Query, and More' +metaDescription: 'Read on to learn how to create and manage documents within MongoDB, including how to delete, update, query, and more.' metaImage: '/social/generic-mongodb.png' authors: ['justinellingwood'] --- ## Introduction -When using [MongoDB](/intro/database-glossary#mongodb), you'll spend most of your time managing [documents](/intro/database-glossary#document) in some way or other. Whether you are creating new documents and adding them to collections, retrieving documents, updating data, or pruning stale items, documents are at the center of the MongoDB model. +When using [MongoDB](/intro/database-glossary#mongodb), you'll spend most of your time managing [documents](/intro/database-glossary#document) in some way or other. Whether you are creating new documents and adding them to collections, retrieving documents, updating data, or pruning stale items, documents are at the center of the MongoDB model. In this guide, we'll cover what MongoDB documents are and then cover the common operations you will likely need to know about to manage a [document-centered environment](/intro/database-glossary#document-database). -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -22,11 +22,11 @@ To get started working with MongoDB and Prisma, checkout our [getting started fr ## What are MongoDB documents? -In MongoDB, all data within databases and collections are stored in documents. Since collections do not specify a required [schema](/intro/database-glossary#schema) by default, documents within a collection can contain an arbitrarily complex structure and need not match the format used by sibling documents. This provides incredible flexibility and allows schema to develop organically as application requirements change. +In MongoDB, all data within databases and collections are stored in documents. Since collections do not specify a required [schema](/intro/database-glossary#schema) by default, documents within a collection can contain an arbitrarily complex structure and need not match the format used by sibling documents. This provides incredible flexibility and allows schema to develop organically as application requirements change. -MongoDB documents themselves use the [BSON data serialization format](https://bsonspec.org/), a binary representation of [the JSON JavaScript Object Notation](https://www.json.org/json-en.html). This provides an organized structure with defined data types that can be queried and operated upon programmatically. +MongoDB documents themselves use the [BSON data serialization format](https://bsonspec.org/), a binary representation of [the JSON JavaScript Object Notation](https://www.json.org/json-en.html). This provides an organized structure with defined data types that can be queried and operated upon programmatically. -BSON documents are represented by a pair of curly braces (`{}`) which contain key-value pairs. In BSON, these data couplets are known as the *field* and its *value*. The field comes first and is represented by a string. The value can be any valid [BSON data type](https://docs.mongodb.com/manual/reference/bson-types/). A colon (`:`) separates the field from its value. A comma is used to separate each field and value pair from one another. +BSON documents are represented by a pair of curly braces (`{}`) which contain key-value pairs. In BSON, these data couplets are known as the _field_ and its _value_. The field comes first and is represented by a string. The value can be any valid [BSON data type](https://docs.mongodb.com/manual/reference/bson-types/). A colon (`:`) separates the field from its value. A comma is used to separate each field and value pair from one another. As an example, here is a valid BSON document that MongoDB can understand: @@ -50,31 +50,31 @@ As an example, here is a valid BSON document that MongoDB can understand: Here, we can see quite a few types: -* `_id` is an integer -* `vehicle_type` and `color` are strings -* `mileage` is a float -* `markets` is an array of strings -* `options` contains a nested document with values consisting of a string, an integer, and a boolean +- `_id` is an integer +- `vehicle_type` and `color` are strings +- `mileage` is a float +- `markets` is an array of strings +- `options` contains a nested document with values consisting of a string, an integer, and a boolean -Due to this flexibility, documents are a fairly flexible medium for storing data. New fields can be added easily, documents can be embedded within one another, and the structural complexity exactly matches the data being stored. +Due to this flexibility, documents are a fairly flexible medium for storing data. New fields can be added easily, documents can be embedded within one another, and the structural complexity exactly matches the data being stored. ## How to create new documents -To create a new document, change to a database where you want to store the created document. We'll use a `school` database for demonstration purposes in this article: +To create a new document, change to a database where you want to store the created document. We'll use a `school` database for demonstration purposes in this article: ``` use school ``` -You'll also want to choose the collection where you want to insert the documents. As with databases, you do not have to explicitly create the collection where you want to insert the document. MongoDB will automatically create it when the first data is written. For this example, we'll use a collection called `students`. +You'll also want to choose the collection where you want to insert the documents. As with databases, you do not have to explicitly create the collection where you want to insert the document. MongoDB will automatically create it when the first data is written. For this example, we'll use a collection called `students`. Now that you know where the document will be stored, you can insert a new document using one of the following methods. -### Using the `insert()` method +### Using the `insert()` method The `insert()` method allows you to insert one or more documents into the collection it is called on. -To insert a single document, pass the document to the method by calling it on the collection. Here, we insert a new document for a student named Ashley: +To insert a single document, pass the document to the method by calling it on the collection. Here, we insert a new document for a student named Ashley: ``` db.students.insert( @@ -86,11 +86,12 @@ db.students.insert( } ) ``` + ``` WriteResult({ "nInserted" : 1 }) ``` -If you want to insert more than one document at the same time, instead of passing a document to `insert()`, pass an array of documents. We can add two new documents for students named Brian and Leah: +If you want to insert more than one document at the same time, instead of passing a document to `insert()`, pass an array of documents. We can add two new documents for students named Brian and Leah: ``` db.students.insert( @@ -109,6 +110,7 @@ db.students.insert( ] ) ``` + ``` BulkWriteResult({ "writeErrors" : [ ], @@ -126,11 +128,11 @@ Since we performed a bulk [write operation](/intro/database-glossary#write-opera While the `insert()` method is flexible, it has been deprecated in many MongoDB drivers in favor of the following two methods. -### Using the `insertOne()` method +### Using the `insertOne()` method -The `insertOne()` method can be used to insert a single document. Unlike the `insert()` method, it can only insert one document at a time, which makes its behavior a bit more predictable. +The `insertOne()` method can be used to insert a single document. Unlike the `insert()` method, it can only insert one document at a time, which makes its behavior a bit more predictable. -The syntax is the same as when you use `insert()` to add a single document. We can add another student named Naomi: +The syntax is the same as when you use `insert()` to add a single document. We can add another student named Naomi: ``` db.students.insertOne( @@ -140,6 +142,7 @@ db.students.insertOne( } ) ``` + ``` { "acknowledged" : true, @@ -147,11 +150,11 @@ db.students.insertOne( } ``` -Unlike with `insert()`, the `insertOne()` method returns a document containing some additional useful information. It confirms that the write was acknowledged by the cluster and it includes the object ID that was assigned to the document since we did not provide one. +Unlike with `insert()`, the `insertOne()` method returns a document containing some additional useful information. It confirms that the write was acknowledged by the cluster and it includes the object ID that was assigned to the document since we did not provide one. -### Using the `insertMany()` method +### Using the `insertMany()` method -To cover scenarios where you want to insert multiple documents at once, the `insertMany()` method is now recommended. Just as when inserting multiple documents with `insert()`, `insertMany()` takes an array of documents. +To cover scenarios where you want to insert multiple documents at once, the `insertMany()` method is now recommended. Just as when inserting multiple documents with `insert()`, `insertMany()` takes an array of documents. We can add three new students named Jasmine, Michael, and Toni: @@ -176,6 +179,7 @@ db.students.insertMany( ] ) ``` + ``` { "acknowledged" : true, @@ -191,15 +195,16 @@ As with `insertOne()`, `insertMany()` returns a document which acknowledges the ## How to query for existing documents -Querying documents is a fairly expansive topic that warrants its own article. You can find details about how to formulate queries to retrieve different types of documents in our guide on [querying data within MongoDB](/mongodb/querying-documents). +Querying documents is a fairly expansive topic that warrants its own article. You can find details about how to formulate queries to retrieve different types of documents in our guide on [querying data within MongoDB](/mongodb/querying-documents). -While the details are best left in the article linked above, we can at least cover the methods that MongoDB provides to query documents. The main way to fetch documents from MongoDB is by calling the `find()` method on the collection in question. +While the details are best left in the article linked above, we can at least cover the methods that MongoDB provides to query documents. The main way to fetch documents from MongoDB is by calling the `find()` method on the collection in question. For instance, to collect all of the documents from the `students`, you can call `find()` with no arguments: ``` db.students.find() ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } @@ -215,6 +220,7 @@ To make the output more readable, you can also chain the `pretty()` method after ``` db..find().pretty() ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), @@ -261,7 +267,7 @@ db..find().pretty() } ``` -You can see that an `_id` field has been added to each of the documents. MongoDB requires a unique `_id` for each document in a collection. If you do not provide one upon object creation, it will add one for you. You can use this ID to retrieve a single object reliably: +You can see that an `_id` field has been added to each of the documents. MongoDB requires a unique `_id` for each document in a collection. If you do not provide one upon object creation, it will add one for you. You can use this ID to retrieve a single object reliably: ``` db.students.find( @@ -270,6 +276,7 @@ db.students.find( } ) ``` + ``` { "_id" : ObjectId("60e8792d4655cbf49ff7cb89"), "first_name" : "Toni", "last_name" : "Fowler" } ``` @@ -278,13 +285,13 @@ You can find out more about various ways of querying data with the article linke ## How to update existing documents -Many or most use cases for databases require you to be able to modify existing data within the database. A field might need to be updated to reflect a new value or you may need to append additional information to an existing document as it becomes available. +Many or most use cases for databases require you to be able to modify existing data within the database. A field might need to be updated to reflect a new value or you may need to append additional information to an existing document as it becomes available. MongoDB uses a few related methods to update existing documents: -* `updateOne()`: Updates a single document within a collection based on the provided filter. -* `updateMany()`: Updates multiple documents within a collection that match the provided filter. -* `replaceOne()`: Replaces an entire document in a collection based on the provided filter. +- `updateOne()`: Updates a single document within a collection based on the provided filter. +- `updateMany()`: Updates multiple documents within a collection that match the provided filter. +- `replaceOne()`: Replaces an entire document in a collection based on the provided filter. We will cover how to use each of these varieties to perform different types of updates. @@ -292,63 +299,63 @@ We will cover how to use each of these varieties to perform different types of u Before we take a look at each of the methods to update documents, we should go over some of the update operators that are available. -* `$currentDate`: Sets a field's value to the current date, either as a date or timestamp type. - * Syntax: `{ $currentDate: { : , ... } }` -* `$inc`: Increments a field's value by a set amount. - * Syntax: `{ $inc: { : , ... } }` -* `$min`: Updates a field's value if the specified value is less than the current value. - * Syntax: `{ $min: { : , ... } }` -* `$max`: Updates a field's value if the specified value is more than the current value. - * Syntax: `{ $max: { : , ... } }` -* `$mul`: Updates a field's value by multiplying it by the given number. - * Syntax: `{ $mul: { : , ... } }` -* `$rename`: Renames a field name to a new identifier. - * Syntax: `{ $rename: { : , ... } }` -* `$set`: Replaces the value of a field with the given value. - * Syntax: `{ $set: { : value, ... } }` -* `$setOnInsert`: During upsert operations, sets the value of a field if a new document is being created and does nothing otherwise. - * Syntax: `{ $setOnInsert: { : , ... } }` -* `$unset`: Removes a field from the document. - * Syntax: `{ $unset: { : "", ... } }` -* `$`: A placeholder for the first array element that satisfies the query. - * Syntax: `{ : {.$: } }` -* `$[]`: A placeholder for all array elements that satisfy the query. - * Syntax: `{ : { .$[]: } }` -* `$addToSet`: Adds values to the array unless they're already present. - * Syntax: `{ $addToSet: { : , ... } }` -* `$pop`: Removes the first or last element of an array. - * Syntax: `{ $pop: { : (-1 or 1), ... } }` -* `$pull`: Removes all elements of an array that match a condition. - * Syntax: `{ $pull: { : , ... } }` -* `$push`: Appends a value to an array. - * Syntax: `{ $push: { : , ... } }` -* `$pullAll`: Removes all of the specified elements from an array. - * Syntax: `{ $pullAll: { : [ , ... ], ...} }` -* `$each`: Modifies `$addToSet` and `$push` operators so that they add each element of an array instead of an array as a single element. - * Syntax: `{ : { : { $each: [ , ... ] }, ... } }` -* `$position`: Used with `$each` and specifies the position the `$push` operator should insert at. - * Syntax: `{ $push: { : { $each: [ , ... ], $position: } } }` -* `$slice`: Used with `$each` and `$push` to limit the number of total elements in the array. - * Syntax: `{ $push: { : { $each: [ , ... ], $slice: } } }` -* `$sort`: Used with `$each` and `$push` to sort array elements. - * Syntax: `{ $push: { : { $each: [ , ... ], $sort: } } }` +- `$currentDate`: Sets a field's value to the current date, either as a date or timestamp type. + - Syntax: `{ $currentDate: { : , ... } }` +- `$inc`: Increments a field's value by a set amount. + - Syntax: `{ $inc: { : , ... } }` +- `$min`: Updates a field's value if the specified value is less than the current value. + - Syntax: `{ $min: { : , ... } }` +- `$max`: Updates a field's value if the specified value is more than the current value. + - Syntax: `{ $max: { : , ... } }` +- `$mul`: Updates a field's value by multiplying it by the given number. + - Syntax: `{ $mul: { : , ... } }` +- `$rename`: Renames a field name to a new identifier. + - Syntax: `{ $rename: { : , ... } }` +- `$set`: Replaces the value of a field with the given value. + - Syntax: `{ $set: { : value, ... } }` +- `$setOnInsert`: During upsert operations, sets the value of a field if a new document is being created and does nothing otherwise. + - Syntax: `{ $setOnInsert: { : , ... } }` +- `$unset`: Removes a field from the document. + - Syntax: `{ $unset: { : "", ... } }` +- `$`: A placeholder for the first array element that satisfies the query. + - Syntax: `{ : {.$: } }` +- `$[]`: A placeholder for all array elements that satisfy the query. + - Syntax: `{ : { .$[]: } }` +- `$addToSet`: Adds values to the array unless they're already present. + - Syntax: `{ $addToSet: { : , ... } }` +- `$pop`: Removes the first or last element of an array. + - Syntax: `{ $pop: { : (-1 or 1), ... } }` +- `$pull`: Removes all elements of an array that match a condition. + - Syntax: `{ $pull: { : , ... } }` +- `$push`: Appends a value to an array. + - Syntax: `{ $push: { : , ... } }` +- `$pullAll`: Removes all of the specified elements from an array. + - Syntax: `{ $pullAll: { : [ , ... ], ...} }` +- `$each`: Modifies `$addToSet` and `$push` operators so that they add each element of an array instead of an array as a single element. + - Syntax: `{ : { : { $each: [ , ... ] }, ... } }` +- `$position`: Used with `$each` and specifies the position the `$push` operator should insert at. + - Syntax: `{ $push: { : { $each: [ , ... ], $position: } } }` +- `$slice`: Used with `$each` and `$push` to limit the number of total elements in the array. + - Syntax: `{ $push: { : { $each: [ , ... ], $slice: } } }` +- `$sort`: Used with `$each` and `$push` to sort array elements. + - Syntax: `{ $push: { : { $each: [ , ... ], $sort: } } }` These various update operators allow you to update various fields of your documents in different ways. ### Updating a single document in a collection -MongoDB's `updateOne()` method is used to update a single document within a collection. The method takes two required arguments as well as a document specifying optional arguments. +MongoDB's `updateOne()` method is used to update a single document within a collection. The method takes two required arguments as well as a document specifying optional arguments. -The first argument is a document that specifies the filter conditions that will be used to select documents. Since the `updateOne()` method modifies at most one document in a collection, the first document that satisfies the filter conditions will be used. +The first argument is a document that specifies the filter conditions that will be used to select documents. Since the `updateOne()` method modifies at most one document in a collection, the first document that satisfies the filter conditions will be used. -The second argument specifies the update operation that should be executed. The update operations given above can be specified here to alter the contents of the matched document. +The second argument specifies the update operation that should be executed. The update operations given above can be specified here to alter the contents of the matched document. -The third argument is a document of various options to modify the behavior of the method. The most important potential values are: +The third argument is a document of various options to modify the behavior of the method. The most important potential values are: -* `upsert`: Turns the operation into an upsert procedure by inserting a new document if the filter does not match any existent documents. -* `collation`: A document that defines language-specific rules that should apply for the operation. +- `upsert`: Turns the operation into an upsert procedure by inserting a new document if the filter does not match any existent documents. +- `collation`: A document that defines language-specific rules that should apply for the operation. -As an example, we can update a single student record that we filter by the `_id` field to ensure that we target the correct document. We can set the `grade_level` to a new value: +As an example, we can update a single student record that we filter by the `_id` field to ensure that we target the correct document. We can set the `grade_level` to a new value: ``` db.students.updateOne( @@ -356,6 +363,7 @@ db.students.updateOne( { $set: { grade_level: 3 } } ) ``` + ``` { "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 } ``` @@ -374,6 +382,7 @@ db.teachers.updateMany( { $set: { "subjects.$": "writing" } } ) ``` + ``` { "acknowledged" : true, "matchedCount" : 3, "modifiedCount" : 3 } ``` @@ -383,6 +392,7 @@ If you check the documents, each instance of "composition" should have been repl ``` db.teachers.find() ``` + ``` { "_id" : ObjectId("60eddca65eb74f5c676f3baa"), "first_name" : "Nancy", "last_name" : "Smith", "subjects" : [ "vocabulary", "pronunciation" ] } { "_id" : ObjectId("60eddca65eb74f5c676f3bab"), "first_name" : "Ronald", "last_name" : "Taft", "subjects" : [ "literature", "grammar", "writing" ] } @@ -393,14 +403,14 @@ db.teachers.find() ### Replacing a document -The `replaceOne()` method works similar to the `updateOne()` method, but replaces the entire document instead of updating individual fields. The syntax is the same as the previous two commands. +The `replaceOne()` method works similar to the `updateOne()` method, but replaces the entire document instead of updating individual fields. The syntax is the same as the previous two commands. For instance, if Nancy Smith leaves your school and you replace her with a teacher named Clara Newman who teaches literature, you could type the following: ``` db.teachers.replaceOne( { - $and: [ + $and: [ { first_name: "Nancy" }, { last_name: "Smith" } ] @@ -412,6 +422,7 @@ db.teachers.replaceOne( } ) ``` + ``` { "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 } ``` @@ -421,6 +432,7 @@ You can see that the matched document has been removed and that the specified do ``` db.teachers.find() ``` + ``` { "_id" : ObjectId("60eddca65eb74f5c676f3baa"), "first_name" : "Clara", "last_name" : "Newman", "subjects" : [ "literature" ] } { "_id" : ObjectId("60eddca65eb74f5c676f3bab"), "first_name" : "Ronald", "last_name" : "Taft", "subjects" : [ "literature", "grammar", "writing" ] } @@ -431,9 +443,9 @@ db.teachers.find() ## How to delete documents -Removing documents from collections is also part of the document life cycle. To remove a document, you can use the `deleteOne()` or `deleteMany()` methods. They have the same syntax, and differ only in how many documents they operate on. +Removing documents from collections is also part of the document life cycle. To remove a document, you can use the `deleteOne()` or `deleteMany()` methods. They have the same syntax, and differ only in how many documents they operate on. -For the most part, all you have to do to delete documents with either of these methods is to provide it with a filter document that specifies how you wish to select the document to be deleted. The `deleteOne()` method will delete at most one document (regardless of how many matches the filter produces) while the `deleteMany()` method deletes every document that matches the filter conditions. +For the most part, all you have to do to delete documents with either of these methods is to provide it with a filter document that specifies how you wish to select the document to be deleted. The `deleteOne()` method will delete at most one document (regardless of how many matches the filter produces) while the `deleteMany()` method deletes every document that matches the filter conditions. For example, to delete a single student, you can provide an `_id` to match them explicitly: @@ -442,6 +454,7 @@ db.students.deleteOne({ _id: ObjectId("60e8792d4655cbf49ff7cb87") }) ``` + ``` { "acknowledged" : true, "deletedCount" : 1 } ``` @@ -453,6 +466,7 @@ db.students.deleteMany({ grade_level: { $eq: null } }) ``` + ``` { "acknowledged" : true, "deletedCount" : 2 } ``` @@ -462,6 +476,7 @@ If we check, we should see that all of the remaining students have a grade level ``` db.students.find() ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } @@ -471,11 +486,11 @@ db.students.find() ## Conclusion -Learning how to create, query for, update, and remove documents gives you the skills you need to effectively manage documents within MongoDB on a daily basis. Becoming familiar with the various document and collection methods and the operators that allow you to match and modify information lets you express complex thoughts that the database system can understand. +Learning how to create, query for, update, and remove documents gives you the skills you need to effectively manage documents within MongoDB on a daily basis. Becoming familiar with the various document and collection methods and the operators that allow you to match and modify information lets you express complex thoughts that the database system can understand. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -485,14 +500,14 @@ To get started working with MongoDB and Prisma, checkout our [getting started fr
What is an embedded document in MongoDB? -An [embedded, or nested, document](https://www.mongodb.com/basics/embedded-mongodb) in MongoDB is a document which contains another document inside of it. +An [embedded, or nested, document](https://www.mongodb.com/basics/embedded-mongodb) in MongoDB is a document which contains another document inside of it. The following is an example of an embedded document where `address` —denoted as a subdocument by additional curly brackets— can be accessed with the `user` record. ``` db.user.findOne({_id: 111111}) - -{ + +{ _id: 111111, email: “email@example.com”, name: {given: “Jane”, family: “Han”}, @@ -529,6 +544,7 @@ db.students.deleteOne({ _id: ObjectId("60e8792d4655cbf49ff7cb87") }) ``` + And to delete many documents matching a certain criteria, the syntax looks similarly: ``` @@ -553,8 +569,8 @@ There is not a specific method to explicitly compare one document to another in Comparison can also be done by configuring an [aggregation pipeline](https://docs.mongodb.com/manual/aggregation/). This method allows you to create stages that: -* group values from multiple documents together -* perform operations on the grouped data to return a single result -* analyze data changes over time +- group values from multiple documents together +- perform operations on the grouped data to return a single result +- analyze data changes over time
diff --git a/content/08-mongodb/09-querying-documents.mdx b/content/08-mongodb/09-querying-documents.mdx index e9cfc31b..f23aadaf 100644 --- a/content/08-mongodb/09-querying-documents.mdx +++ b/content/08-mongodb/09-querying-documents.mdx @@ -8,13 +8,13 @@ authors: ['justinellingwood'] ## Introduction -Querying [documents](/intro/database-glossary#document) is an essential skill necessary to do many different operations within [MongoDB](/intro/database-glossary#mongodb). You need to be able to query to effectively retrieve the documents you need, to update existing information within your databases, and to understand commonalities and differences between your documents. +Querying [documents](/intro/database-glossary#document) is an essential skill necessary to do many different operations within [MongoDB](/intro/database-glossary#mongodb). You need to be able to query to effectively retrieve the documents you need, to update existing information within your databases, and to understand commonalities and differences between your documents. -In this guide, we'll cover the basics of how to compose queries for MongoDB to help you retrieve documents according to your requirements. We will show you how queries work on a general level, then we will explore various operators that MongoDB provides to help you narrow down results by evaluating your conditions. +In this guide, we'll cover the basics of how to compose queries for MongoDB to help you retrieve documents according to your requirements. We will show you how queries work on a general level, then we will explore various operators that MongoDB provides to help you narrow down results by evaluating your conditions. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -121,13 +121,14 @@ db.teachers.insertMany([ ## Basic querying syntax -Now that you have two collections with documents in them, you can experiment with how to retrieve individual documents or groups of documents. The main way to fetch documents from MongoDB is by calling the `find()` method on the collection in question. +Now that you have two collections with documents in them, you can experiment with how to retrieve individual documents or groups of documents. The main way to fetch documents from MongoDB is by calling the `find()` method on the collection in question. For instance, to collect all of the documents from the `students` collection, you can call `find()` with no arguments: ``` db.students.find() ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } @@ -143,6 +144,7 @@ To make the output more readable, you can also chain the `pretty()` method after ``` db..find().pretty() ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), @@ -189,7 +191,7 @@ db..find().pretty() } ``` -You can see that an `_id` field has been added to each of the documents. MongoDB requires a unique `_id` for each document in a collection. If you do not provide one upon object creation, it will add one for you. You can use this ID to retrieve a single object reliably: +You can see that an `_id` field has been added to each of the documents. MongoDB requires a unique `_id` for each document in a collection. If you do not provide one upon object creation, it will add one for you. You can use this ID to retrieve a single object reliably: ``` db.student.find( @@ -198,6 +200,7 @@ db.student.find( } ) ``` + ``` { "_id" : ObjectId("60e8792d4655cbf49ff7cb89"), "first_name" : "Toni", "last_name" : "Fowler" } ``` @@ -211,11 +214,12 @@ For instance, you can get a list of the students named "Brian" with the followin ``` db.students.find({first_name: "Brian"}) ``` + ``` { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } ``` -Any qualities that you specify using the field-value notation will be interpreted as an equality query. If you give multiple fields, all of the values must be equal for a document to match. +Any qualities that you specify using the field-value notation will be interpreted as an equality query. If you give multiple fields, all of the values must be equal for a document to match. For instance, if we perform the same equality match as before, but include `grade_level` as 3, no documents will be returned: @@ -225,9 +229,9 @@ db.students.find({first_name: "Brian", grade_level: 3}) ## Filtering using comparison operators -While the simple equality filtering is useful, it is fairly limited in what it can express. For other types of comparisons, MongoDB provides various comparison operators so that you can query in other ways. +While the simple equality filtering is useful, it is fairly limited in what it can express. For other types of comparisons, MongoDB provides various comparison operators so that you can query in other ways. -The basic function of the available comparison operators is likely fairly familiar if you work with other programming languages. Most operators work by passing an object to the field name that contains the operator and the value you want to compare against, like this: +The basic function of the available comparison operators is likely fairly familiar if you work with other programming languages. Most operators work by passing an object to the field name that contains the operator and the value you want to compare against, like this: ``` : { : } @@ -235,7 +239,7 @@ The basic function of the available comparison operators is likely fairly famili ### Equal to -The `$eq` operator checks for equality between the value provided and the field values in the documents. In most cases, this has the same functionality as the equality comparisons we used above. +The `$eq` operator checks for equality between the value provided and the field values in the documents. In most cases, this has the same functionality as the equality comparisons we used above. For example, we can express the same query for students named "Brian" by typing: @@ -244,13 +248,14 @@ db.students.find({ first_name: { $eq: "Brian" } }) ``` + ``` { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } ``` ### Not equal to -You can also query for documents that are *not* equal to a provided value. The operator for this is `$ne`. +You can also query for documents that are _not_ equal to a provided value. The operator for this is `$ne`. For instance, one way to find all students who have a `grade_level` set is to search for entries where the field is not set to `null`: @@ -259,6 +264,7 @@ db.students.find({ grade_level: { $ne: null } }) ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } @@ -276,6 +282,7 @@ db.students.find({ grade_level: { $gt: 6 } }) ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } ``` @@ -291,6 +298,7 @@ db.students.find({ grade_level: { $gte: 6 } }) ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } { "_id" : ObjectId("60e8792d4655cbf49ff7cb88"), "first_name" : "Michael", "last_name" : "Rodgers", "dob" : ISODate("2008-02-25T00:00:00Z"), "grade_level" : 6 } @@ -307,6 +315,7 @@ db.students.find({ dob: { $lt: new Date("January 1, 2010") } }) ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } { "_id" : ObjectId("60e875d54655cbf49ff7cb85"), "first_name" : "Leah", "last_name" : "Drake", "dob" : ISODate("2009-10-03T00:00:00Z") } @@ -324,6 +333,7 @@ db.students.find({ grade_level: { $lte: 6 } }) ``` + ``` { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } { "_id" : ObjectId("60e8792d4655cbf49ff7cb88"), "first_name" : "Michael", "last_name" : "Rodgers", "dob" : ISODate("2008-02-25T00:00:00Z"), "grade_level" : 6 } @@ -331,9 +341,9 @@ db.students.find({ ### Match any of a group of values -The `$in` operator works like the `$eq` equality operator, but allows you to provide multiple possible values in an array. For instance, instead of checking whether a field value is equal to 8, it can check whether the value is any of `[8, 9, 10, 11]`. +The `$in` operator works like the `$eq` equality operator, but allows you to provide multiple possible values in an array. For instance, instead of checking whether a field value is equal to 8, it can check whether the value is any of `[8, 9, 10, 11]`. -The `$in` operator also works with regular expressions. For example, we can find all students whose first name ends with either an 'i' or an 'e' by typing: +The `$in` operator also works with regular expressions. For example, we can find all students whose first name ends with either an 'i' or an 'e' by typing: ``` db.students.find({ @@ -345,6 +355,7 @@ db.students.find({ } }) ``` + ``` { "_id" : ObjectId("60e877914655cbf49ff7cb86"), "first_name" : "Naomi", "last_name" : "Pyani" } { "_id" : ObjectId("60e8792d4655cbf49ff7cb87"), "first_name" : "Jasmine", "last_name" : "Took", "dob" : ISODate("2011-04-11T00:00:00Z") } @@ -353,9 +364,9 @@ db.students.find({ ### Match none of a group of values -The inverse of the above procedure is to find all documents that have values not in a given array. The operator for that is `$nin`. +The inverse of the above procedure is to find all documents that have values not in a given array. The operator for that is `$nin`. -For instance, we can find all students who have first names that *don't* end in 'i' or 'e' by typing: +For instance, we can find all students who have first names that _don't_ end in 'i' or 'e' by typing: ``` db.students.find({ @@ -367,6 +378,7 @@ db.students.find({ } }) ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } @@ -376,11 +388,11 @@ db.students.find({ ## Filtering using logical operators -To form more complex queries, you can compose multiple conditions using logical operators. Logical operators work by passing them an object of an expression or an array containing multiple objects of expressions. +To form more complex queries, you can compose multiple conditions using logical operators. Logical operators work by passing them an object of an expression or an array containing multiple objects of expressions. ### The logical AND operator -The `$and` operator will return results that satisfy all of the expressions that have been passed to it. Every expression within the `$and` expression must evaluate to true in order to be returned. +The `$and` operator will return results that satisfy all of the expressions that have been passed to it. Every expression within the `$and` expression must evaluate to true in order to be returned. For example, you can use `$and` to query for students that have both a birth date and a grade level set: @@ -392,6 +404,7 @@ db.students.find({ ] }) ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } @@ -400,7 +413,7 @@ db.students.find({ ### The logical OR operator -The `$or` operator performs a logical OR calculation. If *any* of the expressions that are passed to it are true, the entire clause is considered satisfied. +The `$or` operator performs a logical OR calculation. If _any_ of the expressions that are passed to it are true, the entire clause is considered satisfied. You can use this, for example, to query students who are missing either of the fields we queried for above: @@ -412,6 +425,7 @@ db.students.find({ ] }) ``` + ``` { "_id" : ObjectId("60e875d54655cbf49ff7cb85"), "first_name" : "Leah", "last_name" : "Drake", "dob" : ISODate("2009-10-03T00:00:00Z") } { "_id" : ObjectId("60e877914655cbf49ff7cb86"), "first_name" : "Naomi", "last_name" : "Pyani" } @@ -421,21 +435,22 @@ db.students.find({ ### The logical NOT operator -The `$not` operator negates the value of the expression that is passed to it. Instead of operating on an array of expressions, since `$not` is a unary operator, it operates on a single single defining an operator expression directly. +The `$not` operator negates the value of the expression that is passed to it. Instead of operating on an array of expressions, since `$not` is a unary operator, it operates on a single single defining an operator expression directly. -This leads to a slightly different syntax than the previous operators. Instead of wrapping a full field and value expression, you use `$not` as part of the value of the field match and it takes only an *operator expression* as its argument rather than a full expression (the field name is outside of the `$not` expression instead of inside of it). +This leads to a slightly different syntax than the previous operators. Instead of wrapping a full field and value expression, you use `$not` as part of the value of the field match and it takes only an _operator expression_ as its argument rather than a full expression (the field name is outside of the `$not` expression instead of inside of it). -For instance, we can find all students who *do not* have a birthday before 2010 by typing. This differs from checking for `dob` entries that are less than 2010 because it also returns any documents that do not have that field set at all: +For instance, we can find all students who _do not_ have a birthday before 2010 by typing. This differs from checking for `dob` entries that are less than 2010 because it also returns any documents that do not have that field set at all: ``` db.students.find({ dob: { - $not: { + $not: { $lt: new Date("January 1, 2010") } } }) ``` + ``` { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } { "_id" : ObjectId("60e877914655cbf49ff7cb86"), "first_name" : "Naomi", "last_name" : "Pyani" } @@ -445,7 +460,7 @@ db.students.find({ ### The logical NOR operator -The `$nor` operator takes an array of objects and returns documents that do not match *any* of the conditions specified in those objects. Only documents that fail all of the conditions will be returned. +The `$nor` operator takes an array of objects and returns documents that do not match _any_ of the conditions specified in those objects. Only documents that fail all of the conditions will be returned. For example, if you want to retrieve documents of students who are not in grade 6 who also do not have a last name that ends in 's', you could type: @@ -457,6 +472,7 @@ db.students.find({ ] }) ``` + ``` { "_id" : ObjectId("60e875d54655cbf49ff7cb85"), "first_name" : "Leah", "last_name" : "Drake", "dob" : ISODate("2009-10-03T00:00:00Z") } { "_id" : ObjectId("60e877914655cbf49ff7cb86"), "first_name" : "Naomi", "last_name" : "Pyani" } @@ -468,7 +484,7 @@ db.students.find({ Some other ways to test are based on the state of a field or value. -For instance, the `$exists` filter checks for the existence of a field within a document. You can set `$exists` to `true` or `false` to determine which documents to retrieve. +For instance, the `$exists` filter checks for the existence of a field within a document. You can set `$exists` to `true` or `false` to determine which documents to retrieve. For instance, if you wanted to find student documents that have a grade level, you can type: @@ -477,6 +493,7 @@ db.students.find({ grade_level: { $exists: true } }) ``` + ``` { "_id" : ObjectId("60e8743b4655cbf49ff7cb83"), "first_name" : "Ashley", "last_name" : "Jenkins", "dob" : ISODate("2003-01-08T00:00:00Z"), "grade_level" : 8 } { "_id" : ObjectId("60e875d54655cbf49ff7cb84"), "first_name" : "Brian", "last_name" : "McMantis", "dob" : ISODate("2010-09-18T00:00:00Z"), "grade_level" : 2 } @@ -485,11 +502,11 @@ db.students.find({ ## Filtering based on array characteristics -You can also query documents through the arrays they hold. There are a number of operators that can be used to match based on array elements or other qualities. +You can also query documents through the arrays they hold. There are a number of operators that can be used to match based on array elements or other qualities. ### Specifying required elements -The `$all` operator returns documents that have an array containing *all* of the elements given. +The `$all` operator returns documents that have an array containing _all_ of the elements given. For example, if you want to retrieve only teachers that teach both composition and grammar, you could type: @@ -500,6 +517,7 @@ db.teachers.find({ } }) ``` + ``` { "_id" : ObjectId("60eddca65eb74f5c676f3bab"), "first_name" : "Ronald", "last_name" : "Taft", "subjects" : [ "literature", "grammar", "composition" ] } { "_id" : ObjectId("60eddca65eb74f5c676f3bac"), "first_name" : "Casey", "last_name" : "Meyers", "subjects" : [ "literature", "composition", "grammar" ] } @@ -522,6 +540,7 @@ db.teachers.find({ } }) ``` + ``` { "_id" : ObjectId("60eddca65eb74f5c676f3baa"), "first_name" : "Nancy", "last_name" : "Smith", "subjects" : [ "vocabulary", "pronunciation" ] } { "_id" : ObjectId("60eddca65eb74f5c676f3bae"), "first_name" : "Sophie", "last_name" : "Daggs", "subjects" : [ "literature", "composition", "grammar", "vocabulary", "pronunciation" ] } @@ -531,13 +550,14 @@ Both of the teachers who teach "pronunciation" are listed here, as that's the on ### Querying by array size -Finally, you can use the `$size` operator to query for documents of a certain size. For instance, to find all of the teachers who teach three subjects, type: +Finally, you can use the `$size` operator to query for documents of a certain size. For instance, to find all of the teachers who teach three subjects, type: ``` db.teachers.find({ subjects: { $size: 3 } }) ``` + ``` { "_id" : ObjectId("60eddca65eb74f5c676f3bab"), "first_name" : "Ronald", "last_name" : "Taft", "subjects" : [ "literature", "grammar", "composition" ] } { "_id" : ObjectId("60eddca65eb74f5c676f3bac"), "first_name" : "Casey", "last_name" : "Meyers", "subjects" : [ "literature", "composition", "grammar" ] } @@ -545,13 +565,13 @@ db.teachers.find({ ## Conclusion -In this guide, we've covered how to query for documents with MongoDB databases. We covered the basic way that the `find()` method works and how to make its output more readable. Afterwards, we took a look at many of the operators that MongoDB provides to specify the exact parameters of the documents you are interested in. +In this guide, we've covered how to query for documents with MongoDB databases. We covered the basic way that the `find()` method works and how to make its output more readable. Afterwards, we took a look at many of the operators that MongoDB provides to specify the exact parameters of the documents you are interested in. -Understanding how to compose queries to narrow down results and pick out documents that match your specifications is important both when reading and updating data. By getting familiar with the various ways that operators can be chained together, you can express complex requirements that match different types of documents. +Understanding how to compose queries to narrow down results and pick out documents that match your specifications is important both when reading and updating data. By getting familiar with the various ways that operators can be chained together, you can express complex requirements that match different types of documents. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -565,13 +585,13 @@ You can use the `$gt` operator within a find statement to find documents with a The basic syntax looks like the following: - db.collection.find( { : { $gt:ISODate('Date here') } } ) + db.collection.find( { : { $gt:ISODate('Date here') } } )
What is the MongoDB query profiler? -[The MongoDB Database query profiler](https://www.mongodb.com/docs/manual/tutorial/manage-the-database-profiler/) is a tool that collects detailed information about database commands executed against a running `mongod` instance. +[The MongoDB Database query profiler](https://www.mongodb.com/docs/manual/tutorial/manage-the-database-profiler/) is a tool that collects detailed information about database commands executed against a running `mongod` instance. This includes CRUD operations as well as configuration and administrative commands. This can be particularly useful when trying to sort for slow operations. @@ -583,7 +603,7 @@ To query for how long the length of a string is, you can use the `$strLenCP` ope The basic syntax looks as follows: - { $strLenCP: "Hello World!" } + { $strLenCP: "Hello World!" } This particular string will return a value of `12`. @@ -591,11 +611,11 @@ This particular string will return a value of `12`.
What is the MongoDB query to find distinct values? -To query for only unique values of a field within a greater collection, you can use the `distinct()` method. +To query for only unique values of a field within a greater collection, you can use the `distinct()` method. The basic syntax looks like: - db.collection.distinct("") + db.collection.distinct("") This returns all of the unique values within the collection for a particular field with no repetition. @@ -607,6 +627,6 @@ You can export your database contents to JSON with the `mongoexport` command lin The basic syntax looks as follows where we specify the output of the collection export to be `json`: - mongoexport --collection=events --db=reporting --out=events.json + mongoexport --collection=events --db=reporting --out=events.json
diff --git a/content/08-mongodb/10-mongodb-datatypes.mdx b/content/08-mongodb/10-mongodb-datatypes.mdx index 26ee5892..f4817c29 100644 --- a/content/08-mongodb/10-mongodb-datatypes.mdx +++ b/content/08-mongodb/10-mongodb-datatypes.mdx @@ -8,22 +8,22 @@ authors: ['alexemerich'] ## Introduction -When using [MongoDB](/intro/database-glossary#mongodb), you have the ability to be flexible with the structure of your data. You are not locked into maintaining a certain [schema](/intro/database-glossary#schema) that all of your [documents](/intro/database-glossary#document) must fit into. For any given field in a document, you are able to use any of the available [*data types*](/intro/database-glossary#data-type) supported by MongoDB. Despite this default way of working, you are able to impose a [JSON Schema](https://docs.mongodb.com/manual/core/schema-validation/) in MongoDB to add validation on your collections if desired. We won't go into the details of schema design in this guide, but it can have an effect on data typing if implemented. +When using [MongoDB](/intro/database-glossary#mongodb), you have the ability to be flexible with the structure of your data. You are not locked into maintaining a certain [schema](/intro/database-glossary#schema) that all of your [documents](/intro/database-glossary#document) must fit into. For any given field in a document, you are able to use any of the available [_data types_](/intro/database-glossary#data-type) supported by MongoDB. Despite this default way of working, you are able to impose a [JSON Schema](https://docs.mongodb.com/manual/core/schema-validation/) in MongoDB to add validation on your collections if desired. We won't go into the details of schema design in this guide, but it can have an effect on data typing if implemented. -Data types specify a general pattern for the data they accept and store. It is paramount to understand when to choose a certain data type over another when planning your database. The type chosen is going to dictate how you're able to operate on your data and how it is stored. +Data types specify a general pattern for the data they accept and store. It is paramount to understand when to choose a certain data type over another when planning your database. The type chosen is going to dictate how you're able to operate on your data and how it is stored. ## JSON and BSON -Before getting into the details of specific data types, it is important to have an understanding of how MongoDB stores data. MongoDB and many other [document-based NoSQL databases](/intro/database-glossary#document-database) use [JSON](https://www.mongodb.com/json-and-bson) (JavaScript Object Notation) to represent data records as [documents](https://www.prisma.io/dataguide/mongodb/managing-documents). +Before getting into the details of specific data types, it is important to have an understanding of how MongoDB stores data. MongoDB and many other [document-based NoSQL databases](/intro/database-glossary#document-database) use [JSON](https://www.mongodb.com/json-and-bson) (JavaScript Object Notation) to represent data records as [documents](https://www.prisma.io/dataguide/mongodb/managing-documents). There are many advantages to using JSON to store data. Some of them being: -* easiness to read, learn, and its familiarity among developers -* flexibility in format, whether sparse, hierarchical, or deeply nested -* self-describing, which allows for applications to easily operate with JSON data -* allows for the focus on a minimal number of basic types +- easiness to read, learn, and its familiarity among developers +- flexibility in format, whether sparse, hierarchical, or deeply nested +- self-describing, which allows for applications to easily operate with JSON data +- allows for the focus on a minimal number of basic types -JSON supports all the basic data types like string, number, boolean, etc. MongoDB actually stores data records as Binary-encoded JSON ([BSON](https://bsonspec.org/)) documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON allows for additional data types that are not available to JSON. +JSON supports all the basic data types like string, number, boolean, etc. MongoDB actually stores data records as Binary-encoded JSON ([BSON](https://bsonspec.org/)) documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON allows for additional data types that are not available to JSON. ## What are the data types in MongoDB? @@ -33,38 +33,38 @@ MongoDB supports a range of data types suitable for various types of both simple Text -* `String` +- `String` Numeric -* `32-Bit Integer` -* `64-Bit Integer` -* `Double` -* `Decimal128` +- `32-Bit Integer` +- `64-Bit Integer` +- `Double` +- `Decimal128` Date/Time -* `Date` -* `Timestamp` +- `Date` +- `Timestamp` Other -* `Object` -* `Array` -* `Binary Data` -* `ObjectId` -* `Boolean` -* `Null` -* `Regular Expression` -* `JavaScript` -* `Min Key` -* `Max Key` +- `Object` +- `Array` +- `Binary Data` +- `ObjectId` +- `Boolean` +- `Null` +- `Regular Expression` +- `JavaScript` +- `Min Key` +- `Max Key` In MongoDB, each BSON type has both an integer and string identifiers. We'll cover the most common of these in more depth throughout this guide. -With MongoDB's document model, you are able to store data in embedded documents. Check out how you can find, create, update, and delete [composite types with the Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client/composite-types), and model your data by [defining fields in your Prisma Schema](https://www.prisma.io/docs/concepts/components/prisma-schema/data-model#defining-fields). +With MongoDB's document model, you are able to store data in embedded documents. Check out how you can find, create, update, and delete [composite types with the Prisma Client](https://www.prisma.io/docs/orm/prisma-client/special-fields-and-types/composite-types), and model your data by [defining fields in your Prisma Schema](https://www.prisma.io/docs/orm/prisma-schema/data-model/models#defining-fields). @@ -78,7 +78,7 @@ The string type is the most commonly used MongoDB data type. Any value written i | String | 2 | "string" | ``` -Generally, drivers for programming languages will convert from the language's string format to UTF-8 when serializing and deserializing BSON. This makes BSON an attractive method for storing international characters with ease for example. +Generally, drivers for programming languages will convert from the language's string format to UTF-8 when serializing and deserializing BSON. This makes BSON an attractive method for storing international characters with ease for example. Inserting a document with a `String` data type will look something like this: @@ -95,7 +95,7 @@ Querying the collection will return the following: ```shell db.mytestcoll.find().pretty() { - _id: ObjectId("614b37296a124db40ae74d15"), + _id: ObjectId("614b37296a124db40ae74d15"), first_name: "Alex" } ``` @@ -138,19 +138,17 @@ Because we did not insert any `Null` type values into our collection, the result You can use the same method with all of the following types that we will discuss. - - ## Numbers and numeric values -MongoDB includes a range of numeric data types suitable for different scenarios. Deciding which type to use depends on the nature of the values you plan to store and your use cases for the data. JSON calls anything with numbers a *Number*. That forces the system to figure out how to turn it into the nearest native data type. We'll start off by exploring integers and how they work in MongoDB. +MongoDB includes a range of numeric data types suitable for different scenarios. Deciding which type to use depends on the nature of the values you plan to store and your use cases for the data. JSON calls anything with numbers a _Number_. That forces the system to figure out how to turn it into the nearest native data type. We'll start off by exploring integers and how they work in MongoDB. ### Integer The `Integer` data type is used to store numbers as whole numbers without any fractions or decimals. Integers can either be positive or negative values. There are two types in MongoDB, `32-Bit Integer` and `64-Bit Integer`. They can be represented in the two ways depicted in the table below, `number` and `alias`: ``` -| Integer type | number | alias | -| ------------ | ----- | ------------ | +| Integer type | number | alias | +| ------------ | ----- | ------------ | |`32-bit integer`| 16 | "int" | |`64-bit integer`| 18 | "long" | ``` @@ -161,7 +159,7 @@ The ranges a value can fit into for each type are the following: | Integer type | Applicable signed range | Applicable unsigned range | | ------------ | ------------------------------ | ------------------------------- | |`32-bit integer`| -2,147,483,648 to 2,147,483,647| 0 to 4,294,967,295 | -|`64-bit integer`| -9,223,372,036,854,775,808 to | 0 to 18,446,744,073,709,551,615 +|`64-bit integer`| -9,223,372,036,854,775,808 to | 0 to 18,446,744,073,709,551,615 9,223,372,036,854,775,807 ``` @@ -188,7 +186,7 @@ As suggested by the names, a `32-Bit Integer` has 32 bits of integer precision w ### Double -In BSON, the default replacement for JSON's *Number* is the `Double` data type. The `Double` data type is used to store a floating-point value and can be represented in MongoDB like so: +In BSON, the default replacement for JSON's _Number_ is the `Double` data type. The `Double` data type is used to store a floating-point value and can be represented in MongoDB like so: ``` | Type | Number | Alias | @@ -196,7 +194,7 @@ In BSON, the default replacement for JSON's *Number* is the `Double` data type. | Double | 1 | "double" | ``` -Floating point numbers are another way to express decimal numbers, but without exact, consistent precision. +Floating point numbers are another way to express decimal numbers, but without exact, consistent precision. Floating point numbers can work with a large number of decimals efficiently but not always exactly. The following is an example of entering a document with the `Double` type into your collection: @@ -208,7 +206,7 @@ db.mytestcoll.insertOne({testScore: 89.6}) } ``` -There can be slight differences between input and output when calculating with doubles that could potentially lead to unexpected behavior. When performing operations that require exact values MongoDB has a more precise type. +There can be slight differences between input and output when calculating with doubles that could potentially lead to unexpected behavior. When performing operations that require exact values MongoDB has a more precise type. ### Decimal128 @@ -222,7 +220,7 @@ If you are working with very large numbers with lots of floating point range, th The BSON type, `Decimal128`, provides 128 bits of decimal representation for storing numbers where rounding decimals exactly is important. `Decimal128` supports 34 decimal digits of precision, or a [sinificand](https://en.wikipedia.org/wiki/Significand) with a range of -6143 to +6144. This allows for a high amount of precision. -Inserting a value using the `Decimal128` data type requires using the [`NumberDecimal()`](https://docs.mongodb.com/manual/core/shell-types/#std-label-shell-type-decimal) constructor with your number as a `String` to keep MongoDB from using the default numeric type, `Double`. +Inserting a value using the `Decimal128` data type requires using the [`NumberDecimal()`](https://docs.mongodb.com/manual/core/shell-types/#std-label-shell-type-decimal) constructor with your number as a `String` to keep MongoDB from using the default numeric type, `Double`. Here, we demonstrate this: @@ -239,15 +237,13 @@ When querying the collection, you then get the following return: ```shell db.mytestcoll.find().pretty() { - _id: ObjectId("614b37296a124db40ae74d12"), - price: "5.099" + _id: ObjectId("614b37296a124db40ae74d12"), + price: "5.099" } ``` The numeric value maintains its precision allowing for exact operations. To demonstrate the `Decimal128` type versus the `Double`, we can go through the following exercise. - - ### How precision can be lost based on data type Say we want to insert a number with many decimal values as a `Double` into MongoDB with the following: @@ -265,7 +261,7 @@ When we query for this data, we get the following result: ```shell db.mytestcoll.find().pretty() { - _id: ObjectId("614b37296a124db40ae74d24"), + _id: ObjectId("614b37296a124db40ae74d24"), price: 9999999.5 } ``` @@ -287,18 +283,18 @@ db.mytestcoll.insertOne({ price: NumberDecimal( 9999999.4999999999 ) }) **Note**: When making this insertion in the MongoDB shell, the following warning message is displayed: ``` -Warning: NumberDecimal: specifying a number as argument is deprecated and may lead to +Warning: NumberDecimal: specifying a number as argument is deprecated and may lead to loss of precision, pass a string instead ``` -This warning message indicates that the number you are trying to pass could be subject to a loss of precision. They suggest to use a `String` using `NumberDecimal()` so that you do not lose any precision. +This warning message indicates that the number you are trying to pass could be subject to a loss of precision. They suggest to use a `String` using `NumberDecimal()` so that you do not lose any precision. If we ignore the warning and insert the document anyways, the loss of precision is seen in the query results from the rounding up of the value: ```shell db.mytestcoll.find().pretty() { - _id: ObjectId("614b37296a124db40ae74d14"), + _id: ObjectId("614b37296a124db40ae74d14"), price: Decimal128("9999999.50000000") } ``` @@ -312,15 +308,13 @@ db.mytestcoll.insertOne({ price: NumberDecimal( "9999999.4999999999" ) } ) ```shell db.mytestcoll.find().pretty() { - _id: ObjectId("614b37296a124db40ae74d14"), + _id: ObjectId("614b37296a124db40ae74d14"), price: Decimal128("9999999.4999999999") } ``` For any use case requiring precise, exact values, this return could cause issues. Any work involving monetary operations is an example where precision is going to be extremely important and having exact values is critical to accurate calculations. This demonstration highlights the importance of knowing which numeric data type is going to be best suited for your data. - - ## Date The BSON `Date` data type is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This data type stores the current date or time and can be returned as either a date object or as a string. `Date` is represented in MongoDB as follows: @@ -333,11 +327,9 @@ The BSON `Date` data type is a 64-bit integer that represents the number of mill **Note**: BSON `Date` type is signed. Negative values represent dates before 1970. +There are three methods for returning date values. - -There are three methods for returning date values. - -1. `Date()` - returns a string +1. `Date()` - returns a string 2. `new Date()` - returns a date object using the `ISODate()` wrapper @@ -369,14 +361,12 @@ db.mytestcoll.find().pretty() } ``` - - ## Timestamp -There is also the `Timestamp` data type in MongoDB for representing time. However, `Timestamp` is going to be most useful for internal use and ***is not*** associated with the `Date` type. The type itself is a sequence of characters used to describe the date and time when an event occurs. `Timestamp` is a 64 bit value where: +There is also the `Timestamp` data type in MongoDB for representing time. However, `Timestamp` is going to be most useful for internal use and **_is not_** associated with the `Date` type. The type itself is a sequence of characters used to describe the date and time when an event occurs. `Timestamp` is a 64 bit value where: -* the most significant 32 bits are `time_t` value (seconds since Unix epoch) -* the least significant 32 bits are an incrementing `ordinal` for operations within a given second +- the most significant 32 bits are `time_t` value (seconds since Unix epoch) +- the least significant 32 bits are an incrementing `ordinal` for operations within a given second Its representation in MongoDB will look as follows: @@ -386,7 +376,7 @@ Its representation in MongoDB will look as follows: | Timestamp | 17 | "timestamp" | ``` -When inserting a document that contains top-level fields with empty timestamps, MongoDB will replace the empty timestamp value with the current timestamp value. The exception to this is if the `_id` field contains an empty timestamp. The timestamp value will always be inserted as is and not replaced. +When inserting a document that contains top-level fields with empty timestamps, MongoDB will replace the empty timestamp value with the current timestamp value. The exception to this is if the `_id` field contains an empty timestamp. The timestamp value will always be inserted as is and not replaced. Inserting a new `Timestamp` value in MongoDB will use the `new Timestamp()` function and look something like this: @@ -403,16 +393,14 @@ When querying the collection, you'll return a result resembling: ```shell db.mytestcoll.find().pretty() { - "_id" : ObjectId("614b37296a124db40ae74d24"), + "_id" : ObjectId("614b37296a124db40ae74d24"), "ts" : Timestamp( { t: 1412180887, i: 1 }) } ``` - - ## Object -The `Object` data type in MongoDB is used for storing embedded documents. An embedded document is a series of nested documents in ``key: value`` pair format. We demonstrate the `Object` type below: +The `Object` data type in MongoDB is used for storing embedded documents. An embedded document is a series of nested documents in `key: value` pair format. We demonstrate the `Object` type below: ```shell var classGrades = {"Physics": 88, "German": 92, "LitTheoery": 79} @@ -438,8 +426,6 @@ db.mytestcoll.find().pretty() The `Object` data type optimizes for storing data that is best accessed together. It provides some efficiencies around storage, speed, and durability as opposed to storing each class mark, from the above example, separately. - - ## Binary data The `Binary data`, or `BinData`, data type does exactly what its name implies and stores binary data for a field's value. `BinData` is best used when you are storing and searching data, because of its efficiency in representing bit arrays. This data type can be represented in the following ways: @@ -471,8 +457,6 @@ db.mytestcoll.find().pretty() } ``` - - ## ObjectId The `ObjectId` type is specific to MongoDB and it stores the document's unique ID. MongoDB provides an `_id` field for every document. ObjectId is 12 bytes in size and can be represented as follows: @@ -485,34 +469,30 @@ The `ObjectId` type is specific to MongoDB and it stores the document's unique I ObjectId consists of three parts that make up its 12-byte makeup: -* a 4-byte *timestamp value*, representing the ObjectId's creation, measured in seconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time) -* a 5-byte *random value* -* a 3-byte *incrementing counter* initialized to a random value +- a 4-byte _timestamp value_, representing the ObjectId's creation, measured in seconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time) +- a 5-byte _random value_ +- a 3-byte _incrementing counter_ initialized to a random value In MongoDB, each document within a collection requires a unique `_id` to act as a primary key. If the `_id` field is left empty for an inserted document, MongoDB will automatically generate an ObjectId for the field. There are several benefits to using ObjectIds for the `_id`: -* in [`mongosh`](https://docs.mongodb.com/mongodb-shell/#mongodb-binary-bin.mongosh) (MongoDB shell), the creation time of the `ObjectId` is accessible using the [`ObjectId.getTimestamp()`](https://docs.mongodb.com/manual/reference/method/ObjectId.getTimestamp/#mongodb-method-ObjectId.getTimestamp) method. -* sorting on an `_id` field that stores `ObjectId` data types is a close equivalent to sorting by creation time. +- in [`mongosh`](https://docs.mongodb.com/mongodb-shell/#mongodb-binary-bin.mongosh) (MongoDB shell), the creation time of the `ObjectId` is accessible using the [`ObjectId.getTimestamp()`](https://docs.mongodb.com/manual/reference/method/ObjectId.getTimestamp/#mongodb-method-ObjectId.getTimestamp) method. +- sorting on an `_id` field that stores `ObjectId` data types is a close equivalent to sorting by creation time. We've seen ObjectIds throughout the examples so far, and they will look similar to this: ```shell db.mytestcoll.find().pretty() -{ +{ _id: ObjectId("614b37296a124db40ae74d19") } ``` - - **Note**: ObjectId values should increase over time, however they are not necessarily monotonic. This is because they: -* Only contain one second of temporal resolution, so values created within the same second do not have guaranteed ordering -* values are generated by clients, which may have differing system clocks - - +- Only contain one second of temporal resolution, so values created within the same second do not have guaranteed ordering +- values are generated by clients, which may have differing system clocks ## Boolean @@ -545,8 +525,6 @@ db.mytestcoll.find().pretty() } ``` - - ## Regular Expression The `Regular Expression` data type in MongoDB allows for the storage of regular expressions as the value of a field. MongoDB uses [PCRE (Perl Compatible Regular Expression)](https://www.pcre.org/) as its regular expression language. @@ -559,7 +537,7 @@ Its can be represented in the following way: | Regular Expression | 11 | "regex" | ``` -BSON allows you to avoid the typical "convert from string" step that is commonly experienced when working with regular expressions and databases. This type is going to be most useful when you are writing database objects that require validation patterns or matching triggers. +BSON allows you to avoid the typical "convert from string" step that is commonly experienced when working with regular expressions and databases. This type is going to be most useful when you are writing database objects that require validation patterns or matching triggers. For instance, you can insert the `Regular Expression` data type like this: @@ -582,14 +560,12 @@ This sequence of statements will add these documents to your collection. You are db.mytestcoll.find().pretty() { _id: ObjectId("614b37296a124db40ae74d16"), exampleregex: /tt/, - _id: ObjectId("614b37296a124db40ae74d17"), exampleregex: /t+/ + _id: ObjectId("614b37296a124db40ae74d17"), exampleregex: /t+/ } ``` The regular expression patterns are stored as regex and not as strings. This allows you to query for a particular string and get returned documents that have a regular expression matching the desired string. - - ## JavaScript (without scope) Much like the previously mentioned `Regular Expression` data type, BSON allows for MongoDB to store JavaScript functions without scope as their own type. The `JavaScript` type can be recognized as follows: @@ -614,19 +590,17 @@ This functionality allows you to store JavaScript functions inside of your Mongo **Note**: With [MongoDB version 4.4](https://docs.mongodb.com/manual/release-notes/4.4-compatibility/#remove-support-for-bson-type-javascript-code-with-scope) and above, an alternative JavaScript type, the `JavaScript with Scope` data type, has been deprecated - - ## Conclusion In this article, we've covered most of the common data types that are useful when working with MongoDB databases. There are [additional types](https://docs.mongodb.com/manual/reference/bson-types/) that are not explicitly covered in this guide that may be helpful depending on the use case. Getting started by knowing these types covers most use cases. It is a strong foundation to begin modelling your MongoDB database. -It is important to know what data types are available to you when using a database so that you're using valid values and operating on the data with expected results. There are risks you can run into without properly typing your data like demonstrated in the `Double` versus `Decimal128` exercise. It's important to think about this ahead of committing to any given type. +It is important to know what data types are available to you when using a database so that you're using valid values and operating on the data with expected results. There are risks you can run into without properly typing your data like demonstrated in the `Double` versus `Decimal128` exercise. It's important to think about this ahead of committing to any given type. -If you are interested in checking out Prisma with a MongoDB database, you can check out the [data connector documentation](https://www.prisma.io/docs/concepts/database-connectors/mongodb). +If you are interested in checking out Prisma with a MongoDB database, you can check out the [data connector documentation](https://www.prisma.io/docs/orm/overview/databases/mongodb). -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). diff --git a/content/08-mongodb/11-mongodb-indexes.mdx b/content/08-mongodb/11-mongodb-indexes.mdx index aef38781..d4513729 100644 --- a/content/08-mongodb/11-mongodb-indexes.mdx +++ b/content/08-mongodb/11-mongodb-indexes.mdx @@ -18,7 +18,7 @@ Indexes can be thought of as shortcuts for accessing your data so that the entir -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -90,7 +90,7 @@ Index information includes the keys and options used to create the index. -If you're interested in using full text indexes with MongoDB, Prisma has the [`fullTextIndex` Preview feature](https://www.prisma.io/docs/concepts/components/prisma-schema/indexes#full-text-indexes-mysql-and-mongodb) which allows you to easily migrate full text indexes into a Prisma schema for type safety and protection against validation errors. +If you're interested in using full text indexes with MongoDB, Prisma has the [`fullTextIndex` Preview feature](https://www.prisma.io/docs/orm/prisma-schema/data-model/indexes#full-text-indexes-mysql-and-mongodb) which allows you to easily migrate full text indexes into a Prisma schema for type safety and protection against validation errors. @@ -272,7 +272,7 @@ We covered the basics of creating, analyzing, and dropping indexes in MongoDB. K -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). diff --git a/content/08-mongodb/12-mongodb-transactions.mdx b/content/08-mongodb/12-mongodb-transactions.mdx index 79189f9b..7751fcf5 100644 --- a/content/08-mongodb/12-mongodb-transactions.mdx +++ b/content/08-mongodb/12-mongodb-transactions.mdx @@ -14,7 +14,7 @@ In this guide, we'll begin by discussing what transactions are, when to use them -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -256,7 +256,7 @@ Transactions are a fundamental need for relational databases, and are also neede -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). diff --git a/content/08-mongodb/13-connection-uris.mdx b/content/08-mongodb/13-connection-uris.mdx index dda95074..89ac7cd0 100644 --- a/content/08-mongodb/13-connection-uris.mdx +++ b/content/08-mongodb/13-connection-uris.mdx @@ -1,29 +1,30 @@ --- title: 'Introduction to MongoDB connection URIs' -metaTitle: "Format Connection URI with MongoDB Database Information" -metaDescription: "Learn how to encode MongoDB connection details in connection URIs for applications and libraries, including authentication details and other parameters." +metaTitle: 'Format Connection URI with MongoDB Database Information' +metaDescription: 'Learn how to encode MongoDB connection details in connection URIs for applications and libraries, including authentication details and other parameters.' metaImage: '/social/generic-mongodb.png' authors: ['justinellingwood'] --- ## Introduction -Connecting to your database server is usually one of the first tasks you need to accomplish when designing and configuring database-backed applications. While there are many methods of providing the address, listening port, credentials, and other details to applications, connection URIs, sometimes called connection strings or connection URLs, are one of the most powerful and flexible ways of specifying complex configuration in a compact format. +Connecting to your database server is usually one of the first tasks you need to accomplish when designing and configuring database-backed applications. While there are many methods of providing the address, listening port, credentials, and other details to applications, connection URIs, sometimes called connection strings or connection URLs, are one of the most powerful and flexible ways of specifying complex configuration in a compact format. -In this guide, we'll talk about how to format a connection URI with your MongoDB database information and [authentication](/intro/database-glossary#authentication) details. Connection URIs are divided into sections, so we'll cover each part as we go along. +In this guide, we'll talk about how to format a connection URI with your MongoDB database information and [authentication](/intro/database-glossary#authentication) details. Connection URIs are divided into sections, so we'll cover each part as we go along. ## Percent encoding values -Before we begin, we should mention that MongoDB connection URIs expect [percent-encoded values](https://en.wikipedia.org/wiki/Percent-encoding). This means that any characters that have a special meaning within the URL must be converted to their percent-encoded counterparts to ensure that libraries and applications can interpret them correctly. +Before we begin, we should mention that MongoDB connection URIs expect [percent-encoded values](https://en.wikipedia.org/wiki/Percent-encoding). This means that any characters that have a special meaning within the URL must be converted to their percent-encoded counterparts to ensure that libraries and applications can interpret them correctly. Characters you should percent-encode include: -* `:`: `%3A` -* `/`: `%2F` -* `?`: `%3F` -* `#`: `%23` -* `[`: `%5B` -* `]`: `%5D` -* `@`: `%40` + +- `:`: `%3A` +- `/`: `%2F` +- `?`: `%3F` +- `#`: `%23` +- `[`: `%5B` +- `]`: `%5D` +- `@`: `%40` These have special meaning within the connection URI. @@ -39,7 +40,7 @@ pe@ce&lo\/3 pe%40ce&lo\%2F3 ``` -If you are unsure about whether a character should be percent-encoded, it's usually best to encode it anyways. For example, if you are unsure if the `\` character is reserved, you can use it's percent-encoded equivalent, `%5C`, to be safe: +If you are unsure about whether a character should be percent-encoded, it's usually best to encode it anyways. For example, if you are unsure if the `\` character is reserved, you can use it's percent-encoded equivalent, `%5C`, to be safe: ``` pe%40ce%26lo%5C%2F3 @@ -63,20 +64,20 @@ mongodb://[username:password@]host[:port][,...hostN[:port]][/[database][?paramet |- host specifier ``` -The parts in square brackets indicate optional parts. You may have noticed that most parts of the URI are optional. It might also be apparent that there are many pieces of information you can encode in the URI. +The parts in square brackets indicate optional parts. You may have noticed that most parts of the URI are optional. It might also be apparent that there are many pieces of information you can encode in the URI. A quick description of each of the individual components: -* `mongodb://`: The schema identifier used to identify the string as a MongoDB connection URI. -* `auth credentials`: An optional component of the URI that can be used to specify the user and password to connect as. - * `username`: An optional username. If included, it should start after the second slash (`/`) and continue until a colon (`:`). Must be accompanied by a `password` if included. - * `password`: An optional password. If included, it begins after a colon (`:`) and continues until the at sign (`@`). Must be accompanied by a `username` if included. -* `host specifier`: A required component used to specify one or more hostnames and ports to connect to. - * `host`: An IP address, DNS name, or locally resolvable name of the server to connect to. The host continues until a colon (`:`) (if a port is included), until a comma (`,`) if more than one host is specified, or else until a slash (`/`). At least one host must be provided. - * `port`: An optional port specification to indicate which port MongoDB is listening to on the host. The port begins with a colon (`:`) and continues until a comma (`,`) if another host is provided or until the slash (`/`) if not. -* `default authentication database`: The name of the database to authenticate to if a more specific `authSource` is not provided in the parameter list. If no database is specified here or with `authSource`, MongoDB will attempt to authenticate to the standard `admin` database. -* `parameter list`: An optional list of additional parameters that can affect the connection behavior. The parameter list begins with a question mark (`?`). If no default authentication database is provided, you must begin the parameter list with both the slash and question mark (`/?`) after the last host definition. - * `parameter pairs`: The parameter list is composed of key-value pairs. The key and value within each pair are separated by an equal sign (`=`) and each pair is separated from the next by an ampersand (`&`). +- `mongodb://`: The schema identifier used to identify the string as a MongoDB connection URI. +- `auth credentials`: An optional component of the URI that can be used to specify the user and password to connect as. + - `username`: An optional username. If included, it should start after the second slash (`/`) and continue until a colon (`:`). Must be accompanied by a `password` if included. + - `password`: An optional password. If included, it begins after a colon (`:`) and continues until the at sign (`@`). Must be accompanied by a `username` if included. +- `host specifier`: A required component used to specify one or more hostnames and ports to connect to. + - `host`: An IP address, DNS name, or locally resolvable name of the server to connect to. The host continues until a colon (`:`) (if a port is included), until a comma (`,`) if more than one host is specified, or else until a slash (`/`). At least one host must be provided. + - `port`: An optional port specification to indicate which port MongoDB is listening to on the host. The port begins with a colon (`:`) and continues until a comma (`,`) if another host is provided or until the slash (`/`) if not. +- `default authentication database`: The name of the database to authenticate to if a more specific `authSource` is not provided in the parameter list. If no database is specified here or with `authSource`, MongoDB will attempt to authenticate to the standard `admin` database. +- `parameter list`: An optional list of additional parameters that can affect the connection behavior. The parameter list begins with a question mark (`?`). If no default authentication database is provided, you must begin the parameter list with both the slash and question mark (`/?`) after the last host definition. + - `parameter pairs`: The parameter list is composed of key-value pairs. The key and value within each pair are separated by an equal sign (`=`) and each pair is separated from the next by an ampersand (`&`). Here is an example of a MongoDB connection URI that incorporates all of these components: @@ -92,9 +93,9 @@ mongodb://sally:sallyspassword@dbserver.example:5555/userdata?tls=true&connectio ## Specifying the URI type -The item in a connection URI is usually the protocol specification or application type. Since the URI will be used to connect and authenticate to a MongoDB database, we need to use a signifier that signifies that to the applications and libraries we're using. +The item in a connection URI is usually the protocol specification or application type. Since the URI will be used to connect and authenticate to a MongoDB database, we need to use a signifier that signifies that to the applications and libraries we're using. -The MongoDB project only accepts `mongodb` as valid URI schema designators. Therefore, you should always start your connection URI like this: +The MongoDB project only accepts `mongodb` as valid URI schema designators. Therefore, you should always start your connection URI like this: ``` mongodb:// @@ -104,7 +105,7 @@ The schema designator will ensure that the information that follows is interpret ## Specifying a username and password -The next part of the URI is the user credentials. User credentials are optional, but typically required if you don't want to rely on defaults configured by either your application or the database. +The next part of the URI is the user credentials. User credentials are optional, but typically required if you don't want to rely on defaults configured by either your application or the database. To include user credentials, provide the username after the schema identifier, followed by a colon (`:`), the password, and finally an at sign (`@`): @@ -116,9 +117,9 @@ User credentials are optional, but if included, you must provide both the userna ## Specifying where the server is listening -After the user credentials comes the host specifier which defines where the server is listening. One or more hosts can be defined within the host specifier, but since the host specifier is **required**, at least one host must be provided. +After the user credentials comes the host specifier which defines where the server is listening. One or more hosts can be defined within the host specifier, but since the host specifier is **required**, at least one host must be provided. -Each host definition consists of a `host` and an optional `port`. The `host` can either be a locally resolvable host name, a name resolved by an external name system like DNS, or an IP address or other direct address. The port signifies the port number on the host where MongoDB is listening. +Each host definition consists of a `host` and an optional `port`. The `host` can either be a locally resolvable host name, a name resolved by an external name system like DNS, or an IP address or other direct address. The port signifies the port number on the host where MongoDB is listening. To specify that the application should attempt to connect to the default MongoDB port (27017) on the local computer, you can use: @@ -132,25 +133,25 @@ If you needed to include a username and password, that information would come fi mongodb://username:password@localhost ``` -To specify a remote server running on a non-standard port, separate those details with a colon. For example, to connect to port 3333 on a host at `198.51.100.22`, you could use: +To specify a remote server running on a non-standard port, separate those details with a colon. For example, to connect to port 3333 on a host at `198.51.100.22`, you could use: ``` mongodb://username:password@198.51.100.22:3333 ``` -To define more than one host and port pair, separate the sets by commas (`,`) to tell the application to try the latter servers if the first cannot be reached. For example, to extend the previous example to include a fallback server listening on port 5555 on `198.51.100.33`, you could use: +To define more than one host and port pair, separate the sets by commas (`,`) to tell the application to try the latter servers if the first cannot be reached. For example, to extend the previous example to include a fallback server listening on port 5555 on `198.51.100.33`, you could use: ``` mongodb://username:password@198.51.100.22:3333,198.51.100.33:5555 ``` -Conforming clients and applications will try to first connect to the server listening at `198.51.100.22:3333`. If that fails, they will try to reach a MongoDB database listening on `198.51.100.33:5555`. +Conforming clients and applications will try to first connect to the server listening at `198.51.100.22:3333`. If that fails, they will try to reach a MongoDB database listening on `198.51.100.33:5555`. ## Providing the default authentication database -After the host specifier, the next piece of data is the default authentication database. While not true for all database management systems, with MongoDB, you must authenticate against a specific database when establishing a connection. +After the host specifier, the next piece of data is the default authentication database. While not true for all database management systems, with MongoDB, you must authenticate against a specific database when establishing a connection. -The database name begins with a forward slash (`/`) and proceeds until either the end of the line or a question mark (`?`). The default authentication database will be used if an `authSource` option is not provided within the parameter list. If neither are provided, the client will authenticate against the `admin` database. +The database name begins with a forward slash (`/`) and proceeds until either the end of the line or a question mark (`?`). The default authentication database will be used if an `authSource` option is not provided within the parameter list. If neither are provided, the client will authenticate against the `admin` database. To connect to a database called `sales` hosted on a MongoDB server listening on `198.51.100.22:3333`, you could type: @@ -160,9 +161,9 @@ mongodb://username:password@198.51.100.22:3333/sales ## Specifying additional parameters -The last part of the connection URI is used to provide additional parameters for the connection. The list of parameters is introduced by a leading question mark (`?`) and continues until the end of the line. If no default authentication database is provided, the trailing slash indicating the end of the host specification must directly precede the question mark (`/?`). +The last part of the connection URI is used to provide additional parameters for the connection. The list of parameters is introduced by a leading question mark (`?`) and continues until the end of the line. If no default authentication database is provided, the trailing slash indicating the end of the host specification must directly precede the question mark (`/?`). -Each parameter listed is defined as a key and value pair joined with an equals sign (`=`). After the first parameter pair, each additional key-value pair is separated by an ampersand (`&`). +Each parameter listed is defined as a key and value pair joined with an equals sign (`=`). After the first parameter pair, each additional key-value pair is separated by an ampersand (`&`). For example, to specify that the client should apply a 10 second timeout for the connection we were previously defining, you could use: @@ -170,7 +171,7 @@ For example, to specify that the client should apply a 10 second timeout for the mongodb://username:password@198.51.100.22:3333/sales?connectTimeoutMS=10000 ``` -If you wanted to provide additional parameters, you'd add them afterwards with an ampersand (`&`) between each pair. For instance, we could additionally specify that we require SSL and that the specified hosts are members of a replica set we want to connect to: +If you wanted to provide additional parameters, you'd add them afterwards with an ampersand (`&`) between each pair. For instance, we could additionally specify that we require SSL and that the specified hosts are members of a replica set we want to connect to: ``` mongodb://username:password@198.51.100.22:3333,198.51.100.33:5555/sales?connectTimeoutMS=10000&tls=true&replicaSet=someReplicaSet @@ -180,10 +181,10 @@ The MongoDB documentation has a [full list of parameters](https://www.mongodb.co ## Conclusion -In this guide, we discussed what a MongoDB connection URI is, how to interpret the various components, and how to construct your own URIs given a set of connection information. Connection URIs encode all of the information required to connect to a given database within a single string. Because of this flexibility and due to their wide adoption, understanding how to parse and construct those strings can be pretty helpful. +In this guide, we discussed what a MongoDB connection URI is, how to interpret the various components, and how to construct your own URIs given a set of connection information. Connection URIs encode all of the information required to connect to a given database within a single string. Because of this flexibility and due to their wide adoption, understanding how to parse and construct those strings can be pretty helpful. -If you are using [Prisma to manage your MongoDB database](https://www.prisma.io/docs/concepts/database-connectors/mongodb), you need to set a connection URI within a 'datasource' block in your [Prisma schema file](https://www.prisma.io/docs/concepts/components/prisma-schema). You must provide a [connection URI for the 'url' field](https://www.prisma.io/docs/concepts/database-connectors/mongodb#example) so that Prisma can connect to your database. +If you are using [Prisma to manage your MongoDB database](https://www.prisma.io/docs/orm/overview/databases/mongodb), you need to set a connection URI within a 'datasource' block in your [Prisma schema file](https://www.prisma.io/docs/orm/prisma-schema/overview). You must provide a [connection URI for the 'url' field](https://www.prisma.io/docs/orm/overview/databases/mongodb#example) so that Prisma can connect to your database. diff --git a/content/08-mongodb/14-working-with-dates.mdx b/content/08-mongodb/14-working-with-dates.mdx index b99907a3..3029fc8c 100644 --- a/content/08-mongodb/14-working-with-dates.mdx +++ b/content/08-mongodb/14-working-with-dates.mdx @@ -1,20 +1,20 @@ --- title: 'Working with dates and times in MongoDB' -metaTitle: "Using dates and times in MongoDB" -metaDescription: "Learn how to store and manage date information in MongoDB." +metaTitle: 'Using dates and times in MongoDB' +metaDescription: 'Learn how to store and manage date information in MongoDB.' metaImage: '/social/generic-mongodb.png' authors: ['justinellingwood'] --- ## Introduction -Date and time data is commonly managed by database systems and is incredibly important, but can often be trickier to handle correctly than it initially appears. Databases must be able to store date and time data in clear, unambiguous formats, transform that data into user-friendly formats to interact with client applications, and perform time-based operations taking into account complexities like different timezones and changes in daylight savings time. +Date and time data is commonly managed by database systems and is incredibly important, but can often be trickier to handle correctly than it initially appears. Databases must be able to store date and time data in clear, unambiguous formats, transform that data into user-friendly formats to interact with client applications, and perform time-based operations taking into account complexities like different timezones and changes in daylight savings time. -In this guide, we'll discuss some of the tools that MongoDB provides to work effectively with date and time data. We'll explore relevant data types, take a look at the operators and methods, and go over how to best use these tools to keep your date and time data in good order. +In this guide, we'll discuss some of the tools that MongoDB provides to work effectively with date and time data. We'll explore relevant data types, take a look at the operators and methods, and go over how to best use these tools to keep your date and time data in good order. -If you are using [MongoDB with Prisma](https://www.prisma.io/mongodb), you can use the [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb) to connect to and manage your database. Prisma's `date` type [maps directly](https://www.prisma.io/docs/concepts/database-connectors/mongodb#mapping-from-prisma-to-mongodb-types-on-migration) to MongoDB's `Date` type. +If you are using [MongoDB with Prisma](https://www.prisma.io/mongodb), you can use the [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb) to connect to and manage your database. Prisma's `date` type [maps directly](https://www.prisma.io/docs/orm/overview/databases/mongodb#mapping-from-prisma-to-mongodb-types-on-migration) to MongoDB's `Date` type. @@ -22,7 +22,7 @@ If you are using [MongoDB with Prisma](https://www.prisma.io/mongodb), you can u The [`DATE` type in MongoDB](https://www.prisma.io/dataguide/mongodb/mongodb-datatypes#date) can store date and time values as a combined unit. -Here, the left column represents the [BSON (binary JSON)](https://bsonspec.org/spec.html) name for the data type and the second column represents the ID number associated with that type. The final "Alias" column represents the string that MongoDB uses to represent the type: +Here, the left column represents the [BSON (binary JSON)](https://bsonspec.org/spec.html) name for the data type and the second column represents the ID number associated with that type. The final "Alias" column represents the string that MongoDB uses to represent the type: ``` | Type | Number | Alias | @@ -30,14 +30,14 @@ Here, the left column represents the [BSON (binary JSON)](https://bsonspec.org/s | Date | 9 | "date" | ``` -The BSON Date type is a *signed* 64-bit integer representing the number of milliseconds since the [Unix epoch (Jan 1, 1970)](https://en.wikipedia.org/wiki/Unix_time). Positive numbers represent the time elapsed since the epoch while negative numbers represent time moving backwards from the epoch. +The BSON Date type is a _signed_ 64-bit integer representing the number of milliseconds since the [Unix epoch (Jan 1, 1970)](https://en.wikipedia.org/wiki/Unix_time). Positive numbers represent the time elapsed since the epoch while negative numbers represent time moving backwards from the epoch. Storing the date and time data as a large integer is beneficial because it: -* allows MongoDB to store dates with millisecond precision -* provides flexibility in how the date and time can be displayed +- allows MongoDB to store dates with millisecond precision +- provides flexibility in how the date and time can be displayed -Because the date type does not store additional information like timezones, that context must be stored separately if it is relevant. MongoDB will store date and time information using [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) internally, but can easily convert to other timezones at time of retrieval as needed. +Because the date type does not store additional information like timezones, that context must be stored separately if it is relevant. MongoDB will store date and time information using [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) internally, but can easily convert to other timezones at time of retrieval as needed. MongoDB also provides a [`Timestamp` type](https://www.prisma.io/dataguide/mongodb/mongodb-datatypes#timestamp) that is mainly used internally: @@ -47,11 +47,11 @@ MongoDB also provides a [`Timestamp` type](https://www.prisma.io/dataguide/mongo | Timestamp | 17 | "timestamp" | ``` -Because this is mainly implemented to help coordinate internal processes like replication and sharding, you should probably not use this in your own application's logic. The date type can usually satisfy any requirements for times that you might have. +Because this is mainly implemented to help coordinate internal processes like replication and sharding, you should probably not use this in your own application's logic. The date type can usually satisfy any requirements for times that you might have. -When managing a MongoDB database with Prisma, the MongoDB `Date` type [maps directly](https://www.prisma.io/docs/concepts/database-connectors/mongodb#mapping-from-prisma-to-mongodb-types-on-migration) to the `date` type within Prisma. +When managing a MongoDB database with Prisma, the MongoDB `Date` type [maps directly](https://www.prisma.io/docs/orm/overview/databases/mongodb#mapping-from-prisma-to-mongodb-types-on-migration) to the `date` type within Prisma. @@ -59,20 +59,20 @@ When managing a MongoDB database with Prisma, the MongoDB `Date` type [maps dire You can create a new `Date` object in two different ways: -* `new Date()`: Returns a date and time as a `Date` object. -* `ISODate()`: Returns a date and time as a `Date` object. +- `new Date()`: Returns a date and time as a `Date` object. +- `ISODate()`: Returns a date and time as a `Date` object. Both the `new Date()` and `ISODate()` methods produce a `Date` object that is wrapped in an `ISODate()` helper function. Additionally, calling `Date()` function without the `new` constructor returns a date and time as as string instead of a `Date` object: -* `Date()`: Returns a date and time as a string. +- `Date()`: Returns a date and time as a string. -It is important to keep in mind this distinction between these two types as it affects what operations are available, the way the information is stored, and how much flexibility it will give you. In general, it's almost always best to store date information using the `Date` type and then format it for output as needed. +It is important to keep in mind this distinction between these two types as it affects what operations are available, the way the information is stored, and how much flexibility it will give you. In general, it's almost always best to store date information using the `Date` type and then format it for output as needed. Let's take a look at how this works in a MongoDB shell session. -First, we can switch to a new temporary database and create three documents that each have a `date` field. We use a different method for populating the `date` field for each object: +First, we can switch to a new temporary database and create three documents that each have a `date` field. We use a different method for populating the `date` field for each object: ```javascript use temp_db @@ -92,6 +92,7 @@ db.dates.insertMany([ }, ]) ``` + ```javascript { "acknowledged" : true, @@ -103,24 +104,24 @@ db.dates.insertMany([ } ``` -By default, each of these mechanisms will store the current date and time. You can store a different date and time by adding an [ISO 8601 formatted date string](https://en.wikipedia.org/wiki/ISO_8601) as an argument: +By default, each of these mechanisms will store the current date and time. You can store a different date and time by adding an [ISO 8601 formatted date string](https://en.wikipedia.org/wiki/ISO_8601) as an argument: ```javascript db.dates.insertMany([ - { - name: "Future date", - date: ISODate("2040-10-28T23:58:18Z"), - }, - { - name: "Past date", - date: new Date("1852-01-15T11:25"), - }, + { + name: 'Future date', + date: ISODate('2040-10-28T23:58:18Z'), + }, + { + name: 'Past date', + date: new Date('1852-01-15T11:25'), + }, ]) ``` These will create a `Date` object at the appropriate date and time. -One thing to note is the inclusion of the trailing `Z` in the first new document above. This indicates that the date and time is being provided as UTC. Specifying the date without the `Z` will cause MongoDB interpret the input in relation to the current local time (though it will always convert and store it as a UTC date internally). +One thing to note is the inclusion of the trailing `Z` in the first new document above. This indicates that the date and time is being provided as UTC. Specifying the date without the `Z` will cause MongoDB interpret the input in relation to the current local time (though it will always convert and store it as a UTC date internally). ## Validating the type of date objects @@ -129,6 +130,7 @@ Next, we can display the resulting documents to see how MongoDB stored the date ```javascript db.dates.find().pretty() ``` + ```javascript { "_id" : ObjectId("62726af5a3dc7398b97e6e93"), @@ -157,62 +159,61 @@ db.dates.find().pretty() } ``` -As expected, the `date` fields that were populated with `ISODate()` and `new Date()` contain `Date` objects (wrapped in the `ISODate` helper). In contrast, the field populated by the bare `Date()` function call is stored as a string. +As expected, the `date` fields that were populated with `ISODate()` and `new Date()` contain `Date` objects (wrapped in the `ISODate` helper). In contrast, the field populated by the bare `Date()` function call is stored as a string. -You can verify that which of the `date` fields contain an actual `Date` object by calling a `map` function over the collection. The map checks each `date` field to see if the object it stores is an instance of the `Date` type and displays the result in a new field called `is_a_Date_object`. Additionally, we will use the `valueOf()` method to show how each `date` field is actually stored by MongoDB: +You can verify that which of the `date` fields contain an actual `Date` object by calling a `map` function over the collection. The map checks each `date` field to see if the object it stores is an instance of the `Date` type and displays the result in a new field called `is_a_Date_object`. Additionally, we will use the `valueOf()` method to show how each `date` field is actually stored by MongoDB: ```javascript -db.dates.find().map( - function(date_doc) { - date_doc["is_a_Date_object"] = date_doc.date instanceof Date; - date_doc["date_storage_value"] = date_doc.date.valueOf(); - return date_doc; - } -) +db.dates.find().map(function (date_doc) { + date_doc['is_a_Date_object'] = date_doc.date instanceof Date + date_doc['date_storage_value'] = date_doc.date.valueOf() + return date_doc +}) ``` + ```javascript -[ - { - "_id" : ObjectId("62726af5a3dc7398b97e6e93"), - "name" : "Created with `Date()`", - "date" : "Wed May 04 2022 12:00:53 GMT+0000 (UTC)", - "is_a_Date_object" : false, - "date_storage_value" : "Wed May 04 2022 12:00:53 GMT+0000 (UTC)" - }, - { - "_id" : ObjectId("62726af5a3dc7398b97e6e94"), - "name" : "Created with `new Date()`", - "date" : ISODate("2022-05-04T12:00:53.307Z"), - "is_a_Date_object" : true, - "date_storage_value" : 1651665653307 - }, - { - "_id" : ObjectId("62726af5a3dc7398b97e6e95"), - "name" : "Created with `ISODate()`", - "date" : ISODate("2022-05-04T12:00:53.307Z"), - "is_a_Date_object" : true, - "date_storage_value" : 1651665653307 - }, - { - "_id" : ObjectId("62728b57a3dc7398b97e6e96"), - "name" : "Future date", - "date" : ISODate("2040-10-28T23:58:18Z"), - "is_a_Date_object" : true, - "date_storage_value" : 2235081498000 - }, - { - "_id" : ObjectId("62728c5ca3dc7398b97e6e97"), - "name" : "Past date", - "date" : ISODate("1852-01-15T11:25:00Z"), - "is_a_Date_object" : true, - "date_storage_value" : -3722502900000 - } +;[ + { + _id: ObjectId('62726af5a3dc7398b97e6e93'), + name: 'Created with `Date()`', + date: 'Wed May 04 2022 12:00:53 GMT+0000 (UTC)', + is_a_Date_object: false, + date_storage_value: 'Wed May 04 2022 12:00:53 GMT+0000 (UTC)', + }, + { + _id: ObjectId('62726af5a3dc7398b97e6e94'), + name: 'Created with `new Date()`', + date: ISODate('2022-05-04T12:00:53.307Z'), + is_a_Date_object: true, + date_storage_value: 1651665653307, + }, + { + _id: ObjectId('62726af5a3dc7398b97e6e95'), + name: 'Created with `ISODate()`', + date: ISODate('2022-05-04T12:00:53.307Z'), + is_a_Date_object: true, + date_storage_value: 1651665653307, + }, + { + _id: ObjectId('62728b57a3dc7398b97e6e96'), + name: 'Future date', + date: ISODate('2040-10-28T23:58:18Z'), + is_a_Date_object: true, + date_storage_value: 2235081498000, + }, + { + _id: ObjectId('62728c5ca3dc7398b97e6e97'), + name: 'Past date', + date: ISODate('1852-01-15T11:25:00Z'), + is_a_Date_object: true, + date_storage_value: -3722502900000, + }, ] ``` This confirms that the fields displayed as `ISODATE(...)` are instances of the `Date` type while the `date` created with the bare `Date()` function is not. -Additionally, the above output shows that objects stored with the `Date` type are recorded as signed integers. As expected, the date object associated with the date from 1852 is negative because it is counting backwards from January 1970. +Additionally, the above output shows that objects stored with the `Date` type are recorded as signed integers. As expected, the date object associated with the date from 1852 is negative because it is counting backwards from January 1970. ## Querying for date objects @@ -221,10 +222,13 @@ If you have a collection with mixed representations of dates like this, you can For instance, to query for all of the documents where `date` is a `Date` object, you could type: ```javascript -db.dates.find({ - date: { $type: "date" }, -}).pretty() +db.dates + .find({ + date: { $type: 'date' }, + }) + .pretty() ``` + ```javascript { "_id" : ObjectId("62726af5a3dc7398b97e6e94"), @@ -251,10 +255,13 @@ db.dates.find({ To find instances where the `date` field is stored as a string instead, type: ```javascript -db.dates.find({ - date: { $type: "string" }, -}).pretty() +db.dates + .find({ + date: { $type: 'string' }, + }) + .pretty() ``` + ```javascript { "_id" : ObjectId("62726af5a3dc7398b97e6e93"), @@ -265,15 +272,18 @@ db.dates.find({ The `Date` type allows you to perform queries that understand the relationship between time units. -For instance, you can compare `Date` objects ordinally as you would with other types. To check for future dates, you could type: +For instance, you can compare `Date` objects ordinally as you would with other types. To check for future dates, you could type: ```javascript -db.dates.find({ +db.dates + .find({ date: { - $gt: new Date() - } -}).pretty() + $gt: new Date(), + }, + }) + .pretty() ``` + ```javascript { "_id" : ObjectId("62728b57a3dc7398b97e6e96"), @@ -282,16 +292,16 @@ db.dates.find({ } ``` -## How to use `Date` type methods +## How to use `Date` type methods -You can operate on `Date` objects with a variety of included methods and operators. For instance, you can extract different date and time components from a date and print in many different formats. +You can operate on `Date` objects with a variety of included methods and operators. For instance, you can extract different date and time components from a date and print in many different formats. A demonstration is probably the quickest way to showcase this functionality. First, let's select the date from a document with a date object: ```javascript -date_obj = db.dates.findOne({"name": "Future date"}).date +date_obj = db.dates.findOne({ name: 'Future date' }).date ``` Now, we can select the `date` field and extract different components from it by calling various methods on the object: @@ -304,16 +314,17 @@ date_obj.getUTCHours() date_obj.getUTCMinutes() date_obj.getUTCSeconds() ``` + ```javascript -2040 // year -9 // month -28 // date -23 // hour -58 // minutes -18 // seconds +2040 // year +9 // month +28 // date +23 // hour +58 // minutes +18 // seconds ``` -There are also companion methods that can be used to set the time by providing different time and date components. For example, you can change the year by calling the `.setUTCFullYear()` method: +There are also companion methods that can be used to set the time by providing different time and date components. For example, you can change the year by calling the `.setUTCFullYear()` method: ```javascript date_obj.toString() @@ -321,6 +332,7 @@ date_obj.setUTCFullYear(2028) date_obj.toString() date_obj.setUTCFullYear(2040) ``` + ```javascript Sun Oct 28 2040 23:58:18 GMT+0000 (UTC) 1856390298000 // integer stored for the new date value @@ -339,6 +351,7 @@ date_obj.toLocaleTimeString() date_obj.toString() date_obj.toTimeString() ``` + ```javascript Sun Oct 28 2040 // .toDateString() Sun, 28 Oct 2040 23:58:18 GMT // .toUTCString() @@ -353,37 +366,39 @@ These are all mainly methods associated with JavaScript's `Date` type. ## How to use MongoDB `Date` aggregation functions -MongoDB offers some other functions that can manipulate dates as well. One useful example of this is the [`$dateToString()` aggregation function](https://www.mongodb.com/docs/manual/reference/operator/aggregation/dateToString/). You can pass call `$dateToString()` with a `Date` object, a format string specifier, and a timezone indicator. MongoDB will use the format string as a template to figure out how to output the given `Date` object with the timezone being used to offset the output from UTC correctly. +MongoDB offers some other functions that can manipulate dates as well. One useful example of this is the [`$dateToString()` aggregation function](https://www.mongodb.com/docs/manual/reference/operator/aggregation/dateToString/). You can pass call `$dateToString()` with a `Date` object, a format string specifier, and a timezone indicator. MongoDB will use the format string as a template to figure out how to output the given `Date` object with the timezone being used to offset the output from UTC correctly. -Here, we will format the dates in our `dates` collection using an arbitrary string. We'll also cast the dates to the New York timezone. +Here, we will format the dates in our `dates` collection using an arbitrary string. We'll also cast the dates to the New York timezone. First, we need to remove any stray documents that might have saved the `date` field as a string: ```javascript -db.dates.deleteMany({"date": {$type: "string"}}) +db.dates.deleteMany({ date: { $type: 'string' } }) ``` Now we can run an aggregation with the `$dateToString` function: ```javascript -db.dates.aggregate( - [ - { - $project: { - "_id": 0, - "date": "$date", - "my_date": { - $dateToString: { - date: "$date", - format: "Day %d of Month %m (Day %j of year %Y) at %H hours, %M minutes, and %S seconds (timezone offset: %z)", - timezone: "America/New_York", - } - } - } - } - ] -).pretty() +db.dates + .aggregate([ + { + $project: { + _id: 0, + date: '$date', + my_date: { + $dateToString: { + date: '$date', + format: + 'Day %d of Month %m (Day %j of year %Y) at %H hours, %M minutes, and %S seconds (timezone offset: %z)', + timezone: 'America/New_York', + }, + }, + }, + }, + ]) + .pretty() ``` + ```javascript { "date" : ISODate("2022-05-04T12:00:53.307Z"), @@ -403,24 +418,23 @@ db.dates.aggregate( } ``` -The `$dateToParts()` function is similarly useful. It can be used to decompose a `Date` field into its constituent parts. +The `$dateToParts()` function is similarly useful. It can be used to decompose a `Date` field into its constituent parts. For example, we can type: ```javascript -db.dates.aggregate( - [ - { - $project: { - _id: 0, - date: { - $dateToParts: { date: "$date" } - } - } - } - ] -) +db.dates.aggregate([ + { + $project: { + _id: 0, + date: { + $dateToParts: { date: '$date' }, + }, + }, + }, +]) ``` + ```javascript { "date" : { "year" : 2022, "month" : 5, "day" : 4, "hour" : 12, "minute" : 0, "second" : 53, "millisecond" : 307 } } { "date" : { "year" : 2022, "month" : 5, "day" : 4, "hour" : 12, "minute" : 0, "second" : 53, "millisecond" : 307 } } @@ -432,14 +446,14 @@ The [MongoDB documentation on aggregation functions](https://www.mongodb.com/doc ## Conclusion -In this guide, we covered some of the different ways that you can work with date and time data within MongoDB. Most temporal data should probably be stored in MongoDB's `Date` data type as this provides a good deal of flexibility when operating on the data or displaying it. +In this guide, we covered some of the different ways that you can work with date and time data within MongoDB. Most temporal data should probably be stored in MongoDB's `Date` data type as this provides a good deal of flexibility when operating on the data or displaying it. -Getting familiar with how date and time data is stored internally, how to coerce it into desirable formats on output, and how to compare, modify, and decompose the data into useful chunks can help you solve many different problems. While date information can be challenging to work with, taking advantage of the available methods and operators can help mitigate some of the heavy lifting. +Getting familiar with how date and time data is stored internally, how to coerce it into desirable formats on output, and how to compare, modify, and decompose the data into useful chunks can help you solve many different problems. While date information can be challenging to work with, taking advantage of the available methods and operators can help mitigate some of the heavy lifting. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). - \ No newline at end of file + diff --git a/content/08-mongodb/15-mongodb-encryption.mdx b/content/08-mongodb/15-mongodb-encryption.mdx index d8d3233c..5c1b06de 100644 --- a/content/08-mongodb/15-mongodb-encryption.mdx +++ b/content/08-mongodb/15-mongodb-encryption.mdx @@ -14,7 +14,7 @@ The need for data encryption is even more paramount for organizations handling s -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -99,7 +99,7 @@ With more and more data entering the digital universe, more threats are trying t -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). diff --git a/content/08-mongodb/16-mongodb-database-tools.mdx b/content/08-mongodb/16-mongodb-database-tools.mdx index 3e6e7434..eeea7168 100644 --- a/content/08-mongodb/16-mongodb-database-tools.mdx +++ b/content/08-mongodb/16-mongodb-database-tools.mdx @@ -1,34 +1,32 @@ --- title: 'Introduction to MongoDB database tools & utilities' -metaTitle: "MongoDB Database Tools & Utilities Installation and Overview" -metaDescription: "No matter the database you are working with, there are likely database tools available to help. Learn about MongoDB Database Tools & Utilities" +metaTitle: 'MongoDB Database Tools & Utilities Installation and Overview' +metaDescription: 'No matter the database you are working with, there are likely database tools available to help. Learn about MongoDB Database Tools & Utilities' metaImage: '/social/generic-mongodb.png' authors: ['alexemerich'] --- - ## Introduction to MongoDB Database Tools & Utilities No matter the database you are working with, there are likely database tools available to help you work with your database. **Database tools** is a collective term for tools, utilities, and assistants that can make life easier when performing database administration tasks. -While not necessary to use, database tools and utilities can save you time and effort. MongoDB has a first party collection of extremely helpful, good to know command-line utilities that you can use in your deployment. In this article we are going to briefly mention installation and then cover the most useful utilities to know. +While not necessary to use, database tools and utilities can save you time and effort. MongoDB has a first party collection of extremely helpful, good to know command-line utilities that you can use in your deployment. In this article we are going to briefly mention installation and then cover the most useful utilities to know. MongoDB separates its tools and utilities into four categories: Binary Import / Export, Data Import / Export, Diagnostic Tools, and GridFS so we'll cover them accordingly. - ## Installing MongoDB database tools Starting in MongoDB version 4.4, the MongoDB Database Tools are released separately from the download of the MongoDB Server. They are also maintained on their own versioning compared to previous instances when these tools were released alongside a respective MongoDB Server version. We won’t cover the steps for installation, but if you are working with MongoDB 4.4 or later, then the following will walk you through each OS installation process. - + - [Linux](https://www.mongodb.com/docs/database-tools/installation/installation-linux/) - [macOS](https://www.mongodb.com/docs/database-tools/installation/installation-macos/) - [Windows](https://www.mongodb.com/docs/database-tools/installation/installation-windows/) -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -40,13 +38,13 @@ To get started working with MongoDB and Prisma, checkout our [getting started fr `mongodump` is a utility for creating a binary export of the contents of a database. This utility can export data from standalone, replica set, and sharded cluster deployments. The exports can be executed from either `mongod` or `mongos` instances. It is important to note that `mongodump` needs to be run from the system command line, not the `mongo` shell. -`mongodump` can be a partner with `mongorestore` (more upcoming) to form part of a complete backup and recovery strategy. `mongodump` can also generate partial backups based on a collection, query, or syncing from production to development environment. +`mongodump` can be a partner with `mongorestore` (more upcoming) to form part of a complete backup and recovery strategy. `mongodump` can also generate partial backups based on a collection, query, or syncing from production to development environment. -While a viable strategy for smaller deployments, `mongodump` should be set aside for another backup strategy for larger MongoDB deployments. Because `mongodump` operates by interacting with a running `mongod` instance, it can impact the performance of your running database. On top of creating traffic, the tool also forces the database to read all data through memory. When MongoDB needs to read infrequently accessed data, this can take away from more frequently accessed data, diminishing the regular workload’s performance. +While a viable strategy for smaller deployments, `mongodump` should be set aside for another backup strategy for larger MongoDB deployments. Because `mongodump` operates by interacting with a running `mongod` instance, it can impact the performance of your running database. On top of creating traffic, the tool also forces the database to read all data through memory. When MongoDB needs to read infrequently accessed data, this can take away from more frequently accessed data, diminishing the regular workload’s performance. The basic syntax for `mongodump` looks as follows in the system command line: - mongodump + mongodump `mongodump` will generate a file and store it in a `dump/` directory for you to access. You can read more about the [connection string configuration](https://www.mongodb.com/docs/database-tools/mongodump/#connect-to-a-mongodb-instance) and additional [options](https://www.mongodb.com/docs/database-tools/mongodump/#options) in the official MongoDB documentation. @@ -54,86 +52,90 @@ The basic syntax for `mongodump` looks as follows in the system command line: `mongorestore` is the partner tool to `mongodump` for creating a sufficient backup strategy for small deployments. The `mongorestore` program loads data from either a binary database dump (`mongodump` file) or the standard input into a `mongod` or `mongos` instance. -Like `mongodump`, `mongorestore` needs to be run in the system command-line rather than the `mongo` shell. It works against the running `mongod` instance as well making it inefficient as a restoration strategy for anything more than a small deployment. +Like `mongodump`, `mongorestore` needs to be run in the system command-line rather than the `mongo` shell. It works against the running `mongod` instance as well making it inefficient as a restoration strategy for anything more than a small deployment. The basic syntax for `mongorestore` looks like the following: - mongorestore + mongorestore The additional [options](https://www.mongodb.com/docs/database-tools/mongorestore/#options) for `mongorestore` can be added to meet whatever requirements you may need for your backup strategy or standalone imports. ### `bsondump` -`bsondump` is a tool for reading binary files produced from using `mongodump`. The `bsondump` utility converts BSON files into human-readable formats, including JSON. -`bsondump` must be run in the command line, and it is a diagnostic tool for inspecting BSON files. It is not meant to be used for data ingestion or other application use. +`bsondump` is a tool for reading binary files produced from using `mongodump`. The `bsondump` utility converts BSON files into human-readable formats, including JSON. + +`bsondump` must be run in the command line, and it is a diagnostic tool for inspecting BSON files. It is not meant to be used for data ingestion or other application use. `bsondump` uses [Extended JSON v2.0 (Canonical Mode)](https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/) to format its data. By default `bsondump` writes to standard output. To create a JSON file, you can use the following `--outFile` option: - bsondump --outFile=file.json file.bson + bsondump --outFile=file.json file.bson `--outFile` specifies the path of the file which `bsondump` should write its output JSON data. `file.bson` specifies the file to be converted. Other additional options are available in depth in the [MongoDB Documentation](https://www.mongodb.com/docs/database-tools/bsondump/#options). `bsondump` is particularly useful for any `mongodump` debugging tasks where the file needs to become human readable. For example, you can do the following to produce a debugging output: - bsondump --type=debug file.bson - + bsondump --type=debug file.bson ## Data import / export ### `mongoexport` -The `mongoexport` tool can also export data from a MongoDB instance. This command-line tool, however, produces a JSON or CSV export of the data rather than a binary dump like `mongodump` making it a slower operation. + +The `mongoexport` tool can also export data from a MongoDB instance. This command-line tool, however, produces a JSON or CSV export of the data rather than a binary dump like `mongodump` making it a slower operation. In order to use `mongoexport`, a user requires at least read access on the target database. They can either be connected to a `mongod` or `mongos` instance. The basic syntax for `mongoexport` looks as follows: - mongoexport --collection= + mongoexport --collection= -There are many [additional options](https://www.mongodb.com/docs/database-tools/mongoexport/#options) you can incorporate depending on your connection needs and use case. Because `mongoexport` produces a JSON or CSV export, in order to preserve all rich [BSON data types](https://www.prisma.io/dataguide/mongodb/mongodb-datatypes) for a full instance backup you will need to specify [Extended JSON v2.0 (Canonical mode)](https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/). +There are many [additional options](https://www.mongodb.com/docs/database-tools/mongoexport/#options) you can incorporate depending on your connection needs and use case. Because `mongoexport` produces a JSON or CSV export, in order to preserve all rich [BSON data types](https://www.prisma.io/dataguide/mongodb/mongodb-datatypes) for a full instance backup you will need to specify [Extended JSON v2.0 (Canonical mode)](https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/). This is an important option to know because JSON can only directly represent some of the types supported by BSON. Therefore, you must append the `--jsonFormat` option and set to `canonical`. An example would something like the following: - mongoexport --jsonFormat=canonical --collection= + mongoexport --jsonFormat=canonical --collection= Like `mongodump`, `mongoexport` has a partner import tool that will be able to render the exported file for import into MongoDB. ### `mongoimport` -The `mongoimport` tool imports the data captured from an Extended JSON (`mongoexport` file with preserved BSON data types), CSV, or TSV exports created from the `mongoexport` tool. With the correct formatting, `mongoimport` can also import files from a third-party export tool. + +The `mongoimport` tool imports the data captured from an Extended JSON (`mongoexport` file with preserved BSON data types), CSV, or TSV exports created from the `mongoexport` tool. With the correct formatting, `mongoimport` can also import files from a third-party export tool. The `mongoimport` tool can only be used from the system command-line and not the `mongo` shell. It has the following basic syntax: - mongoimport + mongoimport -`mongoimport` restores a database from a backup taken with `mongoexport`. Therefore most of the arguments for both are the same. It is best practice that when using these tools together for a backup strategy that they are on the same version. +`mongoimport` restores a database from a backup taken with `mongoexport`. Therefore most of the arguments for both are the same. It is best practice that when using these tools together for a backup strategy that they are on the same version. `mongoimport` also only supports data files that are UTF-8 encoded. If you attempt to import with any other encoding, then it will result in an error. An exhaustive list of additional option configuration can be found in the [official MongoDB documentation](https://www.mongodb.com/docs/database-tools/mongoimport/#options). ## Diagnostic tools ### `mongostat` + MongoDB also has helpful tools for gathering insights on any of your database instances. One such tool is `mongostat`. `mongostat` is a diagnostic tool that provides a quick overview of the status of a currently running `mongod` or `mongos` instance. If you are familiar with UNIX/Linux, this will sound familiar to `vmstat` except in a MongoDB context. The `mongostat` utility can only be run from the system command line and not the `mongo` shell. In order to connect to a `mongod` instance and use the `mongostat` tool, a user must have the `serverStatus` privilege action on the cluster. MongoDB has a built-in role called `clusterMonitor` that provides this. It is also possible to [customize other roles](https://www.mongodb.com/docs/manual/tutorial/manage-users-and-roles/#std-label-create-role-for-mongostat) to take advantage of `mongostat.` The basic syntax for `mongostat` is as follows: - mongostat + mongostat -By default, `mongostat` reports values that reflect operations over a 1 second period. However, you can adjust this with the [`` argument](https://www.mongodb.com/docs/database-tools/mongostat/#std-option-mongostat.-sleeptime-). Adjusting this time period to anything greater than 1 second averages the statistics to reflect the average operations per second. +By default, `mongostat` reports values that reflect operations over a 1 second period. However, you can adjust this with the [`` argument](https://www.mongodb.com/docs/database-tools/mongostat/#std-option-mongostat.-sleeptime-). Adjusting this time period to anything greater than 1 second averages the statistics to reflect the average operations per second. `mongostat` returns many fields fields, and it can be customized to return only the fields of interest. Another important option to know is `--rowcount=, -n=`. This option limits the amount of rows returned by `mongostat`. Some examples of the fields returned are: -- `inserts` : The number of objects inserted into the database per second. +- `inserts` : The number of objects inserted into the database per second. - `query` : The number of query operations per second. -- `vsize` : The amount of virtual memory in megabytes used by the process at the time of the last `mongostat` call. +- `vsize` : The amount of virtual memory in megabytes used by the process at the time of the last `mongostat` call. - `repl` : The replication status of the member. There are many more fields covered in the [Official MongoDB Documentation](https://www.mongodb.com/docs/database-tools/mongostat/#fields), but these few examples demonstrate the `mongostat` utility’s capability for database monitoring from the system command-line. ### `mongotop` + While `mongostat` is a useful tool for monitoring on a database level, `mongotop` is a useful tool to know for providing statistics on a per-collection level. Specifically, `mongotop` provides a method to track the amount of time a `mongod` instance spends reading and writing data every second. `mongotop` can only be run from the command line, and its basic syntax looks as follows: - mongotop + mongotop `mongotop` returns the following fields: @@ -146,14 +148,16 @@ While `mongostat` is a useful tool for monitoring on a database level, `mongotop `mongotop` allows a database user to monitor the traffic of a collection within a database. You’ll be able to form an image of when the collection is experiencing spikes or lulls in read or write operations. ## GridFS -[GridFS](https://www.mongodb.com/docs/manual/core/gridfs/) is a convention for storing large files in a MongoDB database. All of the official MongoDB drivers support this convention, as does the following `mongofiles` program. It acts as an abstraction layer for storage and recovery of large files such as videos, audios, and images. + +[GridFS](https://www.mongodb.com/docs/manual/core/gridfs/) is a convention for storing large files in a MongoDB database. All of the official MongoDB drivers support this convention, as does the following `mongofiles` program. It acts as an abstraction layer for storage and recovery of large files such as videos, audios, and images. ### `mongofiles` + The `mongofiles` tool makes it possible to manipulate files stored in a MongoDB instance as GridFS objects from the system’s command line. This is particularly useful because it provides an interface between the objects stored in your file system and GridFS. The basic syntax of `mongofiles` is the following: - mongofiles + mongofiles The `` component determines what action you would like the `mongofiles` utility to take. Some example [commands](https://www.mongodb.com/docs/database-tools/mongofiles/#std-label-mongofiles-commands) are: @@ -161,16 +165,17 @@ The `` component determines what action you would like the `mongofiles` - `search ` : Lists the files in the GridFS store with names that match any portion of ``. - `delete ` : Delete the specified file from GridFS storage. -`mongofiles` provides interconnectivity between your local file system and GridFS that is conveniently navigable via the system command line. This makes file management and file storage a simpler task for database administrators and enhances data processing. +`mongofiles` provides interconnectivity between your local file system and GridFS that is conveniently navigable via the system command line. This makes file management and file storage a simpler task for database administrators and enhances data processing. ## Conclusion -In this article, we discussed some of the MongoDB database tools and utilities that make important database tasks simpler via the command-line. A tool may be an essential to everyday database administrative operations or only needed ad-hoc. -Whether it’s exporting/importing data for maintaining a sound backup/recovery strategy, diagnostic monitoring on a database or collection level, or simplifying the interface between file systems for file management, MongoDB has you covered. +In this article, we discussed some of the MongoDB database tools and utilities that make important database tasks simpler via the command-line. A tool may be an essential to everyday database administrative operations or only needed ad-hoc. + +Whether it’s exporting/importing data for maintaining a sound backup/recovery strategy, diagnostic monitoring on a database or collection level, or simplifying the interface between file systems for file management, MongoDB has you covered. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). diff --git a/content/08-mongodb/17-mongodb-sorting.mdx b/content/08-mongodb/17-mongodb-sorting.mdx index d11196a6..59d06513 100644 --- a/content/08-mongodb/17-mongodb-sorting.mdx +++ b/content/08-mongodb/17-mongodb-sorting.mdx @@ -1,114 +1,112 @@ --- title: 'How to sort query results in MongoDB' -metaTitle: "MongoDB Sort Records: How to Sort by Date, Name, and More" -metaDescription: "Learn how to sort MongoDB data in ascending and descending order, with single or multiple fields, including by date." +metaTitle: 'MongoDB Sort Records: How to Sort by Date, Name, and More' +metaDescription: 'Learn how to sort MongoDB data in ascending and descending order, with single or multiple fields, including by date.' metaImage: '/social/generic-mongodb.png' authors: ['justinellingwood'] --- ## Introduction -Sorting data on display or retrieval is a key operation for most database systems that helps differentiate them from other data storage mechanisms. Being able to manipulate the ordering, prioritization, and interpretation of various fields independently of their stored ordinality is one of the most useful features of both the database itself and its associated querying system. +Sorting data on display or retrieval is a key operation for most database systems that helps differentiate them from other data storage mechanisms. Being able to manipulate the ordering, prioritization, and interpretation of various fields independently of their stored ordinality is one of the most useful features of both the database itself and its associated querying system. -MongoDB provides many ways of controlling the way data is sorted when returned from queries. In this guide, we'll cover how to sort data in a variety of ways depending on your use case. We'll go over simple and compound sorts, how to change sort ordering, and how sorting is applied in combination with other operators. +MongoDB provides many ways of controlling the way data is sorted when returned from queries. In this guide, we'll cover how to sort data in a variety of ways depending on your use case. We'll go over simple and compound sorts, how to change sort ordering, and how sorting is applied in combination with other operators. -When using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) with the [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb), you can sort your results using the [`orderBy` API](https://www.prisma.io/docs/reference/api-reference/prisma-client-reference#orderby). +When using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) with the [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb), you can sort your results using the [`orderBy` API](https://www.prisma.io/docs/orm/reference/prisma-client-reference#orderby). -The documentation includes an overview of how to use this feature to [sort results in many flexible ways](https://www.prisma.io/docs/concepts/components/prisma-client/filtering-and-sorting#sorting). +The documentation includes an overview of how to use this feature to [sort results in many flexible ways](https://www.prisma.io/docs/orm/prisma-client/queries/filtering-and-sorting#sorting). ## Setting up example data -In order to demonstrate how sorting works, we'll query a number of documents contained within a `students` [collection](/intro/database-glossary#collections). You can create the `students` collection and insert the documents we'll be querying by copying and pasting the following: +In order to demonstrate how sorting works, we'll query a number of documents contained within a `students` [collection](/intro/database-glossary#collections). You can create the `students` collection and insert the documents we'll be querying by copying and pasting the following:
View the data insertion command ```javascript -db.students.insertMany( - [ - { - "first_name": "Carol", - "last_name": "Apple", - dob: ISODate("2010-10-30"), - address: { - street: { - name: "Flint Rd.", - number: "803", - }, - city: "Camden", - zip: "10832", - }, - }, - { - "first_name": "Spencer", - "last_name": "Burton", - dob: ISODate("2008-12-04"), - address: { - street: { - name: "Edgecombe St.", - number: "2083b", - }, - city: "Zoofreid", - zip: "80828", - }, - }, - { - "first_name": "Nixie", - "last_name": "Languin", - dob: ISODate("2011-02-11"), - address: { - street: { - name: "Kensington Ln.", - number: "33", - }, - city: "Zoofreid", - zip: "80829", - }, - }, - { - "first_name": "Anthony", - "last_name": "Apple", - dob: ISODate("2009-08-16"), - address: { - street: { - name: "Flint Rd.", - number: "803", - }, - city: "Camden", - zip: "10832", - }, - }, - { - "first_name": "Rose", - "last_name": "Southby", - dob: ISODate("2011-03-03"), - address: { - street: { - name: "Plainfield Dr.", - number: "4c", - }, - city: "Nambles", - zip: "38008", - }, - }, - { - "first_name": "Lain", - "last_name": "Singh", - dob: ISODate("2013-06-22"), - address: { - street: { - name: "Plainfield Dr.", - number: "308", - }, - city: "Brighton", - zip: "18002", - }, - }, - ] -) +db.students.insertMany([ + { + first_name: 'Carol', + last_name: 'Apple', + dob: ISODate('2010-10-30'), + address: { + street: { + name: 'Flint Rd.', + number: '803', + }, + city: 'Camden', + zip: '10832', + }, + }, + { + first_name: 'Spencer', + last_name: 'Burton', + dob: ISODate('2008-12-04'), + address: { + street: { + name: 'Edgecombe St.', + number: '2083b', + }, + city: 'Zoofreid', + zip: '80828', + }, + }, + { + first_name: 'Nixie', + last_name: 'Languin', + dob: ISODate('2011-02-11'), + address: { + street: { + name: 'Kensington Ln.', + number: '33', + }, + city: 'Zoofreid', + zip: '80829', + }, + }, + { + first_name: 'Anthony', + last_name: 'Apple', + dob: ISODate('2009-08-16'), + address: { + street: { + name: 'Flint Rd.', + number: '803', + }, + city: 'Camden', + zip: '10832', + }, + }, + { + first_name: 'Rose', + last_name: 'Southby', + dob: ISODate('2011-03-03'), + address: { + street: { + name: 'Plainfield Dr.', + number: '4c', + }, + city: 'Nambles', + zip: '38008', + }, + }, + { + first_name: 'Lain', + last_name: 'Singh', + dob: ISODate('2013-06-22'), + address: { + street: { + name: 'Plainfield Dr.', + number: '308', + }, + city: 'Brighton', + zip: '18002', + }, + }, +]) ```
@@ -117,21 +115,27 @@ Once you've inserted the above documents, continue to the next section to learn ## How to sort a single field -The basic approach to sorting results in MongoDB is to append the `.sort()` method onto a query. The `.sort()` method takes a document as an argument specifying the fields to sort as well as the sort direction. +The basic approach to sorting results in MongoDB is to append the `.sort()` method onto a query. The `.sort()` method takes a document as an argument specifying the fields to sort as well as the sort direction. The most basic way to sort results it to provide a document specifying a single field indicating the column name with a value of `1` indicating an ascending sort: -> Note that we're providing a [MongoDB projection](https://www.mongodb.com/docs/manual/tutorial/project-fields-from-query-results/) as the second argument to `.find()` to only display certain fields. We're also appending the `.pretty()` method to make the output more readable. +> Note that we're providing a [MongoDB projection](https://www.mongodb.com/docs/manual/tutorial/project-fields-from-query-results/) as the second argument to `.find()` to only display certain fields. We're also appending the `.pretty()` method to make the output more readable. ```javascript -db.students.find({}, { - _id: 0, - first_name: 1, - last_name: 1, - dob: 1 -}).sort({ - dob: 1 -}).pretty() +db.students + .find( + {}, + { + _id: 0, + first_name: 1, + last_name: 1, + dob: 1, + } + ) + .sort({ + dob: 1, + }) + .pretty() ``` The above query will return the students organized by their date of birth in the default ascending order: @@ -172,15 +176,22 @@ The above query will return the students organized by their date of birth in the To reverse the ordering, set the sort column to `-1` instead of `1`: ```javascript -db.students.find({}, { - _id: 0, - first_name: 1, - last_name: 1, - dob: 1 -}).sort({ - dob: -1 -}).pretty() +db.students + .find( + {}, + { + _id: 0, + first_name: 1, + last_name: 1, + dob: 1, + } + ) + .sort({ + dob: -1, + }) + .pretty() ``` + ```javascript { "first_name" : "Lain", @@ -216,19 +227,26 @@ db.students.find({}, { ## How to sort on additional fields -MongoDB can use additional fields to control sorting for cases where the primary sort field contains duplicates. To do so, you can pass the extra fields and their sort order within the document that you pass to the `sort()` function. +MongoDB can use additional fields to control sorting for cases where the primary sort field contains duplicates. To do so, you can pass the extra fields and their sort order within the document that you pass to the `sort()` function. For example, if we sort the `student` documents by `last_name`, we can get an alphabetical list of students based on that one field: ```javascript -db.students.find({}, { - _id: 0, - first_name: 1, +db.students + .find( + {}, + { + _id: 0, + first_name: 1, + last_name: 1, + } + ) + .sort({ last_name: 1, -}).sort({ - last_name: 1 -}).pretty() + }) + .pretty() ``` + ```javascript { "first_name" : "Carol", "last_name" : "Apple" } { "first_name" : "Anthony", "last_name" : "Apple" } @@ -243,15 +261,22 @@ However, there are two students with the last name of "Apple" and the returned o To fix this, we can use `first_name` as a secondary sort field: ```javascript -db.students.find({}, { - _id: 0, - first_name: 1, - last_name: 1, -}).sort({ +db.students + .find( + {}, + { + _id: 0, + first_name: 1, + last_name: 1, + } + ) + .sort({ last_name: 1, - first_name: 1 -}).pretty() + first_name: 1, + }) + .pretty() ``` + ```javascript { "first_name" : "Anthony", "last_name" : "Apple" } { "first_name" : "Carol", "last_name" : "Apple" } @@ -265,20 +290,27 @@ After that further specification, the results match the conventional alphabetica ## How to sort using embedded document fields -MongoDB can also sort results based on the values included in embedded documents. To do so, use [dot notation](https://www.mongodb.com/docs/v5.0/core/document/#dot-notation) to drill down to the appropriate field in the embedded document. +MongoDB can also sort results based on the values included in embedded documents. To do so, use [dot notation](https://www.mongodb.com/docs/v5.0/core/document/#dot-notation) to drill down to the appropriate field in the embedded document. -For example, you can sort the `student` data based on the `city` where they live, which is a component of the `address` within each document. Keep in mind that when using dot notation, you need to quote the field names to ensure that they are interpreted correctly: +For example, you can sort the `student` data based on the `city` where they live, which is a component of the `address` within each document. Keep in mind that when using dot notation, you need to quote the field names to ensure that they are interpreted correctly: ```javascript -db.students.find({}, { - _id: 0, - first_name: 1, - last_name: 1, - "address.city": 1 -}).sort({ - "address.city": 1 -}).pretty() +db.students + .find( + {}, + { + _id: 0, + first_name: 1, + last_name: 1, + 'address.city': 1, + } + ) + .sort({ + 'address.city': 1, + }) + .pretty() ``` + ```javascript { "first_name" : "Lain", @@ -327,28 +359,34 @@ db.students.find({}, { You can couple this with additional sort fields to ensure that the results are ordered exactly as you'd like them to be: ```javascript -db.students.find({}, { - _id: 0, - first_name: 1, - last_name: 1, - "address.city": 1, - "address.street": 1 -}).sort({ - "address.city": 1, - "address.street.name": 1, - "address.street.number": 1, +db.students + .find( + {}, + { + _id: 0, + first_name: 1, + last_name: 1, + 'address.city': 1, + 'address.street': 1, + } + ) + .sort({ + 'address.city': 1, + 'address.street.name': 1, + 'address.street.number': 1, last_name: 1, first_name: 1, -}).pretty() + }) + .pretty() ``` In this example, we sorted by the following fields in order: -* City -* Street name -* Street number -* Last name -* First name +- City +- Street name +- Street number +- Last name +- First name The results of the query look like this: @@ -421,22 +459,28 @@ The results of the query look like this: } ``` -Now is also a good time to mention that the fields that you sort with do *not* have to be a subset of those you provide for the projection. +Now is also a good time to mention that the fields that you sort with do _not_ have to be a subset of those you provide for the projection. For example, we can achieve the same exact ordering but only return the student names by typing: ```javascript -db.students.find({}, { - _id: 0, - first_name: 1, - last_name: 1, -}).sort({ - "address.city": 1, - "address.street.name": 1, - "address.street.number": 1, +db.students + .find( + {}, + { + _id: 0, + first_name: 1, + last_name: 1, + } + ) + .sort({ + 'address.city': 1, + 'address.street.name': 1, + 'address.street.number': 1, last_name: 1, first_name: 1, -}).pretty() + }) + .pretty() ``` The query returns the following data: @@ -454,14 +498,14 @@ If you compare the results to that of the previous query, you can verify that th ## Conclusion -In this article, we took a look at how to use the `sort()` method to control how MongoDB orders the results of its queries. We covered single field sorting, sorting multiple fields by priority, changing the sort ordinality, and sorting based on embedded document fields. +In this article, we took a look at how to use the `sort()` method to control how MongoDB orders the results of its queries. We covered single field sorting, sorting multiple fields by priority, changing the sort ordinality, and sorting based on embedded document fields. -Combined with features like [document collation](https://www.mongodb.com/docs/manual/reference/method/cursor.collation/) and [result limiting](https://www.mongodb.com/docs/manual/reference/method/cursor.limit/), sorting enables you to control exactly how documents and fields are compared against one another and how they are returned. Getting familiar with these features can help you write better queries and return data in a state closer to how you'll use it. +Combined with features like [document collation](https://www.mongodb.com/docs/manual/reference/method/cursor.collation/) and [result limiting](https://www.mongodb.com/docs/manual/reference/method/cursor.limit/), sorting enables you to control exactly how documents and fields are compared against one another and how they are returned. Getting familiar with these features can help you write better queries and return data in a state closer to how you'll use it. -When using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) with the [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb), you can sort your results using the [`orderBy` API](https://www.prisma.io/docs/reference/api-reference/prisma-client-reference#orderby). +When using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) with the [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb), you can sort your results using the [`orderBy` API](https://www.prisma.io/docs/orm/reference/prisma-client-reference#orderby). -The documentation includes an overview of how to use this feature to [sort results in many flexible ways](https://www.prisma.io/docs/concepts/components/prisma-client/filtering-and-sorting#sorting). +The documentation includes an overview of how to use this feature to [sort results in many flexible ways](https://www.prisma.io/docs/orm/prisma-client/queries/filtering-and-sorting#sorting). diff --git a/content/08-mongodb/18-mongodb-aggregation-framework.mdx b/content/08-mongodb/18-mongodb-aggregation-framework.mdx index 52ab971a..e2ce8bb0 100644 --- a/content/08-mongodb/18-mongodb-aggregation-framework.mdx +++ b/content/08-mongodb/18-mongodb-aggregation-framework.mdx @@ -1,19 +1,20 @@ --- title: 'Introduction to MongoDB Aggregation Framework' -metaTitle: "MongoDB Aggregation Framework: How to simplify complex logic into stages" +metaTitle: 'MongoDB Aggregation Framework: How to simplify complex logic into stages' metaDescription: "Learn about MongoDB's Aggregation Framework and how to break complex logic into stages." metaImage: '/social/generic-mongodb.png' authors: ['alexemerich'] --- -## Introduction -MongoDB is a document-based NoSQL database where data is organized in collections that are made up of JSON documents. As with any database, MongoDB has a language that a user can use to access data. In MongoDB’s case, this language is the MongoDB Query Language or simply, MQL. Whether MQL or SQL, database queries can start off simple, but as a database scales more complex queries arise. +## Introduction -The [MongoDB Aggregation Framework](https://www.mongodb.com/docs/manual/core/aggregation-pipeline/) is a way to query documents from MongoDB in a way that breaks down these more confounding queries. It separates complex logic into sequential operations. In this guide, we will introduce the MongoDB Aggregation Framework, discuss common aggregation stages, and finish up with a simple aggregation pipeline example. +MongoDB is a document-based NoSQL database where data is organized in collections that are made up of JSON documents. As with any database, MongoDB has a language that a user can use to access data. In MongoDB’s case, this language is the MongoDB Query Language or simply, MQL. Whether MQL or SQL, database queries can start off simple, but as a database scales more complex queries arise. + +The [MongoDB Aggregation Framework](https://www.mongodb.com/docs/manual/core/aggregation-pipeline/) is a way to query documents from MongoDB in a way that breaks down these more confounding queries. It separates complex logic into sequential operations. In this guide, we will introduce the MongoDB Aggregation Framework, discuss common aggregation stages, and finish up with a simple aggregation pipeline example. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). @@ -21,22 +22,21 @@ To get started working with MongoDB and Prisma, checkout our [getting started fr ## How does the MongoDB Aggregation Framework work? -The purpose of MongoDB’s Aggregation Framework is to design a pipeline consisting of multiple stages for processing documents. You start with your collection's data and after each stage of the pipeline you are closer to the end result which will be the desired documents. +The purpose of MongoDB’s Aggregation Framework is to design a pipeline consisting of multiple stages for processing documents. You start with your collection's data and after each stage of the pipeline you are closer to the end result which will be the desired documents. -Each stage performs an operation on the documents. There are several operations that can be conducted. For example, a stage can filter, group, or even calculate values on the data. After each stage, the outputted documents are passed into the next stage and so on until no stages are left. +Each stage performs an operation on the documents. There are several operations that can be conducted. For example, a stage can filter, group, or even calculate values on the data. After each stage, the outputted documents are passed into the next stage and so on until no stages are left. With an aggregation framework, one can achieve several goals. We’ll go into specific examples with the actual operation syntax, but in theory an analyst for the fiction department at a bookstore could set up a framework that groups the number of purchases based on genre or author to inform the sales floor. They are able to iterate their query by adding stages until the data is just what they are looking for. No matter the team, there are insights to be had from data that are all the more easily discovered with the composition of an aggregation pipeline. - ## What are the most common MongoDB aggregation operations? There are approximately 38 aggregation stages available in the MongoDB framework at the time of this writing. We are not going to delve into all of them in this guide, but you can view the whole list in the [official MongoDB documentation](https://www.mongodb.com/docs/manual/reference/operator/aggregation-pipeline/#alphabetical-listing-of-stages). We’ll spend some time to highlight a few that will also get used in an example pipeline. - `$project` : Reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, outputs one document. -- `$match` : Filters the document stream to allow only matching documents to pass unmodified into the next pipeline stage. `$match` uses standard MongoDB queries. For each input document, outputs either one document (a match) or zero documents (no match). +- `$match` : Filters the document stream to allow only matching documents to pass unmodified into the next pipeline stage. `$match` uses standard MongoDB queries. For each input document, outputs either one document (a match) or zero documents (no match). - `$group` : Groups input documents by a specified identifier expression and applies the accumulator expression(s), if specified, to each group. Consumes all input documents and outputs one document per each distinct group. The output documents only contain the identifier field and, if specified, accumulated fields. -- `$sort` : Reorders the document stream by a specified sort key. Only the order changes; the documents remain unmodified. For each input document, outputs one document. -- `$skip` : Skips the first `n` documents where `n` is the specified skip number and passes the remaining documents unmodified to the pipeline. For each input document, outputs either zero documents (for the first `n` documents) or one document (if after the first `n` documents). +- `$sort` : Reorders the document stream by a specified sort key. Only the order changes; the documents remain unmodified. For each input document, outputs one document. +- `$skip` : Skips the first `n` documents where `n` is the specified skip number and passes the remaining documents unmodified to the pipeline. For each input document, outputs either zero documents (for the first `n` documents) or one document (if after the first `n` documents). - `$limit` : Passes the first `n` documents unmodified to the pipeline where `n` is the specified limit. For each input document, outputs either one document (for the first `n` documents) or zero documents (after the first `n` documents). - `$unwind` : Deconstructs an array field from the input documents to output a document for each element. Each output document replaces the array with an element value. For each input document, outputs `n` documents where `n` is the number of array elements and can be zero for an empty array. @@ -44,7 +44,7 @@ There are approximately 38 aggregation stages available in the MongoDB framework To bring aggregation to life with a practical example, we’ll run through setting up a pipeline with an imaginary bookstore. We’ll start with some inventory order data, and we’ll create a pipeline that takes this raw data and outputs which authors have multiple orders and how many copies of their books were ordered. -To begin, we’ll insert some sample order documents into the collection `bookOrders`. +To begin, we’ll insert some sample order documents into the collection `bookOrders`. ``` db.bookOrders.insertMany ( [ @@ -62,20 +62,20 @@ db.bookOrders.insertMany ( [ ] ) ``` -Now that our collection has some sample documents, we can start our query. Aggregation pipelines run with the `db..aggregate()` method. Our goal is to design a query that returns a list of the authors with the most total copies of their fiction books ordered. An example aggregation query can be found below with each stage described. +Now that our collection has some sample documents, we can start our query. Aggregation pipelines run with the `db..aggregate()` method. Our goal is to design a query that returns a list of the authors with the most total copies of their fiction books ordered. An example aggregation query can be found below with each stage described. ``` -db.bookOrders.aggregate ( [ - // Stage 1: The $match operator scans the collection for documents - matching the specified condition to pass to the next stage. - { - $match: - { - genre: "Fiction" - } - }, - - // Stage 2: The $project operator specifies which fields +db.bookOrders.aggregate ( [ + // Stage 1: The $match operator scans the collection for documents + matching the specified condition to pass to the next stage. + { + $match: + { + genre: "Fiction" + } + }, + + // Stage 2: The $project operator specifies which fields in the matched documents should pass onto the next stage. { $project: @@ -84,22 +84,22 @@ db.bookOrders.aggregate ( [ quantity : 1 } }, - - // Stage 3: The $group operator groups the documents by the specified expression + + // Stage 3: The $group operator groups the documents by the specified expression and outputs a document for each unique grouping. The _id field specifies the distinct key to group by. - { - $group: - { - _id: "$last_name", - totalQuantity: { $sum: "$quantity" } } - }, - - // Stage 4: The $sort operator specifies the field(s) to sort by and the order. - -1 specifies a descending order and 1 specifies ascending order. - { - $sort: - { totalQuantity: -1 } - } + { + $group: + { + _id: "$last_name", + totalQuantity: { $sum: "$quantity" } } + }, + + // Stage 4: The $sort operator specifies the field(s) to sort by and the order. + -1 specifies a descending order and 1 specifies ascending order. + { + $sort: + { totalQuantity: -1 } + } ] ) ``` @@ -118,7 +118,7 @@ After running our aggregation query, we get the following output: This example is intentionally simple, but it demonstrates how an aggregation pipeline can take some of the complexity out of some queries. Each step to reaching your desired output is clearly broken down and compartmentalized into a clear stage. -Depending on the collection and document data structure, there are optmizations to consider when building an aggregation pipeline. Additionally, this framework may not work for all complex logic. It is case dependent. +Depending on the collection and document data structure, there are optmizations to consider when building an aggregation pipeline. Additionally, this framework may not work for all complex logic. It is case dependent. One small optimization that should be pointed out can be seen in the first two stages of our example. Generally, the `$match` operator is used to begin most pipelines and is best practice. However, if your collection is full of very large documents, then it is recommended to begin with the `$project` operator instead. Starting with `$project` limits the amount of fields that get passed onto the next stage earlier in the pipeline and reduces some unnecessary load. @@ -126,12 +126,12 @@ One small optimization that should be pointed out can be seen in the first two s In this article, we introduced MongoDB’s Aggregation Framework. We discussed what it is and how it can be a tool for simplifying complex logic and longwinded queries. An aggregation pipeline’s stages break logic down into blocks that can be easily followed and manipulated. -Aggregation pipelines simplify data access, and it is important to understand how it works. MongoDB’s Aggregation Framework can be used to do even more than we demonstrated in our bookstore example, and we hope this introduction starts you down the path of further exploration. +Aggregation pipelines simplify data access, and it is important to understand how it works. MongoDB’s Aggregation Framework can be used to do even more than we demonstrated in our bookstore example, and we hope this introduction starts you down the path of further exploration. -If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/concepts/database-connectors/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage production MongoDB databases with confidence. +If you're using MongoDB, checkout Prisma's [MongoDB connector](https://www.prisma.io/docs/orm/overview/databases/mongodb)! You can use the [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage production MongoDB databases with confidence. To get started working with MongoDB and Prisma, checkout our [getting started from scratch guide](https://www.prisma.io/docs/getting-started) or how to [add to an existing project](https://www.prisma.io/docs/getting-started/setup-prisma/add-to-existing-project/mongodb-typescript-mongodb). - \ No newline at end of file + diff --git a/content/09-database-tools/01-top-11-nodejs-orms-query-builders-and-database-libraries.mdx b/content/09-database-tools/01-top-11-nodejs-orms-query-builders-and-database-libraries.mdx index cced6e7a..adbef489 100644 --- a/content/09-database-tools/01-top-11-nodejs-orms-query-builders-and-database-libraries.mdx +++ b/content/09-database-tools/01-top-11-nodejs-orms-query-builders-and-database-libraries.mdx @@ -107,7 +107,7 @@ Prisma currently supports **PostgreSQL, MySQL, MSSQL, and SQLite**. Additionally - VSCode plugin - Autocompletion support -For a full list of features consult [Database Features](https://www.prisma.io/docs/reference/database-reference/database-features) from the Prisma official documentation. +For a full list of features consult [Database Features](https://www.prisma.io/docs/orm/reference/database-features) from the Prisma official documentation. ### Usage example @@ -125,7 +125,7 @@ Although Prisma is a newer database tool and has gone through several iterations -If you want to learn more about why we think Prisma is a great option, check out our [Why Prisma? page](https://www.prisma.io/docs/concepts/overview/why-prisma). +If you want to learn more about why we think Prisma is a great option, check out our [Why Prisma? page](https://www.prisma.io/docs/orm/overview/introduction/why-prisma). @@ -207,7 +207,7 @@ Sequelize is an established, stable ActiveRecord ORM and due to its popularity a -For a more focused comparison of Prisma and Sequelize, you can look at our [Sequelize comparison page](https://www.prisma.io/docs/concepts/more/comparisons/prisma-and-sequelize). +For a more focused comparison of Prisma and Sequelize, you can look at our [Sequelize comparison page](https://www.prisma.io/docs/orm/more/comparisons/prisma-and-sequelize). @@ -286,7 +286,7 @@ TypeORM and Sequelize are the two most popular relational database ORMs. TypeORM -For a more focused comparison of Prisma and TypeORM, you can look at our [TypeORM comparison page](https://www.prisma.io/docs/concepts/more/comparisons/prisma-and-typeorm). +For a more focused comparison of Prisma and TypeORM, you can look at our [TypeORM comparison page](https://www.prisma.io/docs/orm/more/comparisons/prisma-and-typeorm). @@ -361,7 +361,7 @@ If you're using a MongoDB database with Node and want to use an ODM, Mongoose is -For a more focused comparison of Prisma and Mongoose, you can look at our [Mongoose comparison page](https://www.prisma.io/docs/concepts/more/comparisons/prisma-and-mongoose). +For a more focused comparison of Prisma and Mongoose, you can look at our [Mongoose comparison page](https://www.prisma.io/docs/orm/more/comparisons/prisma-and-mongoose). diff --git a/content/09-database-tools/02-evaluating-type-safety-in-the-top-8-typescript-orms.md b/content/09-database-tools/02-evaluating-type-safety-in-the-top-8-typescript-orms.md index 2fae4202..f1001a55 100644 --- a/content/09-database-tools/02-evaluating-type-safety-in-the-top-8-typescript-orms.md +++ b/content/09-database-tools/02-evaluating-type-safety-in-the-top-8-typescript-orms.md @@ -1,7 +1,7 @@ --- title: 'Top 8 TypeScript ORMs, query builders, & database libraries: evaluating type safety' metaTitle: 'Top 8 TypeScript ORMs, Query Builders, Libraries: Evaluate Type Safety' -metaDescription: "This article assesses the type safety of popular TypeScript ORMs, query builders, and database libraries." +metaDescription: 'This article assesses the type safety of popular TypeScript ORMs, query builders, and database libraries.' metaImage: '/social/typescript-orms-2022.png' --- @@ -13,29 +13,29 @@ While all of the libraries considered in this article have TypeScript bindings f This article will look at the following: -* **Source**: Are library type definitions officially built-in, or sourced from the [DefinitelyTyped](https://github.com/DefinitelyTyped/DefinitelyTyped) @types repository? -* **Record Creation:** Are models type-safe and can records be created in a type-safe manner? -* **Record Fetching**: When fetching data, are objects type-safe, even for partial models and relations? +- **Source**: Are library type definitions officially built-in, or sourced from the [DefinitelyTyped](https://github.com/DefinitelyTyped/DefinitelyTyped) @types repository? +- **Record Creation:** Are models type-safe and can records be created in a type-safe manner? +- **Record Fetching**: When fetching data, are objects type-safe, even for partial models and relations? This article will assume some familiarity with TypeScript and type safety. To learn more, please consult the official [TypeScript documentation](https://www.typescriptlang.org/docs). It will also assume some familiarity with ORMs and query builders. To learn more about these database tools, please see [Comparing SQL, query builders, and ORMs](https://www.prisma.io/dataguide/types/relational/comparing-sql-query-builders-and-orms), also from Prisma's [Data Guide](https://www.prisma.io/dataguide). -**Note:** This article was originally published on October 2, 2020. It was most recently updated on February 15, 2022. +**Note:** This article was originally published on October 2, 2020. It was most recently updated on February 15, 2022. ## Prisma ### Evaluation summary -* **Type definitions**: Built-in -* **Record creation**: Type-safe -* **Record fetching**: Type-safe +- **Type definitions**: Built-in +- **Record creation**: Type-safe +- **Record fetching**: Type-safe ### Overview -* [Website](https://www.prisma.io/) -* [GitHub](https://github.com/prisma/prisma) -* [npm: @prisma/client](https://www.npmjs.com/package/@prisma/client) +- [Website](https://www.prisma.io/) +- [GitHub](https://github.com/prisma/prisma) +- [npm: @prisma/client](https://www.npmjs.com/package/@prisma/client) -Prisma differs from most ORMs in that models are not defined in classes but in the *Prisma schema*, the main configuration and data model definition file used by the Prisma toolkit. In the Prisma schema you define your data source, like a PostgreSQL database, and models, like `users` and `posts` and the relations between them. Using this schema, Prisma generates a type-safe *Client* that exposes a Create-Read-Update-Delete (CRUD) API, which you then use to query your database. This Prisma Client functions as a rich query builder that you can use in your Node.js app to return plain JavaScript objects, not instances of a model class. +Prisma differs from most ORMs in that models are not defined in classes but in the _Prisma schema_, the main configuration and data model definition file used by the Prisma toolkit. In the Prisma schema you define your data source, like a PostgreSQL database, and models, like `users` and `posts` and the relations between them. Using this schema, Prisma generates a type-safe _Client_ that exposes a Create-Read-Update-Delete (CRUD) API, which you then use to query your database. This Prisma Client functions as a rich query builder that you can use in your Node.js app to return plain JavaScript objects, not instances of a model class. ### What is Prisma? @@ -43,7 +43,7 @@ Prisma is a newer ORM and has gone through several iterations and redesigns. Its -If you want to learn more about why we think Prisma is a great option, check out our [Why Prisma? page](https://www.prisma.io/docs/concepts/overview/why-prisma). +If you want to learn more about why we think Prisma is a great option, check out our [Why Prisma? page](https://www.prisma.io/docs/orm/overview/introduction/why-prisma). @@ -89,21 +89,21 @@ This means that attempting to access `post` fields that weren't selected, like ` ### Type safety: strong -Prisma's unique design of generating a local CRUD client that encodes your data model allows it to achieve an unparalleled level of type safety among TypeScript ORMs. When using Prisma to manipulate and query data from your database, you'll have accurate typings for nested relation queries and also partial queries that modify the shape of returned models. +Prisma's unique design of generating a local CRUD client that encodes your data model allows it to achieve an unparalleled level of type safety among TypeScript ORMs. When using Prisma to manipulate and query data from your database, you'll have accurate typings for nested relation queries and also partial queries that modify the shape of returned models. ## Sequelize ### Evaluation summary -* **Type definitions**: Built-in -* **Record creation**: Not Type-safe -* **Record fetching**: Not Type-safe +- **Type definitions**: Built-in +- **Record creation**: Not Type-safe +- **Record fetching**: Not Type-safe ### Overview -* [Website](https://sequelize.org/) -* [GitHub](https://github.com/sequelize/sequelize/) -* [npm: sequelize](https://www.npmjs.com/package/sequelize) +- [Website](https://sequelize.org/) +- [GitHub](https://github.com/sequelize/sequelize/) +- [npm: sequelize](https://www.npmjs.com/package/sequelize) ### What is Sequelize @@ -111,7 +111,7 @@ Sequelize is an established, mature, promise-based Node.js ORM that supports Pos -For a more focused comparison of Prisma and Sequelize, you can look at our [Sequelize comparison page](https://www.prisma.io/docs/concepts/more/comparisons/prisma-and-sequelize). +For a more focused comparison of Prisma and Sequelize, you can look at our [Sequelize comparison page](https://www.prisma.io/docs/orm/more/comparisons/prisma-and-sequelize). @@ -121,7 +121,7 @@ As of v5 (at the time of writing, Sequelize is v6.16.1), Sequelize contains buil ### Record creation: not type-safe -Out-of-the-box, Sequelize will not provide strict type-checking for model properties. To implement this, the developer must write a [non-trivial](https://sequelize.org/master/manual/typescript.html) amount of boilerplate including `interfaces`, classes and definitions for CRUD methods for any relations. For complex data models with multiple relations, this can quickly become cumbersome and unwieldy. When creating records using mixins added to models or using nested models, it is again up to the developer to provide type definitions. +Out-of-the-box, Sequelize will not provide strict type-checking for model properties. To implement this, the developer must write a [non-trivial](https://sequelize.org/master/manual/typescript.html) amount of boilerplate including `interfaces`, classes and definitions for CRUD methods for any relations. For complex data models with multiple relations, this can quickly become cumbersome and unwieldy. When creating records using mixins added to models or using nested models, it is again up to the developer to provide type definitions. Sequelize also allows you to define models without type checking their attributes. Using this approach, you can get up and running quickly with Sequelize and TypeScript, but lose all type safety when working with your data. @@ -137,15 +137,15 @@ As of v5, Sequelize provides built-in type definitions, but to have any sort of ### Evaluation summary -* **Type definitions**: Built-in -* **Record creation**: Type-safe -* **Record fetching**: Partially Type-safe +- **Type definitions**: Built-in +- **Record creation**: Type-safe +- **Record fetching**: Partially Type-safe ### Overview -* [Website](https://typeorm.io/#/) -* [GitHub](https://github.com/typeorm) -* [npm: TypeORM](https://www.npmjs.com/package/typeorm) +- [Website](https://typeorm.io/#/) +- [GitHub](https://github.com/typeorm) +- [npm: TypeORM](https://www.npmjs.com/package/typeorm) ### What is TypeORM? @@ -153,17 +153,17 @@ TypeORM is a Hibernate-influenced JavaScript and TypeScript ORM that can run on -For a more focused comparison of Prisma and TypeORM, you can look at our [TypeORM comparison page](https://www.prisma.io/docs/concepts/more/comparisons/prisma-and-typeorm). +For a more focused comparison of Prisma and TypeORM, you can look at our [TypeORM comparison page](https://www.prisma.io/docs/orm/more/comparisons/prisma-and-typeorm). ### Type definitions: built-in -TypeORM is a TypeScript-first ORM that was explicitly designed for use with TypeScript. Types are built-in to the library and it leverages TypeScript features like decorators when defining models. +TypeORM is a TypeScript-first ORM that was explicitly designed for use with TypeScript. Types are built-in to the library and it leverages TypeScript features like decorators when defining models. ### Record creation: type safe -With TypeORM, models are defined using the `Entity` class. You decorate a model class (like `User`) with the `@Entity()` decorator, and decorate its properties like `id` and `name` with column decorators like `@PrimaryGeneratedColumn()` and `@Column`. If you're using the DataMapper pattern, a record is then defined by creating a new instance of the now type-safe model class and setting its properties. The record is saved using a model-specific `Repository` object, which is also typed. +With TypeORM, models are defined using the `Entity` class. You decorate a model class (like `User`) with the `@Entity()` decorator, and decorate its properties like `id` and `name` with column decorators like `@PrimaryGeneratedColumn()` and `@Column`. If you're using the DataMapper pattern, a record is then defined by creating a new instance of the now type-safe model class and setting its properties. The record is saved using a model-specific `Repository` object, which is also typed. Nested writes are accomplished by creating an instance of the related class (for example a `Post` for a `User`) and then saving both the `User` and `Post` objects. Using the `cascade` feature, this can be done with one `save` call. With TypeORM, model type-safety is available out-of-the-box. @@ -171,14 +171,14 @@ Using the query builder, model properties are also type-checked: ```javascript await conn - .createQueryBuilder() - .insert() - .into(User) - .values([ - { firstName: "Timber", lastName: "Saw" }, - { firstName: "Phantom", lastName: "Lancer" } - ]) - .execute(); + .createQueryBuilder() + .insert() + .into(User) + .values([ + { firstName: 'Timber', lastName: 'Saw' }, + { firstName: 'Phantom', lastName: 'Lancer' }, + ]) + .execute() ``` If the `User` class does not have a `firstName` field, the compiler will emit an error. @@ -186,18 +186,14 @@ If the `User` class does not have a `firstName` field, the compiler will emit an When using relations with the query builder, type safety breaks down as the following does not emit a compiler error: ```javascript -await conn - .createQueryBuilder() - .relation(User, "postsssss") - .of(user) - .add(post); +await conn.createQueryBuilder().relation(User, 'postsssss').of(user).add(post) ``` Even though there is no valid `postssss` relation. ### Record fetching: partially type safe -Fetching records from the database can be accomplished in many different ways. Using typed, model-specific `Repository` objects, the developer calls a method on the repository like `userRepo.find()`, where the return type is correctly inferred as `User[]`. +Fetching records from the database can be accomplished in many different ways. Using typed, model-specific `Repository` objects, the developer calls a method on the repository like `userRepo.find()`, where the return type is correctly inferred as `User[]`. When including relations like `userRepo.find({relations: ["posts"]});` , the return type is still inferred as `User[]` and the compiler is not aware of the included relation. It is up to the developer to access the `user.posts` property in a defensive manner. @@ -205,24 +201,24 @@ Using the built-in query builder, a query like the following is typed as `User`: ```javascript const firstUser = await conn - .getRepository(User) - .createQueryBuilder("user") - .where("user.id = :id", { id: 1 }) - .getOne(); + .getRepository(User) + .createQueryBuilder('user') + .where('user.id = :id', { id: 1 }) + .getOne() ``` And in a query like the following: ```javascript -const user = await conn.manager.findOne(User, 1); +const user = await conn.manager.findOne(User, 1) user.photos = await getConnection() - .createQueryBuilder() - .relation(User, "photos") - .of(user) // you can use just post id as well - .loadMany(); + .createQueryBuilder() + .relation(User, 'photos') + .of(user) // you can use just post id as well + .loadMany() ``` -The type of `user.photos` is `Photo[]`. +The type of `user.photos` is `Photo[]`. ### Type safety: strong @@ -232,15 +228,15 @@ TypeORM is TypeScript ORM with good type safety around its models. Its query bui ### Evaluation summary -* **Type definitions**: @types -* **Record creation**: Not Type-safe -* **Record fetching**: Not Type-safe +- **Type definitions**: @types +- **Record creation**: Not Type-safe +- **Record fetching**: Not Type-safe ### Overview -* [Website](https://bookshelfjs.org/) -* [GitHub](https://github.com/bookshelf/bookshelf) -* [npm: Bookshelf](https://www.npmjs.com/package/bookshelf) +- [Website](https://bookshelfjs.org/) +- [GitHub](https://github.com/bookshelf/bookshelf) +- [npm: Bookshelf](https://www.npmjs.com/package/bookshelf) ### What is Bookshelf.js? @@ -266,15 +262,15 @@ Although Bookshelf.js does have `@types` type definitions, these provide the bar ### Evaluation summary -* **Type definitions**: Built-in -* **Record creation**: Type-safe -* **Record fetching**: Partially Type-safe +- **Type definitions**: Built-in +- **Record creation**: Type-safe +- **Record fetching**: Partially Type-safe ### Overview -* [Website](https://vincit.github.io/objection.js/) -* [GitHub](https://github.com/Vincit/objection.js) -* [npm: Objection](https://www.npmjs.com/package/objection) +- [Website](https://vincit.github.io/objection.js/) +- [GitHub](https://github.com/Vincit/objection.js) +- [npm: Objection](https://www.npmjs.com/package/objection) ### What is Objection.js? @@ -282,28 +278,28 @@ Objection.js is self-described as more of a "relational query builder" than an O ### Type definitions: built-in -Objection.js provides [built-in TypeScript support](https://github.com/Vincit/objection.js/blob/master/typings/objection/index.d.ts). Like Bookshelf.js, Objection.js began as a JavaScript library and typings were added later as TypeScript grew in popularity and adoption. However, unlike Bookshelf.js, Objection.js provides thorough type safety when working with models and queries. +Objection.js provides [built-in TypeScript support](https://github.com/Vincit/objection.js/blob/master/typings/objection/index.d.ts). Like Bookshelf.js, Objection.js began as a JavaScript library and typings were added later as TypeScript grew in popularity and adoption. However, unlike Bookshelf.js, Objection.js provides thorough type safety when working with models and queries. ### Record creation: type safe -Models are defined in Objection.js by extending the `Model` class. Within a, say, `User` model, the developer defines non-nullable and optional properties like `name!` and `age?`, and provides a required `tableName` property. The developer can also provide an optional [JSON Schema](https://json-schema.org) for Model validation. Relations to other models like `HasMany` are also defined in the model class. +Models are defined in Objection.js by extending the `Model` class. Within a, say, `User` model, the developer defines non-nullable and optional properties like `name!` and `age?`, and provides a required `tableName` property. The developer can also provide an optional [JSON Schema](https://json-schema.org) for Model validation. Relations to other models like `HasMany` are also defined in the model class. -When creating new records, the `User.query().insert()` method is type-safe. Model properties are autocompleted and attempting to add properties not defined in the model class will result in compiler errors. +When creating new records, the `User.query().insert()` method is type-safe. Model properties are autocompleted and attempting to add properties not defined in the model class will result in compiler errors. -When creating new records for relations, like a new `Post` for a `User`, the developer uses the `user.$relatedQuery('posts').insert()` call. This is also type safe and although you can replace `posts` with a non-existent model or relation, the chained `insert` call will then spit out compiler errors. Model properties are autocompleted within the `insert()` command and including undefined `Post` properties will result in a compiler error. +When creating new records for relations, like a new `Post` for a `User`, the developer uses the `user.$relatedQuery('posts').insert()` call. This is also type safe and although you can replace `posts` with a non-existent model or relation, the chained `insert` call will then spit out compiler errors. Model properties are autocompleted within the `insert()` command and including undefined `Post` properties will result in a compiler error. Nested writes can also be done using the `insertGraph()` operation: ```javascript - const user = await User.query().insertGraph({ - firstName: 'Sylvester', - lastName: 'Stallone', - posts: [ - { - title: 'My first post', - } - ] - }); +const user = await User.query().insertGraph({ + firstName: 'Sylvester', + lastName: 'Stallone', + posts: [ + { + title: 'My first post', + }, + ], +}) ``` This operation is also type-safe and model properties are autocompleted for the nested model. @@ -313,9 +309,7 @@ This operation is also type-safe and model properties are autocompleted for the When fetching records from the database, queries and return objects are typed. When fetching relations using `relatedQuery`, the return type of the relation is also correctly inferred. In the following example, the return type of posts is `Post[]`: ```javascript -const posts = await User.relatedQuery('posts') - .for(1) - .orderBy('title'); +const posts = await User.relatedQuery('posts').for(1).orderBy('title') console.log(posts[0].name) ``` @@ -333,7 +327,7 @@ In this case the type of `userWithPosts` is inferred as `User`. The compiler emi If instead of `'posts'` you enter a model or relation that doesn't exist, the compiler won't emit any errors. For example the following code would be valid according to the compiler: ```javascript -const userWithPosts = await User.query().findById(1).withGraphFetched('postssss'); +const userWithPosts = await User.query().findById(1).withGraphFetched('postssss') ``` ### Type safety: strong @@ -344,15 +338,15 @@ Along with MikroORM and Bookshelf.js, Objection.js is an ORM-like library built ### Evaluation summary -* **Type definitions**: Built-in -* **Record creation**: Type-safe -* **Record fetching**: Type-safe +- **Type definitions**: Built-in +- **Record creation**: Type-safe +- **Record fetching**: Type-safe ### Overview -* [Website](https://mikro-orm.io/) -* [GitHub](https://github.com/mikro-orm/mikro-orm) -* [npm](https://www.npmjs.com/package/mikro-orm) +- [Website](https://mikro-orm.io/) +- [GitHub](https://github.com/mikro-orm/mikro-orm) +- [npm](https://www.npmjs.com/package/mikro-orm) ### What is MikroORM? @@ -367,25 +361,25 @@ As a TypeScript-first ORM, MikroORM builds in its own extensive set of type defi Defining models with MikroORM involves extending a `BaseEntity` class where the model's properties are declared, typed, and decorated with `@Property` and relation decorators. With these classes defined, records can be created in a type-safe manner by creating instances of these model classes. Model fields are type-checked and autocompleted. Models linked by a relation can be persisted at the same time in a transaction using `persistAndFlush()`. For example: ```javascript -const user = new User('Dave Johnson', 'dave@johns.on'); +const user = new User('Dave Johnson', 'dave@johns.on') user.age = 14 -const post1 = new Post("Dave's First Post", user); -const post2 = new Post("Dave's Second Post", user); +const post1 = new Post("Dave's First Post", user) +const post2 = new Post("Dave's Second Post", user) // Persist the post, author will be automatically cascade persisted -await DI.em.persistAndFlush([post1, post2]); +await DI.em.persistAndFlush([post1, post2]) ``` -Here the `Post` model requires a `title` and `User` in its constructor, and record creation will fail if these are not provided. You can access the post's author object using its properties, e.g. `post1.author.title`. +Here the `Post` model requires a `title` and `User` in its constructor, and record creation will fail if these are not provided. You can access the post's author object using its properties, e.g. `post1.author.title`. ### Record fetching: type-safe -MikroORM also provides strong type safety when fetching records from the database. Records can be fetched using EntityRepositories or an EntityManager. +MikroORM also provides strong type safety when fetching records from the database. Records can be fetched using EntityRepositories or an EntityManager. When fetching records using a repository for a given model, say a `userRepository`, the return object is typed and you cannot query based on properties that haven't been defined in the model. Furthermore, including relations will result in the object's type reflecting which relations were loaded. For example, with a `User` model linked to `Post` and `Item` models, the following command: ```javascript -const UserWithPosts = await DI.userRepository.findOne(1, ['posts']); +const UserWithPosts = await DI.userRepository.findOne(1, ['posts']) ``` Results in this type: @@ -402,7 +396,7 @@ Here we see that posts were loaded and items were not. One limitation is that in A similar level of type-safety applies when using `EntityManager`'s `find()` or `findOne()` functions, like in the following example: ```javascript -const userWithPosts = await DI.em.findOne(User, {email: 'dave@johns.on'}, ['posts']) +const userWithPosts = await DI.em.findOne(User, { email: 'dave@johns.on' }, ['posts']) ``` The type is again inferred as: @@ -422,15 +416,15 @@ MikroORM is a powerful ORM that also packs in the flexible Knex.js query builder ### Evaluation summary -* **Type definitions**: @types -* **Record creation**: Not Type-safe -* **Record fetching**: Not Type-safe +- **Type definitions**: @types +- **Record creation**: Not Type-safe +- **Record fetching**: Not Type-safe ### Overview -* [Website](https://sailsjs.com/documentation/reference/waterline-orm) -* [GitHub](https://github.com/balderdashy/waterline) -* [npm: Waterline](https://www.npmjs.com/package/waterline) +- [Website](https://sailsjs.com/documentation/reference/waterline-orm) +- [GitHub](https://github.com/balderdashy/waterline) +- [npm: Waterline](https://www.npmjs.com/package/waterline) ### What is Waterline? @@ -450,31 +444,31 @@ When fetching records from the database using the Waterline instance and the giv ### Type safety: weak -Waterline's models are not type-safe and data manipulation and creation operations are similarly not type-safe. Waterline is primarily a JavaScript library and its typings provide the bare minimum for TypeScript code to compile. +Waterline's models are not type-safe and data manipulation and creation operations are similarly not type-safe. Waterline is primarily a JavaScript library and its typings provide the bare minimum for TypeScript code to compile. ## Typegoose and Mongoose ### Evaluation summary -* **Type definitions**: @types -* **Record creation**: Type-safe -* **Record fetching**: Not Type-safe +- **Type definitions**: @types +- **Record creation**: Type-safe +- **Record fetching**: Not Type-safe ### Overview -* [Website](https://typegoose.github.io/typegoose/) -* [GitHub](https://github.com/typegoose/typegoose) -* [npm: Typegoose](https://www.npmjs.com/package/typegoose) +- [Website](https://typegoose.github.io/typegoose/) +- [GitHub](https://github.com/typegoose/typegoose) +- [npm: Typegoose](https://www.npmjs.com/package/typegoose) ### What is Mongoose? -Mongoose is a popular and well maintained Node.js data modeling tool for MongoDB. It allows you to model your data using schemas and it includes built-in type casting, validation, query building, and business logic hooks. If you're using a MongoDB database with Node.js and want to use an ORM-like tool to map objects to database documents (or ODM), Mongoose is a safe bet: it is a popular, mature project that continues to be actively maintained. +Mongoose is a popular and well maintained Node.js data modeling tool for MongoDB. It allows you to model your data using schemas and it includes built-in type casting, validation, query building, and business logic hooks. If you're using a MongoDB database with Node.js and want to use an ORM-like tool to map objects to database documents (or ODM), Mongoose is a safe bet: it is a popular, mature project that continues to be actively maintained. There are two main ways to use strong TypeScript typing with Mongoose. One way is to use types from the `@types` repository and write custom interfaces for your models. The other is to use [Typegoose](https://github.com/typegoose/typegoose) along with typings from `@types`. Typegoose allows you to define Mongoose models using classes. In this article we'll consider Typegoose. -For a more focused comparison of Prisma and Mongoose, you can look at our [Mongoose comparison page](https://www.prisma.io/docs/concepts/more/comparisons/prisma-and-mongoose). +For a more focused comparison of Prisma and Mongoose, you can look at our [Mongoose comparison page](https://www.prisma.io/docs/orm/more/comparisons/prisma-and-mongoose). @@ -484,15 +478,15 @@ To use Typegoose you first have to install Mongoose and its `@types` type defini ### Record creation: type-safe -To create models with Typegoose, you define model classes, like `User`, and their properties, like `name` and `age`. Properties are decorated with the `@prop()` decorator to specify additional information like whether or not the properties are required and how they are related to other models. +To create models with Typegoose, you define model classes, like `User`, and their properties, like `name` and `age`. Properties are decorated with the `@prop()` decorator to specify additional information like whether or not the properties are required and how they are related to other models. Once the models have been defined, records can be created in a type-safe manner using Mongoose `Model` objects. Model properties are autocompleted and attempting to add undefined properties results in a compiler error. The return object type corresponds to the defined Model class (`DocumentType`) and its properties can be accessed in a type-safe manner. This type safety also extends to nested models (for example saving a `User` with nested `Post` objects). ### Record fetching: not type-safe -When querying records from the database using `Model.find()`, filter properties are not type checked and it is possible to append properties that haven't been defined without any compiler error. This will result in Mongoose attempting to cast the filter. If this fails, a `CastError` will be thrown at runtime. +When querying records from the database using `Model.find()`, filter properties are not type checked and it is possible to append properties that haven't been defined without any compiler error. This will result in Mongoose attempting to cast the filter. If this fails, a `CastError` will be thrown at runtime. -When using `.populate()` on a model to populate references to other documents, anything can be entered into the `.populate()` method without compiler error, so this operation similarly is not type-safe. +When using `.populate()` on a model to populate references to other documents, anything can be entered into the `.populate()` method without compiler error, so this operation similarly is not type-safe. The return type from a `find()` or `findOne()` command is correctly typed according to the model used to query the database. @@ -506,23 +500,23 @@ This article focuses on the type safety of the most popular ORMs referenced in [ ### Knex.js -* [GitHub](https://github.com/knex/knex) -* [Website](https://knexjs.org) -* [npm](https://www.npmjs.com/package/knex) +- [GitHub](https://github.com/knex/knex) +- [Website](https://knexjs.org) +- [npm](https://www.npmjs.com/package/knex) Knex.js is a Node.js query builder (not ORM) that supports multiple databases and includes features like transaction support, connection pooling, and a streaming interface. It allows you to work at a level above the database driver and avoid writing SQL by hand. However, as it is a lower level library, familiarity with SQL and relational database concepts like joins and indices is expected. Official TypeScript bindings are built-in to the `knex` NPM package. TypeScript support is best-effort and "not all usage patterns can be type-checked." The knex documentation also states that "lack of type errors doesn't currently guarantee that the generated queries will be correct." ### PgTyped -* [GitHub](https://github.com/adelsz/pgtyped) -* [Website](https://pgtyped.now.sh/) +- [GitHub](https://github.com/adelsz/pgtyped) +- [Website](https://pgtyped.now.sh/) -PgTyped's goal is to allow you to write raw SQL and also guarantee the type-safety of the queries you write. It automatically generates TypeScript typings for the parameters and results of SQL queries by processing a SQL file and connecting directly to a running PostgreSQL database. It currently only supports PostgreSQL. +PgTyped's goal is to allow you to write raw SQL and also guarantee the type-safety of the queries you write. It automatically generates TypeScript typings for the parameters and results of SQL queries by processing a SQL file and connecting directly to a running PostgreSQL database. It currently only supports PostgreSQL. ### @slonik/typegen -* [GitHub](https://github.com/mmkal/slonik-tools/tree/master/packages/typegen#sloniktypegen) -* [npm](https://www.npmjs.com/package/@slonik/typegen) +- [GitHub](https://github.com/mmkal/slonik-tools/tree/master/packages/typegen#sloniktypegen) +- [npm](https://www.npmjs.com/package/@slonik/typegen) A similar package to PgTyped is the Slonik typegen library that uses the [Slonik PostgreSQL client](https://github.com/gajus/slonik) to generate TypeScript interfaces from raw SQL queries. To use the typegen library, you import it and use a proxy object that it generates to run queries. After running a query, typegen will inspect the field types of the query result and generate a TypeScript interface for that query. Subsequent queries can then be executed in a type-safe manner. diff --git a/content/09-database-tools/03-connection-pooling.mdx b/content/09-database-tools/03-connection-pooling.mdx index 5212c2c0..ea8c8f99 100644 --- a/content/09-database-tools/03-connection-pooling.mdx +++ b/content/09-database-tools/03-connection-pooling.mdx @@ -1,21 +1,21 @@ --- title: 'What is connection pooling and how does it work?' metaTitle: 'What is connection pooling in database management?' -metaDescription: "Learn how connection pooling helps databases handle more clients with the same resources." +metaDescription: 'Learn how connection pooling helps databases handle more clients with the same resources.' authors: ['justinellingwood'] --- ## Introduction -While development and staging environments can help you anticipate many of the conditions you'll face in production, some challenges only begin to surface at scale. Database connection management falls squarely in this category: the number of requests from client instances can quickly scale beyond the connection limit supported by the database software. +While development and staging environments can help you anticipate many of the conditions you'll face in production, some challenges only begin to surface at scale. Database connection management falls squarely in this category: the number of requests from client instances can quickly scale beyond the connection limit supported by the database software. -Connection management policies and tooling are required to address this resource contention in order to prevent long queue times, failed requests, and user-impacting errors. **Connection pooling**, a strategy based around deploying an intermediary queuing system to manage and recycle database connections, is often successfully employed to mitigate these problems. +Connection management policies and tooling are required to address this resource contention in order to prevent long queue times, failed requests, and user-impacting errors. **Connection pooling**, a strategy based around deploying an intermediary queuing system to manage and recycle database connections, is often successfully employed to mitigate these problems. -In this guide, we'll talk about what connection pooling is, what specific conditions it seeks to address, and how it works. We'll introduce a few popular implementations to act as representative examples, and we'll discuss how they alter the way that clients behave when client requests outstrip the database's available connections. +In this guide, we'll talk about what connection pooling is, what specific conditions it seeks to address, and how it works. We'll introduce a few popular implementations to act as representative examples, and we'll discuss how they alter the way that clients behave when client requests outstrip the database's available connections. -Connection pooling is one of the core features offered by [Prisma Accelerate](https://www.prisma.io/docs/data-platform/accelerate) on the [Prisma Data Platform](https://console.prisma.io/). If you are using Prisma to work with your database, start a free project to easily manage your connections and browse your data. +Connection pooling is one of the core features offered by [Prisma Accelerate](https://www.prisma.io/docs/accelerate) on the [Prisma Data Platform](https://console.prisma.io/). If you are using Prisma to work with your database, start a free project to easily manage your connections and browse your data. @@ -23,101 +23,101 @@ Connection pooling is one of the core features offered by [Prisma Accelerate](ht Before we talk about connection management generally and connection pooling specifically, it may be helpful to take a close look at what goes on during a database connection. -For a client application to open a connection to a database, a surprising number of steps must occur. For each connection, some or all of the following steps must occur: +For a client application to open a connection to a database, a surprising number of steps must occur. For each connection, some or all of the following steps must occur: -* Any DNS lookups required to locate the IP address of the database server -* Conduct the [three-way handshake](https://en.wikipedia.org/wiki/Handshaking#TCP_three-way_handshake) required to establish a TCP connection to the server -* Negotiate and enable encryption for the connection through a [TLS handshake](https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake/) -* Exchange preferences and requirements with the database software to establish the session parameters -* Perform database authentication checks to establish the client's identity -* Perform initial authorization checks to establish that the client has access to the requested database objects -* Perform the actual query and return the results -* Tear down the database session, TLS encryption, and TCP connection +- Any DNS lookups required to locate the IP address of the database server +- Conduct the [three-way handshake](https://en.wikipedia.org/wiki/Handshaking#TCP_three-way_handshake) required to establish a TCP connection to the server +- Negotiate and enable encryption for the connection through a [TLS handshake](https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake/) +- Exchange preferences and requirements with the database software to establish the session parameters +- Perform database authentication checks to establish the client's identity +- Perform initial authorization checks to establish that the client has access to the requested database objects +- Perform the actual query and return the results +- Tear down the database session, TLS encryption, and TCP connection ## How does the number of connections affect database server resources? -Beyond the time required to complete all of the above, each connection also requires resources to establish and maintain. In PostgreSQL, for instance, [some tested workloads resulted in 1.5-14.5MB of memory used per connection](https://aws.amazon.com/blogs/database/resources-consumed-by-idle-postgresql-connections/#:~:text=The%20amount%20of%20memory%20consumed,1.5%E2%80%9314.5%20MB%20per%20connection.). +Beyond the time required to complete all of the above, each connection also requires resources to establish and maintain. In PostgreSQL, for instance, [some tested workloads resulted in 1.5-14.5MB of memory used per connection](https://aws.amazon.com/blogs/database/resources-consumed-by-idle-postgresql-connections/#:~:text=The%20amount%20of%20memory%20consumed,1.5%E2%80%9314.5%20MB%20per%20connection.). -CPU usage also rises with the number of connections as the server has to manage the state of each new connection. As more memory and CPU are used for managing connections, they can begin to affect other things like [the rate that transactions can be executed](https://aws.amazon.com/blogs/database/performance-impact-of-idle-postgresql-connections/#:~:text=Transaction%20rate%20impact). The overhead of managing a large number of connections starts to interfere with the database system's ability to optimally cache results and decreases its ability to perform useful work. +CPU usage also rises with the number of connections as the server has to manage the state of each new connection. As more memory and CPU are used for managing connections, they can begin to affect other things like [the rate that transactions can be executed](https://aws.amazon.com/blogs/database/performance-impact-of-idle-postgresql-connections/#:~:text=Transaction%20rate%20impact). The overhead of managing a large number of connections starts to interfere with the database system's ability to optimally cache results and decreases its ability to perform useful work. ## How does the number of connections affect client applications? -While above paragraph describes the cost of connections that the database server must pay, there are also direct impacts on clients. Database servers can only be configured to accept a certain number of connections. When that limit is reached, additional connection requests are rejected. +While above paragraph describes the cost of connections that the database server must pay, there are also direct impacts on clients. Database servers can only be configured to accept a certain number of connections. When that limit is reached, additional connection requests are rejected. -This means that, by default, your client code needs to implement logic to repeat requests with an exponential backoff algorithm to handle these failures. More importantly, however, is that your client may be forced to use that functionality frequently, leading to stalled queries, delays, and problems that can quickly bubble up to your users. +This means that, by default, your client code needs to implement logic to repeat requests with an exponential backoff algorithm to handle these failures. More importantly, however, is that your client may be forced to use that functionality frequently, leading to stalled queries, delays, and problems that can quickly bubble up to your users. Without a mediating middle component, you would be forced consider: -* increasing the database server's connection limit (affecting the memory and CPU usage as well as transaction rate of the database server), -* scaling up your database to allocate more memory, CPU, or network capacity, or -* scaling out your database to distribute the requests across a greater number of machines +- increasing the database server's connection limit (affecting the memory and CPU usage as well as transaction rate of the database server), +- scaling up your database to allocate more memory, CPU, or network capacity, or +- scaling out your database to distribute the requests across a greater number of machines -These choices are not necessarily negative in their own right, but they can be overkill for the type of congestion we're describing here. Connection pooling is an alternative that offers to help you do more with the resources you currently have. +These choices are not necessarily negative in their own right, but they can be overkill for the type of congestion we're describing here. Connection pooling is an alternative that offers to help you do more with the resources you currently have. ## What is connection pooling? -Connection pooling is a strategy that involves recycling database connections for multiple requests instead of closing them immediately when a query has been resolved. Typically, this is done by introducing a piece of software called a connection pooler between the database server and its client applications that is responsible for managing the connections between the two. +Connection pooling is a strategy that involves recycling database connections for multiple requests instead of closing them immediately when a query has been resolved. Typically, this is done by introducing a piece of software called a connection pooler between the database server and its client applications that is responsible for managing the connections between the two. -As discussed earlier, when forming a connection, a fairly long series of operations must execute before a query is actually run by the database server. The connection pooler attempts to amortize the cost of these operations by keeping the connection open after its initial query and reusing it to run additional queries. +As discussed earlier, when forming a connection, a fairly long series of operations must execute before a query is actually run by the database server. The connection pooler attempts to amortize the cost of these operations by keeping the connection open after its initial query and reusing it to run additional queries. -In this system, clients connect to the connection pooler instead of directly to the database. The client treats the pooler as if it were the database itself and the pooler interprets queries to hand an appropriate connection to the client. One thing to notice is that there is still overhead involved in opening and closing connections between the clients and the pooler. These connections, however, typically have lower overhead because the majority of the heavy processes occur when establishing connections to the database itself. +In this system, clients connect to the connection pooler instead of directly to the database. The client treats the pooler as if it were the database itself and the pooler interprets queries to hand an appropriate connection to the client. One thing to notice is that there is still overhead involved in opening and closing connections between the clients and the pooler. These connections, however, typically have lower overhead because the majority of the heavy processes occur when establishing connections to the database itself. ## How does connection pooling work? -A connection pooler is responsible for opening, closing and maintaining connections to the database on behalf of clients. It does so by following algorithms similar to that which a caching system might use. +A connection pooler is responsible for opening, closing and maintaining connections to the database on behalf of clients. It does so by following algorithms similar to that which a caching system might use. -When a client connects to the connection pooler and requests a connection, the pooler performs a quick assessment of the request characteristics. It might look at information such as the database user, the specific operations that will be performed, the type of encryption, or the database objects accessed. +When a client connects to the connection pooler and requests a connection, the pooler performs a quick assessment of the request characteristics. It might look at information such as the database user, the specific operations that will be performed, the type of encryption, or the database objects accessed. -Once it has this information, it looks at its pool of available connections to see if there are any existing connections that could be used to run the new request. If it finds a suitable, available connection, it hands it to the client and allows it to run its query through the connection. If no connections appropriate for running the new request exist in the pool, it will open a new connection to the database using the required parameters and hand that to the client instead. +Once it has this information, it looks at its pool of available connections to see if there are any existing connections that could be used to run the new request. If it finds a suitable, available connection, it hands it to the client and allows it to run its query through the connection. If no connections appropriate for running the new request exist in the pool, it will open a new connection to the database using the required parameters and hand that to the client instead. -The client executes its query using the connection as usual. When the query is complete, instead of terminating the connection, the pooler places the connection back in the pool so that it can potentially be reused by a subsequent query. The pooler can garbage collect connections within its pool asynchronously using whatever algorithm it chooses (time since establishment, time since last use, etc.). +The client executes its query using the connection as usual. When the query is complete, instead of terminating the connection, the pooler places the connection back in the pool so that it can potentially be reused by a subsequent query. The pooler can garbage collect connections within its pool asynchronously using whatever algorithm it chooses (time since establishment, time since last use, etc.). ## What is the difference between internal and external pooling? -Broadly speaking, connection pooling refers to the algorithm that maintains connections over the course of multiple requests. This can be implemented either internally in a client application or externally, using an external tool or service. +Broadly speaking, connection pooling refers to the algorithm that maintains connections over the course of multiple requests. This can be implemented either internally in a client application or externally, using an external tool or service. -Internal implementations of connection pooling are often a function of a database driver, ORM (object relational mapper), or other database client that might be incorporated into a client application. These solutions provide some of the benefits of connection pooling by maintaining long running connections to the database server and reusing them for multiple queries within the codebase. +Internal implementations of connection pooling are often a function of a database driver, ORM (object relational mapper), or other database client that might be incorporated into a client application. These solutions provide some of the benefits of connection pooling by maintaining long running connections to the database server and reusing them for multiple queries within the codebase. -While internal connection pooling is useful, it does have some practical limitations. Each instance of the application that is being executed must generally maintain its own pool. This impacts how broadly the connections can be shared and reused between queries since each pool only serves a single application instance. +While internal connection pooling is useful, it does have some practical limitations. Each instance of the application that is being executed must generally maintain its own pool. This impacts how broadly the connections can be shared and reused between queries since each pool only serves a single application instance. -The alternative solution is to implement external connection pooling. This approach deploys a separate piece of software that can communicate with and pool connections for multiple client instances. While this deployment scenario does introduce an additional network hop, it generally provides additional scalability and flexibility. The connection pooler can be deployed alongside the database server to serve many different clients or deployed alongside the client applications to serve whatever application instances are running on a single server. +The alternative solution is to implement external connection pooling. This approach deploys a separate piece of software that can communicate with and pool connections for multiple client instances. While this deployment scenario does introduce an additional network hop, it generally provides additional scalability and flexibility. The connection pooler can be deployed alongside the database server to serve many different clients or deployed alongside the client applications to serve whatever application instances are running on a single server. ## What are some common external connection poolers? -There are a number of connection poolers available for different database systems. We can look at a few implementations available for PostgreSQL to get a better understanding of how different solutions approach the problem. +There are a number of connection poolers available for different database systems. We can look at a few implementations available for PostgreSQL to get a better understanding of how different solutions approach the problem. ### `pgbouncer` Perhaps the most well-known connection pooler for PostgreSQL is [`pgbouncer`](https://www.pgbouncer.org/). -Created in 2007, `pgbouncer` is focused on providing a lightweight pooling mechanism for managing PostgreSQL connections. It offers a good deal of flexibility both in terms of where it can be deployed and how exactly it performs pooling. +Created in 2007, `pgbouncer` is focused on providing a lightweight pooling mechanism for managing PostgreSQL connections. It offers a good deal of flexibility both in terms of where it can be deployed and how exactly it performs pooling. -For situations where connections from clients are short-lived, the [project recommends deploying `pgbouncer` on the web server](https://www.pgbouncer.org/faq.html#should-pgbouncer-be-installed-on-the-web-server-or-database-server) where the client code will execute. Putting the pooler with the client software allows for connections between the two to use lighter weight mechanisms than TCP, reducing latency. For scenarios where connections from many different clients need to be pooled together, you can deploy alongside the database server instead. +For situations where connections from clients are short-lived, the [project recommends deploying `pgbouncer` on the web server](https://www.pgbouncer.org/faq.html#should-pgbouncer-be-installed-on-the-web-server-or-database-server) where the client code will execute. Putting the pooler with the client software allows for connections between the two to use lighter weight mechanisms than TCP, reducing latency. For scenarios where connections from many different clients need to be pooled together, you can deploy alongside the database server instead. -One of the most important decisions when using `pgbouncer` is to choose which pooling mode you wish to use. The three available options are: +One of the most important decisions when using `pgbouncer` is to choose which pooling mode you wish to use. The three available options are: -* **transaction pooling**: connections are revoked after every transaction by a client. For any subsequent transactions, a connection will allocated again. This allows `pgbouncer` to quickly reclaim connections while the client might be performing other operations between transactions. -* **session pooling**: connections are assigned to clients for the duration of the client's connection with the pooler. This means that each client connection is paired with a dedicated connection to the database. The connection is still reused once the client session ends, but the number of clients that can use the pooler at one time is greatly reduced. -* **statement pooling**: connections are assigned to execute individual statements. This results in rapid allocation and deallocation of connections, which allows for many clients to use a limited number of connections but can break transaction semantics and lead to unexpected behavior in some cases. +- **transaction pooling**: connections are revoked after every transaction by a client. For any subsequent transactions, a connection will allocated again. This allows `pgbouncer` to quickly reclaim connections while the client might be performing other operations between transactions. +- **session pooling**: connections are assigned to clients for the duration of the client's connection with the pooler. This means that each client connection is paired with a dedicated connection to the database. The connection is still reused once the client session ends, but the number of clients that can use the pooler at one time is greatly reduced. +- **statement pooling**: connections are assigned to execute individual statements. This results in rapid allocation and deallocation of connections, which allows for many clients to use a limited number of connections but can break transaction semantics and lead to unexpected behavior in some cases. In most cases, transaction pooling provides the best balance in terms of recycling idle connections, managing a fair number of clients, and maintaining expected behavior regarding the database session and transaction semantics. ### `pgpool` -Another PostgreSQL connection pooler is `pgpool-II`, often just referred to as `pgpool`. While `pgbouncer` is a lightweight tool focused exclusively on connection pooling, `pgpool` offers a larger selection of related functionality. +Another PostgreSQL connection pooler is `pgpool-II`, often just referred to as `pgpool`. While `pgbouncer` is a lightweight tool focused exclusively on connection pooling, `pgpool` offers a larger selection of related functionality. -Besides connection pooling, `pgpool` supports load balancing queries between a number of backend database instances and has a watchdog service to enable high availability operation for automatic failover. Additionally, it provides a management GUI that can be very useful for certain deployment scenarios. +Besides connection pooling, `pgpool` supports load balancing queries between a number of backend database instances and has a watchdog service to enable high availability operation for automatic failover. Additionally, it provides a management GUI that can be very useful for certain deployment scenarios. -Despite these advanced features, `pgpool` is usually considered a bit more limited when it comes to actually managing connection pooling. While `pgbouncer` allows three pooling modes, `pgpool` can only operate with the equivalent of session mode, meaning that connections are only reassigned when the client disconnects. This decreases the number of clients that `pgpool` can handle in comparison. +Despite these advanced features, `pgpool` is usually considered a bit more limited when it comes to actually managing connection pooling. While `pgbouncer` allows three pooling modes, `pgpool` can only operate with the equivalent of session mode, meaning that connections are only reassigned when the client disconnects. This decreases the number of clients that `pgpool` can handle in comparison. ## Conclusion -In this guide, we took a look at the idea of connection pooling and how it can be used to help reduce database load and resource consumption. We outlined some of the reasons that connections can be expensive to establish between clients and databases and described how reusing connections can reduce that cost on subsequent queries. Afterwards, we discussed how connection poolers actually work and took a look at some representative examples of poolers from the PostgreSQL ecosystem. +In this guide, we took a look at the idea of connection pooling and how it can be used to help reduce database load and resource consumption. We outlined some of the reasons that connections can be expensive to establish between clients and databases and described how reusing connections can reduce that cost on subsequent queries. Afterwards, we discussed how connection poolers actually work and took a look at some representative examples of poolers from the PostgreSQL ecosystem. -Database connection limits can quickly cause problems as your application scales and the querying load becomes more complex. While it's impossible to completely remove the overhead associated with connection management, connection poolers are an invaluable tool for maintaining performance and increasing the number of clients a database can serve. +Database connection limits can quickly cause problems as your application scales and the querying load becomes more complex. While it's impossible to completely remove the overhead associated with connection management, connection poolers are an invaluable tool for maintaining performance and increasing the number of clients a database can serve. -Connection pooling is one of the core features offered by [Prisma Accelerate](https://www.prisma.io/docs/data-platform/accelerate) on the [Prisma Data Platform](https://console.prisma.io/). If you are using Prisma to work with your database, start a free project to easily manage your connections and browse your data. +Connection pooling is one of the core features offered by [Prisma Accelerate](https://www.prisma.io/docs/accelerate) on the [Prisma Data Platform](https://console.prisma.io/). If you are using Prisma to work with your database, start a free project to easily manage your connections and browse your data. diff --git a/content/10-managing-databases/01-database-troubleshooting.mdx b/content/10-managing-databases/01-database-troubleshooting.mdx index 97a7944f..14072c6b 100644 --- a/content/10-managing-databases/01-database-troubleshooting.mdx +++ b/content/10-managing-databases/01-database-troubleshooting.mdx @@ -31,7 +31,7 @@ If your database appears to be down, your application logs might give you insigh -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), you can [configure logging](https://www.prisma.io/docs/concepts/components/prisma-client/working-with-prismaclient/logging) to control how logs are generated. +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), you can [configure logging](https://www.prisma.io/docs/orm/prisma-client/observability-and-logging/logging) to control how logs are generated. @@ -79,7 +79,7 @@ The log above indicates that your application server can't reach the database. I -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), the [error code reference](https://www.prisma.io/docs/reference/api-reference/error-reference#error-codes) can help diagnose what errors mean and how to fix them. +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), the [error code reference](https://www.prisma.io/docs/orm/reference/error-reference#error-codes) can help diagnose what errors mean and how to fix them. diff --git a/content/10-managing-databases/02-how-to-spot-bottlenecks-in-performance.mdx b/content/10-managing-databases/02-how-to-spot-bottlenecks-in-performance.mdx index 68583037..5eab6fb0 100644 --- a/content/10-managing-databases/02-how-to-spot-bottlenecks-in-performance.mdx +++ b/content/10-managing-databases/02-how-to-spot-bottlenecks-in-performance.mdx @@ -139,6 +139,6 @@ There's no silver bullet for optimizing queries. However, diligent efforts to an -If you are using Prisma, you can learn about how to measure and optimize your queries in our [performance and optimization docs](https://www.prisma.io/docs/guides/performance-and-optimization). +If you are using Prisma, you can learn about how to measure and optimize your queries in our [performance and optimization docs](https://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance). diff --git a/content/10-managing-databases/03-syncing-development-databases-between-team-members.mdx b/content/10-managing-databases/03-syncing-development-databases-between-team-members.mdx index eb2b1a91..86b10955 100644 --- a/content/10-managing-databases/03-syncing-development-databases-between-team-members.mdx +++ b/content/10-managing-databases/03-syncing-development-databases-between-team-members.mdx @@ -36,7 +36,7 @@ If this approach is taken, it requires extra effort to keep seeding data and log -If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), you can use the [integrated seeding functionality](https://www.prisma.io/docs/guides/migrate/seed-database) to easily populate a database. +If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), you can use the [integrated seeding functionality](https://www.prisma.io/docs/orm/prisma-migrate/workflows/seeding) to easily populate a database. @@ -74,7 +74,7 @@ Specific database migration strategies differ between languages and frameworks. -[Prisma Migrate](https://www.prisma.io/docs/concepts/components/prisma-migrate) makes it easy to describe your database models and produce migrations from them. It offers a very simple way to keep data in-sync between team members. +[Prisma Migrate](https://www.prisma.io/docs/orm/prisma-migrate) makes it easy to describe your database models and produce migrations from them. It offers a very simple way to keep data in-sync between team members. diff --git a/content/10-managing-databases/04-database-replication/01-database-replication-introduction.mdx b/content/10-managing-databases/04-database-replication/01-database-replication-introduction.mdx index 7018f379..28a94c09 100644 --- a/content/10-managing-databases/04-database-replication/01-database-replication-introduction.mdx +++ b/content/10-managing-databases/04-database-replication/01-database-replication-introduction.mdx @@ -75,6 +75,6 @@ Database replication is not a one size fits all process, so it is important to k -To perform database migrations with [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client), use the [Prisma Migrate tool](https://www.prisma.io/docs/concepts/components/prisma-migrate). Prisma Migrate analyzes your schema files, generates migration files, and applies them to target databases. +To perform database migrations with [Prisma Client](https://www.prisma.io/docs/orm/prisma-client), use the [Prisma Migrate tool](https://www.prisma.io/docs/orm/prisma-migrate). Prisma Migrate analyzes your schema files, generates migration files, and applies them to target databases. diff --git a/content/10-managing-databases/08-testing-in-production.mdx b/content/10-managing-databases/08-testing-in-production.mdx index a3f78af0..8b9101aa 100644 --- a/content/10-managing-databases/08-testing-in-production.mdx +++ b/content/10-managing-databases/08-testing-in-production.mdx @@ -7,105 +7,105 @@ authors: ['justinellingwood'] ## Introduction -Testing in software development traditionally is relegated to development and staging environments that are separate from the production deployment. The reasoning behind this distancing is both to minimize the performance impact of running the tests alongside production traffic and to reduce the chance of a breaking change impacting the production environment. +Testing in software development traditionally is relegated to development and staging environments that are separate from the production deployment. The reasoning behind this distancing is both to minimize the performance impact of running the tests alongside production traffic and to reduce the chance of a breaking change impacting the production environment. -Despite these concerns, there has been a growing movement towards completing partial testing within the production environment. This strategy, known simply as testing in production or TIP, can help teams have a better understanding of how new code will interact with the actual systems and data it will need to be compatible with. +Despite these concerns, there has been a growing movement towards completing partial testing within the production environment. This strategy, known simply as testing in production or TIP, can help teams have a better understanding of how new code will interact with the actual systems and data it will need to be compatible with. -In this guide, we'll explore the idea of testing software changes in production. We'll cover some of the historical reasons for reluctance towards production testing, changes that have made adoption more attractive, and discuss some of the benefits that it can have on development velocity and error reduction. +In this guide, we'll explore the idea of testing software changes in production. We'll cover some of the historical reasons for reluctance towards production testing, changes that have made adoption more attractive, and discuss some of the benefits that it can have on development velocity and error reduction. -The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. +The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. ## What is testing in production? -Testing in production is a philosophy that encourages developers to defer some or all of their software testing until the code is deployed in a production environment. But why would a team want to move towards testing in production? +Testing in production is a philosophy that encourages developers to defer some or all of their software testing until the code is deployed in a production environment. But why would a team want to move towards testing in production? -At first glance, the idea may seem counter-intuitive and even dangerous. Perhaps unsurprisingly, it is not without its risks. However, many of the hazards can be eliminated by introducing a deployment and testing strategy that not only tests your software more thoroughly, but enables you to build a more resilient production environment. +At first glance, the idea may seem counter-intuitive and even dangerous. Perhaps unsurprisingly, it is not without its risks. However, many of the hazards can be eliminated by introducing a deployment and testing strategy that not only tests your software more thoroughly, but enables you to build a more resilient production environment. -Testing in production is valuable because it eliminates a category of problems that occur because of environmental drift between your production and testing environments. Rather than testing in a dedicated environment and then deploying code that passes to your production environment in the hopes that its behavior remains consistent, you instead test where the code will need to run. +Testing in production is valuable because it eliminates a category of problems that occur because of environmental drift between your production and testing environments. Rather than testing in a dedicated environment and then deploying code that passes to your production environment in the hopes that its behavior remains consistent, you instead test where the code will need to run. The focus for those who adopt a testing-in-production philosophy can be summarized like this: -* **Deploy code to production as soon as possible:** This allows your team to test code early *within* the environment where it has to run -* **Decouple deploying code from releasing code:** Separating the deployment process from act of releasing changes gives you more flexibility in testing and activating new features -* **Test with real data:** Database mocks or stubs and databases filled with dummy records cannot accurately replicate the data that your code will have to accommodate -* **Minimize the area of impact for changes:** Changes can be tested more safely and effectively if you can limit and adjust the scope during testing and release +- **Deploy code to production as soon as possible:** This allows your team to test code early _within_ the environment where it has to run +- **Decouple deploying code from releasing code:** Separating the deployment process from act of releasing changes gives you more flexibility in testing and activating new features +- **Test with real data:** Database mocks or stubs and databases filled with dummy records cannot accurately replicate the data that your code will have to accommodate +- **Minimize the area of impact for changes:** Changes can be tested more safely and effectively if you can limit and adjust the scope during testing and release ## What risks are associated with testing in production? -Many people are skeptical that the benefits of testing in production can outweigh the costs. Before we continue on, let's talk about some of the risks of this strategy and how it's possible to mitigate them. +Many people are skeptical that the benefits of testing in production can outweigh the costs. Before we continue on, let's talk about some of the risks of this strategy and how it's possible to mitigate them. ### Risk of defects -The primary concern with testing code in a production environment is the possibility of introducing software defects that may impact your users and services. The idea behind traditional software testing pipelines is to thoroughly vet code and check for known weaknesses before it is promoted to production responsibilities. +The primary concern with testing code in a production environment is the possibility of introducing software defects that may impact your users and services. The idea behind traditional software testing pipelines is to thoroughly vet code and check for known weaknesses before it is promoted to production responsibilities. -This perspective has its merits, but the situation is frequently not as clear cut in practice. The tests that organizations run in staging environments are often not a good approximation of what the code will deal with in production. Replicating environments, including infrastructure and workloads, is not an easy task and often falls out of sync with the actual production system. +This perspective has its merits, but the situation is frequently not as clear cut in practice. The tests that organizations run in staging environments are often not a good approximation of what the code will deal with in production. Replicating environments, including infrastructure and workloads, is not an easy task and often falls out of sync with the actual production system. -The upshot of this is that what you test in your staging environment may not actually be applicable to how your code will perform once released. Furthermore, a significant amount of effort, time, and infrastructure is required in the attempt regardless of whether its successful or not. +The upshot of this is that what you test in your staging environment may not actually be applicable to how your code will perform once released. Furthermore, a significant amount of effort, time, and infrastructure is required in the attempt regardless of whether its successful or not. ### Impact on the live system -A closely related worry is the impact that new changes and the testing process itself could have on the production environment. System stability is a priority for most organizations as it can affect the usability and availability of services and harm user trust. +A closely related worry is the impact that new changes and the testing process itself could have on the production environment. System stability is a priority for most organizations as it can affect the usability and availability of services and harm user trust. -While its true that any code deployed to production has the possibility of impacting operations, there are ways to minimize the potential impact. Implementing controls to limit the amount of traffic new code receives, setting up active standby infrastructure, and implementing monitoring based scaling are some of the ways that this process can be made safer. The great thing about these types of mitigations is that these investments directly improve your production system's resilience to system failures of all kinds. +While its true that any code deployed to production has the possibility of impacting operations, there are ways to minimize the potential impact. Implementing controls to limit the amount of traffic new code receives, setting up active standby infrastructure, and implementing monitoring based scaling are some of the ways that this process can be made safer. The great thing about these types of mitigations is that these investments directly improve your production system's resilience to system failures of all kinds. ## How to test in production -How does testing in production actually work? In this section, we'll talk about some of the most common techniques and strategies organizations implement in order to test code reliably in production. While it's not necessary to adopt each of these ideas, many of these approaches complement one another and can be integrated as part of a more comprehensive system. +How does testing in production actually work? In this section, we'll talk about some of the most common techniques and strategies organizations implement in order to test code reliably in production. While it's not necessary to adopt each of these ideas, many of these approaches complement one another and can be integrated as part of a more comprehensive system. ### Implement a feature flag system -[Feature flags](/intro/database-glossary#feature-flags) are a programming and release technique that involves making features easy to activate or deactivate externally. The basic idea is to wrap new functionality in conditional logic that checks the value of a configuration variable before running. The "flag" or "toggle" variable is often configured in an external store like Redis, where the organization can easily change the value as needed. +[Feature flags](/intro/database-glossary#feature-flags) are a programming and release technique that involves making features easy to activate or deactivate externally. The basic idea is to wrap new functionality in conditional logic that checks the value of a configuration variable before running. The "flag" or "toggle" variable is often configured in an external store like Redis, where the organization can easily change the value as needed. -Feature flags are a valuable tool in production testing because they allow you to safely deploy code without affecting the current logic of the production system. The new code path can be deactivated to start and then activated at a later time when ready to test the new code. Many implementations of the feature flag concept include more fine-grained controls than "enabled" or "disabled" with options to enable it for a percentage of traffic, A/B test different logic, or only select paths in specific cases. +Feature flags are a valuable tool in production testing because they allow you to safely deploy code without affecting the current logic of the production system. The new code path can be deactivated to start and then activated at a later time when ready to test the new code. Many implementations of the feature flag concept include more fine-grained controls than "enabled" or "disabled" with options to enable it for a percentage of traffic, A/B test different logic, or only select paths in specific cases. -By using feature flags, you can deploy your code to production in a deactivated state. You can test the new code path with your testing suite while production traffic continues to use the older logic. You can then slowly release the code by slowly increasing the amount of production traffic the new code path receives as a [canary release](/intro/database-glossary#canary-releases) while monitoring the impact. +By using feature flags, you can deploy your code to production in a deactivated state. You can test the new code path with your testing suite while production traffic continues to use the older logic. You can then slowly release the code by slowly increasing the amount of production traffic the new code path receives as a [canary release](/intro/database-glossary#canary-releases) while monitoring the impact. ### Use CI/CD to deploy and test on production infrastructure -One misconception about testing in production is the assumption that testing must be completed in production. While proponents recognize the value of testing in the environment where the code will eventually run, not all testing requires this degree of fidelity and many fast, focused tests can be automated and run as part of the lead up towards deploying the code. +One misconception about testing in production is the assumption that testing must be completed in production. While proponents recognize the value of testing in the environment where the code will eventually run, not all testing requires this degree of fidelity and many fast, focused tests can be automated and run as part of the lead up towards deploying the code. -The simplest and most effective way of implementing this is through a well-tuned CI/CD pipeline. CI/CD, which stands for continuous integration and continuous delivery or deployment, is a system that automatically tests new code as it is added to a repository. Once the test suites pass successfully, the pipeline, continuous delivery allows developers to deploy the changes with a click of the button, while continuous deployment automatically deploys all successfully tested code. +The simplest and most effective way of implementing this is through a well-tuned CI/CD pipeline. CI/CD, which stands for continuous integration and continuous delivery or deployment, is a system that automatically tests new code as it is added to a repository. Once the test suites pass successfully, the pipeline, continuous delivery allows developers to deploy the changes with a click of the button, while continuous deployment automatically deploys all successfully tested code. -CI/CD has many benefits outside of the context of testing in production. For the systems we're describing, the pipeline acts as an automated part of the deployment process and the goal is to deploy and test in the production environment. While simple tests like unit testing can be done in isolation, more complex stages like integration testing should ideally be done by deploying the code to production infrastructure and testing there. This allows your code to be tested against the actual services it will interact with on the final infrastructure it will run on with the same running context. +CI/CD has many benefits outside of the context of testing in production. For the systems we're describing, the pipeline acts as an automated part of the deployment process and the goal is to deploy and test in the production environment. While simple tests like unit testing can be done in isolation, more complex stages like integration testing should ideally be done by deploying the code to production infrastructure and testing there. This allows your code to be tested against the actual services it will interact with on the final infrastructure it will run on with the same running context. -A CI/CD pipeline helps to build confidence in new changes and relieve anxiety over deploying code so quickly to production. Combined with feature flags, development teams can trust that some aspects of the code have been vetted and then can control how the heavier testing is conducted. +A CI/CD pipeline helps to build confidence in new changes and relieve anxiety over deploying code so quickly to production. Combined with feature flags, development teams can trust that some aspects of the code have been vetted and then can control how the heavier testing is conducted. ### Configure your services to allow dark launches -[**Dark launching**](/intro/database-glossary#dark-launching) is a way to deploy software changes and test them using real traffic without user-facing consequences. The idea is to mirror production traffic and send duplicate requests to your newly deployed code so that you can ensure that it both performs correctly and can handle an actual production load. +[**Dark launching**](/intro/database-glossary#dark-launching) is a way to deploy software changes and test them using real traffic without user-facing consequences. The idea is to mirror production traffic and send duplicate requests to your newly deployed code so that you can ensure that it both performs correctly and can handle an actual production load. -Dark launching can be quite complex as it requires you to replay existing traffic on multiple instances of your application without incurring slowdowns that would affect the legitimacy of your tests. An alternative to duplicating requests in real time is to play back previous requests from event logs or other sources. This strategy requires you to capture all of the relevant context from the initial request. +Dark launching can be quite complex as it requires you to replay existing traffic on multiple instances of your application without incurring slowdowns that would affect the legitimacy of your tests. An alternative to duplicating requests in real time is to play back previous requests from event logs or other sources. This strategy requires you to capture all of the relevant context from the initial request. -The benefit of dark launching is that it allows you to see how your code functions as if it were already released to users. The results of running the traffic through your new code are never displayed for users, but they can provide insight into how your code behaves and what types of conditions it will need to account for. Once you've tested a dark launched version of your application, you can be fairly confident with how it will perform in production given that it's already faced those pressures. +The benefit of dark launching is that it allows you to see how your code functions as if it were already released to users. The results of running the traffic through your new code are never displayed for users, but they can provide insight into how your code behaves and what types of conditions it will need to account for. Once you've tested a dark launched version of your application, you can be fairly confident with how it will perform in production given that it's already faced those pressures. ### Implement robust monitoring and metrics collection -In addition to the core tests performed during deployment and release, it is important to have systems in place that will allow you to continue to monitor its behavior once it is in production. A variety of related techniques can be implemented to help you gain insight into your service health over the long term, allowing you to spot anomalies more quickly if new code causes behavioral shifts. +In addition to the core tests performed during deployment and release, it is important to have systems in place that will allow you to continue to monitor its behavior once it is in production. A variety of related techniques can be implemented to help you gain insight into your service health over the long term, allowing you to spot anomalies more quickly if new code causes behavioral shifts. -Monitoring and metrics tracking are basic tools you can use to understand how your services perform over time and in different conditions. New changes can have side effects that might be difficult to spot on a short timeline, but may become obvious over time. Monitoring and metrics can help associate these changes in behavior with specific releases. +Monitoring and metrics tracking are basic tools you can use to understand how your services perform over time and in different conditions. New changes can have side effects that might be difficult to spot on a short timeline, but may become obvious over time. Monitoring and metrics can help associate these changes in behavior with specific releases. ## How does testing in production affect the database? -One of the most complex aspects of testing in production is figuring out an effective way to perform tests that interact with the database. While it is possible to have your new code read from and write to a different data source, this once again introduces the possibility of testing against a poorly replicated environment. +One of the most complex aspects of testing in production is figuring out an effective way to perform tests that interact with the database. While it is possible to have your new code read from and write to a different data source, this once again introduces the possibility of testing against a poorly replicated environment. -The better, but more difficult to achieve, option is to test with your production database. This can be challenging to implement, especially if your database schema and tooling isn't configured for this possibility from the start, but it provides the best opportunity for understanding how your code will act upon release. +The better, but more difficult to achieve, option is to test with your production database. This can be challenging to implement, especially if your database schema and tooling isn't configured for this possibility from the start, but it provides the best opportunity for understanding how your code will act upon release. -Allowing your new code to read data from the production database is fairly straightforward. So long as the testing does not lead to heavy read contention, it should not significantly affect the database's production responsibilities and there is no danger to your production data. +Allowing your new code to read data from the production database is fairly straightforward. So long as the testing does not lead to heavy read contention, it should not significantly affect the database's production responsibilities and there is no danger to your production data. -Testing how your code performs write operations, however, is a bit trickier. The most straightforward method of testing involves creating dedicated testing users within the production database so that your new code can operate on data without touching real user data. This can still be quite a scary proposition as the test operations will be performed alongside real data coming from your active code. +Testing how your code performs write operations, however, is a bit trickier. The most straightforward method of testing involves creating dedicated testing users within the production database so that your new code can operate on data without touching real user data. This can still be quite a scary proposition as the test operations will be performed alongside real data coming from your active code. ## Conclusion -Testing in production can be challenging to implement and may require many changes for existing projects. However, the benefits of adopting a system where you can test the real behavior of your code in the environment that matters are hard to overstate. Testing is meant to identify bugs and build confidence in your software, two goals that are difficult to wholly achieve in a non-representative environment. +Testing in production can be challenging to implement and may require many changes for existing projects. However, the benefits of adopting a system where you can test the real behavior of your code in the environment that matters are hard to overstate. Testing is meant to identify bugs and build confidence in your software, two goals that are difficult to wholly achieve in a non-representative environment. -By getting to know the challenges involved with this approach, you can evaluate how well it might fit with your organization's work style. Testing in production is an exercise in trade-offs and your success may largely depend on how much effort you are willing and able to devote to the process. While it isn't necessarily an easy adjustment, its advantages can serve you well over the longer term. +By getting to know the challenges involved with this approach, you can evaluate how well it might fit with your organization's work style. Testing in production is an exercise in trade-offs and your success may largely depend on how much effort you are willing and able to devote to the process. While it isn't necessarily an easy adjustment, its advantages can serve you well over the longer term. -The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. +The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. diff --git a/content/10-managing-databases/09-backup-considerations.mdx b/content/10-managing-databases/09-backup-considerations.mdx index 72761b30..6e4ee31c 100644 --- a/content/10-managing-databases/09-backup-considerations.mdx +++ b/content/10-managing-databases/09-backup-considerations.mdx @@ -7,109 +7,109 @@ authors: ['justinellingwood'] ## Introduction -Backing up databases is one of the most important routines involved in managing data. Data is often one of the most important assets an organization manages, so being able to recover from accidental deletion, corruption, hardware failures, and other disasters is a high priority. +Backing up databases is one of the most important routines involved in managing data. Data is often one of the most important assets an organization manages, so being able to recover from accidental deletion, corruption, hardware failures, and other disasters is a high priority. -While it is not difficult to recognize the value of reliable backups, it is not always straightforward to figure out the details. Deciding on the backup mechanism, medium, schedule, level of fidelity, and security are all considerations you need to account for and the right mix will often differ from project to project. +While it is not difficult to recognize the value of reliable backups, it is not always straightforward to figure out the details. Deciding on the backup mechanism, medium, schedule, level of fidelity, and security are all considerations you need to account for and the right mix will often differ from project to project. -In this guide, we'll go over the key decisions you'll have to make when deciding on a backup strategy for your databases. We'll cover different backup methods, where to store data at various points in its life cycle, and discuss how security intersects with backup design in various ways. +In this guide, we'll go over the key decisions you'll have to make when deciding on a backup strategy for your databases. We'll cover different backup methods, where to store data at various points in its life cycle, and discuss how security intersects with backup design in various ways. -The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. +The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. ## Why are database backups important? -Before continuing, it may be helpful to underline what value proper backups can deliver for your team. Some of the key benefits include: +Before continuing, it may be helpful to underline what value proper backups can deliver for your team. Some of the key benefits include: -* Rebuilding after hardware failures: Hardware failure can affect any system and is often unpredictable. Having comprehensive backups allows you to restore your data after replacing failed components. -* Restoring files following corruption: Data corruption is another occurrence that can result from software errors, hardware problems, or environmental factors. Multiple layers of backups may allow you to replace corrupt data with good versions within your backup sets. -* Recovery from accidental deletion: User error can also remove valuable data from your databases. With backups, you can recover that data to avoid permanent loss. -* Auditing and compliance: Many industries have certain standards of backups and auditing trails for compliance reasons. You may be compelled to implement a robust backup routine as part of your agreement to handle user data in various capacities. -* Reassurance when making changes: Having reliable backups make changing your software, environment, or operations less dangerous. You can make changes with confidence knowing that you are not risking your organization's data if something goes wrong. +- Rebuilding after hardware failures: Hardware failure can affect any system and is often unpredictable. Having comprehensive backups allows you to restore your data after replacing failed components. +- Restoring files following corruption: Data corruption is another occurrence that can result from software errors, hardware problems, or environmental factors. Multiple layers of backups may allow you to replace corrupt data with good versions within your backup sets. +- Recovery from accidental deletion: User error can also remove valuable data from your databases. With backups, you can recover that data to avoid permanent loss. +- Auditing and compliance: Many industries have certain standards of backups and auditing trails for compliance reasons. You may be compelled to implement a robust backup routine as part of your agreement to handle user data in various capacities. +- Reassurance when making changes: Having reliable backups make changing your software, environment, or operations less dangerous. You can make changes with confidence knowing that you are not risking your organization's data if something goes wrong. ## Types of backups -There are many different *types* of backups that you may want to consider. In this section, we'll cover some of the different backups you can perform and how they may fit together into a more comprehensive system. +There are many different _types_ of backups that you may want to consider. In this section, we'll cover some of the different backups you can perform and how they may fit together into a more comprehensive system. ### Full backups -**Full backups** are backups that duplicate the entire dataset from the original location. They have the largest scope of any backups because, by definition, they read and write all data from the source to the target. +**Full backups** are backups that duplicate the entire dataset from the original location. They have the largest scope of any backups because, by definition, they read and write all data from the source to the target. -Full backups are important because they give you a complete copy of your dataset that can be used to partially or fully restore missing data. Every strategy should contain full backups as the core component of their routine. +Full backups are important because they give you a complete copy of your dataset that can be used to partially or fully restore missing data. Every strategy should contain full backups as the core component of their routine. -While full backups are necessary and provide the foundation of most backup strategies, they have some major drawbacks as well. Because they must read and write the entire dataset, they can take a long time to complete and may tax the system for the entirety of that time. Additionally, maintaining many complete copies of your databases over time can consume a great deal of storage space. +While full backups are necessary and provide the foundation of most backup strategies, they have some major drawbacks as well. Because they must read and write the entire dataset, they can take a long time to complete and may tax the system for the entirety of that time. Additionally, maintaining many complete copies of your databases over time can consume a great deal of storage space. ### Differential backups -Differential backups copy all data that has changed since the most recent full backup. This allows you to perform expensive, full backups once per backup cycle and then record further changes in smaller backups that use the full backup as a starting point. +Differential backups copy all data that has changed since the most recent full backup. This allows you to perform expensive, full backups once per backup cycle and then record further changes in smaller backups that use the full backup as a starting point. -Differential backups solve some of the core problems that come with taking frequent full backups. They are smaller than a full backup and thus take up less storage space and are faster to perform. +Differential backups solve some of the core problems that come with taking frequent full backups. They are smaller than a full backup and thus take up less storage space and are faster to perform. -While differential backups are an improvement over full backups, they still have some shortcomings. The more time that's passed since the most recent full backup, the larger the differential backups get. Additionally, having multiple differential backups lets you construct different points in time, but at the cost of essentially backing up the same changes more than once. +While differential backups are an improvement over full backups, they still have some shortcomings. The more time that's passed since the most recent full backup, the larger the differential backups get. Additionally, having multiple differential backups lets you construct different points in time, but at the cost of essentially backing up the same changes more than once. ### Incremental backups -Incremental backups are a modification of the strategy used by differential backups. While differential backups always record the differences since the last full backup, incremental backups record the differences since the last full backup *or* incremental backup. This means that instead of always layering on a full backup, incremental backups can restore data by starting with a full backup and then restoring multiple incremental backups. +Incremental backups are a modification of the strategy used by differential backups. While differential backups always record the differences since the last full backup, incremental backups record the differences since the last full backup _or_ incremental backup. This means that instead of always layering on a full backup, incremental backups can restore data by starting with a full backup and then restoring multiple incremental backups. -This system allows you to back up frequently while only recording each change in the system once. Each backup will only contain changes that have occurred since the last time any backup was run. This helps keep the size of each incremental backup manageable and allows you to construct different points in time by combining the most recent full backup with various numbers of incremental backups. +This system allows you to back up frequently while only recording each change in the system once. Each backup will only contain changes that have occurred since the last time any backup was run. This helps keep the size of each incremental backup manageable and allows you to construct different points in time by combining the most recent full backup with various numbers of incremental backups. -One downside of incremental backups is that restoring data can be a bit more complicated. The longer it's been since a full backup, the more incremental backups you'll have to apply to get to recent changes. This can take longer than restoring a single full backup and (at most) a single differential backup. +One downside of incremental backups is that restoring data can be a bit more complicated. The longer it's been since a full backup, the more incremental backups you'll have to apply to get to recent changes. This can take longer than restoring a single full backup and (at most) a single differential backup. ### Write ahead or transaction log backups -Databases frequently implement safety mechanisms to help recover from system crashes and unsafe shutdowns. Depending on the system, these may be called write-ahead logs (WAL) or transaction logs. While these are primarily used for crash recovery purposes, they can be used as a component of a backup strategy to allow for more flexible archiving. +Databases frequently implement safety mechanisms to help recover from system crashes and unsafe shutdowns. Depending on the system, these may be called write-ahead logs (WAL) or transaction logs. While these are primarily used for crash recovery purposes, they can be used as a component of a backup strategy to allow for more flexible archiving. -The basic idea behind WAL-based backups is to take a regular backup of the database's file system and then use the WAL to restore a consistent state to the database and replay any changes that occurred following the backup. This sounds similar to incremental backups, but there are some key differences. +The basic idea behind WAL-based backups is to take a regular backup of the database's file system and then use the WAL to restore a consistent state to the database and replay any changes that occurred following the backup. This sounds similar to incremental backups, but there are some key differences. -The first important difference is that the two components use different mediums. The full backup in this instance is a backup of database files without regard to having the records locked in a coherent state. The WAL then is responsible for fixing the state of the data and catching it up to the point where you want to restore. +The first important difference is that the two components use different mediums. The full backup in this instance is a backup of database files without regard to having the records locked in a coherent state. The WAL then is responsible for fixing the state of the data and catching it up to the point where you want to restore. -This system is also more flexible than traditional incremental backups because a single WAL can be replayed to various points in time. This gives you choices in how much data you want to restore. One disadvantage to this style of backup is that it affects the database system as a whole. You cannot only restore parts of the database while leaving the other parts intact. Because of this, it's not suitable for restoring individual tables or other database objects. +This system is also more flexible than traditional incremental backups because a single WAL can be replayed to various points in time. This gives you choices in how much data you want to restore. One disadvantage to this style of backup is that it affects the database system as a whole. You cannot only restore parts of the database while leaving the other parts intact. Because of this, it's not suitable for restoring individual tables or other database objects. ## Online vs offline backups One thing to keep in mind when deciding on a backup strategy is that certain backup mechanisms cannot be performed on a live system. -Backup methods that require the system to be offline typically have that restriction to ensure that the tool can capture a consistent view of the database at a certain point in time. If the system is being updated and records are changing as the backup is executing, the data from the beginning might not be valid by the time the process completes. +Backup methods that require the system to be offline typically have that restriction to ensure that the tool can capture a consistent view of the database at a certain point in time. If the system is being updated and records are changing as the backup is executing, the data from the beginning might not be valid by the time the process completes. -Offline backups do have a few advantages, however. Taking the database down means that it won't have to share resources with active users, which can make it faster to complete. The backup process itself can also be much less sophisticated since there won't be any processes actively changing the data. +Offline backups do have a few advantages, however. Taking the database down means that it won't have to share resources with active users, which can make it faster to complete. The backup process itself can also be much less sophisticated since there won't be any processes actively changing the data. -With that being said, taking the primary database down for every backup is not acceptable in many scenarios. Fortunately, there are backup methods that are designed to be used on live systems. Generally, this involves using a utility to query the database system directly so that the database structure and data can be copied to the filesystem. Aside from the permissions required, this is not much different from a regular client asking for table structure information and the data contained within. +With that being said, taking the primary database down for every backup is not acceptable in many scenarios. Fortunately, there are backup methods that are designed to be used on live systems. Generally, this involves using a utility to query the database system directly so that the database structure and data can be copied to the filesystem. Aside from the permissions required, this is not much different from a regular client asking for table structure information and the data contained within. -One of the main benefits of these "logical" backup tools (as opposed to the physical tools that work with raw files) is that they can rely on the database's own ability to present a unified consistent snapshot of the data at a specific point-in-time. The backup process in this context operates as a database user, so the cost of this capability is that it will be in contention for the limited database resources with other clients. +One of the main benefits of these "logical" backup tools (as opposed to the physical tools that work with raw files) is that they can rely on the database's own ability to present a unified consistent snapshot of the data at a specific point-in-time. The backup process in this context operates as a database user, so the cost of this capability is that it will be in contention for the limited database resources with other clients. ## Is replication a backup? One point of confusion for some users is why backups are required for databases at all if replication is configured. -[Database replication](/intro/database-glossary#replication) is a method of streaming a log of changes from one server to another server to mirror changes on a different system. Like backups, this also creates a duplicate of the system's data. However, replication should not be considered a safe backup strategy for a few important reasons. +[Database replication](/intro/database-glossary#replication) is a method of streaming a log of changes from one server to another server to mirror changes on a different system. Like backups, this also creates a duplicate of the system's data. However, replication should not be considered a safe backup strategy for a few important reasons. -Replication won't safeguard your data as well as a backup would in a large number of failure scenarios. The same mechanism that ensures that all changes are copied to a secondary server will also duplicate any problems with your primary database. For instance, if a record is unintentionally deleted on the primary database, that change will also be executed on any downstream replicas. This same process means that corrupt data will also be disseminated. +Replication won't safeguard your data as well as a backup would in a large number of failure scenarios. The same mechanism that ensures that all changes are copied to a secondary server will also duplicate any problems with your primary database. For instance, if a record is unintentionally deleted on the primary database, that change will also be executed on any downstream replicas. This same process means that corrupt data will also be disseminated. -Another reason replication is not a backup is that it doesn't have any obvious ability to restore data. While you can replicate changes back and forth between servers, replication isn't concerned with keeping previous versions of data and any historic view of your data will likely be lost when the logs are rotated. This "deficiency" is a reminder that replication is primarily designed to help organizations increase availability and performance rather than as a data preservation tool. +Another reason replication is not a backup is that it doesn't have any obvious ability to restore data. While you can replicate changes back and forth between servers, replication isn't concerned with keeping previous versions of data and any historic view of your data will likely be lost when the logs are rotated. This "deficiency" is a reminder that replication is primarily designed to help organizations increase availability and performance rather than as a data preservation tool. -With that being said, replication can be an important component of a backup and disaster recovery plan. For instance, replication can be useful in the event of a hardware failure on the primary database. In this case, administrators can quickly promote a replication follower to the leader role and continue to serve client requests to avoid a lengthy restoration procedure. Some organizations also set up a "delayed" replica which only applies changes after a period of time has elapsed, allowing them to quickly "roll back" the database state by switching to the replica if necessary. +With that being said, replication can be an important component of a backup and disaster recovery plan. For instance, replication can be useful in the event of a hardware failure on the primary database. In this case, administrators can quickly promote a replication follower to the leader role and continue to serve client requests to avoid a lengthy restoration procedure. Some organizations also set up a "delayed" replica which only applies changes after a period of time has elapsed, allowing them to quickly "roll back" the database state by switching to the replica if necessary. -Another case where replication is often involved in backup processes is as the target of backup operations. Many times, it's easier to back up a replica than the primary server. For instance, to get a consistent file-level backup, you could configure a secondary database as a replica of your production database. Once the database is synchronized, you can turn off replication temporarily and perform a backup of the replica without affecting your production traffic. When the backup is complete, you can turn replication back on to resynchronize the server with the changes that occurred during the backup window. +Another case where replication is often involved in backup processes is as the target of backup operations. Many times, it's easier to back up a replica than the primary server. For instance, to get a consistent file-level backup, you could configure a secondary database as a replica of your production database. Once the database is synchronized, you can turn off replication temporarily and perform a backup of the replica without affecting your production traffic. When the backup is complete, you can turn replication back on to resynchronize the server with the changes that occurred during the backup window. ## How do backups affect security? Backup strategy intersects with security considerations in a few different ways. -Backups can help protect against certain security incidents by providing secondary sources of data in scenarios like ransomware attacks. For instance, if an intruder is able to access and encrypt the contents of your primary database, having access to historical snapshots may give you more options for addressing the situation. For this to be an option, your backup destination must be in a separate security context than your source data to prevent an attacker from impacting your backups as well. +Backups can help protect against certain security incidents by providing secondary sources of data in scenarios like ransomware attacks. For instance, if an intruder is able to access and encrypt the contents of your primary database, having access to historical snapshots may give you more options for addressing the situation. For this to be an option, your backup destination must be in a separate security context than your source data to prevent an attacker from impacting your backups as well. -It's important to understand that your security policies need to be reflected in your backup strategy or else you could be inadvertently exposing sensitive information when you back up your data. For example, f your production systems implement security surrounding personally identifiable information (PII), your backups should be structured in a way to maintain that level of security. This might mean using separate encryption for different types of data or using separate backup locations for different types of data. Your specific situation will dictate what types of protections you need to incorporate within your backup strategy. +It's important to understand that your security policies need to be reflected in your backup strategy or else you could be inadvertently exposing sensitive information when you back up your data. For example, f your production systems implement security surrounding personally identifiable information (PII), your backups should be structured in a way to maintain that level of security. This might mean using separate encryption for different types of data or using separate backup locations for different types of data. Your specific situation will dictate what types of protections you need to incorporate within your backup strategy. -In some cases of sensitive data that won't be valuable long after collection, it is worthwhile to consider *not* backing up that data. Certain types of data collected or generated by systems can be useful in the moment but may represent an increased risk if stored long term. If the value you gain from the data is closely tied to its recency, you may be better off excluding it from your backup sets. +In some cases of sensitive data that won't be valuable long after collection, it is worthwhile to consider _not_ backing up that data. Certain types of data collected or generated by systems can be useful in the moment but may represent an increased risk if stored long term. If the value you gain from the data is closely tied to its recency, you may be better off excluding it from your backup sets. ## Where to store backups -One decision that you'll have to make as you design your backup strategy is where you wish to store the actual backup data. Many different factors may influence what type of backup destination is best for you. +One decision that you'll have to make as you design your backup strategy is where you wish to store the actual backup data. Many different factors may influence what type of backup destination is best for you. -In many cases, choosing a backup destination is a balancing act between ease of access, cost, security, and convenience. Having on-site backups allows for quick backups, but managing physical disks may be more than you're willing to manage. Furthermore, on-site backups don't protect against site-specific disasters like fire, flood, or theft. On the other hand, cloud-based backups can be convenient, but may tie you into a specific provider, cost more, and potentially take longer to recover. +In many cases, choosing a backup destination is a balancing act between ease of access, cost, security, and convenience. Having on-site backups allows for quick backups, but managing physical disks may be more than you're willing to manage. Furthermore, on-site backups don't protect against site-specific disasters like fire, flood, or theft. On the other hand, cloud-based backups can be convenient, but may tie you into a specific provider, cost more, and potentially take longer to recover. -Most of the time, it is a good idea to use multiple storage locations and mediums to help balance between the risks and benefits and to gain additional protection against data loss. As an example, you may use an object storage provider like Amazon S3 as the target for your primary backup rotation. Every month or so, you might move some of those backups for longer storage to an archival storage like Amazon S3 Glacier. You might also wish to back up your data to a different provider than you use for your production infrastructure. +Most of the time, it is a good idea to use multiple storage locations and mediums to help balance between the risks and benefits and to gain additional protection against data loss. As an example, you may use an object storage provider like Amazon S3 as the target for your primary backup rotation. Every month or so, you might move some of those backups for longer storage to an archival storage like Amazon S3 Glacier. You might also wish to back up your data to a different provider than you use for your production infrastructure. ## Backup tips @@ -117,38 +117,38 @@ Now that we've covered many of the components of a robust backup strategy, we ca ### Establish a backup rotation strategy -One of the first things you'll want to figure out is how frequently you want to perform backups and what combination backup types is most helpful. To do this, you need to establish a backup rotation so that you can back up as often as you need to without using unnecessary amounts of storage. +One of the first things you'll want to figure out is how frequently you want to perform backups and what combination backup types is most helpful. To do this, you need to establish a backup rotation so that you can back up as often as you need to without using unnecessary amounts of storage. -A backup rotation is basically a schedule that determines what type of backups to take at what interval. Generally, organizations implement a rotation so that they can have many recent backups while still keeping a useful amount of historical backups as an archive. Schedules are often created based on the performance impact of your backup mechanisms, what types of backups you need to take, your backup storage capacity, and cost. +A backup rotation is basically a schedule that determines what type of backups to take at what interval. Generally, organizations implement a rotation so that they can have many recent backups while still keeping a useful amount of historical backups as an archive. Schedules are often created based on the performance impact of your backup mechanisms, what types of backups you need to take, your backup storage capacity, and cost. -As an example of a fairly conventional backup strategy, you may start with taking one full backup of your database once a week. On the other days of the week, you may schedule an incremental backup so that you can restore to any specific day. You may wish to keep two full backups plus all of the intervening incremental backups at any one time. You may continuously archive daily WAL data as well so that you can restore to any specific point in time during that day. +As an example of a fairly conventional backup strategy, you may start with taking one full backup of your database once a week. On the other days of the week, you may schedule an incremental backup so that you can restore to any specific day. You may wish to keep two full backups plus all of the intervening incremental backups at any one time. You may continuously archive daily WAL data as well so that you can restore to any specific point in time during that day. -When a new backup occurs, the oldest backup may be deleted or occasionally transferred to longer-term storage. This gives you access to historical data if you need it, but the archive doesn't remain in your normal backup rotation. A system like this can help give you options for recovery while minimizing the increase in your total storage over time. +When a new backup occurs, the oldest backup may be deleted or occasionally transferred to longer-term storage. This gives you access to historical data if you need it, but the archive doesn't remain in your normal backup rotation. A system like this can help give you options for recovery while minimizing the increase in your total storage over time. ### Schedule and automate the backup process -Once you have decided on a backup rotation, its important to schedule and automate as much of the process as possible. Ensuring that your backups occur without supervision is an important part of safekeeping your data. +Once you have decided on a backup rotation, its important to schedule and automate as much of the process as possible. Ensuring that your backups occur without supervision is an important part of safekeeping your data. -Most backup mechanisms include scheduling components, so it shouldn't require special effort in most cases. It is important, however, to ensure that you have mechanisms in place to alert you when scheduled backups do not occur. This might be an email alert when a backup fails or a ping to a chat channel that your organization monitors. +Most backup mechanisms include scheduling components, so it shouldn't require special effort in most cases. It is important, however, to ensure that you have mechanisms in place to alert you when scheduled backups do not occur. This might be an email alert when a backup fails or a ping to a chat channel that your organization monitors. -Automating the backup process may also require you to think about how best to implement your security requirements and access requirements. Does a pull-based backup system, where the backup target pulls data from your production systems, make sense? What type of access does the backup process need to your systems? What permissions can you remove to limit the reach and impact of the backup account? These are the types of questions you'll need to ask yourself while implementing your backup system and especially when automating the process. +Automating the backup process may also require you to think about how best to implement your security requirements and access requirements. Does a pull-based backup system, where the backup target pulls data from your production systems, make sense? What type of access does the backup process need to your systems? What permissions can you remove to limit the reach and impact of the backup account? These are the types of questions you'll need to ask yourself while implementing your backup system and especially when automating the process. ### Test your backups often -One of the most important, and frequently ignored, activities of a robust backup system is testing. You need to regularly test that your backup files can be used to successfully restore data. If you cannot guarantee that your backups are valid, the entire process is of limited or no value. +One of the most important, and frequently ignored, activities of a robust backup system is testing. You need to regularly test that your backup files can be used to successfully restore data. If you cannot guarantee that your backups are valid, the entire process is of limited or no value. -Testing backups involves applying the backed up data to a clean system or to a system that has partial or differing data. It is important to know that you can restore in these scenarios and the exact process you need to execute to recover. This not only validates the integrity of your backup archives, it also ensures that your organization knows what steps must be executed to restore your system during high stress scenarios. It also gives you valuable data on how long different types of data restoration may take. +Testing backups involves applying the backed up data to a clean system or to a system that has partial or differing data. It is important to know that you can restore in these scenarios and the exact process you need to execute to recover. This not only validates the integrity of your backup archives, it also ensures that your organization knows what steps must be executed to restore your system during high stress scenarios. It also gives you valuable data on how long different types of data restoration may take. -Backups may be tested manually from time to time, but ideally, the recovery process should be part of the automation you implement for the rest of your backups. Backups can be restored to testing environments and test suites can be run to make sure that the data has the values and structure you expect it to have. If you do automate backup testing, don't forget to set up alerts in the event that the restoration fails. +Backups may be tested manually from time to time, but ideally, the recovery process should be part of the automation you implement for the rest of your backups. Backups can be restored to testing environments and test suites can be run to make sure that the data has the values and structure you expect it to have. If you do automate backup testing, don't forget to set up alerts in the event that the restoration fails. ## Conclusion -In this guide, we covered why database backups are so important and introduced some of the things you'll need to think about when implementing them. We went over the advantages of backups, different types and scopes of backups, the differences between online and offline backups, why replication isn't a backup strategy, and more. +In this guide, we covered why database backups are so important and introduced some of the things you'll need to think about when implementing them. We went over the advantages of backups, different types and scopes of backups, the differences between online and offline backups, why replication isn't a backup strategy, and more. -Being familiar with what choices you have as an organization and how different decisions can affect your performance, security, and availability is essential. While every project's backup needs are different, there is commonality in the need for persistent, trusted long-term storage to help recover from data problems. Taking the time to sort through your requirements and develop a comprehensive plan will allow you to safely and confidently move forward with less danger. +Being familiar with what choices you have as an organization and how different decisions can affect your performance, security, and availability is essential. While every project's backup needs are different, there is commonality in the need for persistent, trusted long-term storage to help recover from data problems. Taking the time to sort through your requirements and develop a comprehensive plan will allow you to safely and confidently move forward with less danger. -The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. +The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. diff --git a/content/10-managing-databases/10-intro-to-full-text-search.mdx b/content/10-managing-databases/10-intro-to-full-text-search.mdx index 348abef3..3e5e7b5f 100644 --- a/content/10-managing-databases/10-intro-to-full-text-search.mdx +++ b/content/10-managing-databases/10-intro-to-full-text-search.mdx @@ -7,86 +7,84 @@ authors: ['justinellingwood'] ## Introduction -One of the most familiar functions of a database system is the ability to retrieve items according to a given query. While conventional database queries are well suited to structured commands by users who are familiar with the system's tools and the structure of the data, most search functionality exposed to users uses a different technique. +One of the most familiar functions of a database system is the ability to retrieve items according to a given query. While conventional database queries are well suited to structured commands by users who are familiar with the system's tools and the structure of the data, most search functionality exposed to users uses a different technique. -In this article, we'll introduce the ideas behind full-text search. We'll discuss how this type of text processing and retrieval differs from conventional database querying and why it is helpful in many cases. We will explore some of the decisions that you can make during the indexing and querying process to affect the results you retrieve, and we'll discuss some of the trade-offs you'll have to make when interacting with these systems to get the results you need. +In this article, we'll introduce the ideas behind full-text search. We'll discuss how this type of text processing and retrieval differs from conventional database querying and why it is helpful in many cases. We will explore some of the decisions that you can make during the indexing and querying process to affect the results you retrieve, and we'll discuss some of the trade-offs you'll have to make when interacting with these systems to get the results you need. -The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. +The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. ## What is full-text search? -**[Full-text search](/intro/database-glossary#full-text-search)** is a term for the family of techniques that allow you to search the complete text of documents within a database system. This is in direct opposition to search functionality that relies only on metadata, partial text sources, and other incomplete assessments. +**[Full-text search](/intro/database-glossary#full-text-search)** is a term for the family of techniques that allow you to search the complete text of documents within a database system. This is in direct opposition to search functionality that relies only on metadata, partial text sources, and other incomplete assessments. -In contrast to regular database queries, full-text search is language-aware. This means that it relies on some level of understanding of the language used both in the stored documents and the search itself in order to retrieve results that are semantically meaningful. - -Full-text search often also includes the ability to rank and sort results based on their relative similarity to the user's query. This can help the engine display the most appropriate results first or disqualify low-ranking items unless explicitly requested. +In contrast to regular database queries, full-text search is language-aware. This means that it relies on some level of understanding of the language used both in the stored documents and the search itself in order to retrieve results that are semantically meaningful. +Full-text search often also includes the ability to rank and sort results based on their relative similarity to the user's query. This can help the engine display the most appropriate results first or disqualify low-ranking items unless explicitly requested. ## The importance of indexing While many database operations can be improved by indexing, full-text search is especially dependent on an asynchronous indexing processes. -Indexing is the process of parsing existing documents to compile an index. An [index](/intro/database-glossary#index) is a smaller, optimized structure specifically designed for quickly finding requested items. While normal database indexes might analyze a few fields to create an index, a full-text search index is created by analyzing the complete text of documents. +Indexing is the process of parsing existing documents to compile an index. An [index](/intro/database-glossary#index) is a smaller, optimized structure specifically designed for quickly finding requested items. While normal database indexes might analyze a few fields to create an index, a full-text search index is created by analyzing the complete text of documents. -Without an index, searches must complete a processes called [serial scanning](/intro/database-glossary#serial-scanning) where the system analyzes each item at query time to see whether they match the query criteria. For those familiar with Unix-like systems, this is akin to the difference between searching for a filename using the `find` tool, which scans the filesystem during a query, and the faster `locate` command, which relies on an index that is periodically updated. +Without an index, searches must complete a processes called [serial scanning](/intro/database-glossary#serial-scanning) where the system analyzes each item at query time to see whether they match the query criteria. For those familiar with Unix-like systems, this is akin to the difference between searching for a filename using the `find` tool, which scans the filesystem during a query, and the faster `locate` command, which relies on an index that is periodically updated. ### How indexes are created -The indexing process is composed of a number of related stages. First, documents are scanned by a parser to divide the text into individual "tokens". [Tokens](/intro/database-glossary#token) are discrete words that are known to the system and can be categorized based on their part of speech, relation to similar words, etc. +The indexing process is composed of a number of related stages. First, documents are scanned by a parser to divide the text into individual "tokens". [Tokens](/intro/database-glossary#token) are discrete words that are known to the system and can be categorized based on their part of speech, relation to similar words, etc. -Tokens are then processed into ["lexemes"](/intro/database-glossary#lexeme), a language-level unit of meaning. During this process, terms are often normalized to collapse related words into a single entry which allows the querying engine to return relevant results that are slight variations of the literal search. +Tokens are then processed into ["lexemes"](/intro/database-glossary#lexeme), a language-level unit of meaning. During this process, terms are often normalized to collapse related words into a single entry which allows the querying engine to return relevant results that are slight variations of the literal search. -The results of the analysis are then sorted and stored in an optimized index so that the querying engine can find relevant results and compare different factors about each document. The index includes information about the lexemes found in each document and may include additional context like positional data, lexeme density, etc. to help with more sophisticated search criteria. +The results of the analysis are then sorted and stored in an optimized index so that the querying engine can find relevant results and compare different factors about each document. The index includes information about the lexemes found in each document and may include additional context like positional data, lexeme density, etc. to help with more sophisticated search criteria. ### Balancing the indexing process -While we talked about the general steps required to create a full-text index, we left out some of the key factors that shape the index and resulting querying functionality. There are many different ways to index text, and the way that you choose to do so can have a large impact on the performance and characteristics of your search functionality. +While we talked about the general steps required to create a full-text index, we left out some of the key factors that shape the index and resulting querying functionality. There are many different ways to index text, and the way that you choose to do so can have a large impact on the performance and characteristics of your search functionality. -One optimization that almost all indexing processes adopt is the use of stop words. [Stop words](/intro/database-glossary#stop-words) are a list of words deemed irrelevant or too ambiguous to be useful during a search. They are often the most common words in a language that contain the least amount of relevant context. Some examples in English include "the", "a", and "it". During indexing, stop words are removed from the index, which helps keep the index smaller and faster for more relevant searches. +One optimization that almost all indexing processes adopt is the use of stop words. [Stop words](/intro/database-glossary#stop-words) are a list of words deemed irrelevant or too ambiguous to be useful during a search. They are often the most common words in a language that contain the least amount of relevant context. Some examples in English include "the", "a", and "it". During indexing, stop words are removed from the index, which helps keep the index smaller and faster for more relevant searches. -As mentioned earlier, the indexing process might also collapse closely related entries into a single item to keep the index small and provide a wider range of results for similar queries. One of these techniques, called ["stemming"](/intro/database-glossary#stemming), combines words that are variations of the same word "stem". For instance, "cook", "cooking", and "cooked" would be combined into a single entry. Other techniques might consult language-aware tools like thesauruses to map synonyms or identify phrases that might stand in for a word. +As mentioned earlier, the indexing process might also collapse closely related entries into a single item to keep the index small and provide a wider range of results for similar queries. One of these techniques, called ["stemming"](/intro/database-glossary#stemming), combines words that are variations of the same word "stem". For instance, "cook", "cooking", and "cooked" would be combined into a single entry. Other techniques might consult language-aware tools like thesauruses to map synonyms or identify phrases that might stand in for a word. -Many times, the decisions you make during indexing influence the quality of the search results you can get during the querying process. This is sometimes discussed as a balance between prioritizing recall and precision, both of which are technical terms used to describe different aspects of search effectiveness. +Many times, the decisions you make during indexing influence the quality of the search results you can get during the querying process. This is sometimes discussed as a balance between prioritizing recall and precision, both of which are technical terms used to describe different aspects of search effectiveness. -[Recall](/intro/database-glossary#recall) is the ratio of the returned relevant results compared to the total number of relevant results within a data set. A query with high recall retrieves a large percentage of the possible relevant results. +[Recall](/intro/database-glossary#recall) is the ratio of the returned relevant results compared to the total number of relevant results within a data set. A query with high recall retrieves a large percentage of the possible relevant results. -Related to this is the [precision](/intro/database-glossary#precision) of the search, which describes how many of the returned results were actually relevant. A query with high precision has a limited number of results that aren't very relevant to the given query. +Related to this is the [precision](/intro/database-glossary#precision) of the search, which describes how many of the returned results were actually relevant. A query with high precision has a limited number of results that aren't very relevant to the given query. -Techniques like stop words can increase the precision of the results by eliminating words that aren't important from the analysis. Stemming, on the other hand, primarily increases recall by catching instances where relevant results would be missing due to small word differences. These two concepts can influence one another, so both must be accounted for as you build your indexes to make sure your results have both the relevancy and volume you desire. +Techniques like stop words can increase the precision of the results by eliminating words that aren't important from the analysis. Stemming, on the other hand, primarily increases recall by catching instances where relevant results would be missing due to small word differences. These two concepts can influence one another, so both must be accounted for as you build your indexes to make sure your results have both the relevancy and volume you desire. ## Optimizing the indexing process Beyond stop words and stemming, there are other ways that database administrators can optimize the indexing process. -Some systems allow document authors to provide a list of relevant keywords along with the document text. These can be used as hints to the indexer as to the appropriate context of words and also can help dictate the intended primary subject of a document. They can also be used to help the indexer distinguish between [homographs](https://en.wikipedia.org/wiki/Homograph), words that are spelled the same but can have different meanings (like the difference between a "mean" person and the arithmetic "mean" of a number set). - -Developers and admins can also dictate the parsers, dictionaries, and token types that are used. These determine how the text is processed, broken down, and categorized during the indexing process. Switching out a parsing algorithm or the indexing pattern can change the structure of the index created, its performance with different types of queries, and how flexible it is in accommodating complex queries. +Some systems allow document authors to provide a list of relevant keywords along with the document text. These can be used as hints to the indexer as to the appropriate context of words and also can help dictate the intended primary subject of a document. They can also be used to help the indexer distinguish between [homographs](https://en.wikipedia.org/wiki/Homograph), words that are spelled the same but can have different meanings (like the difference between a "mean" person and the arithmetic "mean" of a number set). -A further point of influence that can have a large impact on future queries is weighing different factors in the document text. Administrators can assign increased ["weight"](/intro/database-glossary#search-weight), or relevancy, to words included in a document's title rather than its footnotes, for instance. Depending on the system, this may be fairly sophisticated and expressive. For instance, you might use a subject matter-specific word list upon analyzing a document title to assign increased weight to terms that are relevant to the subject on a per-document basis. +Developers and admins can also dictate the parsers, dictionaries, and token types that are used. These determine how the text is processed, broken down, and categorized during the indexing process. Switching out a parsing algorithm or the indexing pattern can change the structure of the index created, its performance with different types of queries, and how flexible it is in accommodating complex queries. +A further point of influence that can have a large impact on future queries is weighing different factors in the document text. Administrators can assign increased ["weight"](/intro/database-glossary#search-weight), or relevancy, to words included in a document's title rather than its footnotes, for instance. Depending on the system, this may be fairly sophisticated and expressive. For instance, you might use a subject matter-specific word list upon analyzing a document title to assign increased weight to terms that are relevant to the subject on a per-document basis. ## Influencing the query engine -For most full-text search systems, it is also important to allow expressiveness during the actual querying process. The querying interface can expose this expressiveness in many different ways. +For most full-text search systems, it is also important to allow expressiveness during the actual querying process. The querying interface can expose this expressiveness in many different ways. -One of the easiest ways to increase the level of control a user has during querying structured items is to allow searching by field. This is not as relevant in unstructured text, but can be incredibly useful when coupled with metadata to search using fields like authors, publication dates, titles, genres, and more. For fields with a small number of possible values, these can be directly selectable within the interface rather than searchable to increase usability. +One of the easiest ways to increase the level of control a user has during querying structured items is to allow searching by field. This is not as relevant in unstructured text, but can be incredibly useful when coupled with metadata to search using fields like authors, publication dates, titles, genres, and more. For fields with a small number of possible values, these can be directly selectable within the interface rather than searchable to increase usability. -Compound query operators are another simple way for full-text search tools to allow users to influence the querying engine. This allows users to structure queries using simple boolean logic like "and" to include multiple terms that should be present and "or" to include alternatives. It can also enable more complex functionality by letting users provide lists of terms that should *not* be included, phrases that should be present, or queries that account for proximity between words within text. +Compound query operators are another simple way for full-text search tools to allow users to influence the querying engine. This allows users to structure queries using simple boolean logic like "and" to include multiple terms that should be present and "or" to include alternatives. It can also enable more complex functionality by letting users provide lists of terms that should _not_ be included, phrases that should be present, or queries that account for proximity between words within text. -Another important way that search interfaces can influence the querying engine is by enabling "strict" querying modes. While it might be helpful to include close matches during regular operation, it is sometimes helpful to search only for the exact word or phrase given. Allowing users to change the querying mode between fuzzy and exact matching increases the likelihood of surfacing relevant results. +Another important way that search interfaces can influence the querying engine is by enabling "strict" querying modes. While it might be helpful to include close matches during regular operation, it is sometimes helpful to search only for the exact word or phrase given. Allowing users to change the querying mode between fuzzy and exact matching increases the likelihood of surfacing relevant results. ## Conclusion -In this article, we talked about what full-text search is and introduced some of the core concepts behind it. We discussed the difference between full-text search and conventional database queries, explained why indexes are of critical importance in this context, and went over some of the factors you might need to take into account when designing your search indexes. +In this article, we talked about what full-text search is and introduced some of the core concepts behind it. We discussed the difference between full-text search and conventional database queries, explained why indexes are of critical importance in this context, and went over some of the factors you might need to take into account when designing your search indexes. -Full-text search is an incredibly broad topic with many nuances, optimizations, balance considerations, and implementations. While this article isn't intended to be a definitive resource, it should hopefully serve as a strong conceptual foundation to build on as you continue to learn. +Full-text search is an incredibly broad topic with many nuances, optimizations, balance considerations, and implementations. While this article isn't intended to be a definitive resource, it should hopefully serve as a strong conceptual foundation to build on as you continue to learn. -The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/concepts/components/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. +The [Prisma Data Platform](https://www.prisma.io/data-platform) can help simplify access to your database in production environments. If you are using [Prisma Client](https://www.prisma.io/docs/orm/prisma-client) to manage your database connections, the Prisma Data Platform may help you manage your production workloads more easily. diff --git a/content/11-serverless/02-serverless-comparison.mdx b/content/11-serverless/02-serverless-comparison.mdx index e40f9690..3834ce3e 100644 --- a/content/11-serverless/02-serverless-comparison.mdx +++ b/content/11-serverless/02-serverless-comparison.mdx @@ -309,7 +309,7 @@ Cloudflare Workers are capable of delivering high performance at a lower cost to -You can learn how to deploy a Cloudflare Worker that uses Prisma to save every request to a MongoDB database for inspection later in the [Prisma docs](https://www.prisma.io/docs/guides/deployment/deployment-guides/deploying-to-cloudflare-workers). +You can learn how to deploy a Cloudflare Worker that uses Prisma to save every request to a MongoDB database for inspection later in the [Prisma docs](https://www.prisma.io/docs/orm/prisma-client/deployment/edge/deploy-to-cloudflare-workers). @@ -400,7 +400,7 @@ While different from some of the other offerings because of this positioning, Ne -You can easily deploy Node.JS applications built with Prisma to Netlify. This [deployment guide](https://www.prisma.io/docs/guides/deployment/deployment-guides/deploying-to-netlify) covers the process in greater detail. +You can easily deploy Node.JS applications built with Prisma to Netlify. This [deployment guide](https://www.prisma.io/docs/orm/prisma-client/deployment/serverless/deploy-to-netlifyy) covers the process in greater detail. @@ -582,7 +582,7 @@ Learn more about how [Prisma and Planetscale work together](https://www.prisma.i ### MongoDB Atlas Serverless -MongoDB Atlas is a hosted, multi-cloud version of MongoDB. They are now offering [MongoDB Atlas Serverless](https://www.mongodb.com/cloud/atlas/serverless) as a way of interacting with your MongoDB database according to the serverless model. The serverless version of their hosted database offers a seamless scaling experience to accommodate highly variable and infrequent workloads while only charging for the resources used. +MongoDB Atlas is a hosted, multi-cloud version of MongoDB. They are now offering [MongoDB Atlas Serverless](https://www.mongodb.com/cloud/atlas/serverless) as a way of interacting with your MongoDB database according to the serverless model. The serverless version of their hosted database offers a seamless scaling experience to accommodate highly variable and infrequent workloads while only charging for the resources used. ### CockroachDB Serverless @@ -608,7 +608,7 @@ Deno Deploy hopes to provide you with a unified developing experience by using t ### Prisma Accelerate -[Prisma Accelerate](https://www.prisma.io/docs/data-platform/accelerate) is a tool designed to make connecting to databases simpler in a serverless context. Because the instances used by serverless providers to execute functions are, by nature, ephemeral, it is easy to exhaust the connection pool to any backing databases if too many function instances are active at once. +[Prisma Accelerate](https://www.prisma.io/docs/accelerate) is a tool designed to make connecting to databases simpler in a serverless context. Because the instances used by serverless providers to execute functions are, by nature, ephemeral, it is easy to exhaust the connection pool to any backing databases if too many function instances are active at once. Prisma Accelerate provides a solution for this problem by acting as an intermediary to a database. Serverless instances can instead connect to Prisma Accelerate which manages the connection pooling automatically to avoid failed or delayed serverless evocations due to resource contention. diff --git a/content/11-serverless/03-serverless-challenges.mdx b/content/11-serverless/03-serverless-challenges.mdx index b7ca8d0f..689a86b1 100644 --- a/content/11-serverless/03-serverless-challenges.mdx +++ b/content/11-serverless/03-serverless-challenges.mdx @@ -8,99 +8,98 @@ authors: ['justinellingwood'] ## Introduction -The [serverless](https://www.prisma.io/dataguide/serverless/what-is-serverless) paradigm represents a notable shift in the way that application and web developers interact with infrastructure, language runtimes, and supplemental services. It offers the freedom to focus on your primary area of concern by abstracting away and taking responsibility for many of the environmental factors that traditionally affect the way code runs in production. +The [serverless](https://www.prisma.io/dataguide/serverless/what-is-serverless) paradigm represents a notable shift in the way that application and web developers interact with infrastructure, language runtimes, and supplemental services. It offers the freedom to focus on your primary area of concern by abstracting away and taking responsibility for many of the environmental factors that traditionally affect the way code runs in production. -While [serverless computing has many benefits](https://www.prisma.io/dataguide/serverless/what-is-serverless#when-is-serverless-a-good-choice), it also has some challenges that must be acknowledged or addressed before you can be successful. In this guide, we'll talk about some of the main pain points of the current generation of solutions and discuss what they mean and how you might work around them. You should come away with a better understanding of what requirements you may have to fulfill and what roadblocks you may run into. +While [serverless computing has many benefits](https://www.prisma.io/dataguide/serverless/what-is-serverless#when-is-serverless-a-good-choice), it also has some challenges that must be acknowledged or addressed before you can be successful. In this guide, we'll talk about some of the main pain points of the current generation of solutions and discuss what they mean and how you might work around them. You should come away with a better understanding of what requirements you may have to fulfill and what roadblocks you may run into. -[Prisma Accelerate](https://www.prisma.io/docs/data-platform/accelerate) provides one way to handle connection issues between your serverless applications and backend databases. It can help manage ephemeral connections from your serverless functions to avoid exhausting your database connection pool. Check it out now! +[Prisma Accelerate](https://www.prisma.io/docs/accelerate) provides one way to handle connection issues between your serverless applications and backend databases. It can help manage ephemeral connections from your serverless functions to avoid exhausting your database connection pool. Check it out now! ## Cold start problems -One of the most commonly discussed challenges when working with serverless is called the [*cold start*](/serverless/serverless-glossary#cold-start) problem. While the goal with serverless is allow functions to immediately be executed on demand, there are some scenarios that may result in predictable delays. +One of the most commonly discussed challenges when working with serverless is called the [_cold start_](/serverless/serverless-glossary#cold-start) problem. While the goal with serverless is allow functions to immediately be executed on demand, there are some scenarios that may result in predictable delays. ### What is the cold start problem? -One big selling point for serverless is the ability to scale to zero in periods of no activity. If a function is not actively being executed, the function's resources are spun down, returning capacity to the platform and reducing the cost for the user of reserving those components. This is ideal from a cost perspective as it means users only pay for the time and resources their code actually executes. +One big selling point for serverless is the ability to scale to zero in periods of no activity. If a function is not actively being executed, the function's resources are spun down, returning capacity to the platform and reducing the cost for the user of reserving those components. This is ideal from a cost perspective as it means users only pay for the time and resources their code actually executes. -The downside to this is that when the resources spin down completely, there is a predictable delay the next time it needs to execute. The resources need to be reallocated to run the function, which takes time. You end up with one set of performance characteristics for "hot" functions that have been recently used and another profile for "cold" functions that need to wait for the platform to create the execution environment. +The downside to this is that when the resources spin down completely, there is a predictable delay the next time it needs to execute. The resources need to be reallocated to run the function, which takes time. You end up with one set of performance characteristics for "hot" functions that have been recently used and another profile for "cold" functions that need to wait for the platform to create the execution environment. ### How do developers try to address cold start? -There are a number of ways developers and platforms have tried to address this problem. Some developers schedule "dummy" requests to keep the resources associated with their functions on standby. Many platforms have added an additional tier to their services to allow developers to automatically keep resources on standby. +There are a number of ways developers and platforms have tried to address this problem. Some developers schedule "dummy" requests to keep the resources associated with their functions on standby. Many platforms have added an additional tier to their services to allow developers to automatically keep resources on standby. -These solutions start to blur the line a bit as to what constitutes a serverless environment. When developers are forced to pay for standby resources when their code is not actively executing, it raises some questions about some of the fundamental claims of the serverless paradigm. +These solutions start to blur the line a bit as to what constitutes a serverless environment. When developers are forced to pay for standby resources when their code is not actively executing, it raises some questions about some of the fundamental claims of the serverless paradigm. -A recent alternative to preallocating resources has been to sidestep the problem by switching to a lighter runtime environment. Runtimes like V8 have a much different execution strategy than traditional serverless and are able to avoid cold start problems by using different isolation technologies and a more pared down environment. They avoid cold start issues at the expense of compatibility with functions that have dependencies on a more robust environment. +A recent alternative to preallocating resources has been to sidestep the problem by switching to a lighter runtime environment. Runtimes like V8 have a much different execution strategy than traditional serverless and are able to avoid cold start problems by using different isolation technologies and a more pared down environment. They avoid cold start issues at the expense of compatibility with functions that have dependencies on a more robust environment. ## Application design constraints -Another challenge that is fundamental to the serverless model is the application design it imposes. Serverless platforms are only useful for applications that can work within their constraints. Some of these are inherent in cloud computing in general, while other requirements are dictated by the serverless model specifically. +Another challenge that is fundamental to the serverless model is the application design it imposes. Serverless platforms are only useful for applications that can work within their constraints. Some of these are inherent in cloud computing in general, while other requirements are dictated by the serverless model specifically. ### Designing a cloud-friendly architecture -The first requirement that applications must meet to use serverless platforms is to be designed in a cloud-friendly way. In the context of this discussion, at the very least this means that the application must be at least partly deployable to a cloud service that the other components are able to communicate with. And while it's possible to implement [monolithic functions](/intro/database-glossary#monolithic-architecture) in your design, the serverless model best accommodates [microservice architectures](/intro/database-glossary#microservice-architecture). +The first requirement that applications must meet to use serverless platforms is to be designed in a cloud-friendly way. In the context of this discussion, at the very least this means that the application must be at least partly deployable to a cloud service that the other components are able to communicate with. And while it's possible to implement [monolithic functions](/intro/database-glossary#monolithic-architecture) in your design, the serverless model best accommodates [microservice architectures](/intro/database-glossary#microservice-architecture). -The upshot of this is that your application must be designed in part as a series of functions executed by your serverless provider. You must be comfortable with the processing taking place on infrastructure that you do not control. Furthermore, you must be able to decompose your application's functionality into discrete functions that can be executed remotely. +The upshot of this is that your application must be designed in part as a series of functions executed by your serverless provider. You must be comfortable with the processing taking place on infrastructure that you do not control. Furthermore, you must be able to decompose your application's functionality into discrete functions that can be executed remotely. ### Dealing with stateless execution -Serverless functions are, by design, stateless. That means that, while some information may possibly be cached if the function is executed with the same resources, you can't rely on any state being shared between invocations of your functions. +Serverless functions are, by design, stateless. That means that, while some information may possibly be cached if the function is executed with the same resources, you can't rely on any state being shared between invocations of your functions. -You must design your functions to have all of the information they need to execute internally. Any external state must be fetched at the beginning of invocation and exported before finishing. Since functions may be executed in parallel, this also limits what type of state may reasonable be acted upon. In general, the less state that your functions have to manage, the faster and cheaper they will be to execute and the less complexity you will have to manage. +You must design your functions to have all of the information they need to execute internally. Any external state must be fetched at the beginning of invocation and exported before finishing. Since functions may be executed in parallel, this also limits what type of state may reasonable be acted upon. In general, the less state that your functions have to manage, the faster and cheaper they will be to execute and the less complexity you will have to manage. -There are other side effects of the function's ephemeral nature as well. If your functions need to reach out to a database system, there is a good chance you may quickly exhaust your database's connection pool. Since each invocation of your functions can be executed in a different context, your database's connection pool can quickly drain as it responds to different invocations or tries to return resources to its pool. Solutions like [Prisma Accelerate](https://www.prisma.io/docs/data-platform/accelerate) help mitigate these issues by managing the connection resources for the serverless instances in front of whatever connection pooling is in place. +There are other side effects of the function's ephemeral nature as well. If your functions need to reach out to a database system, there is a good chance you may quickly exhaust your database's connection pool. Since each invocation of your functions can be executed in a different context, your database's connection pool can quickly drain as it responds to different invocations or tries to return resources to its pool. Solutions like [Prisma Accelerate](https://www.prisma.io/docs/accelerate) help mitigate these issues by managing the connection resources for the serverless instances in front of whatever connection pooling is in place. ## Provider lock-in concerns -One challenge that is difficult to get away from with serverless is provider lock-in. When you architect your application to rely on external functions running on a specific provider's platform, it can be difficult to migrate to a different platform at a later time. +One challenge that is difficult to get away from with serverless is provider lock-in. When you architect your application to rely on external functions running on a specific provider's platform, it can be difficult to migrate to a different platform at a later time. ### What types of lock-in can occur? -For applications built targeting a specific serverless platform, many different factors can interfere with cleanly migrating to another provider. These may result from the serverless implementation itself or from use of the provider's related services that might be integrated into the application design. +For applications built targeting a specific serverless platform, many different factors can interfere with cleanly migrating to another provider. These may result from the serverless implementation itself or from use of the provider's related services that might be integrated into the application design. -In terms of lock-in caused by the actual serverless implementation, one of the most basic differences between providers can be the languages supported for defining functions. If your application functions are written in a language not supported by other candidate providers, migration will be impossible without reimplementing the logic in a supported language. A more subtle example of serverless incompatibilities are the differences in the way that different providers conceptualize and expose the triggering mechanisms for functions within the platform. You might need to redefine how your trigger is implemented on your new platform if those mechanisms differ significantly. +In terms of lock-in caused by the actual serverless implementation, one of the most basic differences between providers can be the languages supported for defining functions. If your application functions are written in a language not supported by other candidate providers, migration will be impossible without reimplementing the logic in a supported language. A more subtle example of serverless incompatibilities are the differences in the way that different providers conceptualize and expose the triggering mechanisms for functions within the platform. You might need to redefine how your trigger is implemented on your new platform if those mechanisms differ significantly. -Other types of lock-in can occur when serverless applications use other services in their provider's ecosystem to support their application. For example, since serverless functions don't handle state, it's common to use the provider's object storage offering to store any artifacts produced during invocation. While object storage is widely implemented using a standard interface, it demonstrates how the constraints of the serverless architecture can lead to greater adoption and dependence on the ecosystem of other available services. +Other types of lock-in can occur when serverless applications use other services in their provider's ecosystem to support their application. For example, since serverless functions don't handle state, it's common to use the provider's object storage offering to store any artifacts produced during invocation. While object storage is widely implemented using a standard interface, it demonstrates how the constraints of the serverless architecture can lead to greater adoption and dependence on the ecosystem of other available services. ### What developers do to try to limit lock-in There are some ways that developers can attempt to minimize the likelihood or impact of lock-in for their applications. -Writing your functions in a widely supported language like JavaScript is one of the easiest ways to avoid hard dependencies. If your language of choice is supported by many providers, it gives you options for other platforms that might be able to run your code. +Writing your functions in a widely supported language like JavaScript is one of the easiest ways to avoid hard dependencies. If your language of choice is supported by many providers, it gives you options for other platforms that might be able to run your code. -Developers can also try to limit their use of services to those that are commodity offerings supported almost the same on each platform. For instance, the object storage example we used before is actually an ideal example of a service that is likely replaceable by another provider's offering. The more specialized the service you're depending on, the more difficult it will be to move out of the ecosystem. This is a trade-off you'll have to evaluate on a case-by-case basis, as you might have to forgo specialized tools for their more generic counterparts. +Developers can also try to limit their use of services to those that are commodity offerings supported almost the same on each platform. For instance, the object storage example we used before is actually an ideal example of a service that is likely replaceable by another provider's offering. The more specialized the service you're depending on, the more difficult it will be to move out of the ecosystem. This is a trade-off you'll have to evaluate on a case-by-case basis, as you might have to forgo specialized tools for their more generic counterparts. ## Concerns about lack of control and insight when debugging -One of the common complaints levied at serverless by developers evaluating it for future projects is the lack of control and insight serverless platforms provide. Part of this is inherent in the offering itself, as control of the infrastructure running the code would, necessarily, disqualify the service from the serverless category. Still, developers are often still apprehensive about deploying in an environment that limits visibility and control, especially when it comes to diagnosing issues that might affect uptime and impact production. +One of the common complaints levied at serverless by developers evaluating it for future projects is the lack of control and insight serverless platforms provide. Part of this is inherent in the offering itself, as control of the infrastructure running the code would, necessarily, disqualify the service from the serverless category. Still, developers are often still apprehensive about deploying in an environment that limits visibility and control, especially when it comes to diagnosing issues that might affect uptime and impact production. ### What types of differences can developers expect? -The promise of the serverless paradigm is to shift the responsibility for everything but the code itself to the platform provider. This can yield many advantages in terms of operations overhead and simplifying the execution environment for developers, but it also makes many techniques and tools that developers might typically rely on either more difficult or impossible to use. +The promise of the serverless paradigm is to shift the responsibility for everything but the code itself to the platform provider. This can yield many advantages in terms of operations overhead and simplifying the execution environment for developers, but it also makes many techniques and tools that developers might typically rely on either more difficult or impossible to use. -For instance, some developers are used to being able to debug by accessing the programming environment directly, either by connecting to a host with SSH or by introspecting the code and using data that is exposed by the process. These are not generally possible or easy in serverless environments because the execution environment is opaque to the user and only specific interfaces like function logs are available for debugging. This can make it difficult to diagnose problems, especially when it's impossible to reproduce locally or when multiple functions are invoked in a pipeline. +For instance, some developers are used to being able to debug by accessing the programming environment directly, either by connecting to a host with SSH or by introspecting the code and using data that is exposed by the process. These are not generally possible or easy in serverless environments because the execution environment is opaque to the user and only specific interfaces like function logs are available for debugging. This can make it difficult to diagnose problems, especially when it's impossible to reproduce locally or when multiple functions are invoked in a pipeline. ### What options are available to help? There are a number of different strategies developers can adopt to help them work within this more limited debugging environment. -Some serverless functionality can be run or emulated locally, allowing developers to debug on their own machine what they are unable to debug in production on their provider. A number of tools were designed to emulate common serverless platforms so that developers can recapture some of the diagnostic capabilities they might be missing. They can allow you to step through functions, see state information, and set breakpoints. +Some serverless functionality can be run or emulated locally, allowing developers to debug on their own machine what they are unable to debug in production on their provider. A number of tools were designed to emulate common serverless platforms so that developers can recapture some of the diagnostic capabilities they might be missing. They can allow you to step through functions, see state information, and set breakpoints. -For debugging on the platform itself, you have to try to take advantage of all of the tools offered by the provider. This often means logging heavily within your functions, using API testing tools to trigger functions automatically with different input, and using any metrics the platform offers to try to gain insight into what may be happening in the execution environment. +For debugging on the platform itself, you have to try to take advantage of all of the tools offered by the provider. This often means logging heavily within your functions, using API testing tools to trigger functions automatically with different input, and using any metrics the platform offers to try to gain insight into what may be happening in the execution environment. ## Wrapping up -Serverless environments offer a lot of value in terms of developer productivity, reduced operational complexity, and real cost savings. However, it's important to remain aware of the limitations of the paradigm and some of the special challenges that you might have to address when designing applications to operate in serverless environments. - -By gaining familiarity with the different obstacles that you might face, you can make a better informed decision as to what applications might benefit most from the trade-offs present. You will also be better prepared to approach these problems with a better understanding of how to mitigate them or avoid them through additional tooling or design considerations. +Serverless environments offer a lot of value in terms of developer productivity, reduced operational complexity, and real cost savings. However, it's important to remain aware of the limitations of the paradigm and some of the special challenges that you might have to address when designing applications to operate in serverless environments. +By gaining familiarity with the different obstacles that you might face, you can make a better informed decision as to what applications might benefit most from the trade-offs present. You will also be better prepared to approach these problems with a better understanding of how to mitigate them or avoid them through additional tooling or design considerations. -[Prisma Accelerate](https://www.prisma.io/docs/data-platform/accelerate) provides one way to handle connection issues between your serverless applications and backend databases. It can help manage ephemeral connections from your serverless functions to avoid exhausting your database connection pool. Check it out now! +[Prisma Accelerate](https://www.prisma.io/docs/accelerate) provides one way to handle connection issues between your serverless applications and backend databases. It can help manage ephemeral connections from your serverless functions to avoid exhausting your database connection pool. Check it out now! diff --git a/content/11-serverless/04-traditional-vs-serverless-databases.mdx b/content/11-serverless/04-traditional-vs-serverless-databases.mdx index 30c88f2d..722fcbfd 100644 --- a/content/11-serverless/04-traditional-vs-serverless-databases.mdx +++ b/content/11-serverless/04-traditional-vs-serverless-databases.mdx @@ -148,6 +148,6 @@ While serverless databases are not suitable for every type of application, they -[Prisma Accelerate](https://www.prisma.io/docs/data-platform/accelerate) provides one way to handle connection issues between your serverless applications and backend databases. It can help manage ephemeral connections from your serverless functions to avoid exhausting your database connection pool. Check it out now! +[Prisma Accelerate](https://www.prisma.io/docs/accelerate) provides one way to handle connection issues between your serverless applications and backend databases. It can help manage ephemeral connections from your serverless functions to avoid exhausting your database connection pool. Check it out now!