Skip to content

Commit

Permalink
Update concepts.qmd
Browse files Browse the repository at this point in the history
Grammar fixes
  • Loading branch information
maouw authored Dec 13, 2023
1 parent c65b0db commit 94d9474
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions docs/storage/concepts.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,23 @@ There are many different types of storage. Some of the most common types of stor

### Object storage

Object storage is a type of storage that stores data as objects. Each object contains data, metadata, and a unique identifier. Objects are stored in a flat address space, and can be accessed using a URL. Object storage is designed to store large amounts of unstructured data. Object storage is not designed to be used as a file system. Object storage is typically accessed using a REST API. Object storage is typically used for storing data that is not actively being worked on, but needs to be stored for a long time. [Amazon S3][#amazon-web-services] and [Azure Blobs](#azure) are examples of object storage services.
Object storage is a type of storage that stores data as objects. Each object contains data, metadata, and a unique identifier. Objects are stored in a flat address space and can be accessed using a URL. Object storage is designed to store large amounts of unstructured data. Object storage is *not* designed to be used as a file system. Object storage is typically accessed using a REST API. Object storage is typically used for storing data that is not actively being worked on but needs to be stored for a long time. [Amazon S3][#amazon-web-services] and [Azure Blobs](#azure) are examples of object storage services.

### Block storage

Block storage is a type of storage that stores data as blocks. Each block contains data and a unique identifier. Blocks are stored in a hierarchical address space, and can be accessed using a file system. Block storage is designed to store large amounts of structured data. Block storage is typically used for storing data that is actively being worked on. [Amazon EBS](#amazon-web-services) is an example of block storage.
Block storage is a type of storage that stores data as blocks. Each block contains data and a unique identifier. Blocks are stored in a hierarchical address space and can be accessed using a file system. Block storage is designed to store large amounts of structured data. Block storage is typically used for storing data that is actively being worked on. [Amazon EBS](#amazon-web-services) is an example of block storage.

### File storage

File storage is a type of storage that stores data as files. Your computer's local filesystem uses file storage. Network filesystems such as NFS and SMB, and cloud storage services such as [Azure Files](#azure) and [Amazon EFS](#amazon-web-services) also use file storage. File storage is designed to store large amounts of structured data. File storage is designed to be used by multiple users at the same time and supports creating, reading, updating, and deleting files. File storage is typically used for storing data that is actively being worked on. [Azure Files](#azure) is an example of file storage.

#### Sync services

Cloud-based file sync services, such as [Dropbox](#dropbox), [Google Drive](#google-drive), and [OneDrive](#uw-onedrive-for-business), offer file storage that can easily be accessed from multiple devices. These services usually offer convenient features such as file versioning, recovery, and file sharing. You can access the files stored in these services from a directory on your computer, and the files are automatically synchronized with the cloud storage service. However, the services may not support all of the features of a traditional file system (e.g. file permissions, symbolic links) and often have restrictions on the length of file names and the types of characters that can be used in file names, which may cause serious problems when using files stored in these services with some applications (e.g., [FreeSurfer](../software/freesurfer.qmd)). Caution is advised when working with files stored in these services.
Cloud-based file sync services, such as [Dropbox](#dropbox), [Google Drive](#google-drive), and [OneDrive](#uw-onedrive-for-business), offer file storage that can easily be accessed from multiple devices. These services usually offer convenient features such as file versioning, recovery, and file sharing. You can access the files stored in these services from a directory on your computer, and the files are automatically synchronized with the cloud storage service. However, the services may not support all of the features of a traditional file system (e.g., file permissions, symbolic links) and often have restrictions on the length of file names and the types of characters that can be used in file names, which may cause serious problems when using files stored in these services with some applications (e.g., [FreeSurfer](../software/freesurfer.qmd)). Caution is advised when working with files stored in these services.

### Version control systems

Version control systems, such as [Git](https://git-scm.com/), are designed to store and manage different versions of files. Version control systems are typically used for storing source code, but can also be used for storing other types of files. There are some version control systems that are designed to handle datasets, such as [DVC](https://dvc.org/). [GitHub](#github) provides online hosting and collaboration services for Git repositories.
Version control systems, such as [Git](https://git-scm.com/), are designed to store and manage different versions of files. Version control systems are typically used for storing source code but can also be used for storing other types of files. There are some version control systems that are designed to handle datasets, such as [DVC](https://dvc.org/). [GitHub](#github) provides online hosting and collaboration services for Git repositories.

### Data repositories

Expand All @@ -39,36 +39,36 @@ There are a number of characteristics and considerations that are important when

### Storage capacity

How much data is being stored. The more data that is being stored, the more it will cost to store the data. Storage services may have minimum and/or maximum storage capacity requirements, and may charge for storage in increments of a certain size (e.g. 1 TiB).
How much data is being stored. The more data that is being stored, the more it will cost to store the data. Storage services may have minimum and/or maximum storage capacity requirements. Some services may charge for storage in increments of a certain size (e.g., 1 TiB).

### Storage latency

How quickly data can start being read or written. The lower the latency, the faster data can be read or written. Storage services may charge more for lower latency. A solid-state drive (SSD) has lower latency than a hard disk drive (HDD), which has lower latency than a tape drive. However, capacity on SSDs is more expensive than capacity on HDDs, which is more expensive than capacity on tape drives.
How quickly data can start being read or written. The lower the latency, the faster data can be read or written. Storage services may charge more for lower latency. A solid-state drive (SSD) has lower latency than a hard disk drive (HDD), which has lower latency than a tape drive. However, storage capacity on SSDs is more expensive than storage capacity on HDDs, which is more expensive than storage capacity on tape drives.

### Data transfer

The rate at which data can be downloaded from or uploaded to a service may be limited. The higher the bandwidth, the faster data can be downloaded from or uploaded to the service. Service providers may charge more for higher bandwidth.

Data transfer costs may be charged for uploading data to and downloading data from a service, also known as *ingress* and *egress*. Typically, ingress is free and egress is charged. Some services may charge more for uploading data to and downloading data from certain locations (e.g. outside of the United States). Some providers offer options such as physical media transfer (e.g. mailing a hard drive) for uploading and downloading large amounts of data (see [AWS Snowball](https://aws.amazon.com/snowball/)).
Data transfer costs may be charged for uploading data to and downloading data from a service, also known as *ingress* and *egress*. Typically, ingress is free and egress is charged. Some services may charge more for uploading data to and downloading data from certain locations (e.g., outside of the United States). Some providers offer options such as physical media transfer (e.g., mailing a hard drive) for uploading and downloading large amounts of data (see [AWS Snowball](https://aws.amazon.com/snowball/)).

### Retention period

How long data needs to be stored. The longer data needs to be stored, the more it will cost to store the data. Billing for storage services is typically done on a monthly or annual basis.

### Backup and recovery options

How often data is backed up, how long backups are kept, and how quickly data can be recovered from backups. The more often data is backed up, the longer backups are kept, and the faster data can be recovered from backups, the more it will cost to store the data. Some services may not provide backup and recovery options, or may charge extra for backup and recovery options.
How often data is backed up, how long backups are kept, and how quickly data can be recovered from backups. The more often data is backed up, the longer backups are kept, and the faster data can be recovered from backups, the more it will cost to store the data. Backup and recovery options may not be available or involve extra charges on some services.

### Frequency of access

How often data is accessed. Some services may charge for each time data is accessed, or may charge more for more frequent access.
How often data is accessed. Some services may charge for each time data is accessed or charge more for more frequent access.

### Data access restrictions

Who can access the data. Data access restrictions may be inherent to a service (e.g., only people connected to the UW intranet can access the data) or may be imposed by the user, the organization, or the service provider. Sharing data with outside collaborators may require additional steps such as creating accounts for the collaborators, or may not be possible at all.
Who can access the data. Data access restrictions may be inherent to a service (e.g., only people connected to the UW intranet can access the data) or may be imposed by the user, the organization, or the service provider. Sharing data with outside collaborators may require additional steps, such as creating accounts for the collaborators, or may not be possible at all.

### Data security

How secure the data is against unauthorized access. While [access restrictions](#data-access-restrictions) can limit who can access the data in certain circumstances, measures such as encryption may be necessary to prevent access by users who have access to the data but should not be able to access the data (e.g., system administrators).

Research data that is subject to HIPAA or FERPA regulations must be stored in a HIPAA or FERPA compliant service. For more information about HIPAA at UW, see the [HIPPA Guidance](https://www.washington.edu/research/policies/guidance-hipaa-2/) page.
Research data that is subject to HIPAA or FERPA regulations must be stored in a HIPAA- or FERPA-compliant service. For more information about HIPAA at UW, see the [HIPPA Guidance](https://www.washington.edu/research/policies/guidance-hipaa-2/) page.

0 comments on commit 94d9474

Please sign in to comment.