-
Notifications
You must be signed in to change notification settings - Fork 68
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
47bfa0b
commit b6eec65
Showing
11 changed files
with
170 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Abstract data type | ||
## Definition | ||
ADT is a **mathematical model** for data types, analogous to an algebraic structure. It consists of a domain, a collection of operations, and a set of constraints the operations must satisfy. | ||
ADTs are a **theoretical concept**, used in formal semantics and program verification and, less strictly, design and analysis of algorithms. | ||
## ADT vs data structures | ||
ADT, as a mathematical model, contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not an user. | ||
For example, a stack has push/pop operations that follow a LIFO rule, and can be concretely implemented using either a list or an array. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Data structures | ||
|
||
## What is a data structure? | ||
A data structure is a data organization and storage format for efficient access to data. More precisely, it's a collection of data values, the relationships among them, and the operations that can be applied to the data (i.e., it's an *algebraic structure* about data). | ||
We can split data structures into two things: the interface and the implementation. | ||
## Interface | ||
The interface is like a contract that specifies how we can interact with the data structure -- what operations we can perform on it, what inputs it expects, and what outputs we can expect. | ||
For example, consider a dynamic array. The interface would include operations like appending, insertion, removal, updating, and more. | ||
## Implementation | ||
The implementation is the code that actually makes the data structure work. It includes the details of how the data is stored and how the operations are performed. | ||
For example, the implementation of a dynamic array might involve allocating memory for the list, tracking the size, and rearranging the elements when an operation like remove is called. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Collisions | ||
## Definition | ||
When different keys convert to the same integer, it is called a collision. Without handling collisions, older keys will get overridden and data will be lost. | ||
## Collision resolution | ||
There are [multiple ways](https://en.wikipedia.org/wiki/Hash_table#Collision_resolution) to handle collisions. | ||
### Chaining | ||
We store linked lists inside the hash map's array instead of the elements themselves. The linked list nodes store both the key and the value. | ||
If there are collisions, the collided key-value pairs are linked together in a linked list. Then, when trying to access one of these key-value pairs, we traverse through the linked list until the key matches. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# Hash function | ||
## Definition | ||
A hash function is any function that can be used to map data of arbitrary size to fixed-size values (though there are some hash functions that support variable-length output). | ||
A hash function may be considered to perform **three functions**: | ||
- Convert variable-length keys into fixed-length values, by folding them by words or other units using a parity-preserving operator like ADD or XOR. | ||
- Scramble the bits of the key so that the resulting values are uniformly distributed over the keyspace. | ||
- Map the key values into ones less than or equal to the size of the table. | ||
### Hash values and digests | ||
The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. | ||
The values are usually used to index a fixed-size table called a *hash table*. | ||
## Usage | ||
- **Data storage and retrieval**: Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. | ||
- **Integrity checking**: Identical hash values for different files imply equality, providing a reliable means to detect file modifications. | ||
- **Key derivation**: Minor input changes result in a random-looking output alteration. | ||
- **Message authentication codes** (MACs): Through the integration of a confidential key with the input data, hash functions can generate MACs ensuring the genuineness of the data (e.g., HMACs). | ||
- **Signatures**: Message hashes are signed rather than the whole message. | ||
## What makes a good hash function? | ||
A good hash function satisfies two basic properties: | ||
- Should be very fast to compute. | ||
- Should minimize duplication of output values (collisions). | ||
Hash functions rely on generating favorable probability distributions for their effectiveness, reducing access time to nearly constant. | ||
High table loading factors, pathological key sets, and poorly designed hash functions can result in access times approaching linear in the number of items in the table. | ||
## Collition-resolution | ||
A necessary adjunct to the hash function is a collision-resolution method that employs an auxiliary data structure like linked lists, or systematic probing of the table to find an empty slot. | ||
## Properties | ||
### Uniformity | ||
It should map the expected inputs as evenly as possible over its output range. That is, every hash value should be generated with roughly the same probability. | ||
|
||
> [!tip] Uniformly distributed is not random | ||
> This criterion only requires the value to be *uniformly distributed*, not *random* in any sense. A good randomizing function is (barring computational efficiency concerns) generally a good choice as a hash function, but the converse need not be true. | ||
#### Absolute uniformity | ||
In special cases when the keys are known in advance and the key set is static, a hash function can be found that achieves absolute (or collision-less) uniformity. Such a hash function is said to be *perfect*. | ||
There is no algorithmic way of constructing such a function -- searching for one is a factorial function of the number of keys to be mapped versus the number of table slots that they are mapped into. | ||
Finding a perfect hash function over more than a very small set of keys is usually computationally infeasible; the resulting function is likely to be more computationally complex than a standard hash function and provides only a marginal advantage over a function with good statistical properties that yields a minimum number of collisions. | ||
### Testing and measurement | ||
When testing a hash function, the uniformity of the distribution can be evaluated by the [chi-squared test](https://en.wikipedia.org/wiki/Chi-squared_test) which is a goodness-of-fit measure: it's the actual distribution of items in buckets versus the expected distribution of items. | ||
### Efficiency | ||
A hash function takes a finite amount of time to map a potentially large keyspace to a feasible amount of storage space searchable in a bounded amount of time regardless of the number of keys. | ||
In most applications, the hash function should be computable with minimum latency and secondarily in a minimum number of instructions. | ||
|
||
> [!tip] Space-time trade-off | ||
> In data storage and retrieval applications, the use of a hash function is a trade-off between search time and data storage space. | ||
> If [memory](https://en.wikipedia.org/wiki/Computer_memory "Computer memory") is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if infinite time is available, values can be stored without regard for their keys, and a [binary search](https://en.wikipedia.org/wiki/Binary_search "Binary search") or [linear search](https://en.wikipedia.org/wiki/Linear_search "Linear search") can be used to retrieve the element. | ||
### Applicability | ||
A hash function that allows only certain table sizes or strings only up to a certain length, or cannot accept a seed, is less useful than one that does. | ||
### Deterministic | ||
A hash procedure **must be deterministic** -- for a given input value, it must always generate the same hash value. | ||
### Defined range | ||
It is often desirable that the output of a hash function have fixed size. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Hash map | ||
## Definition | ||
A hash map (also known as hash table or dictionary) is an **unordered data structure that stores key-value pairs**. | ||
It implements an associative array, which is an abstract data type that maps keys to values. It uses a [[Hash function]] to compute an *index*, into an array of *buckets* or *slots*, from which the desired value can be found. | ||
![[Pasted image 20241114124706.png]] | ||
Typically, the only constraint on a hash map's key is that it has to be **immutable**. | ||
## Advantages | ||
- It allows to reduce the time complexity of a search algorithm by a factor of $O(n)$ for a huge amount of problems. | ||
- It can add, update, check if exists and remove elements in $O(1)$. | ||
## Disadvantages | ||
- For smaller input sizes, they can be slower due to overhead. | ||
- Can take up more space than arrays. | ||
- When implemented using a fixed size array, resizing is much more expensive than a normal array because every existing key needs to be re-hashed, and a hash table may use an array that is significantly larger than the number of elements stored, resulting in a huge waste of space. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Hashing | ||
## Definition | ||
Use of hash function to index a [[Hash map]] is called *hashing* or *scatter-storage addressing*. | ||
Hashing is a computationally- and storage-space-efficient form of data access that **avoids the non-constant access time** of ordered and unordered lists and structured trees, and the often-exponential storage requirements of direct access of state spaces of large or variable-length keys. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Sets | ||
## Definition | ||
A set is an **unordered** abstract data type that can **store unique values**. It is a computer implementation of the mathematical concept of a *finite set*. | ||
|
||
> [!tip] Test membership | ||
> Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set. | ||
## Advantages | ||
You can add, remove, and check if an element exists in a set in $O(1)$. | ||
## Sets vs hash table | ||
Sets use the same mechanism for hashing keys into integers but the difference is that sets do not map their keys to anything. | ||
Sets are convenient to use when you only care about checking if element exists. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Universal hashing | ||
## Definition | ||
Universal hashing refers to selecting a hash function at random from a family of hash functions with a certain mathematical property. This guarantees a low number of collisions in expectation. | ||
Many universal families are known (for hashing integers, vectors, strings), and their evaluation is often very efficient. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,16 @@ | ||
# Data structures | ||
## General | ||
- [[Data structure|What's a data structure?]] | ||
- [[Abstract Data Type]] | ||
## Arrays | ||
- [[Array|What's an array?]] | ||
- [[Two pointers]] | ||
- [[Sliding window]] | ||
- [[Prefix sum]] | ||
- [[Prefix sum]] | ||
## Hashing | ||
- [[Hash function]] | ||
- [[Collisions]] | ||
- [[Hashing]] | ||
- [[Universal hashing]] | ||
- [[Hash map]] | ||
- [[Sets]] |