Skip to content

Prefetcher

lukemartinlogan edited this page May 16, 2023 · 19 revisions

Prefetcher

The objective of the Prefetcher is to promote and demote content which is expected to be used in the near future or frequently. The prefetcher only applies to data which is already staged within Hermes. In order to activate prefetching, a Prefetcher Trait can be attached to a Tag (or Bucket) to indicate that prefetching should be enabled and which kind of prefetching should be applied.

Usage

To enable prefetching, attach the Prefetcher trait to a tag (or Bucket). In this example, we attach a DeterministicPrefetcherTrait to the SimulationBucket, which represents the data for the simulation workload.

TagId bkt_id = HERMES->GetBucketId("SimulationBucket")
TraitId trait_id = HERMES->GetTraitId("DeterministicPrefetcherTrait");
HERMES->AttachTrait(trait_id, bkt_id);

Application Tracing

In order to support prefetching, we implement a tracing system within Hermes. The tracer is called for every Put and Get operation within Hermes. It stores the information called for the Put or Get internally within a multiple-producer single-consumer (MPSC) shared-memory queue, which is asynchronously digested by the prefetcher.

The tracer collects the following information:

  1. Operation (Put or Get)
  2. Blob Id
  3. Bucket Id
  4. Blob Size
  5. Timestamp (from program start)
  6. Rank (if MPI)

In the binary file, we store the following information:

  1. Operation (Put or Get)
  2. Blob Name (64-bit Hash)
  3. Bucket Name (64-bit Hash)
  4. Blob Size (64-bit)
  5. Timestamp (from program start)
  6. Rank (if MPI)

Note, we store hashes of Blob Name and Bucket Name to reduce the space complexity of the binary log file. There is no need to know the full file name, and the hashes will likely be unique enough. Note, we use the Blob Name and Bucket Name in the binary log output file instead of Blob Id and Bucket Id. This is because the IDs can change between application runs, whereas names will be consistent.

Deterministic Prefetcher

Currently, we implement the deterministic prefetcher. Many applications exhibit completely deterministic I/O patterns. Deep Learning applications for example will have the same I/O pattern when the randomness seed is fixed and all other paramters remain the same. Many HPC workloads are executed repeatedly with the same parameters for reasons such as reproducability. This prefetcher assumes that the user will supply an I/O trace log.

Future Work

  1. Live Prefetcher: Use some sort of short-term memory models to ensure that data to a bucket
Clone this wiki locally