[Feature] `cache_data` expire time #201

jcpunk · 2022-06-06T18:47:21Z

Affected Puppet, Ruby, OS and module versions/distributions

Puppet: 7.14
Ruby: 3.0
Distribution: Fedora
Module version: 6.0.0

What behaviour did you expect instead

Could an additional parameter be added to the cache_data function setting the max lifetime of the file? This would let me trivially rotate certain resources to a new value over time.

Any additional information you'd like to impart

This could have interesting interactions with facter's new caching infrastructure.

The text was updated successfully, but these errors were encountered:

ekohl · 2022-06-09T12:41:25Z

Interesting idea. Given the current function is straight up dumping the data (as YAML) in the file (source) there is no room for metadata. That means it would need to take the mtime into account. Do you think that would be reliable enough? IMHO it would and would approve a PR that implements it.

alexjfisher · 2024-03-12T14:45:52Z

I was taking a look at this function. I wonder if we should have a version that takes the name of a function, such that the function doesn't get run if the cache can return the data instead?

I think my potential use-case is slightly different though. I'd like a generic way to cache the results of expensive function calls. eg. if I have an expensive puppetdb query or ldapquery and I don't need to worry if the data is eg up to 5 minutes old.

tam116 · 2024-03-13T20:42:16Z

I think my potential use-case is slightly different though. I'd like a generic way to cache the results of expensive function calls. eg. if I have an expensive puppetdb query or ldapquery and I don't need to worry if the data is eg up to 5 minutes old.

I was looking for this exact thing as well. I've been able to create a crude version by using exported resources because I only have one node that has the "query" class applied and several other nodes that can realize the resource. However this isn't super intuitive code and my cache timeout=frequency of puppet run on "query" node.

tam116 · 2024-03-14T19:43:58Z

Using the cache_data function as starting point I created a new function, cache_function which does what @alexjfisher and I would like, cache the results of other functions with a configurable timeout. I've never done rspec testing and barely done any ruby coding so I'm going to leave this here for now so other can at least starting using it and suggest improvements. If I find the time I'll try to learn how to do proper testing and submit a complete pull request but I'm not going to complain if someone else beats me to it.

# frozen_string_literal: true

require "fileutils"
require "yaml"
require "etc"

# @summary Caches the result of a function call until timeout
#
# Retrieves data from a cache file, or runs the supplied function if the
# file doesn't exist or is older than the cache timeout.
#
# Useful for temporarily caching the results of expensive or slow functions once on the master side
# whose results don't change that frequently (e.g. ldapquery::search). Because it's
# stored on the master on disk, it doesn't work when you use mulitple Puppet
# masters that don't share their vardir.
#
# @example Calling the function
#   $ldap_result = cache_data('ldap', 'my_query_result', 'ldapquery::search', [undef, $ldap_search_string, $ldap_attributes])
#
Puppet::Functions.create_function(:'cache_function') do
  # @param namespace Namespace for the cache
  # @param name Cache key within the namespace
  # @param timeount Number of seconds to cache the data for, a setting of 0 will disable caching
  # @param function The function to run when there is no cache yet
  # @param function_args The arguments to pass to the function as an array
  # @return The cached value when it exists. The initial data when no cache exists
  dispatch :cache_function do
    param "String[1]", :namespace
    param "String[1]", :name
    param "String[1]", :timeout
    param "String[1]", :function
    param "Optional[Array[Any]]", :function_args
    return_type "Any"
  end

  def cache_function(namespace, name, timeout, function, function_args)
    cache_dir = File.join(Puppet[:vardir], namespace)
    cache = File.join(cache_dir, name)

    if File.exist? cache and timeout.to_i > 0
      if File.mtime(cache) > Time.now - timeout.to_i
        return YAML.safe_load(File.read(cache))
      end
    end
    result = ""
    FileUtils.mkdir_p(cache_dir)
    File.open(cache, "w", 0o600) do |c|
      result = call_function(function, *function_args)
      c.write(YAML.dump(result))
    end
    File.chown(File.stat(Puppet[:vardir]).uid, nil, cache)
    File.chown(File.stat(Puppet[:vardir]).uid, nil, cache_dir)
    result
  end
end

alexjfisher · 2024-03-15T15:43:17Z

I wrote an implementation too, but cached in memory as I wasn't sure how to safely have multiple jruby puppet processes using the same cache files. Not sure if it's worth taking further or now, but I'll leave it here in case anyone's interested.

# frozen_string_literal: true

require 'benchmark'

Puppet::Functions.create_function(:'cache_function', Puppet::Functions::InternalFunction) do
  dispatch :cache_function do
    scope_param
    param 'String[1]', :function
    optional_param 'Array', :args
    optional_param 'Integer[0]', :expiry
  end

  def cache_function(scope, function, args = [], expiry = 300)
    stacktrace = Puppet::Pops::PuppetStack.stacktrace
    file, line = stacktrace[0]

    key = generate_cache_key(function, args, file, line)

    result = nil
    from_cache = false

    time_in_seconds = Benchmark.realtime do
      if (result = fetch_from_cache(key))
        @cache_hits = cache_hits + 1
        from_cache = true
      else
        @cache_misses = cache_misses + 1
        result = scope.call_function(function, args)
        store_in_cache(key, result, expiry) if result
      end
    end

    Puppet.info("Function `#{function}` took #{(time_in_seconds * 1000).round}ms in #{file}, line:#{line}, from_cache:#{from_cache}, total_hits:#{cache_hits}, total_misses:#{cache_misses}")

    result
  end

  def cache
    # This cache is per environment and per puppet/jruby instance
    @cache ||= {}
  end

  def cache_hits
    @cache_hits ||= 0
  end

  def cache_misses
    @cache_misses ||= 0
  end

  def generate_cache_key(*args)
    Digest::SHA2.hexdigest args.to_json
  end

  def fetch_from_cache(key)
    expire_cache
    return unless cache[key]

    cache[key][:value]
  end

  def expire_cache
    Puppet.info("Expiring from function cache. Current size: #{cache.size}")
    cache.each do |k, v|
      cache.delete(k) if v[:ttl] < Time.now.to_i
    end
    Puppet.info("New cache size size: #{cache.size}")
  end

  def store_in_cache(key, result, expiry)
    cache[key] = {
      ttl: Time.now.to_i + expiry,
      value: result
    }
  end
end

ekohl added the enhancement New feature or request label Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] `cache_data` expire time #201

[Feature] `cache_data` expire time #201

jcpunk commented Jun 6, 2022

ekohl commented Jun 9, 2022

alexjfisher commented Mar 12, 2024

tam116 commented Mar 13, 2024

tam116 commented Mar 14, 2024 •

edited

Loading

alexjfisher commented Mar 15, 2024

[Feature] cache_data expire time #201

[Feature] cache_data expire time #201

Comments

jcpunk commented Jun 6, 2022

Affected Puppet, Ruby, OS and module versions/distributions

What behaviour did you expect instead

Any additional information you'd like to impart

ekohl commented Jun 9, 2022

alexjfisher commented Mar 12, 2024

tam116 commented Mar 13, 2024

tam116 commented Mar 14, 2024 • edited Loading

alexjfisher commented Mar 15, 2024

[Feature] `cache_data` expire time #201

[Feature] `cache_data` expire time #201

tam116 commented Mar 14, 2024 •

edited

Loading