Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] cache_data expire time #201

Open
jcpunk opened this issue Jun 6, 2022 · 5 comments
Open

[Feature] cache_data expire time #201

jcpunk opened this issue Jun 6, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@jcpunk
Copy link

jcpunk commented Jun 6, 2022

Affected Puppet, Ruby, OS and module versions/distributions

  • Puppet: 7.14
  • Ruby: 3.0
  • Distribution: Fedora
  • Module version: 6.0.0

What behaviour did you expect instead

Could an additional parameter be added to the cache_data function setting the max lifetime of the file? This would let me trivially rotate certain resources to a new value over time.

Any additional information you'd like to impart

This could have interesting interactions with facter's new caching infrastructure.

@ekohl ekohl added the enhancement New feature or request label Jun 9, 2022
@ekohl
Copy link
Member

ekohl commented Jun 9, 2022

Interesting idea. Given the current function is straight up dumping the data (as YAML) in the file (source) there is no room for metadata. That means it would need to take the mtime into account. Do you think that would be reliable enough? IMHO it would and would approve a PR that implements it.

@alexjfisher
Copy link
Member

I was taking a look at this function. I wonder if we should have a version that takes the name of a function, such that the function doesn't get run if the cache can return the data instead?

I think my potential use-case is slightly different though. I'd like a generic way to cache the results of expensive function calls. eg. if I have an expensive puppetdb query or ldapquery and I don't need to worry if the data is eg up to 5 minutes old.

@tam116
Copy link

tam116 commented Mar 13, 2024

I think my potential use-case is slightly different though. I'd like a generic way to cache the results of expensive function calls. eg. if I have an expensive puppetdb query or ldapquery and I don't need to worry if the data is eg up to 5 minutes old.

I was looking for this exact thing as well. I've been able to create a crude version by using exported resources because I only have one node that has the "query" class applied and several other nodes that can realize the resource. However this isn't super intuitive code and my cache timeout=frequency of puppet run on "query" node.

@tam116
Copy link

tam116 commented Mar 14, 2024

Using the cache_data function as starting point I created a new function, cache_function which does what @alexjfisher and I would like, cache the results of other functions with a configurable timeout. I've never done rspec testing and barely done any ruby coding so I'm going to leave this here for now so other can at least starting using it and suggest improvements. If I find the time I'll try to learn how to do proper testing and submit a complete pull request but I'm not going to complain if someone else beats me to it.

# frozen_string_literal: true

require "fileutils"
require "yaml"
require "etc"

# @summary Caches the result of a function call until timeout
#
# Retrieves data from a cache file, or runs the supplied function if the
# file doesn't exist or is older than the cache timeout.
#
# Useful for temporarily caching the results of expensive or slow functions once on the master side
# whose results don't change that frequently (e.g. ldapquery::search). Because it's
# stored on the master on disk, it doesn't work when you use mulitple Puppet
# masters that don't share their vardir.
#
# @example Calling the function
#   $ldap_result = cache_data('ldap', 'my_query_result', 'ldapquery::search', [undef, $ldap_search_string, $ldap_attributes])
#
Puppet::Functions.create_function(:'cache_function') do
  # @param namespace Namespace for the cache
  # @param name Cache key within the namespace
  # @param timeount Number of seconds to cache the data for, a setting of 0 will disable caching
  # @param function The function to run when there is no cache yet
  # @param function_args The arguments to pass to the function as an array
  # @return The cached value when it exists. The initial data when no cache exists
  dispatch :cache_function do
    param "String[1]", :namespace
    param "String[1]", :name
    param "String[1]", :timeout
    param "String[1]", :function
    param "Optional[Array[Any]]", :function_args
    return_type "Any"
  end

  def cache_function(namespace, name, timeout, function, function_args)
    cache_dir = File.join(Puppet[:vardir], namespace)
    cache = File.join(cache_dir, name)

    if File.exist? cache and timeout.to_i > 0
      if File.mtime(cache) > Time.now - timeout.to_i
        return YAML.safe_load(File.read(cache))
      end
    end
    result = ""
    FileUtils.mkdir_p(cache_dir)
    File.open(cache, "w", 0o600) do |c|
      result = call_function(function, *function_args)
      c.write(YAML.dump(result))
    end
    File.chown(File.stat(Puppet[:vardir]).uid, nil, cache)
    File.chown(File.stat(Puppet[:vardir]).uid, nil, cache_dir)
    result
  end
end

@alexjfisher
Copy link
Member

I wrote an implementation too, but cached in memory as I wasn't sure how to safely have multiple jruby puppet processes using the same cache files. Not sure if it's worth taking further or now, but I'll leave it here in case anyone's interested.

# frozen_string_literal: true

require 'benchmark'

Puppet::Functions.create_function(:'cache_function', Puppet::Functions::InternalFunction) do
  dispatch :cache_function do
    scope_param
    param 'String[1]', :function
    optional_param 'Array', :args
    optional_param 'Integer[0]', :expiry
  end

  def cache_function(scope, function, args = [], expiry = 300)
    stacktrace = Puppet::Pops::PuppetStack.stacktrace
    file, line = stacktrace[0]

    key = generate_cache_key(function, args, file, line)

    result = nil
    from_cache = false

    time_in_seconds = Benchmark.realtime do
      if (result = fetch_from_cache(key))
        @cache_hits = cache_hits + 1
        from_cache = true
      else
        @cache_misses = cache_misses + 1
        result = scope.call_function(function, args)
        store_in_cache(key, result, expiry) if result
      end
    end

    Puppet.info("Function `#{function}` took #{(time_in_seconds * 1000).round}ms in #{file}, line:#{line}, from_cache:#{from_cache}, total_hits:#{cache_hits}, total_misses:#{cache_misses}")

    result
  end

  def cache
    # This cache is per environment and per puppet/jruby instance
    @cache ||= {}
  end

  def cache_hits
    @cache_hits ||= 0
  end

  def cache_misses
    @cache_misses ||= 0
  end

  def generate_cache_key(*args)
    Digest::SHA2.hexdigest args.to_json
  end

  def fetch_from_cache(key)
    expire_cache
    return unless cache[key]

    cache[key][:value]
  end

  def expire_cache
    Puppet.info("Expiring from function cache. Current size: #{cache.size}")
    cache.each do |k, v|
      cache.delete(k) if v[:ttl] < Time.now.to_i
    end
    Puppet.info("New cache size size: #{cache.size}")
  end

  def store_in_cache(key, result, expiry)
    cache[key] = {
      ttl: Time.now.to_i + expiry,
      value: result
    }
  end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants