This ARM is focused specifically on the problem of providing Hiera data in Puppet modules.
Currently, module authors use a 'params class pattern' to provide defaults for
the parameters their classes accept. A module for managing a database service
mydb
, for example, will provide a class mydb::params
, which contains
parameter assignments like $mydb::params::tcp_port
,
$mydb::params::install_dir
, etc. These assignments can use the puppet DSL for
conditional logic so that the install_directory
follows different OS'
filesystem conventions appropriately. The parameter values are then used in the
module's other classes, either in the prototype for the class or directly in
the manifest:
class mydb::packages (
$tcp_port = $mydb::params::tcp_port,
$install_dir = $mydb::params::install_dir,
...
) {
# class resources go here
}
The params pattern has some drawbacks (summarized from user comments on #16856 and the subsequent mailing list discussion:
- it lives apart from hiera, which we've landed on and promoted as being the solution to the data/code separation problem
- it’s an attempt to supply default data to your classes and falls over whenever someone wants to change data or provide their own for their site. They’re forced to either run with a modified module or commit changes upstream, which may contain private data.
- We'd like to build a simple, consistent interface for data in modules that ties into our larger data/code separation story. This means building on the concepts in Hiera, which advanced users have found to be flexible and powerful enough for their needs without imposing a huge cognitive load on their non-expert team members.
- The implementation described here is a low risk, opt-in mechanism to integrate technology based on Cloudsmith's work in Geppetto and Stackhammer. It presents a significant opportunity to improve areas that have historically been troublesome in Puppet and hiera: data typing, lookup precedence, introspection of variable lookups, round-trips from the Puppet DSL to a UI and back.
None stated; but this probably means we need help questioning our assumptions.
If our approach is successful:
- new users who find and integrate modules that make use of hiera2 will find that it's easy to understand and meets their needs for site customization.
- module authors will find the approach to be a significant improvement over their current methods of providing bindings of values-to-variables which are based on conditional logic.
- current hiera users will find the new capabilities to be an advancement in expressiveness, power and predictability.
I think this is largely covered by the Goals section.
The functionality consists of:
- Hiera-2 - a reimplementation of Hiera (referred to as Hiera-1)
- Composition of data across multiple sources (most notably modules)
- Implicit lookup - (parameters for parameterized classes). Can combine Hiera1 (at site level) with Hiera2 from modules (or use Hiera-2 throughout). Hiera-1 is not supported inside modules.
- Explicit lookup via function
If you like examples more than theory, you may want to skip to the next section, or even go directly to the Examples and come back here when a term is unclear.
Binding - conceptually this term means "value bound to a key". If you are familiar with Hiera already; when writing apache::port: 80 in a hiera data file, this is a binding that binds the value 80 to the key/name apache::port.
Bindings - simply a collection of Binding
Category / Categories - if you are familiar with Hiera already, this is the term chosen to denote an entry in the hierarchy — you can use it interchangeably with the term hierarchy, but the hierarchy term is difficult to use when talking about a specific entry in the hierarchy – the best English word seems to be 'strata' - which is a bit ceremonial, and the term "category" was instead chosen. The term makes it easier to talk about them e.g. 'the node category has higher priority than the environment category', 'In which category did we find the value for the key 'apache::port'?', 'Do we have a category for data-center?'. The term was also chosen because there are multiple things that makes up the total "hierarchy" of things; there is data at the site level as well as the module level, and it is possible to do overrides. The term "hierarchy" was kept when dealing with Hiera based data (for sake of familiarity), but it is referred to as "categories" instead of "hierarchy" when dealing with the configuration of the overall system.
Layers - this is simply the top-most specification of the hierarchy; as you will see later, this defines that site may override modules, and if you like, that some modules may override others.
Hiera-2 is different from Hiera-1 in the following ways:
- The
hiera.yaml
has more information. - Interpolation uses Puppet language interpolation (syntax and expressive power).
- All decisions about composition of data are made "in the data" (by the author of the data), not when looking things up.
- Only the equivalence of Hiera-1’s priority lookup is supported in the first
implementation.
There are also internal changes that affects those that want to write backends.
Hiera 2’s config file looks like this:
---
version: 2
hierarchy:
[ ['osfamily', '${osfamily}', '${osfamily}' ],
['common', 'true', 'common' ]
]
backends:
- yaml
- json
(NOTE: there are several ways to write this in YAML, the notation above is shorter than an explicit listing of each entry using '-' ; what to use is up to the author, as long as the file is valid YAML and contains an array of arrays of three strings).
The hierarchy is no longer a simple list of paths to search (as in Hiera1), instead an array of arrays is used where each array describes three things:
- the name of the “category”
- the value / expression of that category (e.g. ‘kermit.example.com’ in the category ‘node’)
- the path to the Hiera-2 data file
What are these used for? Why the change?
The category name (i.e. ‘node’, ‘environment’, ‘common’) is used to ensure that all contributed data bindings bind them with the same priority. Since all contributions are composed into one coherent set of bindings it would be very bad if modules did not have the same relative order.
The category value is there to enable checking that all modules have the same notion of what it means to be in a category. Some providers of bindings may filter the result (of what they contribute) based on these values, and it would be bad if bindings were contributed that would never take effect.
The path is there (just like in hiera1) to point the implementation to the data file that contains the values for this particular category. The path is relative to the location of the directory containing the hiera.yaml file.
Four categories are predefined; node
, osfamily
, environment
, and common
- these have to have this
relative precedence (node having the highest priority (of the three), and common the lowest).
Note that “everything” is in the common category.
(The implementation may change to allow ‘common’ to be specified without the category value since it
is meaningless; although a path to a file still is. By convention, the value should be set to ‘true’
).
The backends array has the same semantics as in Hiera-1. The first implementation has support
for json
and yaml
, and it is possible to plugin/extend the system with Hiera-2 backends (as well as extending
the system in other ways (a backend is not always the best choice).
The data files (in json or yaml) work the same way as in Hiera-1 except that interpolations are written using Puppet
syntax (the future parser is used for this). This means that ${}
should be used instead of the %{}
syntax
for interpolation in Hiera-1.
This also means that any r-value expression can be used, not only variable references. Most notably it is possible to lookup a value!
Here are some examples:
some_key: 'Hello ${some_var}'
some_other_key: 'Hello ${some_array[2]}'
yay: 'Hello ${lookup(akey)}'
The composition of the bindings is driven by a configuration. If nothing is specified, a default configuration will be
used (as shown below). If there is a $confdir/binder_config.yaml
, it will be used instead of the default, and
finally, if the setting :binder_conf
is set to point to a yaml file, that file is used instead of any
existing $confdir/binder_config.yaml
.
The binder_config.yaml
defines how all contributions to the bindings are composed. The default looks like this:
---
version: 1
layers: [
{ 'name': 'site',
'include': ['confdir-hiera:/', 'confdir:/default?optional']
},
{ 'name': 'modules',
'include': ['module-hiera:/*/', 'module:/*::default']
}
]
categories:
[['node', "${fqdn}"],
['osfamily', "${osfamily}"],
['environment', "${environment}"],
['common', "true"]
]
There are two main entries; layers (which defines the layering of contributions),
and categories (which defines the names of valid categories, and their priority).
Contributions in a layer with higher priority completely override those in lower layers (i.e.; by default - anything
defined at site level overrides everything contributed from modules. Note, there is nothing preventing
bindings from the $confdir
to be placed in the same layer as contributions from modules, and only have site-wide
overrides in the top layer).
The categories is a list in priority order (highest priority first) giving the name of the category, and the expression that evaluates to the category value. That is; all categories used in all modules must have a corresponding entry in this category section - but there are no paths specified.
Note: Although this may look as a chore at first, it is expected that almost all public modules only contain default values in the common category. It is not an error to include a category that is not used in a particular configuration; so once your organization has defined the various categories to use, they can stay unchanged.
Note about the category value expression: it should use variable names without '::'
(i.e. for facts) as the
evaluation takes place in top scope, and it is slightly more efficient to leave out the '::'
(it works to enter
them with '::'
as well e.g. '::fqdn'
. For lookup in the ::settings
namespace there is no difference in speed).
At the point of evaluation of categories, only facts and ::settings
are initialized, and the top scope has not yet
finished its evaluation (i.e. the site.pp
manifest has not yet been evaluated).
The layers describe which contributions to include in the resulting bindings. Contributions in higher layers have higher priority; their contributions win over those in lower layers even if the priority of an individual binding is lower that the corresponding binding in a lower layer - i.e. this is an override mechanism that is used to “repair” things that are broken, or ensure that certain bindings can not be specified by contributions from the lower layers. More simply put; the simplest use-case is to allow you to override the contributions from the modules at the site level.
The number of layers to use is up to you; maybe you want your organization’s modules to have higher priority than public ones, maybe you have some that are more sensitive than others etc.
A layer is specified with:
- name - a name that is used only in information output (error messages etc.) - it has no effect on the composition of data.
- include - a list of bindings provider URIs that denote the bindings to include
- exclude - a list of bindings provider URIs that should not be included
The version has has to be specified, but the remaining sections are optional and will get default values. If a section is given it must be given in full.
The bindings provider URI is specific to the used scheme. In the first implementation there are four such schemes:
confdir-hiera
module-hiera
confdir
module
The confdir-hiera
scheme is used with a path that is relative to the confdir. The path should appoint a
directory where a hiera.yaml
file is located. If the hiera.yaml
for the site is located in the same directory as
the binder_config.yaml
, the entry would simply be confdir-hiera:/
The module-hiera
scheme is used with a path where the first entry on the path is the name of the module,
or a wildcard *
denoting all-modules. When using the wildcard, those contributions that have a hiera.yaml
in
the appointed directory will be included (those that do not are simply ignored). The path following the module name
is relative to the module root.
Thus, the URI module-hiera:/*
uses the hiera.yaml
in the root of every module. The URI module-hiera:/*/windows
loads from every modules' windows directory, etc.
It is expected that include
is used with broad patterns, and that a handful of exclusions are made (broken/bad module
data, or data that simply should not be used in a particular configuration).
When the URI contains a wildcard, and there is no hiera.yaml
present that entry is just ignored. When an explicit
URI is used it is an error if the config file (or indeed the module) is missing.
In the first release of data-in-modules, the module
and confdir
schemes are only used to refer to bindings
made in Ruby. In later releases these schemes will be used to also refer to bindings defined in the Puppet Language
(as explored in ARM-8 Puppet Bindings). You can safely skip the rest of this section on a first read through if you
want to concentrate on the features provided when only using the much simpler Hiera-2 structures.
The module
and confdir
schemes use symbolic references - e.g. module:/modulename::bindingsname
. In the first
implementation it is possible to define such bindings in Ruby by placing a file matching the qualified name under
lib/puppet/bindings
- in the example <module>/lib/puppet/bindings/modulename/bindingsname.rb
and then defining the
binding using the Ruby class Puppet::Bindings
like this:
Puppet::Bindings.newbindings("modulename::bindingsname") do
bind.name("modulename::foo::bar").to(42)
end
This way of binding is much more powerful than the Hiera-syntax as the full range of features available in the Puppet Bindings system can be used.
The intention is to make the same capabilities available with concrete Puppet Language syntax as described in ARM-8.
The module
and confdir
schemes differ in that the confdir
scheme does not support module wildcards, and internally
they handle optionality differently (they need to be able to determine if there is a file to load or not). For both
module- and confdir- schemes, the loading is done the same way - confdir:/foo::bar
and module:/foo::bar
is the same
named bindings, the only difference is the check for optionality (This is not ideal and should be addressed).
To make inclusion more flexible, it is possible to define that an (explicit) URI is optional - this means that it is
ignored if the URI can not be resolved. The optionality is expressed using a URI query of ?optional
. As an example,
if the module ‘devdata
’ is present its contributions should be used, otherwise ignored, is expressed as
module-hiera:/devdata?optional
.
Implicit lookup of class parameters is done by first consulting the Puppet Bindings and then Hiera-1 (if present).
Bindings are done the same way as in Hiera-1 - i.e. binding <fully qualified class name>
::
<parameter-name>
to a value.
Explicit lookup is done with (surprise...) the function lookup
. It takes one or two parameters, the first is always the
name to lookup, and the second optional parameter is a type description (a string) that when used will assert that the
returned data is type compliant.
lookup('apache::port')
lookup('apache::port', 'Integer')
lookup('mything::list_of_users', 'Array[String]')
If the produced value does not comply with the given type an error is raised.
The type system contains both concrete types; Integer
, Float
, Boolean
, String
, Pattern
(regular expression pattern),
Array
, Hash
, and Ruby
(represents a type in the Ruby type system - i.e. a Class), as well as abstract
types Literal
, Data
, Collection
, and Object
. The Array
and Hash
types
are parameterized, Array[V]
, and Hash[K,V]
, where if K
is omitted, it defaults to Literal
, and if V
is omitted,
it defaults to Data
.
The Ruby
type (i.e. representing a Ruby class not represented by any of the other types) does not have much
value in puppet manifests but is valuable when describing bindings of puppet extensions.
The Ruby type is parameterized with a string denoting the class name - i.e. Ruby['Puppet::Bindings']
is a valid type.
The abstract types are:
Literal
-Integer
,Float
,Boolean
,String
,Pattern
Data
- anyLiteral
,Array[Data]
, orHash[Literal, Data]
Collection
- anyArray
orHash
Object
- any type
The bindings system uses the type system to infer the type of bound data, as key when performing a lookup, and to assert type compliance of produced results.
This section contains a series of examples starting with the simplest possible configuration that makes use of data in modules.
NOTE: In order to activate the "data-in-modules" and "Hiera-2" it is required to:
- Have the gem rgen >= 0.6.5 installed
- Use one of these settings (in puppet's config or made from the command line).
Neither is on by default in Puppet 3.3.x
--binder
--parser future
Note: The opt in with --binder
basically exists to avoid adding a required runtime dependency to Rgen for
those that upgrade their puppet installation (i.e. it would in a way be a breaking change). The future parser
however already depends on Rgen so it also turns on data-in-modules. The future parser will also always use the
binding systems to handle extensions (e.g. the puppet templates and puppet heredoc makes use of extensions for syntax
checking; the bindings/injector is not just for application data).
First, --binder
or --parser future
must be in effect.
Then, in order for a module to be able to contribute bindings (i.e. data) using Hiera-2 and the default configuration,
the module needs to have a hiera.yaml
in its root directory. (This is not entirely true; to opt in using bindings
expressed in Ruby there is no need to have a hiera.yaml
file). Also as you will see later, you can modify the overall
composition to place the module's hiera.yaml
elsewhere, or have alternatives for the user to choose from. The convention
is that the defaults for a module should be in the root of the module, and optional/alternative configurations placed elsewhere.
This hiera.yaml
is needed because it:
- indicates that there is data that can be used in the module
- tells the system how the data is organized
Hiera-2 has sane defaults, so all that is required is a file in the root of the module
hiera.yaml:
---
version: 2
Yes, that is all that is needed to be able to express common bindings (and bindings for osfamily).
The default configuration declares that all data is in a directory called data
, and that common is
in data/common{.json, .yaml}
, and osfamily is in data/osfamily/<osfamily>{.json, .yaml}
.
You also get both yaml
and json
backends by default where yaml
has higher priority.
At this point, we have not defined any data in the module, but this can now be added.
To define that 'has_funny_hat' should have the value 'the pope' in general we add the following file:
data/common.yaml:
---
has_funny_hat: 'the pope'
You can now obtain this data anywhere in your puppet logic, here is quick way to check:
puppet apply -e 'notice lookup(has_funny_hat)'
which results in this output
Notice: Scope(Class[main]): the pope
Now, when running on OS X (osfamily "Darwin"), we want the value to be different. Since the default already places the category "osfamily" at higher priority than "common" and has defined where it is supposed to be found, we simply add the following file/content.
data/osfamily/darwin.yaml:
---
has_funny_hat: 'steve martin'
and then run:
puppet apply -e 'notice lookup(has_funny_hat)'
the output is:
Notice: Scope(Class[main]): steve martin
That is, if you are running this on an OS X machine (Darwin osfamily), otherwise you would get 'the pope'.
It is expected that almost all modules only contain default data, or default data for an os family. This because in general modules (especially those published by others) cannot know anything about the names of your environments, nor the names/identities of your nodes, data centers etc.
If you however have your own modules you may want to add data for a category with values specific for your organization.
When adding a category, you need to specify the entire section (i.e. hierarchy since this is a hiera configuration).
hiera.yaml:
---
version: 2
hierarchy:
[['osfamily', '$osfamily', 'data/osfamily/$osfamily'],
['environment', '$environment', 'data/env/$environment'],
['common', 'true', 'data/common']
]
Since "environment" (and also "node") are already defined in the default binder_config.yaml
, this is the only
change we have to make in order to define data for a specific environment. We can now add a data file for an environment:
data/env/production.yaml:
---
has_funny_hat: 'comedians'
Now, we get 'comedians' as default when running in production (except for 'darwin' where we get 'steve martin').
Note that the hiera.yaml
only needs to list the categories/hierarchies that it is using, and those that are
used have to be present in the binder_config.yaml
, and have the same relative order (i.e. you are allowed to skip
categories that are not in use, but not change their relative order).
When doing a lookup it is possible to also state the expected return type, which will be asserted by the lookup function (i.e. raise an error if the result does not comply with the specified type).
To assert that the result is a String:
lookup(has_funny_hat, 'String')
To assert that the result is an array of integers:
lookup(port_numbers, 'Array[Integer]')
To assert that the result is a Hash with String keys and values being arrays of integers:
lookup(named_port_numbers, 'Hash[String, Array[Integer]]')
and so on...
The type specification is one of:
- the basic types;
Integer
,String
,Float
,Boolean
, orPattern
' (regular expression) - an
Array
with an optional element type given in[]
, that when not given defaults to[Data]
. - a
Hash
with optional key and value types given in[]
, where key type defaults toLiteral
and value toData
, if only one type is given, the key defaults to
Literal
. - the abstract type
Literal
which is one of the basic types - the abstract type
Data
which isLiteral
, or type compatible withArray[Data]
, orHash[Literal, Data]
. - the abstract type
Collection
which isArray
orHash
of any element type. - the abstract type
Object
which represents an object of any type
Just like in Hiera-1, it is possible to bind default values for parameterized classes. This now also works for data in modules. The fully qualified name of the parameter should be used when binding a value - e.g:
'apache::port': 80
Which binds the parameter port to 80 for the class apache.
Note: There is no limitation on where parameters are defined; one module may provide default parameters for classes in other modules. (Which leads us to the next example).
If you are looking something up and want to use a default specified in your puppet code if the lookup did not result in anything you can do this by...
In puppet 3 by something similar to:
$x = lookup('something')
$looked_up = $x ? { undef => 'nothing', default => $x }
When --parser future
is used, a lambda can instead be used:
# lookup and provide default $looked_up = lookup('something') |$result| { if $result {$result} else {'nothing'} } # lookup and fail if not defined $looked_up = lookup('something') |$result| { unless $result { error('data for something is missing')} $result }
The advantage of using a lambda is that there is no leakage of an extra variable, in the first example above, both $x
, and $looked_up
are
externally referenceable if they are in a class. When using a lambda, its variables are always local.
If we find that two modules define a value for the same name; say 'apache::port' - which one should be used if they are defined with the same priority?
We can solve this in two ways, have a module that binds a value at a higher priority, or more practical, bind a value at the site level; since these bindings by default override all bindings contributed from modules.
If Hiera-1 is not in use, simply add a hiera.yaml
in the environment/confdir root:
hiera.yaml:
---
version: 2
and add the file:
data/common.yaml:
---
'apache::port': 8080
By doing so, we have now overridden the (conflicting) bindings from the modules layer.
We may want to keep the bindings done for the sole purpose of overriding modules in a separate layer. The reasons could be that we do not want to mix them up with the regular site level bindings as it can become difficult to entangle them later (not knowing why certain bindings have been made).
We do this, by creating a bindings_config.yaml
in the confdir as we can no longer use the default configuration,
and specify it like this (a comment shows what is added to the default):
bindings_config.yaml:
---
version: 1
layers:
[{name: site, include: 'confdir-hiera:/'},
{name: overrides, include: 'confdir-hiera:/overrides'}, # <--added
{name: modules,
include: ['module-hiera:/*', 'module:/*::default']}
]
We do not have to enter the categories, since we are not going to change those from the default.
We can now define the data for overrides under 'overrides'. This is done by adding:
overrides/hiera.yaml:
---
version: 2
overrides/data/common.yaml:
---
'apache::port': 8080
We find that there is something wrong with a particular module. It may have syntax errors in the data or it has definitions of data that we are not happy with in general - in short, we just do not want the data it contributes by default.
To do this, we need a bindings_config.yaml
(the default does not exclude anything). As an example, assume
we want to exclude the default Hiera-2 data from module 'bad'
bindings_config.yaml:
---
version: 1
layers:
[{name: site, include: 'confdir-hiera:/'},
{name: overrides, include: 'confdir-hiera:/overrides'}
{name: modules,
include: ['module-hiera:/*', 'module:/*::default'],
exclude: 'module-hiera:/bad'}
]
If you are publishing a complex module - say one that is configurable to be used in combination with other modules or stand alone and you want to help users consume it by including different sets of data.
This is easily done by having one default (say for standalone) configuration and one for each alternative. You could organize this like this:
<module> |-- hiera.yaml |-- data |-- |-- default | | |-- common.yaml |-- |-- for_x | | |-- hiera.yaml | | |-- data | | | |-- common.yaml |-- |-- for_y | | |-- hiera.yaml | | |-- data | | | |-- common.yaml
Here, the <module>/hiera.yaml
uses paths that points to 'data/default
',and the hiera.yaml
in 'for_x
' and 'for_y
' uses path starting with 'data
' (i.e. relative to the respective hiera.yaml
file).
The documentation for the module needs to explain that it comes with different configuration options. The user of the module then selects which one to use (or several at the same time if that makes sense).
If there is a default contribution that is mutually exclusive the user needs to add it to the excluded list
before including one of the alternatives. The user may end up with these two entries in their modules layer
in bindings_config.yaml
:
include: [... , 'module-hiera:/the_module/for_x' ],
exclude: [... , 'module-hiera:/the_module]
If you have complex data you may want to consider defining this in ruby instead as you can then define everything
per option - i.e. default.rb
, for_x.rb
, and for_y.rb
and have them all in the same place.
The Puppet Binding system is capable of binding data in more advanced ways than what can be expressed with simple Hiera definitions. (Adding these capabilities to hiera data would make the regular (straight forward) cases far more complicated to enter. Instead, the capability to define bindings in Ruby was added. Later it is expected that the same capabilities are made available directly in the Puppet Language based on the ideas in ARM-8).
You have already seen the URIs module: and confdir: in the examples above, and you may have wondered what they refer to. These schemes use symbolic names rather than paths, and they are capable of loading ruby logic using conventional ruby name to path conversion.
The only difference between the two schemes is that the module scheme supports wildcard notation for the module name.
You do not have to modify the default configuration to opt-in since the defaults are to include all ruby based default bindings from modules and the confdir. All that is needed is for a ruby file defining bindings to be in the expected location with the expected name.
The ruby bindings are loaded using the auto loader, and may thus be loaded from the confdir, modules on the module path, gems, or directories on the configurable load PATH.
In our module we can define a data binding using ruby.
<module>/lib/puppet/bindings/mymodule/default.rb:
Puppet::Bindings.newbindings('mymodule::default') do bind.name('has_funny_hat').to('the_pope') end
This can also be written as:
Puppet::Bindings.newbindings('mymodule::default') do bind { name 'has_funny_hat' to 'the_pope' } end
The location in the file system and the name of the resulting binding must match. By convention, all bindings from modules should have a fully qualified name that starts with the module name. For bindings in the confdir/an-environment care must be taken with naming; using either unqualified names (like default.rb), or a name that does not clash with any module names.
In ruby, we do not have to store data separately for the different categories, instead we can organize the data per concern (i.e. have the bindings that have a common purpose in one file, and bindings for a different purpose in another file).
The examples above where different people got to be the person having a funny hat can all be written in one place.
Puppet::Bindings.newbindings('mymodule::default') do bind { name 'has_funny_hat' to 'the_pope' } when_in_category('osfamily', 'darwin') { bind { name 'has_funny_hat' to 'steve martin' } } when_in_category('environment', 'production') { bind { name 'has_funny_hat' to 'comedians' } } end
You can use Multibinding to collect data into an array or a hash. To do this the multibind is first declared, and the bindings that should be included are contributed by stating that they are made in the scope of the multibind. It may sound more complicated than what it is... here is an example:
Puppet::Bindings.newbindings('mymodule::default') do multibind('collected-users') do name 'users' hash_of_data #determines the resulting type end bind.in_multibind('collected_users') do name 'fred' to 'Fred Flintstone' end bind.in_multibind('collected-users') do name 'mary' to 'Mary Poppins' end end
As you may have guessed, the individual contributions can have different priority, and the binding with the highest priority of a given contributed key wins.
The multibindings take additional parameters that are passed as options (default in parentheses) - these are the options for a hash based multibinding:
-
:conflict_resolution
(:priority
) is one of:error
,:merge
,:append
,:priority
,:ignore
ignore
, the first found highest priority contribution is used, the rest are ignorederror
, any duplicate key is an errorappend
, element type must be compatible with Array, makes elements be arrays and appends all foundmerge
element type must be compatible with hash, merges hashes with retention of highest priority hash contentpriority
, the first found highest priority contribution is used, duplicates with same priority raises and error, the rest are ignored.
-
:flatten
(false
), If appended elements should be flattened. If argument is a number it is used as the max level of flattening, a value oftrue
is the same as:flatten => -1
-
:uniq
(false
), If appended result should be made unique (i.e. remove duplicates) -
:transformer
(nil
), A Ruby or Puppet lambda that is called to transform the produced value before it is returned to the caller. The Ruby lambda gets scope and value, the puppet lambda gets the value. (All producers support this option)
Here is an example that collects all entries bound in 'collected-users' in a resulting merged hash.
Puppet::Bindings.newbindings('mymodule::default') do multibind('collected-users') do name 'users' hash_of_data producer_options { :conflict_resolution => merge } end
This example collects and merges users like the previous example, but removes all users in the 'banned-users' list of users.
Puppet::Bindings.newbindings('mymodule::default') do multibind('collected-users') do name 'users' hash_of_data producer_options { :conflict_resolution => merge :transformer => lambda do |scope, value| banned = scope.compiler.injector.lookup('banned-user') value.delete_if {|k,v| banned.include? k } value end } end
And yes, you can contribute multibinds to mutibinds if you need several levels of collection.
For completeness, here are the options for an array based multibind:
-
:uniq
(false
) if collected result should be post-processed to contain only unique entries -
:flatten
(false
) if collected result should be post-processed so all contained arrays are flattened. May be set to an Integer value to indicate the level of recursion (-1 is endless, 0 is none). -
:priority_on_named
(true
) if highest precedented named element should win or if all should be included -
:priority_on_unnamed
(false
) if highest precedented unnamed element should win or if all should be included -
:transformer
(nil
), A ruby or puppet lambda that is called to transform the produced value before it is returned to the caller. The ruby lambda gets scope and value, the puppet lambda gets the value. (All producers support this option)
Say you want myclass::service::port
to have the same value as what is bound to apache::port
. This is how you achieve this:
Puppet::Bindings.newbindings('mymodule::default') do bind do name 'myclass::service::port' to_lookup_of(type_factory.integer, 'apache::port') end end
A user can naturally bind 'myclass::service::port
' to something else with higher priority.
Say you want to bind the port parameter to either apache::port
, or nginx::port
(you know one will be used, but not both).
Here is how you do this:
Puppet::Bindings.newbindings('mymodule::default') do bind do name 'myclass::service::port' integer to_first_found('apache::port', 'nginx::port') end end
The list to first_found
may be an array of arrays specifying type, name combination. We did not need this in the
example above, since the type of the binding is integer, and an assertion is made that the result complies with the type.
We could have written the longer:
to_first_found([T.integer, 'apache::port'], [T.integer, 'nginx::port'])
The result would be the same if there is no error, but in the later case an error is reported for the looked up
(apache
, or nginx
) instead of for the 'myclass::service::port
' lookup.
You may also provide a default value using a transformer option (i.e. if neither of apache::port
not nginx::port
were defined, the example below produces the value 80).
Puppet::Bindings.newbindings('mymodule::default') do bind do name 'myclass::service::port' integer to_first_found('apache::port', 'nginx::port') producer_options(:transformer => lambda do |scope, value| value or 80 end end end
The following sections has information if you want to dig deeper into the bindings system. This information is intended for Ruby developers.
The block given to the Puppet::Bindings.newbindings
method is evaluated in an anonymous class inheriting from
Puppet::Pops::Binder::BindingsFactory::BindingsContainerBuiilder
. This enables calls to bind
and when_in_category
and a few others.
When calling bind
, which also accepts a block, the evaluation takes place in a BindingsBuilder
.
This class has the methods name
, type
, to
, and a dozen other for more special forms of bindings.
The result produced by the block, must respond to the method model
, which should return an instance
of Puppet::Pops::Binder::Bindings::NamedBindings
. This enables you to call out to whatever logic you want
to produce the result. As an example, it could call to some external service and transform the result into a
bindings model (most easily done using the BindingsFactory
).
There are several ways to extend Data in Modules with custom implementations in Ruby. This section is intended as an overview of the capabilities.
- Custom Hiera-2 backend
- Custom bindings scheme handler
- Custom producers
Custom Hiera-2 backends are implemented in Ruby and have a very simple API; given a directory and a filename (without extension), its responsibility is to produce a Hash.
Implementing a Hiera-2 backend is easy, but does not give access to the more powerful features in the bindings system. You can use this if you need to read simple name to value definitions in some file format.
Here is a toy backend that binds an echo of the input parameters. The example below should be placed in a module
(in the example, in the module 'awesome'). The method read_data
should return a Hash with the wanted map of names to values.
<module-root>/lib/puppetx/awesome/echo_backend.rb:
require 'puppetx/puppet/hiera2_backend' module Puppetx::Awesome class EchoBackend < Puppetx::Puppet::Hiera2Backend def read_data(directory, file_name) {"echo::#{file_name}" => "echo... #{File.basename(directory)}/#{file_name}"} end end end
This custom backend is registered in the binder_config.yaml
like this:
---
# categories, and layers
# ...
extensions:
hiera_backends:
echo: 'Puppetx::Awesome::EchoBackend'
Which can then be included in a hiera.yaml
under the name 'echo
' like this:
---
hierarchy:
- ['node', '$fqdn', '$fqdn']
- ['common', 'true', 'common']
backends:
- yaml
- json
- echo
Now, when executed for the node 'localhost
' and a lookup is made of 'echo::common
' or 'echo::localhost
',
the result is "echo... awesome/common
", "echo... awesome/localhost
".
(Yes, this is a toy example, the purpose is to show the API).
A custom bindings scheme handler is responsible for interpreting an URI such as module:/foo::bar
and turning it
into bindings. The scheme handler's responsibility is to process URIs for inclusion and exclusion when
wildcards/query/optionality is supported, and to produce a bindings model. (The example does not show wildcard expansion).
Here is a toy echo scheme handler example:
require 'puppetx/puppet/bindings_scheme_handler' module Puppetx::Awesome # A binding scheme that echoes its path # 'echo:/quick/brown/fox' becomes: # '::quick::brown::fox' => 'echo: quick brown fox'. class EchoSchemeHandler < Puppetx::Puppet::BindingsSchemeHandler def contributed_bindings(uri, scope, composer) factory = ::Puppet::Pops::Binder::BindingsFactory bindings = factory.named_bindings("echo") bindings.bind.name(uri.path.gsub(/\//, '::')) .to("echo: #{uri.path.gsub(/\//, ' ').strip!}") result = factory.contributed_bindings("echo", bindings.model, nil) end end end
The new 'echo
' scheme is registered in binder_config.yaml
under the key extensions
, scheme_handlers
.
This example also shows the use of the echo scheme by including it in a layer.
---
version: 1
# categories
layers: [ ...
[name: testing, include: 'echo:/quick/brown/fox']
]
extensions:
scheme_handlers:
echo: 'Puppetx::Awesome::EchoSchemeHandler'
With the content above, a lookup of '::quick::brown::fox'
produces 'echo: quick brown fox'
.
(Yes, this is a toy example, the purpose is to show the API).
For a full description, look at the yard documentation for the classes.
Both the Hiera-2 backend, and scheme handler extensions, deal with the production of bindings. These bindings are created up-front before catalog compilation and they are therefore somewhat static in nature. While it is possible to bind a puppet expression (that is evaluated either before compilation starts, or on each lookup), there may be the need to create a reusable value producer to keep the amount of repeated logic down, or to provide features not already available from one of the existing producers.
Note that it is only possible to bind a name to a custom producer when defining the bindings in Ruby. The Hiera-2 backends only provide binding to constant values, and to string interpolation.
Using a custom producer is the most powerful way to integrate into the bindings system, but since it is feature rich also a bit more complex. Examples of custom producers could be a producer that maintains a connection to another system and that has a custom method of producing a value, other producers lookup this producer to use its API to perform remote lookup. A custom producer could produce a series of unique values, or produce the same unique value after having produced it once, etc. However, don't worry if this seems complex - these features are mainly available for integrators. For users of custom producers, it is simply the matter of giving the name of the producer, and declaring the options; i.e. not more difficult that calling a function.
While the producer sub-system has many features a simple producer implementation is quite small. All that is
needed is a class derived from Puppet::Pops::Binder::Producers::Producer
that implements the
method internal_produce(scope)
, and that this method returns the "looked up value". The producer will
also need to implement the method initialize
which is given information about the context where this producer
is being used, arguments defined in the bindings (the producer options), and the binding itself. A simple producer
that returns a constant value may look like this:
class MyConstantProducer < Puppet::Pops::Binder::Producers::Producer def initialize(injector, binding, scope, options) super @value = options[:value] end def internal_produce(scope) @value end
When used from Ruby:
Puppet::Bindings.newbindings('mymodule::default') do bind { name 'has_funny_hat' to_producer(MyConstantProducer) producer_options(:value => 'the_pope') } end
If you are interested in the full capabilities, you will need to study the source of the Producer class and its subclasses and read the yard documentation for the classes.
If your Hiera-1 configuration is in hiera.yaml
, the confdir-hiera:
scheme will detect that it is not
Hiera-2-compliant and silently ignore it (and instead letting the existing Hiera-1 deal with the data).
If you want to use both Hiera-1 and -2 at the same time at the site level, simply alter the configuration
for Hiera-1 by using one of the alternatives:
By moving the Hiera 1 configuration:
- rename
$confdir/hiera.yaml
to$confdir/hiera1.yaml
- change the setting
:hiera_config
to$confdir/hiera1.yaml
- add a
$confdir/hiera.yaml
for hiera-2, i.e. a file that hasversion: 2
By changing the bindings_config.yaml
- modify the site layer to reference
confdir-hiera:/hiera2
instead of the defaultconfdir-hiera:/
- put a
hiera.yaml
in$confdir/hiera2
No. This has to work the same way across all modules. The name of the file signals if it is contributing data or not.
An alternative would be to alter the meta-data in the module. This may be a future feature.
Yes. Hiera-1 continues to work as before. The hiera functions will not lookup in bindings, and the lookup function will not lookup using Hiera-1.
Parameterized classes will get their default from bindings first, and from Hiera-1 second (if present).
Yes, this is possible, you can even call lookup, or one of the Hiera-1 lookup functions if you like. (It protects you from doing endless recursion).
The data files are compatible unless you use interpolation, in which case you need to change from the %{}
syntax in Hiera-1, to
the Puppet DSL syntax ${}
in Hiera-2. Also, if you rely on Hiera-1 lookups that use something other
than priority lookup (e.g. merge
, merge deeper
, etc.) you will not get the same result from the
lookup function. If you need this, you should either stay with Hiera-1, or use the more advanced ruby bindings.
The reason for this is that decision about how data gets composed is defined in the bindings, not when looking up.
No. Not yet at least.
No. Not yet at least.
Say I want osfamily 'darwin' to have different values for environments 'test' and 'production', how do I specify that?
There are two ways; by adding a combined category, or by defining the binding in Ruby.
Adding a combined category requires you to:
- add the hierarchy to hiera.yaml, e.g.
['os-env', '${osfamily}-${env}', 'data/os/${osfamily}/${env}']
- add the category 'os-env' to the
bindings_config.yaml
to indicate how this category is prioritized against all others (across all modules). - write the data in
data/os/darwin/production.yaml
Or in Ruby:
Puppet::Bindings.newbindings('mymodule::default') do when_in_categories({ 'environment' => 'production', 'osfamily' => 'darwin'}) do bind { name 'has_funny_hat' to 'the pope' } end end
When doing this in Ruby, the result is automatically bound at a priority higher than the highest of the given categories, but lower than the next higher category.
Yes, in Ruby, but not in Hiera-2.
bind.name('mypattern').to(/[a-z]+/)
Note: at the moment, the regular expression value is not that useful as match in the puppet language does not accept an expression on the RHS, but it can be used with custom resource types that knows how to deal with a regexp, or be passed to a custom function.
Yes, this is available on the Puppet master branch.
Yes, bindings schemes can be defined in the bindings_config.yaml
Yes, Hiera-2 backends can be defined in the bindings_config.yaml
As an alternative you can instead contribute bindings directly using Puppet::Bindings::newbindings
.
You will get more flexibility if instead writing a scheme handler as you can then control what the user can specify as query parameters, fragments, etc. in the URI. A backend is only recommended if you want to provide something that is path based and loads from the file system.
No, all the bindings are computed first. Once computed they are immutable. Calling functions happens much later. If you really need this, you may want to implement a Producer since they can have lazy evaluation.
Yes, in Ruby, and accessing the scope (peek at the top scope variables and then define bindings suitable for the particular catalog compilation). Also see "Can scope be accessed in newbindings?" below.
No. Access to variables interpolated into looked up data are always in top scope only. Anything else depends on the parse order.
Yes, it is passed as a parameter for the block - e.g:
Puppet::Bindings.newbindings('mymodule::default') do |scope| # logic end
The scope is always topscope. It may be used to only produce bindings that are relevant for the current compilation.
For small sets of data it is not meaningful to filter; simply produce all the bindings. If you however have very large data sets, or do something advanced like calling an external service to get data, you may want to limit the request to the particular node, osfamily etc. and not send information about every possible node over the wire in the response.
Nothing special is required to test and evaluate.
Continued use of Hiera-1 search based algorithm was considered, but rejected because of poor composability traits.
No particular risks identified.
The implementation of this ARM is a subset of the proposed implementation described in ARM-8. The implementation requires RGen.
How will this work impact other parts of the platform, the product, and the contributors working on them? Omit any irrelevant items.
- Other Puppet components: External tools that visualize or edit "data"
- Compatibility: The implementation is backwards compatibile wrt. Hiera-1
- Documentation: There is quite a lot to document
- Spin-offs/Future work: Continuing with concrete syntax for bindings (i.e. ARM.8).