Within Puppet we use modules to describe specific technical components which we want to configure on a system.
This can by achieved either by upstream library modules (some refer to these as component modules) which can be found on Puppet Forge or by self written Puppet code which we usually refer to as technical implementation profiles.
Since Puppet is a client server model, the server must be aware of each node and must know which classes a node needs. This process is called node classification.
Within Puppet there are several ways how nodes can be classified.
This article describes the classical node classification and its limitations.
We then demonstrate usage and examples for a more sophisticated hiera data driven node classification process.
Classical classification concepts
The Puppet server uses the manifest
configuration option to check for a directory in which the Puppet server expects the site.pp
file.
The default setting has the following value: /etc/puppetlabs/code/environments/production/manifests
Within the site.pp
file (and any other *.pp file in this directory or any subdirectory) we can add Puppet code for node classification.
Node resource
The most simple approach is the usage of the node
resource type. The node resource uses the Puppet agent certificate DN as identifier. It is also possible to use a regular expression
# manifests/site.pp
...
node 'app1web03-dev.domain.tld' {
include profile::base
include profile::accounts::dev
include profile::webserver::nginx
include profile::application::app01
}
...
Please note that Puppet Server must receive a node classification object.
Therefore we can add an empty default fallback node to site.pp:
node default {}
The node resource is a simple solution for small environments with a couple of nodes only.
When it comes to larger infrastructure this approach is very time consuming to maintain - even when we make use of the possibility to group servers in individual files and directories.
Role classification
In site.pp
file we can also add any Puppet code like querying for specific Puppet agent facts or other data.
If we add a fact called role
to our Puppet agents, we can use the fact to include role classes:
# manifests/site.pp
include "role::${facts['role']}"
The role pattern makes sense if you have larger groups of servers which must be configured identically.
On the other hand every node with individual configuration must receive its own role.
In case that an infrastructure consists of many different roles, this concepts becomes very time intensive to maintain.
External Node Classifier
Puppet is able to make use of other tools for node classification. These tools are called External Node Classifiers (ENC). Puppet Enterprise and Foreman make use of this feature.
Attention: Please note that the ENC is an add-on to the node resource classification and not a replacement!
If a node is classified in ENC and within manifests directory, both classification objects are used.
To configure Puppet to make use of an ENC script one must add the following two configuration options to puppet.conf:
[master]
node_terminus = exec
external_nodes = /usr/local/bin/enc
Puppet Server runs the command specified via external_nodes
and passes the client’s certname to the script.
The Puppet Server user executes the command which can be written in any language, query remote services for data (query a web API, a Database, check file contents) and has to return YAML output for the given certname:
---
environment: production
classes:
profile::base:
time_servers: ['time.domain.tld']
profile::accounts::dev: {}
profile::webserver::nginx: {}
profile::application::app01: {}
Within the classes
section the Puppet Server expects an array or a has of classes to include for that node. When using hashes, one can also pass a data sub-hash for class parameters.
Within the optional environment
key we can force an agent to use the specified Puppet environment.
The following is an enc example shell script, which uses files in a directory:
# /usr/local/bin/enc
#!/bin/bash
if [ -e /etc/enc/nodes/$1.yaml ]; then
cat /etc/enc/nodes/$1.yaml
else
cat /etc/enc/default.yaml
fi
The Puppet ENC classification is useful if you want to offer the possibility to add new nodes or change classification on existing nodes with a separate tool.
This is an elegant solution to separate Puppet and the classification process.
While this allows one to develop a solution with unlimited complexity, we always recommend to keep the classification process as simple as maximum possible and as complex as minimal required.
An ENC should return an answer very fast. In larger environments one must also consider load and performance. When using remote systems, one wants to ensure high availability and high performance.
Hiera
Hiera is the Puppet built-in data backend in which we usually add parameters which differ within the infrastructure, e.g. servers in datacenter A use a different DNS server setting than servers in datacenter B.
Within hiera configuration file we can specify different layers of hierarchies. We usually recommend the following approach:
- Node specific data
- Application and stage data
- Location or network zone data
- OS specific data (only if needed)
- Common or global data
More information about Hiera can be found on the Puppet Hiera website.
One can place any kind of key value pairs to the hiera yaml files.
This allows us to also use hiera for node classification by querying a specific key using the Puppet lookup function.
Within the lookup function we can specify several default values:
- key name
- expected data type
- merge behavior
Hiera Array
To benefit most from hiera one should make use of adding classifications into specific hiera levels (application1-dev, net-dmz, os-version, ...).
Usually hiera will return the first value found when iterating over the hierarchies. This can be overwritten by specifying the merge behavior:
lookup( {
'name' => 'classes',
'value_type' => Array,
'default_value' => [],
'merge' => {
'strategy' => 'unique',
},
} ).each | $c | {
# Note: we can not use the variable `$class` as this is a reserved word!
include $c
}
Within the data hierarchies we can then add the 'classes' key where needed:
# common.yaml
---
classes:
- profile::base
# os/CentOS.yaml
---
classes:
- profile::base::centos
# stage/dev.yaml
---
classes:
- profile::accounts::dev
# application/app01.yaml
---
classes:
- profile::webserver::nginx
- profile::application::app01
Another nice feature built into hiera is the possibility to set the lookup behavior within hiera data itself.
One can place the merge behavior into common.yaml and remove it from lookup function:
# common.yaml
---
lookup_options:
'classes':
merge:
strategy: 'unique'
We can now remove the merge key from the lookup function
# manifests/site.pp
lookup( {
'name' => 'classes',
'value_type' => Array,
'default_value' => []
}).each | $c | {
include $c
}
If you need to configure a single system where things are different, you can use the lookup_options on a higher hierarchy.
e.g. only add some specific users to a system, but not all users from common or any lower layer than the node data layer.
In this case you can add the lookup_option to the node hierarchy:
# node/hr_server.domain.tld.yaml
---
lookup_options:
'classes':
merge:
strategy: 'first'
classes:
- profile::base::hr
- profile::accounts::hr
Please note that all classifications from other hierarchies must be added at this location.
Hiera Hash
An even more sophisticated option is the usage of hashes instead of arrays, which allow overrides and exceptions to the list of classes.
We must adopt two settings:
common.yaml: switch the merge strategy from unique to deep
---
lookup_options:
'classes':
merge:
strategy: 'deep'
site.pp: switch value_type from Array to Hash and iterate using $key, $value
lookup( { 'name' => 'classes',
'value_type' => Hash,
'default_value' => {}
}).each | $key, $c | {
if $c =! '' {
include $c
}
}
Now we can set Hashes in hiera:
# common.yaml
---
classes:
base_class: 'profile::base'
# os yaml
---
classes:
security_class: 'profile::base::centos'
# stage yaml
---
classes:
accounts_class: 'profile::accounts::dev'
# app yaml
classes:
webserver_class: 'profile::webserver::nginx'
application_class: 'profile::application::app01'
The classes hash consists of keys which are string words and values which are class names.
The keys are used within hiera only, not within Puppet.
The key:value option allows us to override classification for a node from common classification:
# node yaml
---
classes:
webserver_class: 'profile::webserver::tomcat'
On this node we overwrite the webserver class to not use nginx, but using a tomcat class.
We can even decide to NOT manage a class key at all on a node (or group of nodes, according to where on Hiera we make the configuration):
---
classes:
application_class: ''
which overrides the classes defined in more general Hiera layers and uses an empty class instead. Within the Puppet Code we omit to include classes with empty names. Optionally one can make use of the notice function to log the information about an empty class hash element.
Hiera Multiple Hashes
Another option is to use different lookups for different purposes like common, os, application classes.
The following is an example for the usage of different Hiera Hash keys to identify different classification based on 'kernel' fact:
# common.yaml
---
lookup_options:
/.*_classes/:
merge:
strategy: 'deep'
# manifests/site.pp
$kernel_down = $facts['kernel'].downcase
lookup( { 'name' => "${kernel_down}_classes",
'value_type' => Hash,
'default_value' => {}
}).each | $key, $c | {
if $c != '' {
include $c
}
}
Now we can add the os specific classes key:
linux_classes:
hostname: 'profile::linux::hostname'
repo: 'profile::linux::repo'
sudo: 'profile::linux::sudo'
ssh: 'profile::linux::ssh'
mail: 'postfix'
webshop: 'profile::application::webshop::nginx'
windows_classes:
hostname: 'profile::windows::hostname'
hosts: 'profile::windows::hosts'
features: 'profile::windows::features'
time: 'profile::windows::time'
users: 'profile::windows::ad_auth'
webserver: 'iis'
We can even expand this hash driven lookup by implementing ordering using pre and post classes in combination with Puppet tags and build dependencies:
# manifests/site.pp
lookup( { 'name' => "pre_classes",
'value_type' => Hash,
'default_value' => {}
}).each | $key, $c | {
if $c != {
class { $c:
tags => 'pre',
}
}
}
# manifests/site.pp
$kernel_down = $facts['kernel'].downcase
lookup( { 'name' => "${kernel_down}_classes",
'value_type' => Hash,
'default_value' => {}
}).each | $key, $c | {
if $c != '' {
include $c
Class<| tags == 'pre' |> -> Class[$c]
}
}
This pattern even allows one to add application specific lookups.
Conclusion
Depending on your infrastructure and your requirements, you want to check which of the mentioned options are useful for you.
A more complex infrastructure needs a more sophisticated node classification process.
Due to the reason that node resource and roles are to limited for most setups, we usually recommend to make use of the more flexible Hiera based node classification.
Depending on the variations of servers and operatingsystems you might consider the array based classification for less complex infrastructure and use the hash based classification for infrastructures with a huge variety of operatingsystems, stages, applications, ...
The option to use multiple hashes is always useful in case that you have a complex infrastructure with several teams with split responsibility.
Here one can split the classification from base and applications by adding additional lookups for 'app_classes' or even 'pre_classes' and 'post_classes'.
Happy puppetizing,
Martin Alfke
Top comments (1)
Nice post Martin. Worth mentioning that the Hiera Multiple hashes approach is used and available out of the box, in example42's psick module.
It's enough to include (classify) the psick class and then manage classification via Hiera using the following keys (for each one an hash of classes cat be set):
Same entrypoints are available for
windows_classes
anddarwin_classes
.