Hiera can do anything

Page content

I have been meaning to write this Blog post regarding using Hiera as sort of an ENC for a very long time but I have never got round to it. Until On a recent trip to Melbourne I suffered from jet lag so finally got round to writing it.

I believe that the best solution to a problem is normally the simplest one. For example I recently did some work for a client and will use the solution to highlight a way of driving everything from Hiera and the date we lookup for it. This means that we need no ENC, no node definitions just puppet code and a smartly crafted hierarchy. This is just an example of how we did it to meet that clients requirements but with a few tweaks I feel it can meet 99% of all use cases very easily. I suggest that before you read these Blog post so that you are familiar with Hiera and create_resource:

  1. Installing and using Hiera
  2. Create_resource

Node Information

In order to classify a node you firstly need to know critical pieces of information about a node in our example we needed to know:

  1. Environment (Production|Staging|Test)
  2. Location (Physical location of the machine)
  3. Network (Database|Management|Web)
  4. Function (What the machine actually does for the business)

How do we gain this information, this could be looked up from a CMDB, such as racktables, ldap, mysql etc. In our particular example the client had all the necessary information that we need stored within the FQDN of the host. We need to present this is such a way that Hiera can use it to alter the information that is looked up. To do this I wrote a simple function that took the Certname (please see previous posts about why I choose Certname) and returns the data to allow Hiera to use the top level variable that we set using this.

The function looked something very similar to this:

module Puppet::Parser::Functions
newfunction(:lookupme, :type => :rvalue, :doc => "Return the
    type requested from the certname") do  |arguments|
  if (arguments.size != 1) then
    raise(Puppet::ParseError, "lookupme(): Wrong number of arguments "+ 
    "given #{arguments.size} for 1 argument required")
  type = arguments[0]
  rx = /([a-z]{2}[a-z0-9][a-z]?)[0-9]{2}\.([a-z])([a-z])\.([a-z]{3}[a-z0-9][0-9])/
  if match = rx.match(host)
      if type == "environment"
        case match[2]
        when "p"; env = "production"
        when "s"; env = "staging"
        when "t"; env = "test"
        else env = "unknown"
        return env
      elsif type == "location"
        return match[4]
      elsif type == "network"
        case match[3]
        when "a"; zone = "web"
        when "d"; zone = "database"
        when "m"; zone = "management"
        else zone = "unknown"
      elsif type == 'function'
        return match[1]
        fail("Invalid type requested")
      fail("Invalid fqdn unable to run rx on #{host}")

This uses a regex to return information about a host based of its Certname. For example pup01.pm.ukdc1.example.com would have a:

  • function of pup
  • environment of production
  • network of management
  • location of ukdc1

We can simple use this function within puppet to set some top level variables in site.pp like this:

$ourenvironment = lookupme('environment')
$ourlocation = lookupme('location')
$network = lookupme('network')
$ourfunction = lookupme('function')

Remember this function could have queried anything from a database to a rest api but for my example the company had all the required information contained within the Certname of the agent. Now we have these top level variables set we can use them within Hiera.

Hiera Hierarchy

Now that we have these top level variables set we can use them inside our hierarchy to alter how we lookup data from Hiera.

The Hiera config I used at the client is as follows:

   - yaml

 :logger: console

   - harddefaults
   - nodes/%{ourlocation}/%{ourenvironment}/%{clientcert}
   - functions/%{ourfunction}
   - network/%{ourlocation}/%{ournetwork}
   - environments/%{ourenvironment}
   - locations/%{ourlocation}
   - global
     - '/etc/puppetlabs/puppet/hieradata'

One thing that we need to understand is that we can override a key at any level. As long as we override it in the file above the one we need to override.

If you look at the hierarchy we have

  • harddefaults:
    • This is at the top of our hierarchy
    • These are values that should never ever get overridden at any other level
    • These are required to be the same throughout the infrastructure due to PCI DSS requirements.

An example of this file will look like this:

profile::password_exp_max_days: 25
profile::password_exp_min_days: 1
profile::password_exp_min_len: 7
profile::password_exp_warn_age: 7
profile::password_exp_inactive_age: 7

As you can see all our Hiera keys are prefixed with the module / class name so that they are compatible with data bindings introduced in Puppet 3.0

  • nodes/%{ourlocation}/%{ourenvironment}/%{clientcert}:
    • Specific machine configuration items
    • Overrides needed for a specific machine

An example of this file nodes/ukdc1/production/pup01.pm.ukdc1.example.com.yml:

    macaddress: '00:50:56:FF:3F:70'
    bootproto: 'static'
    ipaddress: ''
    netmask: ''
    ensure: 'up'

The only key we have is a hash containing the information needed to configured the networking for the specific machine. We can then use this with the function create_resources to configure the networking for the specific machine.

  • functions/%{ourfunction}:
    • Specific configured data for the machines function

An example of this file would be functions/pup.yml:

          - 'puppet::mastera'
          - 'puppetdb'

puppet::master::storeconfigs: true
ntp::force_datetime: true

This contains the array classes that contains the puppet classes needed to be applied to this machines function. In this case its a puppet master and a puppetdb server. We also override the ntp::foruce_datetime as we need to set the data and time correctly instantly rather than skewing the clock, due to this machine having the puppet master and puppetdb function that are very picky about time.

  • network/%{ourlocation}/%{ournetwork}:
    • Specific configuration need for the machine on a specific network

An example of this file would be network/ukdc1/management.yml

ssh::log::loglogins: true

This only contains three Hiera keys one for logging all ssh logins to this machine, another opening up the firewalls rules need to allow logins from the networks and finally the default gateway needed for the network. This data only changed between the environment that a machine is in we don’t care what function the machine is doing just what network and location the machine is in.

  • environments/%{ourenvironment}:
    • Specific configuration data for the environment the machine is in

An example of this file would be environments/production.yml

rsyslog::loggingtarget: logvip.p.ukdc1.example.com
apache::loglevel: error

I have trimmed this file to only two entry, this contains any configuration data needed for puppet modules in production. For example in production apache has a log level set to error and rsyslog transfers all its log data to the logging target. These will change between being a production and staging. We don’t care the location or function of the machine we just care about what environment the machine is in. So all machine what ever location or function they are doing if they are in production work and have apache deployed will have loglevel set to error.

But remember that we can override this value at any point above this in hierarchy, so if we need to change the loglevel for a pup machine we could or even for a specific host we could.

  • locations/%{ourlocation}
    • Physical location dependant information

An example of this file would be location/ukdc1.yml

                                   - ''
                                   - ''
                                   - ''
                          - ''
                          - ''
                          - ''

I have also trimmed this file, it contains any data that is specific to a location. In this case the dns servers that are present within ukdc1 and the localnetworks that bind needs to present a location view for. We don’t care about anything else other than the machine is in this data center.

  • global.yml
    • Any data that all machines should have
ntp::force_datetime: false
          - 'ntp'
          - 'networking'
          - 'network::resolv_conf'
          - 'profile'
          - 'hosts'
          - 'motd'
          - 'ssh::client'
          - 'ssh::server'
          - 'timezone'

This yaml file contains anything that should be applicable to any machine on our network. As you can see we also have the same key classes as in the function specific yaml and I will explain how latter but for now this is a array of all the classes that should be on every machine in our infrastructure. So every machine will have those puppet classes applied to it as well as the classes in the function specific yaml file. We also have another value ntp::force_datetime if you remember we overrode this in a function specific yaml file but for every other machine that doesn’t have that specific function this value is set to false.

Using the data

Now that we have setup the hierarchy we actually need to use the data within our modules. Currently we have two ways of doing this and this will all depend on the version of Puppet we are using.

If we are using a version of Puppet before 3.0 then we need to actually make a function calls to Hiera. Where as if we are using a version of Puppet later than 3.0 then we can use databindings unless we need to use hiera_hash or hiera_array.

Due to the new databindings in Puppet 3.0 it makes sense to use the class parameters to do the lookup for the Hiera data. An example of this in 2.7 would be

class networking(
  $defaultgateway = hiera('network::defaultgateway')
  $interfaces = hiera('network::interfaces')

Notice that within the class we use the create_resources function in order to dynamically generate resources of the network::interface kind from the hash that we have just lookup from Hiera using the network::interfaces key.

We also had the key classses defined at multiple levels of the hierarchy in order to find all the keys and return them from Hiera we can use the hiera_array function that will contain an array that is concatenated from keys from every level of the hie racy and pass this into the class resource to realize all the classes form that specific node.

In order to do this in site.pp I used the following code

$classes = hiera_array('classes')

This means that if we add another class to any level of the hierarchy that fits the specific node, it will be assigned to the machine. So if we need to add a class to all machines we add it to global.yml but if we only need a class for a specific function we add it to functions/%{ourfunction} or for a specific machine we can add it to nodes/%{ourlocation}/%{ourenvironment}/%{clientcert} .

Although this is a elegant solution we can’t easily tell what classes have been assign to a machine, due to it being controlled at multiple levels of the hierarchy. In order to overcome this I created a little function called hiera_debug . This well create a file per node on the puppet master in json format that will list all the keys looked up for a specific node and the values for those keys.

This allowed we to create a super simple web page that listed all the nodes and the hiera keys looked up on the machine with there values. As classes is just a hiera lookup we can now easily tell what classes have been assigned to a specific node.

Thanks for reading if you have any question or think I should add anything simple enter a comment below.