Monday 29 July 2013

Roadmap for Heat Havana (part 2)

So with havana2 workload and holidays delaying this follow-up post it's probably a bit late to really call this a roadmap, but what follows is a status update and some further details on what we're working on delivering (or have delivered) for Heat's Havana cycle:

Ceilometer Integration

Some great work has been going on adding alarming features to ceilometer, and recently some patches have been landing integrating Heat with this alarming capability.  This should allow us to move away from maintaining a metric store and alarming functionality inside heat, which will provide a many benefits:


  • Align with one openstack metric/alarm solution
  • Some alarms can use existing hypervisor-level metrics instead of in-instance agent
  • Allow extensible alarm resources via Provider templates
  • Removal of heat-engine periodic evaluation tasks (which will allow easier engine scale-out)

Heat (grizzy) metric collection mechanism
Heat (grizzy) metric collection mechanism
The diagram above illustrates how the metric collection works in grizzly heat - all metric data is collected via a "cfn-push-stats" agent (typically via a cron job defined in the stack template), which requires credentials (a keystone ec2-keypair) to be deployed inside the instance.  The metric data is stored in the heat-engine database, and a periodic task evaluates the currently stored data against the alarm thresholds defined in the template.  All in all, a crude (but simple) mechanism which has proven sufficient for initial Heat development purposes in the absence of ceilometer metric/alarm functionality.

The Havana Heat metric collection mechanism will look different, introducing a dependency on the ceilometer service, which can provide access to the hypervisor level statistics, avoiding the in-instance aspect of the method described above for many metric types:

Heat (Havana) metric collection/alarms via Ceilometer
Heat (Havana) metric collection/alarms via Ceilometer
We are also planning to support a compatibility mode (probably for one release cycle) which will allow existing templates using cfn-push-stats to work with the new Ceilometer based alarm mechanism:


This should allow existing users of the Heat metric/alarm features time to migrate to the new metric collection method, and also give us time to work out if a Ceilometer tool or agent will be developed which can replace cfn-push-stats (or if cfn-push-stats can be reworked to direct metric data to a Ceilometer API equivalent of PutMetricData), the exact way forward here is still under discussion.

Keystone Trusts Integration

Work is in-progress to integrate with the Keystone explicit impersonation "Trusts" feature which was added as a v3 API extension for grizzly.  The initial focus will be to remove the requirement to store encrypted credentials in the Heat DB (which are used for post-create stack actions, for example AutoScaling adjustments), instead we will create a trust token with the minimum possible roles to perform these actions.

A second thread of this work is to provide an alternative to creating actual keystone users related to the User, AccessKey and WaitConditionHandle resources - because these resource depend on creating an ec2-keypair we need a way to create a keypair from a trust token, which has been proposed as a new keystone feature, but not yet implemented.  As such it's not yet clear if we'll be able to complete this second step in the Havana time-frame, but we're looking into it! :)

HOT Progress

Work has been progressing well in delivering the abstractions related to the new HOT DSL, in particular the work related to Provider resources and Environments is now largely complete, the initial "hello world" HOT parser implementation has been completed, and work is under-way completing the various additional blueprints required to enable more complex templates to be expressed.  It's a huge piece of work, but all those involved are doing a great job pushing things in the right direction.

And Much More...

There is much more that I've not covered here (more stack update improvements, more neutron fixes and functionality, heat standalone mode, converting InstanceGroups to nested stacks, event persistence, to name a few), but that's all I have time for today - hopefully the info above provides some useful context and detail!