In late 2012, Ceilometer was established as a framework that collects usage metrics to effectively monitor and meter the OpenStack cloud environment. Initially, Ceilometer was created to aid billing systems in acquiring the necessary meters for customer billing that spanned all relevant OpenStack core components. However, the framework can collect measurements from various sources for other custom meter needs, as well.
For example, Ceilometer collects and monitors:
- Compute resources, including types and location.
- Network activity, including data transfer (in/out) by location.
- Storage, including swift storage, volumes and IO
According to OpenStack, Ceilometer contributors aim to achieve the following goals:
- Collecting metered data regarding network costs and CPU
- Gathering data notifications via service updates or infrastructure based polls
- Configuring collected data types to meet operational enterprise needs
- Gaining complete access to metered data by means of the REST API
- Broadening the framework for custom usage data collection with supplementary plugins
Having the metering messages signed and non-repudiable.
Even with Ceilometer’s impressive goal to achieve wide ranging data measurements, no solution is perfect. With its improvements occurring at a decent pace, it is still a young project, which is reflected in its capabilities and stability. Below are a few of the current challenges that OpenStack developers face today.
The Ceilometer Mechanism
The Ceilometer agent runs across every compute node in order for the central server to poll the raw metering metrics from each node. In addition, the central server collects usage and utilization metrics from other resources that are not tied to a specific node. A collector then monitors the queues of messages that are sent by the agents. After the messages are processed, they are then stored in a dedicated database server that supports intensive write operations.
The Ceilometer Backend Database
Ceilometer currently supports MongoDB, MySQL, PostgreSQL, DB2, and HBase. According to a recent research study at Indiana University, “Understanding Cloud Usage through Resource Allocation Analysis”, an OpenStack production environment’s scale metering has approximately 386 writes per second and 33,360,480 events per day. Due to its write intensive manner, a load this size would require 239 GB of storage volume for statistics, alone, per month. Although other databases are supported, the OpenStack community predominantly utilizes MongoDB as Ceilometer’s default backend database. While the backend is supposed to be replaceable, most of the testing occurred with MongoDB due to the limitations that were brought about by the other database solutions.
Managing big data at scale is no easy feat. For example, providing support for metering services can be quite difficult when a massive amount of data streamed from all of the OpenStack services is aggregated into a persistence layer for query analysis. Recently, MongoDB added the new HBase backend, which is equipped with a variety of tools specifically designed for big data analytics, to assist with these types of jobs.
We, at Cloudyn, have gained a vast amount of experience in cloud management and optimization of cloud utilization over the past few years. By means of our own development and enhancements in the field, we have learned that the task of collecting and tracking cloud resources, including inventory and usage, can be quite challenging, especially for immature cloud platforms. This is mainly due to a lack of solid documentation along with the fact that the available data is often times not sufficient enough to track and analyze the appropriate resources and usage.
All in all, the following scenario demonstrates the difficulties involved in tracking the state of an OpenStack cloud, as well as receiving the necessary resource metrics by means of the Ceilometer API.
The main mode of collecting data from Ceilometer is via the event API. However, in many cases, the event API is inadequate. It is important to be able to see the events that are tied together, which is demonstrated through Ceilometer’s wiki and blueprint. When fully implemented, this feature holds great promise in successful metering.
To demonstrate the complexities mentioned above, the examples below illustrate the lack of information and consistency in compute status monitoring:
Example #1 (Havana)
The images below, concerning resource ‘641a’, show how the state of the resource (i.e. active, suspended, terminated…) is not expressed:
If we take another instance and suspend it, keeping in mind that the status is suspended during transition, we may subsequently find that all of the metadata is gone. As a result, we have no indication regarding the status of the resource, solely that it looks as if it is active.
Resuming the instance, you can see that more metadata is coming in, however, the same metadata is received as when it was suspended, without dates or time stamps.
The collection process is comprised of a sampling procedure that periodically checks the state of the instance. The examples above actually show how easy it is to miss certain pieces of data and generate inaccurate results. Ceilometer is a promising solution in terms of providing a single endpoint to monitor and track the lifecycles of cloud resources. Nonetheless, the examples provided above illustrate the difficulties that accompany the current API. Hopefully this will improve in upcoming new releases.
The Future of Ceilometer
At Cloudyn, we see the Openstack cloud as an integral part of the enterprise cloud environment. Today we use Ceilometer to extract usage and performance data from its API. Looking at the commit graph above, you can see that the project is very active, especially before the release cycle. Additionally, Ceilometer’s Heat and autoscaling features, which are based on its alarms and event APIs, have gained a lot of momentum as of late. This attention is a sure fire sign that more robust performance is in store. It is, however, still a work in progress.
There is clearly a need for a central solution that monitors inventory, events, and resource usage, in general. Yet, due to the current situation, it seems that joint community efforts will better yield improved cloud monitoring systems as well as documentation for developers and users.