In this article, we would like to share the basic perceptions and structure of Cloudyn’s cloud optimization model. One of the main challenges it faces is the ability to analyze and optimize heterogeneous cloud (IaaS) environments, including Amazon, Google Cloud Platform (GCP), Openstack and Azure. This challenge is derived from various environmental factors, such as different terms and APIs, as well as each vendor’s unique data and structure. Our data model is based on three components: Data Capturing, Normalization and Categorization.
Cloudyn’s data engines have been designed to filter and organize data that is collected from various clouds, ensuring accurate presentation, analysis and optimization. The data that is primarily collected is related to cloud costs, performance and inventory. However it is not limited to these categories. There are essentially two types of data: customer specific and vendor specific. Customer specific data reflects particular customer usage patterns, such as reservation usage and performance metrics. On the other end of the spectrum, vendor specific data focuses on resource list prices, instance types, and reservation offerings, to name a few. Information from both types of data is collected and normalized by provider specific data collectors.
The specific cloud data that is captured, digested, and normalized, is then stored in what we call “cloud vendor agnostic tables”. These tables ensure compatibility across the board. Since they are not semantically related to any specific provider, they use terms such as “resource type” instead of the more specific “instance type”. Future data optimization and analysis, even if it is passed around from one vendor to another, is done on the basis of cloud providers’ agnostic tables. Due to their more coherent and organized structure, the amount of agnostic cloud data is smaller than providers’ specific data tables (which are cross-provided from vendors). The agnostic table data can then be used by Cloudyn’s optimization algorithms, including price, sizing and utilization analysis. This generic data layer enables Cloudyn to provide users that use multiple clouds, a holistic view into their overall multi-cloud deployments.
Below are few examples that show the variation of information between the various cloud providers and even within a specific cloud:
1. The AWS cloud (within a specific cloud): With AWS, the billing file can report VM usage in the us-east-1 region as “BoxUsage:m1.xlarge”, whereas it is reported as “APN1-BoxUsage:m1.large” in the Asia-Pacific region and CLI reports this region as “ap-northeast-1”.
2. Fragment Azure Billing Data
For example, as seen in the table below, there is no consistency when it comes to reporting on the same service type (A1 VM and BASIC.A1 VM) or the same region (West Europe, EU West). Instead, we’d prefer to have a combination of several fields define an instance type, which would result in a reduced amount of rows.
3. No uniformity between clouds: In AWS pricing model, the price of the proprietary OS is included in the price of each VM, whereas GCE and Azure report it as a separate, aggregated, billable item. In this case, Cloudyn would break between the VM price and the operating system in order to set a least common denominator.
Proper categorization is crucial with such a complicated data model. It helps you understand how a cloud environment should be structured in terms of cost and usage consumption. Additionally, it allows you to see all of the different angles and dimensions that are monitored, captured and normalized. Categorization is done for many different types of data points across the different assets, including compute, storage and network.
Metrics, such as the number of instances that run per hour or the difference in usage between block and regular storage, raise concerns regarding where spend is allocated, resource costs, and inventory. Detecting and analyzing these metrics relies on cleverly aggregating and categorizing data sets that are retrieved from various sources. Our goal is to provide users with one single, encompassing answer.
Optimization and Aggregation
We generate a second level of tables based on the collected data. After normalization, that data is kept in the secondary provider’s agnostic tables. Once it is categorized, the Cloudyn engines can run their optimization algorithms. For example, in order to provide Reserved Instance (RI) recommendations, Cloudyn holds reservation prices and finds their break-even point based on each on-demand instance’s type, region and price-utilization analysis.
We also generate our own classification data with features for every object we analyze. For example, one classification might be the “relative power” of different clouds’ compute flavors. In addition, we continuously benchmark the different clouds. All of this data is used to compare various cloud offerings and enable our customers with enhanced cross-cloud optimization opportunities.
One of the main benefits of the Cloudyn cloud management model is its structured representation of usage and costs. It makes no difference where data comes from, because once aggregated and normalized, it is considered a unified stock of information.
Furthermore, once the data is normalized into Cloudyn’s agnostic database, it is then sorted by its common denominators across the various normalized environments. Then, our system can perform analyses on specific vendors and run optimization engines across all vendors. The beauty is that with Cloudyn, it becomes possible to simply use one consolidated data source and efficiently and effectively generate information and insights that support quick decision making.