Devops Data Integration with Seiso
At Expedia, our automation efforts involve a lot of tools and data integration. This is a challenge because individual teams have considerable flexibility to adopt the tools and practices that help them achieve fast results. In cases where we think a common capability will lead to the better and faster result, the goal is to do this in a way that maintains flexibility.
One key area in this regard is data integration. We want a shared and integrated view of our development and delivery world, even in the face of a wide range of data sources.
To this end we created the Seiso data hub. Seiso maintains an inventory of all your running services and hardware, and exposes that data to users (developers, operators, administrators, etc.) and other tools.
Seiso has both a UI and a REST API. It supports users and automation alike by providing an entry point into pre-integrated, normalized service configuration and state data, with jumping-off points to external sources as necessary.
Seiso is particularly helpful to automation efforts: by reducing the number of integration points required to get a comprehensive view of the data, we increase the leverage attaching to any given piece of automation.
Seiso is open source software. You can get it from the Seiso repository at GitHub.
This post explains in more detail how Seiso works, how we use it at Expedia and the benefits we’ve realized.
How Seiso works
Here are some key design principles that explain how Seiso works from a high level.
Seiso’s purpose is to make service data readily available to human and automated clients. To accomplish this, we need a way to get the data into Seiso, and also a way to ensure that it stays up to date.
The best way to create convergence toward data completeness and correctness is to establish tools and processes (ideally automated) that consume Seiso data. That way, if data is missing, things don’t work; if data is wrong, things break. The reciprocal relationship between the data and the processes that use it creates a closed loop.
Automation data is spread out across many different systems. There’s no single system, for example, that manages user identity information, source code and monitoring alert subscriptions. And the more ambitious your automation efforts, the more systems you’ll need to pull in. Seiso’s federated design embraces this reality.
The idea is to sync data from existing source systems into Seiso, using either batch or real-time processes as appropriate. Data consumers can grab pre-integrated data from Seiso. And nothing prevents consumers from going straight to the data sources if they desire.
In principle, we should be able to automate anything a person can do. Seiso therefore has an API to mediate access to service data. The UI acts against the API instead of invoking the database directly. Anything a person can do through the UI, a sufficiently smart robot can do through the API.
How we use Seiso at Expedia
We use Seiso primarily in conjunction with another system called Eos. At its core, Eos is an automated reactor: it watches for events in the environment, and reacts by kicking off automated workflows to address them.
Eos uses Seiso as a data backend, which creates the closed loop we described above. This allows Eos to manage a large number of services (160 at the time I’m writing) with a single data integration to Seiso. And because Eos is useful, teams are motivated to get their service data into Seiso and then keep it up to date. The net result:
- It’s dramatically easier to apply automation across a large range of services.
- We have increased visibility of our services, both broad and deep.
To better understand these benefits, let’s get into some specifics about how we use Seiso and Eos to support two major use cases: deployment automation and automated remediation.
Eos knows which version of software is supposed to be on any given server. When the version is wrong, Eos kicks off a deployment workflow to deploy the correct version of the software. It does so in a controlled manner, ensuring that we maintain required capacity to avoid disrupting the service during deployment.
Seiso has an entity called a service instance, which is essentially a service as it’s instantiated in a given environment (e.g., development, test, production, DR, etc.). Typically a service instance represents a load-balanced set of service nodes, though load balancing isn’t required. Nodes in turn run on machines. (So you could have multiple service instances running on a machine: each service instance would have at least one node running on the machine.)
Here’s what a service instance looks like in Seiso (click to enlarge):
For deployment automation, Eos grabs service instance, node and machine data from Seiso. Here are the key pieces of data for the deployment use case:
|Item type||Key data for deployment automation|
* Eos doesn’t currently source software versions from Seiso, nor does it tell Seiso which builds it’s deployed to individual nodes. We’ll do those shortly but we haven’t gotten to them yet. For now Eos itself knows which software version to deploy.
When we perform production releases, we generally want to do so without interrupting service to our customers. So we need to stagger our releases, leaving a certain amount of capacity in rotation at any given time. Moreover we need to account for unhealthy nodes and nodes that may be out of rotation for reasons unrelated to the release. Seiso data drives the behavior of the Eos automation, providing the data described above.
The data integration and logic required to accomplish these goals is non-trivial. Many teams decided that it made more sense for them to load their service data into Seiso to avail themselves to the deployment capability that Eos provides.
Another important use case is automated server remediation. Development teams create health checks for their services, and Eos uses them to detect unhealthy services. It responds by first trying to “bounce” the services (shutting the service down and starting it back up), and if that doesn’t work, it escalates the problem for manual intervention by humans.
As with the deployment use case, Eos sources its data from Seiso. Eos has a monitoring module that uses Seiso node data to run the health checks against specific nodes and then inform Seiso as to the result. (This supports capacity management in the way we described above.) When the check fails, Eos invokes a remediation script, provided by the development team, to attempt the service bounce.
So far we’ve covered some ways in which we consume Seiso data. We happen to be using Eos, though neither Seiso nor Eos are tightly coupled to one another. (We achieve the decoupling through service orientation, Eos’ plugin architecture and messaging.) Now let’s look quickly at how we get data into Seiso.
Seiso data sources
Seiso isn’t tied to any particular data sources. The whole point is to be able to integrate the data sources you happen to be using. At Expedia, Seiso integrates data from a variety of sources.
|Data source||Data types||Sync mechanism|
||Batch sync on commit|
||Batch sync, every 30 min|
|NetScaler load balancers||
||Daily batch sync|
|Seyren (work in progress)||
||Batch sync, every 30 min|
|Grafana (Elastic backend; work in progress)||
||Batch sync, every 30 min|
We use GitHub for a lot of high-level data (and some mid-level data as well) just because it’s a good practice to version control your configuration management data. We’re working on configuration authoring tools to make it easier to work with the data files.
Here are some annotated screenshots showing how Seiso pulls together data into an integrated view (click to enlarge).
Recap: How Seiso Helps
By now you have a pretty good idea of how Seiso works, both from the automation side and from the data source side. It will help to review the two major benefits that Seiso offers.
The first is that Seiso presents end users with an integrated view of your service data. The view is broad in that it gives you views into all the services you’ve onboarded. It also has some depth in that it pulls in data from a variety of sources, giving you access to data supporting development, testing, release and operational use cases.
Second, by offering programmatic access to integrated service data, Seiso accelerates your automation efforts. One reason is that pre-integration makes it easier to write automation in the first place. Another is that having comprehensive, normalized data allows you to write automation once and apply it across a large range of services.
Learn more about Seiso at seiso.io.