Published
August 10, 2015
Author
Category
Comments
Time To Read
Estimated reading time: 8 minutes

Devops Data Integration with Seiso

By Willie Wheeler (@williewheeler) in Devops on August 10, 2015 |

At Expedia, our automation efforts involve a lot of tools and data integration. This is a challenge because individual teams have considerable flexibility to adopt the tools and practices that help them achieve fast results. In cases where we think a common capability will lead to the better and faster result, the goal is to do this in a way that maintains flexibility.

One key area in this regard is data integration. We want a shared and integrated view of our development and delivery world, even in the face of a wide range of data sources.

To this end we created the Seiso data hub. Seiso maintains an inventory of all your running services and hardware, and exposes that data to users (developers, operators, administrators, etc.) and other tools.

Seiso has both a UI and a REST API. It supports users and automation alike by providing an entry point into pre-integrated, normalized service configuration and state data, with jumping-off points to external sources as necessary.

Seiso is particularly helpful to automation efforts: by reducing the number of integration points required to get a comprehensive view of the data, we increase the leverage attaching to any given piece of automation.

Seiso is open source software. You can get it from the Seiso repository at GitHub.

This post explains in more detail how Seiso works, how we use it at Expedia and the benefits we’ve realized.

How Seiso works

Here are some key design principles that explain how Seiso works from a high level.

Closed-loop design

closed-loop

Seiso’s purpose is to make service data readily available to human and automated clients. To accomplish this, we need a way to get the data into Seiso, and also a way to ensure that it stays up to date.

The best way to create convergence toward data completeness and correctness is to establish tools and processes (ideally automated) that consume Seiso data. That way, if data is missing, things don’t work; if data is wrong, things break. The reciprocal relationship between the data and the processes that use it creates a closed loop.

Data federation

data-federation

Automation data is spread out across many different systems. There’s no single system, for example, that manages user identity information, source code and monitoring alert subscriptions. And the more ambitious your automation efforts, the more systems you’ll need to pull in. Seiso’s federated design embraces this reality.

The idea is to sync data from existing source systems into Seiso, using either batch or real-time processes as appropriate. Data consumers can grab pre-integrated data from Seiso. And nothing prevents consumers from going straight to the data sources if they desire.

Automation-first design

automation-first

In principle, we should be able to automate anything a person can do. Seiso therefore has an API to mediate access to service data. The UI acts against the API instead of invoking the database directly. Anything a person can do through the UI, a sufficiently smart robot can do through the API.

How we use Seiso at Expedia

We use Seiso primarily in conjunction with another system called Eos. At its core, Eos is an automated reactor: it watches for events in the environment, and reacts by kicking off automated workflows to address them.

Eos uses Seiso as a data backend, which creates the closed loop we described above. This allows Eos to manage a large number of services (160 at the time I’m writing) with a single data integration to Seiso. And because Eos is useful, teams are motivated to get their service data into Seiso and then keep it up to date. The net result:

  • It’s dramatically easier to apply automation across a large range of services.
  • We have increased visibility of our services, both broad and deep.

To better understand these benefits, let’s get into some specifics about how we use Seiso and Eos to support two major use cases: deployment automation and automated remediation.

Deployment automation

Eos knows which version of software is supposed to be on any given server. When the version is wrong, Eos kicks off a deployment workflow to deploy the correct version of the software. It does so in a controlled manner, ensuring that we maintain required capacity to avoid disrupting the service during deployment.

Seiso has an entity called a service instance, which is essentially a service as it’s instantiated in a given environment (e.g., development, test, production, DR, etc.). Typically a service instance represents a load-balanced set of service nodes, though load balancing isn’t required. Nodes in turn run on machines. (So you could have multiple service instances running on a machine: each service instance would have at least one node running on the machine.)

Here’s what a service instance looks like in Seiso (click to enlarge):

A service instance in Seiso

For deployment automation, Eos grabs service instance, node and machine data from Seiso. Here are the key pieces of data for the deployment use case:

Item type Key data for deployment automation
Service instance
  • Name
  • Service
  • Environment
  • Ports
  • Minimum capacity level to maintain during deployment
Node
  • Name
  • IP addresses
  • Health status
  • Node rotation status (for load balancing)
  • IP address rotation status (for load balancing)
  • Software version running on the node*
Machine
  • FQDN
  • IP address

* Eos doesn’t currently source software versions from Seiso, nor does it tell Seiso which builds it’s deployed to individual nodes. We’ll do those shortly but we haven’t gotten to them yet. For now Eos itself knows which software version to deploy.

When we perform production releases, we generally want to do so without interrupting service to our customers. So we need to stagger our releases, leaving a certain amount of capacity in rotation at any given time. Moreover we need to account for unhealthy nodes and nodes that may be out of rotation for reasons unrelated to the release. Seiso data drives the behavior of the Eos automation, providing the data described above.

The data integration and logic required to accomplish these goals is non-trivial. Many teams decided that it made more sense for them to load their service data into Seiso to avail themselves to the deployment capability that Eos provides.

Automated remediation

Another important use case is automated server remediation. Development teams create health checks for their services, and Eos uses them to detect unhealthy services. It responds by first trying to “bounce” the services (shutting the service down and starting it back up), and if that doesn’t work, it escalates the problem for manual intervention by humans.

As with the deployment use case, Eos sources its data from Seiso. Eos has a monitoring module that uses Seiso node data to run the health checks against specific nodes and then inform Seiso as to the result. (This supports capacity management in the way we described above.) When the check fails, Eos invokes a remediation script, provided by the development team, to attempt the service bounce.

So far we’ve covered some ways in which we consume Seiso data. We happen to be using Eos, though neither Seiso nor Eos are tightly coupled to one another. (We achieve the decoupling through service orientation, Eos’ plugin architecture and messaging.) Now let’s look quickly at how we get data into Seiso.

Seiso data sources

Seiso isn’t tied to any particular data sources. The whole point is to be able to integrate the data sources you happen to be using. At Expedia, Seiso integrates data from a variety of sources.

Data source Data types Sync mechanism
GitHub Various:

  • Service groups
  • Services
  • Environments
  • Service instances
  • Nodes
  • Machines (in cases where the team isn’t using Chef)
  • Infrastructure providers
  • Regions
  • Data centers
  • Load balancers
  • …and others.
Batch sync on commit
Chef server
  • machine data (i.e., Chef nodes)
Batch sync, every 30 min
Eos
  • Server health
  • Deployed version (planned)
Real-time sync
NetScaler load balancers
  • Node and IP address rotation status
Real-time sync
Active Directory
  • Person data (name, title, contact, organization, etc.)
Daily batch sync
Seyren (work in progress)
  • Seyren checks (i.e., monitoring alerts)
Batch sync, every 30 min
Grafana (Elastic backend; work in progress)
  • Grafana dashboards
Batch sync, every 30 min

We use GitHub for a lot of high-level data (and some mid-level data as well) just because it’s a good practice to version control your configuration management data. We’re working on configuration authoring tools to make it easier to work with the data files.

Here are some annotated screenshots showing how Seiso pulls together data into an integrated view (click to enlarge).

sources-1
sources-2

Recap: How Seiso Helps

By now you have a pretty good idea of how Seiso works, both from the automation side and from the data source side. It will help to review the two major benefits that Seiso offers.

The first is that Seiso presents end users with an integrated view of your service data. The view is broad in that it gives you views into all the services you’ve onboarded. It also has some depth in that it pulls in data from a variety of sources, giving you access to data supporting development, testing, release and operational use cases.

Second, by offering programmatic access to integrated service data, Seiso accelerates your automation efforts. One reason is that pre-integration makes it easier to write automation in the first place. Another is that having comprehensive, normalized data allows you to write automation once and apply it across a large range of services.

Learn more about Seiso at seiso.io.