March 7, 2016
John Fischer and Waqas Sheikh
Time To Read
Estimated reading time: 10 minutes

How Expedia Finds your Flights: A Detailed View

By John Fischer and Waqas Sheikh in Travel Technology on March 7, 2016 |

As explained in part 1 of this series, Best Fare Search (BFS) is the service for finding and pricing the flights that you see in Expedia search results. Finding routes and prices for air travel is difficult because it is a famously difficult computer science problem, the airlines adjust prices many times a day which frustrates caching, and a host of other complexities. We’re going to drill into some of the details around the complexity of BFS and delivering quality search results.

Flight Searching / Selection

When a user searches for a flight we need to decide which flights to show them (which we’ll call flight selection), and then ensure the pricing is accurate (we’ll call this flight pricing). The two problems are not independent as choosing competitive flights requires considering the price of the flights.

We’ll begin with describing flight selection. At any given time BFS stores millions of flight schedule records, each describing a flight. Flights are identified by a flight number, offered on a specific schedule for a specific span of time. This is a mockup of what data might appear in a flight schedule record, for a fictional airline with a code of “ZZ”:

  • ZZ 2152, Seattle-Billings, departing 10:35am UTC-07:00 Mon/Wed/Friday. Duration 1 hr 55 min. United States Domestic Flight. Effective May 1 2016 to June 30 2016.

This flight schedule record defines 26 individual flights (Monday, May 2; Wednesday, May 4; and so on). BFS subscribes to flight schedule data roughly one year into the future; even if each schedule record defined a single flight there would still be over ten thousand flights per day. Flight selection explores a network of tens of thousands of flights for each travel date.

The next bit of complexity we come across is knowing what the “best” flight really is.  Should BFS favor generating flight itineraries that are inexpensive yet have multiple stops/connections, or should it favor itineraries that are more expensive but more direct? Also, while having a diverse range of airlines represented in flight search results can be beneficial for customers, ensuring this diversity can entail making tradeoffs against price and/or directness.

Many airlines’ tickets have single letters printed next to the flights; these are booking codes. Airlines divide their flights’ seat inventory across several such codes; 5 seats in T, 7 seats in Y, and so on. Booking code seat counts can be quite volatile, as airline yield management algorithms manipulate the number of seats in each booking code to maximize revenue. When you fly there’s a good chance the person you sit next to paid a different price for their seat depending on the day they bought, how many other people were showing interest in the flight at the time they bought, and/or how full the airline was forecasting the plane to be.

Both flight selection and pricing must consult booking code availability data. Because flight selection must look at many more flights than pricing does it must deal with larger volumes of highly volatile flight availability data.

To return any results BFS must somehow access this availability data loaded into the system. The airlines expose their availability data in a variety of ways. One method of sharing availability data is a “real time proxy”. The real time proxy provides a detailed, near perfect view of an airline’s availability, but is expensive to build and requires considerable domain expertise to use properly. There are also simpler systems known as AVS  systems that provide availability data in a much more compact, simpler form, but with less accuracy. BFS consumes availability data from both of these models.

Selecting routes and summarizing prices

Now that we have a large dynamic database of flight schedules, booking codes, and availability, we can start building itineraries to service the user’s search. To gauge whether any part of a potential flight itinerary is attractive, it’s tempting to say “let’s price it”. However, flight pricing is a higher-latency operation so flight selection cannot afford to price prospective flight itineraries exhaustively. To get a sense for why this is the case, let’s explore pricing in more detail.

Booking code availability alone does not suffice to determine the price of your flight itinerary. Airlines organize pricing via fares. A fare is a certification for travel on a given portion of a flight itinerary at a given “base” price. Fares are filed in markets; a Chicago-New York fare applies to travel between Chicago and New York. Each has an alphanumeric descriptor known as a fare basis code, and includes a variety of rules restricting its usage.

The rules on a fare used to be handwritten; with the information age came the development of computerized encoding of fares and fare rules. Organizations such as ATPCO and SITA define networks of data tables within which airlines write and link together fare rules. Pricing engines like BFS then subscribe to this data, receiving multiple updates every day.

Let’s look at some examples to see how the fare, fare rules, and fare markets interact to influence search results.

First consider the case of one adult flying outbound Seattle-Denver-Dallas-Atlanta (SEA-DEN-DFW-ATL) on airline A on March 3, and inbound Atlanta-Chicago-Seattle (ATL-ORD-SEA) on airline B on March 7. To price the itinerary every flight must be covered by a fare in some way.

Suppose airline A has filed a fare with code YZ7AS in the ATL-SEA market at $455, and airline B has filed fare CB7C in the same market at $465. (Why not SEA-ATL? In some markets, fares can be used for travel in either direction, so the markets’ two airport codes are filed in lexicographic order. In general fare directionality is more complicated; details are omitted for brevity.)

One way to cover the itinerary with fares would be to cover the entire outbound portion, SEA-DEN-DFW-ATL, with fare YZ7AS, and to cover ATL-ORD-SEA with CB7C. The total fare amount is $455 + $465 = $920, to which surcharges, taxes, and other fees are added.

This is just one possible way to cover the itinerary with fares though; we could divide the itinerary into fare markets in many other ways. Suppose airline A has a fare TL14XWG in the DEN-SEA market at $125, and a fare WL14HD in the ATL-DEN market at $170. The itinerary could then be covered with fares as follows:

  • Seattle-Denver: $125, on airline A’s fare TL14XWG
  • Denver-Dallas-Atlanta: $170, on airline A’s fare WL14HD
  • Atlanta-Chicago-Seattle: $465, on airline B’s fare CB7C

The total fare amount when using these fares is $125 + $170 + $465 = $760. Have we found a cheaper itinerary price? It’s definitely cheaper, but there’s another complicating factor to consider. Each fare has a set of rules that dictate how it can be used, including how multiple fares may be combined. We always have to consider these rules to determine if a proposed fare combination is allowed.

To generate the list of possible flight results, all combinations of fare markets must be considered to find the one which obeys all the fare rules and whose total price is cheapest. In the attached figure you can see just four of many possible ways to cover the same round trip flight route from Seattle to Atlanta:

A diagram describing 4 ways of dividing the same round trip flight route into fare markets.

Fare Rules

The following mockup example shows some of what is filed on a (simplified) fare record for an airline ‘ZZ’. It’s very similar to our flight schedule record from earlier but now it includes fare rule information:

  • ZZ fare TL14XWG, in market DEN-SEA at $125. Effective May 16 2016 to July 15 2016, subject to rules in fare rule tables 236257, 1482, 19624, 239243, and 204682.

Examples of properties on which fare rule tables can impose restrictions:

  • Passenger age
  • Which flight(s) the fare covers
  • The booking code(s) on the flight(s) the fare covers
  • The current date, the dates of travel, and/or relationships among these dates
  • Private fare status (whether a fare is publicly available or restricted to a certain subset of buyers, sites, etc.)

This means BFS needs to store several more millions of rule records relating to flight seat availability and/or booking codes, and tens of millions of rule records relating to private fare status.

Complicating things even further, rule records referenced on the fare record in turn contain references to deeper fare rules; this continues several levels deep. In certain areas rules can also be strung together, yielding ‘composite’ rules such as “If the flight number on any transatlantic flight covered by the fare is 2xxx, then allow purchase until 14 days before travel; otherwise allow purchase until 7 days before travel”.

This stringing makes evaluating fare rules difficult to optimize. Given this, it’s tempting to hope that the pricing problem as a whole exhibits some sort of optimal substructure.

Optimal substructure allows one to employ dynamic programming, as one would for the shortest path problem. However fare rules systems include rules restricting which other fares a fare can coexist with on an itinerary (combination rules). This thwarts a naive dynamic programming approach.

More Complicating Factors

At this point one hopes that airlines aren’t filing many fares per market, given the need to check fare combination rules. This hope is dashed by constructed fares. Fare systems provide mechanisms for synthesizing non-filed fares from filed ones. These synthesized fares must be considered alongside the filed fares; it’s important to do so as discount fare creation is one of the functions of fare synthesis. So BFS needs to maintain several more millions of data records, and tens of millions of linkages among records, that encode rules for synthesized fare construction.

Fare synthesis can even yield a fare whose market differs from that of the original filed fare(s), blurring our earlier ‘bucketing’ of fares into separate markets. For instance, when pricing a Chicago-Paris flight itinerary, one could potentially use a synthesized fare constructed from a New York-London filed fare, even if neither New York nor London is part of the itinerary!

At this point pricing looks daunting, but glimmers of hope remain. If the total price of the itinerary equals the sum of the prices of the fares covering the itinerary, this ‘linearity’ can be exploited. Unfortunately even this is not always true. Surcharges and taxes can apply to larger portions of the itinerary than individual fares, and some surcharges must only be applied a limited number of times based on the number and arrangement of fares on the itinerary.

While the features of airline pricing described above are among the most vexing, they’re just the tip of the iceberg. Multi-passenger fare rules, subtleties of flight availability and airline yield management, and taxation could each fill an entire blog post.

Pricing is difficult enough when the goal is to accurately price one thousand or so priced itineraries to the user. To most accurately guide flight selection, however, one would need to compute billions of itinerary prices or more. A significant challenge of the flight selection problem is exploring how to approximate or truncate pricing so that it is faster yet still accurate enough to guide the discovery of competitive flights.

On top of all this airline flight search and pricing is a moving target. For example ten years ago surcharges used to be relatively small; nowadays surcharges can exceed filed fare costs and are filed with increasingly byzantine applicability restrictions. BFS has had to adjust as large surcharges function as a sort of de facto second pricing system wherein costs are accumulated according to an entirely different paradigm.

The designs of flight availability and fare rule systems continue to change and expand in response to airlines’ needs. For over fifteen years BFS has evolved alongside changes in the air industry to provide Expedia customers with diverse and attractive flight search results.