Author Archive

Published
December 29, 2016
Author
Category
Comments
Time To Read
Estimated reading time: 9 minutes

Operationalizing Spark Streaming (Part 1)

By in Big Data, Lessons Learned on December 29, 2016 |

For those looking to run Spark Streaming in production, this two-part article contains tips and best practices collected from the front lines during a recent exercise in taking Spark Streaming to production. For my use case, Spark Streaming serves as the core processing engine for a new real time Lodging Market Intelligence system used across the Lodging Shopping stack on Expedia.com, Hotels.com and other brands. The system integrates with Kafka, S3, Aurora and Redshift and processes 500 msg/sec average with spikes up to 2000 msg/sec. The topics discussed are: Availability: Getting Spark running and…

Read More
Principal Engineer at Expedia working on scalable streaming data processing systems.