Author Archive

December 29, 2016
Time To Read
Estimated reading time: 10 minutes

Operationalizing Spark Streaming (Part 1)

By in Big Data, Data Science, Lessons Learned on December 29, 2016 |

For those looking to run Spark Streaming in production, this two-part article contains tips and best practices collected from the front lines during a recent exercise in taking Spark Streaming to production. For my use case, Spark Streaming serves as the core processing engine for a new real time Lodging Market Intelligence system used across the Lodging Shopping stack on, and other brands. The system integrates with Kafka, S3, Aurora and Redshift and processes 500 msg/sec average with spikes up to 2000 msg/sec. The topics discussed are: Availability: Getting Spark running and…

Read More

Principal Engineer at Expedia working on scalable streaming data processing systems.