Presentation: Massive Scale Anomaly Detection Framework

Track: Predictive Architectures in the Real World

Location: Cyril Magnin I + II

Duration: 11:40am - 12:20pm

Day of week: Tuesday

Share this on:


Early detection of abnormal events can be critical for many business applications, however there are numerous challenges when implementing real-time anomaly models at scale. server failure, developer error and malicious activities are very different scenarios with different engineering requirements. Moreover, most analytical models have been traditionally designed for the batch processing paradigm and usually cannot be easily adapted to unbounded datasets and real-time latencies.

At PayPal, we must be able to analyze billions of events every day in real-time across a wide range of services, devices and locations. In a collaboration between our Platform engineering team and data science teams, we have built a generic framework for developing robust and scalable anomaly detection streaming applications, focusing on flexibility to support different types of statistical and machine learning models. Inspired by the design of scikit-learn and Spark MLlib, we have designed a simple pipeline-based API on top of Spark Structured Streaming, that captures common patterns of the anomaly detection domain.

At the base of the framework, we took advantage of Spark Structured Streaming fast and scalable execution engine together with stream-oriented building blocks to allow easy extension to new production grade models. We found real-time anomaly detection to provide powerful capabilities in many different fields, internally we use the framework for a variety of use cases ranging from fraud prevention, operations and even security.

Speaker: Guy Gerson

Big Data Developer @PayPal

Guy Gerson is a Software Engineer on PayPal’s next generation stream processing platform core team. He is currently working on the adaptation of Statistical and Machine learning methodologies as part of real-time data pipelines. Prior to PayPal, He was a Researcher on the IBM Cloud and Data Technologies group focusing on designing large scale Internet of Things analytics architectures.

Find Guy Gerson at

Speaker: Uri Silberstein

Senior Cloud & Big Data Developer @PayPal

As part of the platform engineering team at PayPal, Uri Silberstein is responsible for building PayPal's large scale stream processing platform including analytical framework which provides an easy way to run statistical and machine learning models. Before working for PayPal, Uri worked at several leading startups and at IBM in both research and storage divisions. Besides all the fun above, Uri also enjoys sports, traveling and spending quality time with his wife and two sons.

Find Uri Silberstein at

2019 Tracks

  • Groking Timeseries & Sequential Data

    Techniques, practices, and approaches around time series and sequential data. Expect topics including image recognition, NLP/NLU, preprocess, & crunching of related algorithms.

  • Deep Learning in Practice

    Deep learning use cases around edge computing, deep learning for search, explainability, fairness, and perception.