Large-Scale Recommendation System with Python and Spark
By Phil Anderson

New product discovery is an established activity within brick-and-mortar grocery stores, but is still ripe for experimentation within an online setting. In this talk, we discuss a customer-level product recommendation system we developed for the Kroger Company, using Python, Apache Spark, and Apache Airflow.

Saturday noon–12:30 p.m. in Cartoon 1

Abstract

We will briefly cover the Kroger Company and its digital properties, along with its current recommendation systems and need for a new one. We will then move into a deep dive of the system we developed, covering the Python APIs for large-scale data processing tool Spark, and the underlying Hadoop Distributed File System (HDFS) - focusing on how we utilized each in our implementation. We’ll also discuss process scheduling and coordination via Apache Airflow, along with its Python API and use of Python eggs. Finally, we will show the recommendation system in action, and discuss plans for testing and improvement.

Talk will be organized as follows:

Intro - Context Setting (5 min)

  • What is Kroger?
  • What is 84.51?

    • What is Digital Personalization at 84.51?
  • Landscape: Kroger’s digital properties

  • Typically use Retention-focused Recommendation Systems
    • These tend to work extremely well with grocery’s cyclic purchase cycles
  • Need for Acquisition-based Recommendation System

Body - Technical Deep Dive (20 min)

New Product Recommender - Ensemble Recommendation System

Part 0:

  • Hadoop & Spark, and their Python API

Part 1: Collaborative Filtering

  • Overview
  • Training (PySpark)
  • Implementation (PySpark)
    • Roadblocks

Part 2: Regularized Regression

  • Overview
  • Training
  • Implementation (PySpark)

Part 3: Process Scheduling

  • Overview of Airflow
    • Directed Acyclic Graphs
    • Python directive script layout
    • Python Eggs

Part 4: Live view of system on kroger.com

Conclusion (5 min)

  • Next Steps - Testing

Phil Anderson

Phil Anderson is a Lead Data Scientist at 84.51, a Cincinnati-based analytics services provider and wholly-owned subsidiary of the Kroger Company. He is a member of 84.51’s Digital Personalization team, which builds large-scale recommendation systems for Kroger’s digital assets, including Kroger.com and Kroger’s mobile applications. He has worked for 84.51/dunnhumbyUSA for 5 years, in Analytics roles related to ad targeting and measurement, CRM program deployment, and the development of enterprise applications for price/promotion evaluation. He holds a Bachelors of Economics from the University of Notre Dame, and is working on a Masters of Statistics from Texas A&M University (2018 expected completion).

Twitter
Sponsors