Back to Talks

Tranforming Legacy Code to Leverage Spark

Rachita Chandra

Audience level: Intermediate
Topic area: Modeling

Description

In this sessin we cover how we selected and prioritized the components of a solution to leverage Spark and the issues we faced while developing and testing the transformed solution with Spark code. The solution is an end-to-end multi-tenant enterprise application comprising of several components - data transformations, quality checks, analytics, ML functions and visualization.

Abstract:

We built a prototype for identifying key risk predictors for high cost patients using claims and membership. This worked well for 5 million patients data in a non-distributed environment. This prototype utilized several Python data science libraries and Machine Learning (ML) models.

In this session we cover how we selected and prioritized the components of the solution to leverage Spark and the issues we faced while developing and testing the transformed solution with Spark code. This session will be useful for attendees interested in leveraging spark for existing solutions and they will take away the following from the session:

  • Finding compatible components across both environments (non-distributed and Spark environment)

  • Architectural changes and mapping libraries from non-distributed environment to distributed cluster and modifications needed in the existing codebase to leverage Spark

  • Challenges faced while porting existing Machine Learning modules to use Spark