Back to Talks

Real-time Meatspace Data Science

Jason Walsh Penn Medicine

Audience level: Intermediate
Topic area: Streaming


Penn Signals is an award-winning ( microservices software platform for processing real-time clinical data from a variety of systems. This talk demonstrates how the data science team at Penn Medicine has combined open source technologies that allow data scientists and researchers to create and use predictive applications to support improvements in health care.



Intro (5 mins)

  • Who we are (2.5 mins)

  • Penn Signals (2.5 mins)

Problem (10 mins)

  • Healthcare data standards or lack thereof

    • HL7
    • Clinical database solutions
    • "RESTful" APIs /s
  • Various types of healthcare data issues

    • Timestamps, latency, time zones (data from the future!)
    • Pipeline availability (where’s my freakin’ data?)
    • Patient identifier inconsistency (80-year-old newborns?!)
    • Multiple fields referring to the same types of data (heart rate, pulse, etc.)
    • AND MORE!
  • Delivering products to clinicians

    • Security
    • Ease of use
    • Integrating with the Electronic Medical Record (EMR)
    • Providing actionable insights

Solution (10 mins)

  • Open source software (5 mins):

  • Putting it all together (5 mins)

  • Demo (5 mins)

    • Demonstrate ingesting data from a remote source and serializing it into Apache Avro format (2.5 mins)
    • Using pyspark with Jupyter Notebook to query serialized data (2.5 mins)

Q & A