On-Demand Webinar: How Regeneron Accelerates Genomic Discovery at Biobank-scale with Apache SparkTM

The field of genomics has matured to a stage where DNA sequencing projects have reached population scale. And while many organizations have invested in large genomic datasets like the UK Biobank, few have the expertise or proper technology architecture to turn these massive volumes of raw DNAseq data into actionable insights.

Regeneron, a leading biotech company committed to creating therapeutic innovations, has built one of the world’s most comprehensive genetics databases with over 500,000 exomes. On their journey to turning this data into novel therapeutic insights, Regeneron encountered numerous challenges. For example, how do you enable fast and accurate queries from >300B data points? And how do you expedite novel statistical tests on TB-scale data?

In this session, Regeneron will share the challenges they faced building the world’s largest genetics databases, how they overcame these challenges with a scalable and performant informatics infrastructure powered by Apache Spark™, Databricks and AWS and the key lessons learned along the way.

Join this webinar to learn :

  • About the role genomics plays in accelerating drug development at Regeneron
  • What challenges they faced turning 500k exomes and electronic medical records into actionable insights
  • How Apache Spark, Databricks and AWS enables them to easily scale informatics and improve query speeds by 600x
  • Demo on a machine learning model for genome-wide disease risk scoring powered by Apache Spark and Databricks

Watch Now