The field of genomics has matured to a stage where DNA sequencing projects have reached population scale. And while many organizations have invested in large genomic datasets like the UK Biobank, few have the expertise or proper technology architecture to turn these massive volumes of raw DNAseq data into actionable insights.
Regeneron, a leading biotech company committed to creating therapeutic innovations, has built one of the world’s most comprehensive genetics databases with over 500,000 exomes. On their journey to turning this data into novel therapeutic insights, Regeneron encountered numerous challenges. For example, how do you enable fast and accurate queries from >300B data points? And how do you expedite novel statistical tests on TB-scale data?
In this session, Regeneron will share the challenges they faced building the world’s largest genetics databases, how they overcame these challenges with a scalable and performant informatics infrastructure powered by Apache Spark™, Databricks and AWS and the key lessons learned along the way.
Join this webinar to learn :