Accelerate and Scale Joint Genotyping in the Cloud

December 10th, 2019 - 10:00 AM - 11:00 AM PST
Simplify multi-sample variant calling with Databricks, Apache SparkTM, Delta Lake and Glow

Many organizations have successfully scaled single sample variant calling pipelines to support hundreds of thousands of whole genomes. Multi-sample variant calling stands as the next step to further improve the accuracy of these population-scale studies. However, transitioning from single sample calling gVCFs to a project VCF is a challenge. Organizations struggle to orchestrate the GATK’s CombineGVCFs and GenotypeGVCFs commands across tens of thousands of gVCFs. This is made even more challenging on the cloud, as the storage layers of the GATK’s joint genotyping stack are difficult to integrate with cloud storage systems. 

To simplify this process, the Databricks Unified Data Analytics Platform for Genomics offers a fully managed implementation of the GATK4’s joint genotyping engine that takes less than 5 minutes to configure, improves CPU efficiency by 2x and seamlessly scales in the cloud.

Join this webinar to learn:
  • About the opportunities and challenges presented by multi-sample variant calling
  • How Databricks rearchitected the GATK4’s Joint Genotyping engine to leverage Apache Spark and Delta Lake to scale for larger cohorts while retaining high accuracy
  • Live demo of joint genotyping on the Databricks Genomics Runtime, demonstrating how joint genotyping can be set up in less than 5 minutes in the cloud
  • How joint genotyping can be combined with Project Glow—open source software jointly developed by Databricks and the Regeneron Genetics Center—to rapidly move into tertiary analytics on genotype data

Speakers:
  • Frank Austin Nothaft, Healthcare and Life Sciences Technical Director, Databricks 
  • Michael Ortega, Industry and Solutions Marketing, Databricks

Register Now