How Databricks and Machine Learning is Powering the Future of Genomics

On-Demand Webinar
With the drastic drop in the cost of sequencing a single genome, many organizations across biotechnology, pharmaceuticals, biomedical research, and agriculture have begun to make use of genome sequencing. While the sequence of a single genome may provide insight about the individual who was sequenced, to derive maximal insight from the genomic data, the ultimate goal is to query across a cohort of many hundreds to thousands of individuals.

Join this webinar to learn how Databricks — powered by Apache Spark™ — enables queries across a database of genomics in interactive time and simplifies the application of machine learning models and statistical tests to genomics data across patients, to derive more insight into the biological processes driven by genomic alterations.
In this webinar, we will:

  • Demonstrate how Databricks can rapidly query annotated variants across a cohort of 1,000 samples.
  • Look at a case study using Databricks to improve the performance of running an expression quantitative trait loci (eQTL) test across samples from the GEUVADIS project. 
  • Show how we can parallelize conventional genomics tools using Databricks.

Frank Austin Nothaft
Genomics Data Engineer

Frank Austin Nothaft is a Genomics Data Engineer at Databricks and a PhD candidate in Computer Science at UC Berkeley. Frank is one of the core developers on the ADAM/Big Data Genomics project, pioneering methods for processing large genomics datasets using Apache Spark. Frank holds a Masters of Science in Computer Science from UC Berkeley, and a Bachelors of Science with Honors in Electrical Engineering from Stanford University. Prior to joining UC Berkeley, Frank worked at Broadcom Corporation on design automation techniques for industrial scale wireless communication chips.

Sign up today