<<Back to events page

Save your seat!

Friday 7th June 2019 - Cambridge

The field of genomics has matured to a stage where organisations are sequencing DNA at population scale. However, taking raw DNAseq data and transforming it into a format suitable for analysis has become the new bottleneck to genomic discovery. Typically, teams are gluing together a series of bioinformatics tools with custom scripts and processing data on single node machines, one sample at a time. Bioinformatics scientists are spending more time building and maintaining pipelines than modelling data. To ease the burden of analysing population scale genomic data, a number of open-source bioinformatics tools have moved to use Apache Spark™, such as the GATK4, Hail, and ADAM, but mastering these tools is no easy task.

In this workshop, we’ll walkthrough how the Databricks Unified Analytics Platform for Genomics simplifies the end-to-end process of turning raw sequencing data into actionable insights at scale. Introduced by the original creators of Apache Spark, this platform makes it simple to deploy Spark-based bioinformatics tools on cloud computing, and rapidly accelerates common genomic analyses.

Join this half day technical workshop to learn how to:

Call variants, both in a single sample and across multiple samples, using our accelerated GATK4 pipelines
Use Spark SQL to characterise the association of variants in a population with phenotypes
Use machine learning to model genome-wide disease risk across multiple variants associated with a phenotype of interest

Key technologies employed: GATK4/Variant calling, Genotype-phenotype association tests, population scale risk-modelling via ML, ML model training/deployment.

Location:

Wellcome Genome Campus

James Watson Pavilion

Hinxton Cambridgeshire

CB10 1SA

Agenda:

8:30-9:00 Registration, Breakfast & Networking

9:00-9:50 Opening Remarks, Customer Use Case and Set-up

9:50-10:00 Break

10:00-10:45 Workshop #1: Accelerating Variant Calls with Apache Spark

10:45-11:30 Workshop #2: Characterising Genetic Variants with Spark SQL

11:30-11:45 Break

11:45-12:15 Workshop #3: Disease Risk Scoring with Machine Learning

12:15-12:30 Q&A

Presenter:

Frank Austin Nothaft

Technical Director Healthcare

Databricks

Please fill out the form to confirm your spot

First Name:

Last Name:

Company:

Job Title:

How would you describe yourself

Company Email

Phone Number:

Where do you intend to run your analytic workloads?

Currently Using Apache Spark:

How big is your company? (# of employees)

Dietary Restrictions:

Keep me informed with the occasional updates about Databricks and Apache Spark<span style="font-size: 12px;"><sup>TM</sup></span>.

Person Source:

UTM Source:

UTM Campaign:

UTM Medium:

UTM Offer:

UTM Ad Group:

UTM Keyword:

mkto_partner_source:

UTM Content:

GCLID: