Public Sector Code-Along

Webinar

Public Sector Code-Along with Azure Databricks and MoJ's Splink

Available On-demand

Code-Along Materials & Solutions Accelerator Launch

The first mission of the National Data Strategy, devised by the UK Government, is to unlock the value of data across the economy, where it is noted that there is much-untapped potential in linking data sets from different organisations.

However, many organisations struggle to link their own data sets, never mind being able to link with data sets from a completely different organisation. When analysts and other data professionals spend huge amounts of time retrieving, merging, cleaning and verifying their data, it's time not spent doing the valuable work of understanding and synthesising their analysis into actionable information.

In January Microsoft, Databricks and the Ministry of Justice welcomed a number of Public Sector colleagues to a code-along at Imperial College and the National Innovation Centre for Data. At the event, participants were able to collaborate with Public Sector colleagues on data challenges to use Azure Databricks and the MoJ's Splink package.

In this session, join as the Databricks, Microsoft and Splink team:

Recap what was covered and learnt on the day of the Code-Along
MoJ’s Splink & Databricks updates on learnings and new implementations
Databricks ARC demonstration showing how data linking can be performed with just a few lines of code
Launch the Automated Record Connector as a Solution Accelerator

Agenda

Code-Along recap

5 Minutes

Isabella Puscasu, Data and AI Specialist, Public Sector, Microsoft

Benefits of Public Sector communities collaborating on data problems and Splink updates

15 Minutes

Ross Kennedy, Lead Data Scientist, MoJ

Databricks post event updates & demo

10 Minutes

Robert Whiffin, Solutions Architect, Public Sector, Databricks

Q&A

15 Minutes

Automated Record Connector (ARC)

ARC simplifies the process of data linking. Building on the MoJ’s powerful Splink library, ARC can produce a data-linking model with just a few lines of code, without any expert data-linking knowledge required.

Tightly integrated with Spark, MLflow and Hyperopt, ARC intelligently chooses the best set of model parameters. Spark provides scale to link billions of records, MLflow provides built-in tracking for full reproducibility, and Hyperopt provides a Bayesian parameter optimisation approach. Simplifying the requirements for data linking increases the pool of potential projects. For example, automatic linking allows for low-effort evaluation of the potential links between different data sets - rather than committing valuable analyst time to a prospective linking project, use software to first determine which projects are worth investing in.

This element should not be removed. It will be automatically hidden