Webinar
Public Sector Code-Along with Azure Databricks and MoJ's Splink
Available On-demand
Code-Along Materials & Solutions Accelerator Launch
The first mission of the National Data Strategy, devised by the UK Government, is to unlock the value of data across the economy, where it is noted that there is much-untapped potential in linking data sets from different organisations.
However, many organisations struggle to link their own data sets, never mind being able to link with data sets from a completely different organisation. When analysts and other data professionals spend huge amounts of time retrieving, merging, cleaning and verifying their data, it's time not spent doing the valuable work of understanding and synthesising their analysis into actionable information.
In January Microsoft, Databricks and the Ministry of Justice welcomed a number of Public Sector colleagues to a code-along at Imperial College and the National Innovation Centre for Data. At the event, participants were able to collaborate with Public Sector colleagues on data challenges to use Azure Databricks and the MoJ's Splink package.
In this session, join as the Databricks, Microsoft and Splink team:
Agenda
Code-Along recap
5 Minutes
Isabella Puscasu, Data and AI Specialist, Public Sector, Microsoft
Benefits of Public Sector communities collaborating on data problems and Splink updates
15 Minutes
Ross Kennedy, Lead Data Scientist, MoJ
Databricks post event updates & demo
10 Minutes
Robert Whiffin, Solutions Architect, Public Sector, Databricks
Q&A
15 Minutes
Automated Record Connector (ARC)
ARC simplifies the process of data linking. Building on the MoJ’s powerful Splink library, ARC can produce a data-linking model with just a few lines of code, without any expert data-linking knowledge required.
Tightly integrated with Spark, MLflow and Hyperopt, ARC intelligently chooses the best set of model parameters. Spark provides scale to link billions of records, MLflow provides built-in tracking for full reproducibility, and Hyperopt provides a Bayesian parameter optimisation approach. Simplifying the requirements for data linking increases the pool of potential projects. For example, automatic linking allows for low-effort evaluation of the potential links between different data sets - rather than committing valuable analyst time to a prospective linking project, use software to first determine which projects are worth investing in.
Read more in our recent blog post.
This event is being held in collaboration with Microsoft. Please see the Microsoft Privacy Policy here.
Watch Now
© Databricks 2025. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation.