Fundamental economic data, financial stock tick data and alternative data sets such as geospatial or transactional data are all indexed by time, often at irregular intervals. Solving business problems in finance such as investment risk, fraud, transaction costs analysis and compliance ultimately rests on being able to analyze millions of time series in parallel. Older technologies, which are RDBMS-based, do not easily scale when analyzing trading strategies or conducting regulatory analyses over years of historical data.
In this webinar learn how:
- How to build time series functions on hundreds of thousands of tickers in parallel using Apache Spark™.
- We’ll then show how to modularize functions in a local IDE and create rich time series feature sets with Databricks Connect.
- Lastly, if you are a pandas user looking to scale data preparation which feeds into financial anomaly detection or other statistical analyses, we use a market manipulation example to show how Koalas makes scaling transparent to the typical data science workflow.
Presented by:
Ricardo Portilla
Solutions Architect, Databricks