Decision support using R and DataOps at a European Union bank regulator

Authors

  • Jonas Bergstrom, Single Resolution Board

  • Nicolas Pochet, Single Resolution Board

The views expressed by the authors in this article are personal and cannot be attributed to the institutions to which they are or were affiliated.

Abstract

The Single Resolution Board (SRB) is the Resolution Authority1 for the EU Banking Union. Its mission is to promote financial stability and protect EU taxpayers by safely managing any failures of large banking groups. In the event of a bank failure the SRB assesses the public interest of the bank to determine if the bank should be resolved or go through normal insolvency proceedings.

The SRB develops quantitative models to estimate the impact of bank failures on EU member states and the financial system. Decisions taken by the SRB can have significant financial impact and may be subject to legal proceedings. It is therefore crucial that all SRB models are correct, reliable, reproducible and auditable.

To address these needs the SRB has adopted a DataOps methodology centered around an RStudio-based infrastructure. This has allowed SRB to deploy high quality models and data pipelines while at the same time reducing delivery times. This note describes the infrastructure and processes adopted by SRB, initial results and some remaining challenges.

Keywords

R package deployment, R application deployment, RStudio Server, RStudio Connect, DataOps

Findings

Meeting the requirements for SRB quantitative models requires infrastructure as well as processes. The SRB has adopted a set of core principles inspired by the DataOps Manifesto2 and is developing its infrastructure and processes in line with these principles.

Infrastructure

The SRB infrastructure is centered on an RStudio Teams deployment, consisting of RStudio Workbench, RStudio Connect and RStudio Package Manager. Around the RStudio core is an Elastic Stack setup used for centralized logging, an Atlassian BitBucket server for version control and Atlassian Bamboo for continuous integration and deployment. All RStudio products are deployed behind an f5 load-balancing service. System and user configuration is performed using Ansible to ensure reproducible configuration and that server nodes can be added or upgraded as needed.

R Infrastructure{width=“524”}

The majority of the data used for analysis is stored in an Oracle-based RDBMS storing versioned data of regulatory reports submitted by banks to SRB. Other sources of data include ECB/Eurostat statistical data and financial market data (e.g. Bloomberg, IHS Markit). Smaller data sets are stored as Pins in RStudio Connect enabling logging as well as dataset versioning

The full setup is replicated in four different environments (Development, Testing, Acceptance and Production). Development and Testing are used for development of supporting R packages and/or testing infrastructure changes, while Acceptance and Production have access to confidential data and are used for model development.

Processes

Model, Data Pipeline and Package development

SRB Data pipelines and models are developed using RStudio Workbench and deployed in RStudio Connect as Shiny applications or RMarkdown documents. Reusable code is developed as R packages and made available through an internal RStudio Package Manager.

Every project is tracked using four branches in BitBucket, each corresponding to an infrastructure environment (Development, Test, Acceptance and Production). As models are developed using live confidential data, model development is restricted to the Acceptance and Production environment (No confidential data is allowed on Development and Test systems). Supporting libraries, Shiny modules or projects not dependent on live data go through all four environments.

Data pipelines are deployed as scheduled RMarkdown documents on RStudio Connect. Typically, the final data is published as a pin in RStudio Connect (for small data sets) or inserted into the Oracle infrastructure (for larger data sets such as financial market data)

Projects and pipelines integrate automated testing using testthat or shinytest and are instrumented with futile.logger to ensure that business, application and security events are logged in Elastic Stack.

R Package and Application deployment

SRB strives to have reusable code published as internal packages, be it for statistical models or for publishing shiny modules. R package development follows the below automated workflow:

Description of automated testing and deployment of SRB R
packages.{width=“622”}

  1. The package is developed with RStudio Server and tested locally
  2. Changes are pushed to the correct Bitbucket branch
  3. The user creates a pull request that can be reviewed if necessary
  4. Bitbucket triggers the tests on Bamboo
  5. Bamboo spawns a new R Docker image and executes the tests and sends the results back to Bitbucket
  6. The pull request is merged automatically (or after pair review)
  7. RStudio Package Manager fetches the latest commits
  8. RStudio Package Manager builds and publishes the new version of the package in the appropriate environment (Development, Test, Acceptance or Production)

Shiny applications are tested and deployed using a similar workflow and in the final step Bamboo publishes the application directly to RStudio Connect

Description of automated testing and deployment of Shiny
applications{width=“671”}

Application Logging

A key component of the SRB approach is the centralized collection of business, application and security events to Elastic Stack. These logs are used for notifications in case of abnormal events or failed data quality checks.

Description of logging from RStudio Connect to the Elastic
Stack{width=“701”}

Conclusions

The adoption of R, RStudio and DataOps principles has allowed the SRB to go from largely manual analyses to automated pipelines which provide logging and feedback to users and developers. As an example, the timeline for adding new data or analytical pipelines has been reduced from many months down to 1-2 weeks. Some challenges remain, such as the management of interconnected data pipelines in RStudio. The SRB will continue to improve its infrastructure and processes to solve these challenges to further increase quality and reduce delivery times.

Acknowledgements