The number of models in use varies dramatically according to the specific institution. A bank with 150 models can have more than 600 model monitoring runs a year.
A traditional reaction would be to pile on more bodies — modelers, data managers, validators — to take on the tasks of data review and validation, model refreshes and reruns, and test results reviews. These demands would require three to five times the staff used today. Most banks don’t want to grow their teams by this magnitude and, even if they did, the ability to find and keep a qualified team is nearly impossible.
Instead, they can ease the burden by leveraging capabilities that are available to the industry and, in some cases, using artificial intelligence to speed the process.
TURNING DOWN THE HEAT
As a first step, some banks have created a more structured model management process we describe as a “model factory.”
In a model factory, bankers focus on efficiency and effectiveness by looking at each stage of the planned process: determine what needs to be done, figure out ways to do it with the least effort, set rules on data source usage, and establish common development standards.
This aligns processes to reduce a complex set of production tasks into what we visualize as a “model factory” that has distinct and rigorously-defined stages: “architect”, “data scientist”, “data engineers”, “modelers”, and “model validators.” This “factory” environment structures the approach, eliminates unnecessary redundancies, and decreases the amount of rework. Many banks are benefiting from this streamlined and consistently-conducted process, but the number of monitoring events keeps increasing. Most teams will be swamped in the next 24 months without a radical re-evaluation of their workflow.
TAKING THE POT OFF THE STOVE
Banks have done everything “humanly possible” to improve upon this situation. Now banks need to take advantage of technology to improve this process in three dimensions.
Data. It is surprising that the biggest challenge banks continue to face is getting and maintaining data in usable condition for modeling. There are no guardrails to direct modelers into where and what data can be used allowing for variations in sourcing that then requires separate validation processes.
And once in production, source data is often maintained in multiple data store — each with separate rules, refresh routines, and validation processes. Even banks that are rolling out an enterprise data lake as a single source of truth are struggling to operationalize their data assets and build scalable processes. Seldom is there complete registry and lineage to sources making validation a cumbersome process. Further, the data used to build models are often not the same data used in production.
To solve this conundrum, advanced banks are investing in high-performance models that position the data to be centrally managed and curated and usable for modeling and production with automated checks and balances in place (see Figure 2). All data is cleaned, mapped and harmonized on first time ingestion. Subsequent refreshes can then be fully automated with rules in place for verifying sourcing and validating each element. As a result, the dream of validating a model without the lengthy and low-value debate on data viability is quickly becoming a reality.