Reviewing progress towards IT and operational resilience implementation

Benjamin Brundell, Head of Operational and Technology Risk, Lloyds Banking Group

Below is an insight into what can be expected from Benjamin’s session at Risk EMEA 2023

The views and opinions expressed in this article are those of the thought leader as an individual, and are not attributed to CeFPro or any particular organization.

In what ways have hybrid working schemes effected resilience plans? 

The almost-immediate shift to hybrid working – almost three years ago – was the most significant way that we tested out our ability to work away from the office.  That itself wasn’t all smooth: from a colleague point of view many were ill-equipped to do their best work at home and we had to look again at our risk tolerances in terms of data and surveillance.

By and large the move to hybrid working didn’t impact our businesses dramatically and was a success story of the Covid response.  We might yet face longer-term effects of this shift, for example, how do we recruit and retain colleagues that are critical to supporting older systems without face-to-face contact time?

I think we can all remember examples earlier in our careers of learning by observing and being part of a wider response that ran from an office where we all sat together.  What happens when we inhibit those experiences?

We’ve added a new and proven tool in terms of our response to resilience challenges, but we need to be conscious of the longer-term, principally human, consequences that might yet impact our overall resilience.

Why should technology recovery programs be implemented into organizational frameworks? 

The remediation of long-standing technology resilience issues is often seen as a necessary evil.  Investment in older technology, even if just to remediate, requires the use of scarce subject matter expertise, exposes us to change risk and on the benefits, allows us to have confidence that the service we expected five years ago will be delivered today with the same predictability.  Often there is no shift in the functionality or agility that comes from these in-situ uplifts.

To counter this, we need to take a different view, and flex some of the organizational frameworks that usually govern investment management and prioritization.  With simple good risk practice, we need to quantify the risk we are willing to tolerate and draw a line at what’s acceptable and what’s not.  This needs to be data driven and encompass the real-world experience that goes with un-remediated technology: time to recover, supplier commitment, ability to upskill SMEs and so on.  This pure risk-based discussion can help in environments where the focus is on cost-control rather than growth and allows the firm to prioritize and remediate within an already-understood framework.

A growth-based view would look at the ‘friction’ that legacy technology causes to transformational change.  To what degree do we unlock a quicker, safer, and more predictable delivery if we move to a modernized infrastructure?  What additional capabilities can we open up if we host these workloads on cloud platforms and take advantage of native services that bring us better insight through data?  Quantifying this convincingly is difficult but can enable the discussion on collective priorities and trade-offs.

Finally, we need to make more effort to ensure remediations are aligned with the strategic demands in terms of resilience; but also, that they don’t inherit the same weaknesses as the platforms they replace – can we be sure that our new deployments will always be up to date, always secure and always available?  If we can’t answer those questions positively, we’re simply storing up a future challenge.

What impact has DORA had within the third-party risk sector? 

I think we’re seeing big vendors view both the EU’s DORA and the UK’s emerging Critical Third Party expectations as two closely linked regulatory objectives.  First, they set a benchmark that the treatment of critical third parties is a significant and cross-jurisdiction priority for supervision.  Even in relatively recent regulatory terms, the outsources I worked on five years ago didn’t have the benefit of some very clear and specific expectations for what both the customer and supplier needs to do; and these types of regulations help establish a benchmark for how we view Operational Resilience and third parties.

Secondly, I think there will be an emerging issue on both coverage and level of assurance – where regulatory objectives can be met by focusing on the subset of critical suppliers whose failure would have a systemic impact; but where financial institutions are hoping or expecting that direct regulation will provide them with additional confidence on their own supplier agreements.  Looking ahead, I think it’s prudent to almost discount the comfort we could take from Critical Third Party supervision until we fully understand how HM Treasury might define the list of CTPs based on their role in systemic stability; the services they provide; and their level of use by firms.

Finally, we need to keep in mind that regardless of the level of confidence we place in suppliers – as mandated through DORA or the UK’s CTP work – regulators still expect senior individuals to ask the right questions and get the right level of assurance in the context of the services they provide.  This month’s enforcement action against the former CIO of TSB is a good illustration that we can’t just assume that a compliant service will be resilient without doing our own risk-based analysis and review.

How can financial institutions effectively coordinate across all departments to ensure a holistic view of resilience? 

This is really tough, especially in the larger banks that have grown through acquisition; or have been cost-led and therefore slowed investment; or have had bigger priorities to focus on.  It’s not unheard of to have a web of technologies on multiple platforms knitted together to form end-to-end systems, used by different parts of the business each with their own priorities, backed by fragile human expertise and a collection of third parties.

The right answer would be to start with a better organization design that views end-to-end services as the building block of the organization with alignment between services, customer base, the change ‘engine room’, supplier management and then strategic planning and prioritization.  In this environment senior leaders can join their own teams together to form a view of today’s position and the steps needed to achieve the outcomes their business needs.  Naturally this consistency of purpose would also show itself in alignment of policies, standards and things like the view of risks.

Unfortunately, very few of us have the luxury of a greenfield site that looks like that, so the answer is more around using the standard resilience lifecycle to identify what’s important and then join those together with a thread running between each ingredient – overlaid with specific responsibilities for resilience.  This has a lot more friction and feels like the type of approach that drives even more complexity and bureaucracy.

With something so important and non-negotiable as Operational Resilience, I don’t see an option to avoid that extra work, but we can be really smart about what we aim for and what we ask for: forensic prioritization, collapsing side-of-desk roles in to smaller number of more professional full-time jobs, great support from the central functions, and relying where we can on the organizational models that are already in place.

The final overlay here is data – once we understand who’s supposed to be doing what and where we can inject some measurement, it’s possible to drive insight at an end-to-end level and describe the full picture of resilience surrounding individual services – without needing to negotiate around coverage, data sources, or what’s important – because we’ve already done that work in preparation.