Regulation 2.0: the AI tech revolution is here and now

Written by Barry West, head of emerging tech, Abu Dhabi Global Markets, & Vladimir Ershov, data science and ML lead, ClauseMatch
9th December 2020

In an increasingly interconnected world financial services are rapidly blurring boundaries. They need to constantly keep track of changing regulatory obligations in different jurisdictions, fragmented and differing rules written in different languages and using different taxonomy. One specific area of technology that is capable of making a huge impact on regulatory compliance now and in the future is artificial intelligence (AI).

Knowledge graphs are clearly the future of regulation

Both regulators and financial institutions realise that the compliance function can be and should be re-conceptualised, offering tremendous cost savings for financial institutions while providing regulators an opportunity for real-time enhanced oversight. There is a huge amount of regulation globally. And now with COVID-19 and remote work, regulators start to understand that moving in the cloud is basically essential.

2020 has shaped up to be a big year for Data Science and supervisory technology. Regulators globally are seriously considering speeding up the initiatives of digitising regulations. According to the survey carried out recently by the World Bank and the Cambridge Centre For Alternative Finance (CCAF), 72% of regulators said that they have either accelerated or introduced initiatives on digital infrastructure in 2020, 58%have either accelerated or introduced initiatives regarding regtech or supervisory tech, and 56% did so in regard to innovation offices. And in November, the Global Financial Innovation Network (GFIN) also announced the global sandbox initiatives involving 23 regulators.

For firms that have multiple regulators that are supervising them, it’s very hard to keep track of the fast-changing rules because they are published in different formats, different taxonomy. And that means that implementation of changes pushed out by different regulator stakes much longer. That’s why the most forward-thinking regulators are making serious moves towards digital regulation, making it machine-readable.

We often hear about digital machine-readable regulation. What does it really mean in 2020 and what are the latest developments in this space?

The project

At the end of 2019, ClauseMatch was tasked by the Financial Services Regulatory Authority (FSRA) of Abu Dhabi Global Market (ADGM) to fully digitise the ADGM rulebooks and express them as a set of application programme interfaces (APIs) publishing them with innovative tools for firms to interact with dynamically. With the main idea to help the regulated financial services firms achieve better compliance and risk management outcomes, while reducing regulatory costs and burden.

Following the initial stages of collaboration, in April 2020, FSRA (ADGM) launched three proofs-of-concept including knowledge graphs and API-enabled rulebooks.

Here are the main goals that the joint team set and achieved in several phases:

Taxonomise all the content
Develop AI/ machine learning to enable auto-tagging
Create visual dynamic knowledge graphs

In essence, the team was expected to completely reimagine the regulatory framework. This was achieved by taking the content of regulatory requirements, automatically categorising it using advanced AI models, creating tags and interlinking it on a granular paragraph-level based on the most common themes.

For most of the AI models, the accuracy score is higher than 90%. Trained AI models learned to understand financial concepts and then when applied to the whole ADGM corpus detected hundreds of thousands of occurrences for thousands of various entities from various concepts, many of which were not even presented at the training stage. Yet, the models were able to detect these unseen cases successfully.

This is the first step before creating dynamic interconnected knowledge graphs. Enabled by artificial intelligence and advanced natural language processing (NLP) algorithms, the knowledge graphs are designed to represent regulatory data in a structured and visual format.

During the graph-building phase, we have aimed to improve and advance the graph’s functionality linking it with the internal documentation not only for analytical purposes but also so that it could solve automatable and repetitive tasks for financial institutions. Creating AI-based interconnected tags and exposing the rules and regulations as a knowledge graph has helped to connect all the context in and around words in the regulations and make every word function as a data point.

Clicking on the bubbles in a graph brings you exactly to the places where a certain topic in regulation is covered. Then, the visual dynamic knowledge graphs were created for internal documentation such as policies, procedures, controls to map them with the requirements in a visualised dynamic form.

We’ve defined the concept, defined the obligation template. We’ve tagged over a thousand pages to create a training dataset. Then, we trained tagging models, evaluated the results with experts, extracted relations, constructing a knowledge graph over the regulations documents, do live demo within real-time document obligation comparison.

The train and inference process support parallelism, so you either pay $500 per hour for two hours or $2 per hour for 20 days.

The mechanism can be replicated to any regulator, and this is what has to be done inevitably, to allow automated consistent merge for regulations from various regulators in different jurisdictions.

Knowledge graphs: where are the roots coming from?

The idea of knowledge graphs has been in the air for almost 30-40 years. The general idea of putting information into a kind of knowledge graph to then be able to make operations around it has been in the air for quite a long time – since the late 1980s. Serious developments in this area started at the beginning of the 2000s and the first notable technology adoption was completed successfully by Google in 2012.

Since then knowledge graphs have clearly become a trend as more companies like Airbnb, Uber, Facebook, and Amazon reported making variations of those graphs as part of their system. Although task to reason over knowledge graphs remains to be a challenge for the machines, the situation is changing rapidly thanks to the recent advancements in Natural Language Processing (NLP) and language modelling.

In 2017 the mechanism of a “new building block” – transformer – was developed which allowed a model to work with much more sophisticated concepts because of the use of the attention mechanism. The turning point for the advances in NLP was in 2018 with the introduction of the bidirectional encoder representations from transformers (BERT) linguistic model that brought all the ML models to the new quality level. The transfer learning concept allowed effective adoption of the models from various other sectors to compliance. Various benchmarks proved the capability for the new models to handle necessary association inference on or above the human level. Unfortunately, that wasn’t the same for the causality inference.

We created semantic triples that can be turned into code

The race stacking transformers continues up to the present day, providing us with new state of the art (SOTA) model results for higher costs. Models like GPT-3 worth $4.6 million for computational power were trained, revealing its questionable cost-effectiveness and lack of explainable results. Deep learning researchers community reports clearly that the next advancements should be made not with the billions sunk within the language modelling stage, but with the combination of transformer-based models power with first-order logic over knowledge graphs and enriching machines with the human-like ability for fact reasoning with techniques like reinforcement learning.

Tagging and relation extraction stages for the knowledge graph construction project were often the most challenging in terms of the cost of manual work but with those new models comes the possibility to automate the whole process. The causal reasoning over graphs being now in the research edge, such as the recent DeepMind release from 23 October 2020.

All that progress already allows us to put the regulation into code and even to automate judgment on top of it. Digitalisation and the possibility to transform all the regulation into code mean that we now can write a certain request or in other words, a certain code, input all the regulation and get a knowledge graph that would actually be possible to view as a code.

Future vision: Regulation-as-a-Service

It’s clear that the digital transformation agenda has steadily become one of the most important focuses in all financial institution boardrooms, and we are seeing the impact of that on all aspects of our lives. Newer frameworks such as the Second Payments Directive (PSD2) and digital banking, as well as guidance on topics such as “ethics in AI” and “encryption and storage of virtual assets” are a good indication of how regulation is reflecting these changes. Though the evolution in finance is fast making the current analogue regulatory system and frameworks obsolete.

Knowledge graphs are clearly the future of regulation. We are now existing in a much more digital paradigm and moving towards digital regulation. In fact, we’re witnessing the beginning of regulation 2.0. After having Software-as-a-Service, Cloud-as-a-Service, Banking-as-a-Service, we’re now moving into Regulation-as-a-Service, which will truly usher in the new developments.

We have been using NLP, semantics and machine learning to not only identify relevant subjects, objects and concepts but also the relationships between them. The “who” and the “what” and, more importantly, the “why”. This same process of converting words and sentences into “data points” also means we can see linked and associated words and concepts i.e. we can start to see context.

The regulatory corpus used to be published in analogue format, complicated to process and hard to navigate. Graph representations enable the regulator to infer new relationships, gain a deeper understanding and realise patterns within the regulation that would not have been spotted otherwise. During our joint project all of the content of regulation was expressed as knowledge graphs and APIs.

Essentially, we created semantic triples that can be turned into code. So, what we can start doing is expressing certain specific parts of the rules and regulations as code. It is exciting as this milestone marks the beginning of a very interesting journey.