Large language models: risk mitigation best practices
Over the past year, AI has captured global attention.
The significant increase in the value of AI technology companies along with that of AI chipmakers in recent weeks attests to the global community’s shared expectation for expansive AI utilisation in the future.
Specifically, it’s the Large Language Models (LLMs) within the broader AI landscape that are drawing attention. LLMs are machine learning models designed to understand, generate, and improve human language. They are trained on vast amounts of text data, allowing them to predict or generate human-like text based on the inputs they receive.
What distinguishes these new types of AI models from conventional ones is the scale at which LLMs operate and their extensive use of unstructured data. This leaves LLMs exposed to data and privacy risks of significantly higher magnitude than their conventional AI counterparts.
LLMs are globally used in various solutions, with ChatGPT being the most dominant tool. Some organisations enable their employees to use ChatGPT to significantly boost productivity.
Besides ChatGPT, LLMs are also commonly utilised in several other ways, such as embedded solutions in the form of chatbots used in customer service platforms, transcription and summarisation tools, or document and code writing assistants. Chatbots can serve a variety of purposes within organisations, ranging from providing internal IT support to facilitating interactions with external customers.
To fully understand the risk landscape of any LLM solution and subsequently mitigate these risks, we should focus on how they are implemented and deployed within an organisation.
Generally, these solutions can be categorised into two types: open-source and vendor-provided.
Open source solutions are based on LLMs whose code, and sometimes even the trained model, are made public, allowing researchers and developers to use and modify them. In contrast, vendor-provided solutions are those that a specific vendor develops, hosts, and maintains, making them readily accessible to users, typically through an API.
Let’s explore three common risk mitigation strategies that organisations can employ to protect themselves against potential risks.
Upgrade to the enterprise edition
In the case of vendor solutions, some options, such as ChatGPT, are freely available on the web. ChatGPT incorporates several built-in controls, including encryption, network security, and data masking. However, these measures may not meet the privacy and security standards typically required by large organisations.
Due to this, an enterprise version of ChatGPT can be used. OpenAI offers an enterprise variant of ChatGPT, sometimes referred to as ChatGPT Plus, which is designed to cater to the scalability and security needs of businesses and organisations.
It fundamentally provides robust security features essential for safeguarding sensitive and proprietary information. Therefore, organisations should consider adopting the enterprise version of ChatGPT because it aligns more closely with the high-level security and privacy controls they require.
Deploy a secure gateway
Similarly, when it comes to vendor solutions, organisations can implement and deploy a secure gateway. This gateway scrupulously examines information fed into the solution and potentially exiting the organisation.
The use of a secure gateway can significantly enhance security provisions, an important aspect when dealing with enterprise-level use. Serving as a checkpoint, the secure gateway controls and inspects data as it moves between disparate networks, such as a private office network and a vendor’s cloud-based services.
This management of traffic can prevent unauthorised access and leakage of sensitive data while also ensuring data integrity. Furthermore, it guarantees all data sent to and retrieved from the LLM solution is encrypted, rendering the data unreadable without the encryption key in interception scenarios.
A secure gateway provides numerous benefits in terms of access control and monitoring. Organisations can implement stringent access control measures to determine which team members have access to the LLM solution, thereby reinforcing internal security.
Additionally, a secure gateway maintains logs of all data exchanges between the internal network and the LLM solution, assisting in audits and post-event analyses. Thus, the implementation of a secure gateway could dramatically enhance data security.
Invest in a private cloud
Another risk mitigation strategy involves deploying an LLM solution on a private cloud. Typically, publicly accessible LLM solutions are hosted on a public cloud. For instance, in the case of ChatGPT, interactions transpire via OpenAI’s API, transmitting all input data over the public internet.
When an organisation hosts an instance of ChatGPT on their private cloud, the data, even though it transfers beyond the organisation’s immediate network, remains fully isolated from external access.
However, a question arises: if the LLM solution is provided by a vendor and the underlying model isn’t necessarily open source, how can one deploy these solutions on a private cloud?
The solution lies in using an open source version of these LLMs if available. An open source version allows users to deploy and operate LLMs privately, offering an experience similar to the standard LLM, but without exposing company data in its use.
The inherent guarantee of user data privacy offered by open source solutions ensures that the enterprise data entered into the system remains entirely separate from vendor-managed networks. Moreover, all network traffic can be confined solely within the organisation’s network, thereby granting administrative power to the organisation.
The open source aspect unlocks the opportunity for more advanced techniques, such as Retrieval Augmented Generation (RAG) and LLM fine-tuning, to allow better control over outcomes and more adaptability for specific business processes.
Additional considerations
In addition to the risk mitigation measures discussed above, it’s important to assess and effectively remediate the full spectrum of risks to which LLM solutions are exposed.
For instance, in the case of LLM solutions providing transcription services, one must be mindful of data security, data privacy, and the accuracy and reliability of transcriptions. To mitigate data security risks, data should be encrypted both in transit and at rest to prevent unauthorised access.
Furthermore, by establishing access controls and clearly defining user management roles, only authorised personnel will have access to the transcription data. As for data privacy, employing data anonymisation and masking techniques can be highly effective.
From a governance perspective, it’s essential for organisations to conduct regular security audits. These audits provide an additional layer of oversight and help identify potential security vulnerabilities.
Summary
LLMs have dramatically changed our approach to tasks involving natural language processing, ranging from document summarisation to chatbot deployment. Their ability to understand and generate human-like text brings substantial added value across all industries.
While LLM solutions free us up to engage in more creative tasks, it’s important to remain aware of the risks associated with these tools and strategise plans for effective risk mitigation.
The views expressed here are strictly my own and do not reflect the opinions or beliefs of any organisation I am affiliated with.
About the author
Anil Sood has spent the past 18 years working at the intersection of technology and finance. He currently leads the AI Governance practice at EY Canada. Connect with him on LinkedIn.