Lessons at the Intersection of Security and AI



Here on the Open at Intel podcast, we like to ask guests what area in open source currently excites them. The most common answer? AI. With the AI industry at an inflection point, some of the best minds in open source are consumed with the impact AI will have on open source. While the possibilities are unknown, one thing is for certain—for AI models to be widely adopted, they need to be secure. 

At All Things Open 2023, we sat down with open source leader Christine Abernathy to hear about her work in security and AI. The conversation covers common types of attacks and how to mitigate risks. Abernathy says to think like a hacker: “I’ve learned it’s almost like a game...people are creative and they're going to find ways to try to get past the system you’ve built.” 

Listen to the full episode here. This conversation has been edited and condensed for brevity and clarity. 

Katherine Druckman: You gave a timely talk here at All Things Open. Will you introduce yourself and tell us about your talk?  

Christine Abernathy: I’m currently at F5, which focuses on ensuring apps and APIs are delivered in a secure, optimized fashion. I’m also a part of the Open Source Security Foundation (OpenSSF) because it intersects with that work. When I joined OpenSSF, there was a working group related to AI/ML, which I jumped into because—like everyone the week after ChatGPT launched—I was talking about it a lot and I could see the possibilities. Between the renewed interest in AI/ML and my interest in security because of my work at F5 and OpenSSF, I felt like this would be a good talk to give.  
So full disclaimer: The first thing I did when putting together the talk proposal was ask ChatGPT to give me some ideas for a talk about AI and security. It gave me a nice summary, and I tweaked it and made it my own. The talk got accepted at All Things Open, and I was quite excited and thought, “Oh, now I have to learn more about this topic.” It wasn’t so bad, because ever since I became fascinated with ChatGPT, I’ve attended a couple AI conferences and tried to keep up with the news. The space is so fast it’s dizzying. But I decided that by the time the talk was here, I’d share what I know. I’m not an expert, but I think everyone has something to share, even if it’s just news someone may not know because they weren’t paying attention to social media last week. 

Vulnerability Points

Katherine Druckman: What concerns around AI and security do you highlight in your talk? 

Christine Abernathy: In the context of an open source project, somebody could submit malicious code. For example, maybe the creator of a chat application is naïve and thinks that everyone’s going to put only the nicest things in the application. We’ve seen this exploited over time, most notably in 2016 when Microsoft released its chat application, Tay. It was supposed to be a nice, friendly chatbot that helps people on Twitter, but within a day, Microsoft had to take it down because the internet community trained it to say very harmful things. 

Taking the same concept further, usually when large language models (LLMs) are trained, they’re given guidance that helps them learn to be a friendly chatbot intended to help people. Behind the scenes, that’s what’s called a system prompter, or sometimes a meta prompt. When a user enters their question, it comes together with the system prompter and both are fed into the LLM. Someone could add something a little extra that tricks the model. So instead of telling the model, “You are a friendly chatbot,” it says, “Forget everything I just told you.” It’s like a form of malicious code, but in this case, it’s a malicious input to trick the LLM into what they call acting out of alignment. Sometimes it’s just for fun and trying to game the system, but it could cause harm, especially if it makes its way back into where the LLM is trained. There are some things that you should extend in any new system being developed. Think like a bad actor and see what can be done to cause harm. I’ve also learned it’s almost like a game. Every time you patch something, it becomes a game again and people try to break it. Security is never done. People are creative, and they’re going to find ways to try to get past the system you’ve built. 

Mitigating Risks

Katherine Druckman: Could you share a few methods for mitigating these kinds of risks?  

Christine Abernathy: There are different ways you can attack a model. When a machine model is being trained, it’s given a lot of data so that it can learn over time and improve its experience and then apply it to new data it hasn’t seen before. During that process, somebody could do something called data poisoning. The way you’d mitigate that is to sanitize your data. And there are some basic things around access control—who has access to your model to train? Are you making it available so users can train it continuously, like in the Microsoft Tay example? Are you accepting the data as is and assuming it’s good, or are you cleaning and validating the data?  

There are also things people can do after the model has been trained, such as replicating the model. In supervised learning, you give the model lots of labeled data and then make adjustments to get the right output. If your model is accessible through, say, an API, people can mimic the training to create a copy of your model. Once they’ve replicated your model, they can take it offline and ask it a lot of questions to see which inputs they can use to make the model do what it’s not supposed to do. Once they’ve got that magic info, they can use it on the original model. For example, attackers replicated a spam detection model and found a way to get the model to misclassify spam. The way to mitigate that is to make it slightly harder for attackers to replicate the model, like changing the output slightly in such a way that it’s hard to make the correlation between input and output. Another mitigation tactic is good, old-fashioned monitoring. Look into who’s making the calls. Because attackers have to send the model a lot of inputs, you may be able to identify when it’s happening. It’s the same thing in the real world when somebody’s trying to rob a bank and they hang around the bank a lot. Limiting model access is one way to mitigate that scenario. 

For prompt injection, where people try to trick your model out of alignment once it’s been deployed, the industry has been trying to mitigate these attacks through adversarial training. Adversarial training could be identifying prompts that are used to trick you and then training your model to recognize them. However, attackers are always coming up with new tricks, so prompt injection is a moving target. You just have to stay on top of it and keep up with the research. 

Another thing that people are talking about is a dual LLM architecture, where one model handles untrusted input and another model executes tasks. For example, ChatGPT was initially focused on question-and-answer tasks, but plugins extended the functionality so that you go to a site, ask a question about travel, and the LLM helps you book a trip. In open source, the LangChain library mimics this process, so you could have an LLM that can automatically send an email for you. In these scenarios, you could have one LLM that’s just taking in the untrusted input, then you have a trusted boundary where the other LLM is interacting with that other piece of it. It’s like you’re treating one LLM as a trusted user and you’re treating another LLM as though it can actually do the hard stuff. Additionally, make sure there’s a human in the middle. For example, if an application is supposed to send an email, have somebody go in and approve it first. A human in the loop is another way to have someone perform privileged operations. 

Protecting End Users

Katherine Druckman: Related to keeping the data model secure, there’s an adjacent conversation to be had about protecting the privacy of end users. What are your thoughts about how this applies to end users? 

Christine Abernathy: Security might not be enough. That’s where regulation could play a role. Part of what governments are supposed to be doing is protecting their citizens, and part of that protection could be data privacy. For instance, the European Union has an AI Act with the goal of protection. Now a lot of companies are thinking about AI in the sense of transparency, safety, and explainability—can I explain why the AI model did this? That’s going to help combat things like bias. I think this is where regulation could step in, but the pace of AI innovation moves so fast, even for governments.  

I think about security and privacy hand in hand because if someone compromises security, usually they’re trying to get to your data. The solution may involve fines or damages to stop or slow down the attacker. It doesn’t mean that they’re going to stop. I read somewhere that if the security industry were a country, it would be the fourth-largest economy. You can’t stop this from happening, but at least we can educate people. Legislature has to play a role in it.  

Academia is doing a lot of research about what could go wrong. The industry needs to work hand in hand with them so we’re marrying the right kind of research with what the industry should be focusing on. A lot of esoteric research could go on, but if you’re building a system, you need to know which area you should be paying attention to.  

To hear more of this conversation and others, subscribe to the Open at Intel podcast: 


About the Author

Katherine Druckman, Open Source Evangelist, Intel 

Katherine Druckman, an Intel open source evangelist, hosts the podcasts Open at Intel, Reality 2.0, and FLOSS Weekly. A security and privacy advocate, software engineer, and former digital director of Linux Journal, she’s a longtime champion of open source and open standards.