Protecting AI from Text-Based Attacks: The Challenge of Malicious Prompt Engineering

Can AI be protected against text-based attacks?

It didn’t take users long to find creative ways to break Bing Chat. The AI-powered chatbot was developed by Microsoft in collaboration with OpenAI. Users were able, with carefully tailored inputs to make it profess their love, threaten harm, defend Holocaust, and create conspiracy theories. Can AI be protected against these malicious prompts in the future?

It’s caused by malicious prompt engineering. This is when an AI like Bing chat, which uses text-based prompts to complete tasks, is tricked with malicious adversarial prompts. It was tricked into performing tasks that were not part of its original objective. Bing Chat was not designed to write neo Nazi propaganda. It’s possible for it to fall into bad patterns because it is trained on a large amount of internet text, some of which may be toxic.

Adam Hyland is a Ph.D. candidate in the Human Centered Design and Engineering Program at the University of Washington. He compared prompt engineering with an escalation of privilige attack. Hackers can access resources (such as memory) that are normally only available to them if an audit has not been performed.


Leave a Reply

Your email address will not be published. Required fields are marked *