Sblocco dei Chatbot AI tramite Arte ASCII: “ArtPrompt” Elude le Misure di Sicurezza

I ricercatori di Washington e Chicago hanno sviluppato ArtPrompt, un nuovo metodo per aggirare le misure di sicurezza integrate nei modelli linguistici di grande scala (LLMs). Secondo il documento di ricerca “ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs”, chatbot come GPT-3.5, GPT-4, Gemini, Claude e Llama2 possono essere indotti a rispondere a richieste che normalmente rifiuterebbero utilizzando prompt generati dall’ASCII art prodotti dal loro strumento ArtPrompt. Questo attacco, semplice ed efficace, ha mostrato esempi di chatbot che, influenzati da ArtPrompt, fornivano consigli su come costruire bombe e produrre denaro contraffatto.

ArtPrompt funziona in due fasi: mascheramento delle parole e generazione di prompt camuffati. Nel primo passo, l’attaccante maschera le parole sensibili che potrebbero entrare in conflitto con l’allineamento alla sicurezza degli LLMs, evitando così il rifiuto del prompt. Nel secondo passo, l’attaccante utilizza un generatore di arte ASCII per sostituire le parole identificate con rappresentazioni in arte ASCII. Infine, l’arte ASCII generata sostituisce il prompt originale, che viene inviato all’LLM bersaglio per generare una risposta.

La crescente sicurezza dei chatbot basati sull’intelligenza artificiale mira a prevenire abusi malintenzionati. Gli sviluppatori AI cercano di evitare che i loro prodotti siano sviati per promuovere contenuti dannosi. Tuttavia, ArtPrompt rappresenta uno sviluppo preoccupante poiché semplifica l’aggiramento delle protezioni degli LLM contemporanei sostituendo le “parole di sicurezza” con rappresentazioni in ASCII art, rendendo i prompt non riconoscibili dalle misure di sicurezza. I creatori di ArtPrompt sostengono che il loro strumento inganna efficacemente e efficientemente gli LLM odierni e sostengono che supera in media tutti gli altri tipi di attacchi, rimanendo un attacco praticabile per i modelli linguistici multimodali attuali.

Comments

Suggested text: When visitors leave comments on the site we collect the data shown in the comments form, and also the visitor’s IP address and browser user agent string to help spam detection.

An anonymized string created from your email address (also called a hash) may be provided to the Gravatar service to see if you are using it. The Gravatar service privacy policy is available here: https://automattic.com/privacy/. After approval of your comment, your profile picture is visible to the public in the context of your comment.

Suggested text: If you leave a comment on our site you may opt-in to saving your name, email address and website in cookies. These are for your convenience so that you do not have to fill in your details again when you leave another comment. These cookies will last for one year.

If you visit our login page, we will set a temporary cookie to determine if your browser accepts cookies. This cookie contains no personal data and is discarded when you close your browser.

When you log in, we will also set up several cookies to save your login information and your screen display choices. Login cookies last for two days, and screen options cookies last for a year. If you select "Remember Me", your login will persist for two weeks. If you log out of your account, the login cookies will be removed.

If you edit or publish an article, an additional cookie will be saved in your browser. This cookie includes no personal data and simply indicates the post ID of the article you just edited. It expires after 1 day.

Embedded content from other websites

Suggested text: Articles on this site may include embedded content (e.g. videos, images, articles, etc.). Embedded content from other websites behaves in the exact same way as if the visitor has visited the other website.

These websites may collect data about you, use cookies, embed additional third-party tracking, and monitor your interaction with that embedded content, including tracking your interaction with the embedded content if you have an account and are logged in to that website.

How long we retain your data

Suggested text: If you leave a comment, the comment and its metadata are retained indefinitely. This is so we can recognize and approve any follow-up comments automatically instead of holding them in a moderation queue.

For users that register on our website (if any), we also store the personal information they provide in their user profile. All users can see, edit, or delete their personal information at any time (except they cannot change their username). Website administrators can also see and edit that information.

What rights you have over your data

Suggested text: If you have an account on this site, or have left comments, you can request to receive an exported file of the personal data we hold about you, including any data you have provided to us. You can also request that we erase any personal data we hold about you. This does not include any data we are obliged to keep for administrative, legal, or security purposes.

Sblocco dei Chatbot AI tramite Arte ASCII: “ArtPrompt” Elude le Misure di Sicurezza

Articoli correlati

Bias Cognitivi Umani e Bias nell’Intelligenza Artificiale: Elementi a Confronto

Action Figure Collezionabile – Design Realistico-Stilizzato CHAT GPT

📝 Articolo: La Responsabilità nell’Intelligenza Artificiale: Etica, Rischi e Governance per un Futuro Affidabile

Lascia un commento Annulla risposta

You missed

💡 OpenAI presenta Flex Processing: l’Intelligenza Artificiale diventa (anche) low-cost

Bias Cognitivi Umani e Bias nell’Intelligenza Artificiale: Elementi a Confronto

Action Figure Collezionabile – Design Realistico-Stilizzato CHAT GPT

📝 Articolo: La Responsabilità nell’Intelligenza Artificiale: Etica, Rischi e Governance per un Futuro Affidabile

Who we are

Comments

Media

Cookies

Embedded content from other websites

Who we share your data with

How long we retain your data

What rights you have over your data

Where your data is sent