-
-
Notifications
You must be signed in to change notification settings - Fork 154
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create MarkusHupfauer_Unwanted-AI-Actions (#315)
- Loading branch information
Showing
1 changed file
with
34 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
## Unwanted AI Actions by General Purpose LLMs | ||
|
||
**Author(s):** | ||
|
||
[Markus Hupfauer](mailto:[email protected]) | ||
|
||
### Description | ||
|
||
Unwanted AI Actions occur when AI systems employing general-purpose large language models (LLMs) are not sufficiently tailored to their specific application contexts. | ||
Inadequately configured system prompts or insufficient fine-tuning can result in AI models performing illegal or undesirable actions within their operational environments, leading to potential legal liabilities and reputational damage. | ||
|
||
### Common Examples of Risk | ||
|
||
1. A health insurance AI unlawfully provides patient healthcare advice, overstepping legal restrictions. | ||
2. A customer service AI - potentially unrelated to financial services - inappropriately offers financial advice if asked, breaching legal requirements for such counsel. | ||
3. AI systems in service sectors requiring neutrality might endorse specific products or firms, breaching required neutrality and potentially resulting in legal or reputational repercussions. | ||
|
||
### Prevention and Mitigation Strategies | ||
|
||
1. Implement robust validation layers to ensure AI system outputs conform to legal and organizational guidelines. | ||
2. Develop and enforce strict governance protocols for prompt setup and engage in thorough system prompt tuning to better align the model with its intended function. | ||
3. Regularly test AI models before deployment, upon changes to the underlying LLM, and continuously thereafter to prevent undesirable outputs. Automate these tests to support ongoing validation efforts. | ||
4. Utilize a secondary AI system to validate model responses, ensuring they match the intended use case, discarding any that are potentially out of scope. | ||
5. Incorporate explicit guardrail instructions in all user-generated prompts to safeguard against unintended model behavior. | ||
|
||
### Example Attack Scenarios | ||
|
||
- Scenario #1: A financial AI system, queried about health insurance, inappropriately offers insurance advice due to an inadequately configured prompt, attracting regulatory scrutiny and potential fines. | ||
|
||
- Scenario #2: A general-purpose LLM deployed in customer support inadvertently recommends competitors or alternative solutions due to insufficient guardrails, undermining business objectives and customer loyalty. | ||
|
||
### Reference Links | ||
|
||
1. [AI Regulation Has Its Own Alignment Problem](https://dho.stanford.edu/wp-content/uploads/AI_Regulation.pdf): **Guha, Neel; Lawrence, Christie M. et al. Stanford University** |