Breaking the Jargons #Issue 8

Your Bi-weekly Digest of what's happening in the field of Responsible AI.

Mar 24, 2024

Hi there! I know it’s been a while since a new edition of this newsletter hit your inboxes. The reason? I’d been fully immersed in finishing my book for the last couple of years. Machine Learning for High-Risk Applications was released early last year, and the time spent on it has deeply inspired me to focus more on the responsible use of AI. So, guess what? We’re shifting gears with this newsletter to do precisely that.

In the new version of the newsletter, we’ll dive into the latest research, thought-provoking articles, and the most innovative libraries and code snippets shaping the future of responsible AI. Thanks for sticking around, and if you’re new here, welcome aboard.

🔬Research Rundown

Favourite Research Papers I read last week:

1. The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Researchers have introduced a ‘Mind Wipe’ technique for erasing hazardous knowledge from AI systems, ensuring functionality remains while enhancing safety. Alongside, the Weapons of Mass Destruction Proxy (WMDP) benchmark, with 4,157 questions targeting biosecurity, cybersecurity, and chemical security, has been made public. This initiative comes as government bodies and leading AI labs develop private evaluations for LLMs’ hazardous capabilities often, limited to specific malicious scenarios, thus constraining broader risk mitigation research.

Key highlights include:
🤝 Development of a comprehensive risk assessment framework.
🚫 Selective erasure of harmful AI knowledge.
🛠️ Need for ongoing optimization of the technique.
📜 Alignment with AI safety policies like President Biden’s Executive Order.
📊 Public release of the WMDP Benchmark for hazardous knowledge assessment.

2. From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

Until now, efforts to reduce toxicity in language models have been limited to English, despite the presence of harmful content across all languages. This new study sheds light on this critical gap in AI safety. It expands toxicity mitigation to 9 languages and 5 writing systems, marking a significant first step towards creating safer, more inclusive multilingual AI models.

3. Evaluating Frontier Models for Dangerous Capabilities

Researchers from GoogleDeepMind have developed a new suite of evaluations designed to measure AI capabilities, especially those that could pose risks. With a set of 52 tasks, they tested advanced AI functions in areas like persuasion, deception, cyber capabilities, and self-improvement to create an ‘early warning system’ to monitor and mitigate potential AI risks. The evaluations were piloted on Gemini 1.0 models.

4. Privacy Considerations in Large Language Models: A survey

An in-depth survey on arXiv that delves into privacy considerations in LLMs. This essential read encompasses a wide array of critical topics, including:
• Memorization,
• Privacy attacks against Language models
• Privacy Preserving LLMs
• Strategies for machine unlearning,
• Differential privacy training, and much more.

The authors have generously compiled all referenced papers and code, creating a comprehensive resource for anyone delving into the privacy aspects of LLM technology, available on their GitHub page.

5. Explain to Question not to Justify

This paper explores the current challenges & future paths in the field of Explainable AI(XAI). The authors argue that progress is slowed by “divergent & incompatible goals” and propose splitting XAI into two complementary cultures, which they term BLUE and RED cultures, a reference also to the Red team and Blue team roles defined in IT security systems.

🔵 BLUE XAI: Human-centric, nurturing trust & ethics
🔴 RED XAI: Model-centric, focusing on validation & debugging

The paper points out that we’re missing out on vital chances to make AI safer and more in tune with what people value by not paying enough attention to the RED XAI approach. It encourages those working on AI explanations to consider RED and BLUE approaches to build more robust and trustworthy AI. The paper covers the fallacies behind the XAI crisis and identifies new challenges and research areas that concern RED XAI culture.

Here is what one of the authors Przemyslaw Biecek remarked :

This paper is inspired by Leo Breiman's “Statistical Modeling: The Two Cultures” Just as Breiman argued that statisticians can do a lot of interesting research in predictive modeling, we believe there are a lot of interesting challenges in XAI oriented around debugging models, looking for and patching weaknesses, and red-teaming (not only LLMs).

🔖 Curated Blogs and Articles

Preparing for the EU AI Act: Insights, Impact, and What It Means for You

This blog nicely dissects the EU AI Act, examining the various risk categories it identifies, outlining the comprehensive duties imposed on AI stakeholders, and considering its potential to shape AI policies globally.

Nobody Knows How to Safety-Test AI

As AI chatbots and models rapidly advance, Beth Barnes (previously at OpenAI), and her team at Model Evaluation and Threat Research (METR) are urgently probing the potential dangers, catching the attention of major tech companies and governments. Their work highlights the pressing need to evaluate AI safety before these powerful technologies outpace our understanding of the risks involved.

Bye Bye Bye...: Evolution of repeated token attacks on ChatGPT models

Researchers at Dropbox uncovered a training data extraction vulnerability in OpenAI's chat models, including GPT-3.5 and GPT-4. This security flaw, now addressed by OpenAI, emphasizes the importance of vigilance for AI/ML security practitioners.

Debates on the nature of artificial general intelligence

Melanie Mitchell explores the evolving concept of Artificial General Intelligence (AGI) throughout AI's history and discusses the differing perspectives on intelligence between AI professionals and researchers in biological intelligence.

💻 Code Corner: Featured Repositories & Resources

Awesome Machine Learning Interpretability

The repository presents an extensive and meticulously curated collection of resources for responsible machine learning including community and official guidance, educational materials, technical tools, and frameworks for AI interpretability and security.

💡Tweet Spotlight

Documenting AI training data isn't just good practice, it's the backbone of responsible AI development.

I hope this edition of “Breaking the Jargons” provides you with valuable insights and stimulating reads. Until next time, keep breaking the jargons!

Image Credits: All images featured in this newsletter have been sourced directly from the respective research papers mentioned.

Breaking the Jargons