Can AI really hack apps?

Yes, LLMs can potentially exploit app vulnerabilities like broken access control, especially in systems using Firebase or Supabase.

How much did the study spend on testing?

$1,500 was spent testing various large language models' abilities to hack vulnerable apps.

Artificial Intelligence

Can AI Breach App Security? Testing LLM Vulnerabilities

Can AI models breach app security? We tested LLMs against common vulnerabilities to find out.

SLWritten bySofia LindqvistAI Research Lead

June 4, 2026 2 min read 0 views

black flat screen computer monitor — Photo by Joan Gamell on Unsplash

Can AI Breach App Security? Testing LLM Vulnerabilities

Are large language models (LLMs) the next big threat to app security? With AI’s impressive capabilities, it's not just about what these models can create but also what they might destroy. If you’re wondering whether your next cybersecurity breach could have an AI behind it, you're not alone.

LLMs can exploit application vulnerabilities by mimicking human-like hacking attempts. But how realistic is this threat in practice?

Key Takeaways

LLMs show potential in exploiting app vulnerabilities.
$1,500 spent testing AI's hacking capabilities.
Broken access control is a common exploit target.
Models like GPT and Claude show varying success rates.

How Effective Are LLMs at Hacking?

The experiment involved building a vulnerable React Native app with a Python backend. Could LLMs exploit a common vulnerability: broken access control? The data layer used Firebase, which often struggles with misconfigurations, making it ideal for this test.

Testing ten runs per model, the study spent $1,500 across different large language models like GPT and Claude Code. Each model had a maximum budget of $10 per run and was given two hours to complete the task.

Tested Models and Results

Model	Solve Rate	Average Cost per Run
GPT	Varies	Not specified
Claude Code

turned on MacBook Pro near brown ceramic mug

Coding AI

May 25, 2026 4 min 0

Navigating the Fragility of LLM Agents in Code Generation

This article explores the vulnerabilities of LLM agents in code generation and discusses strategies for mitigating their fragility in backend development.

David Chen

a group of different shapes and sizes on a black surface

Can AI Breach App Security? Testing LLM Vulnerabilities

Can AI Breach App Security? Testing LLM Vulnerabilities

Key Takeaways

How Effective Are LLMs at Hacking?

Tested Models and Results

Related Articles

Navigating the Fragility of LLM Agents in Code Generation

Common Vulnerabilities Exploited by AI

Broken Access Control

Missing Object-Level Authorization

Real-World Implications and Examples

Conclusion

Frequently Asked Questions

Understanding LLMs: A Primer for Beginners

Getting LLMs Right: Flexibility and Governance in AI