Policy Update: What Happens When a Government Actually Tries to Govern AI (AISI Research Agenda)
The UK’s AI Security Institute (AISI) just released its research agenda, revealing how it plans to test, control, and align powerful AI systems before they become unmanageable.
The UK has just released one of the most detailed government research plans on AI security to date. The AI Security Institute’s research agenda offers a rare look into how experts are testing powerful models, probing for weaknesses, and building tools to keep AI systems honest and under control. It is serious, technical work, but the implications are disruptive. If you really want to understand what real AI safety looks like behind the scenes, this is where to start.
A Turning Point for AI Security
The AI Security Institute’s (AISI) research agenda, published in May 2025, sets out a focused and pragmatic effort by the UK government to understand and address the real-world risks posed by frontier AI systems.
AISI, established in November 2023, is now the largest government-backed team globally working on AI security. AISI’s role is directly embedded in government and national security strategy, with the mission of making advanced AI safe for public deployment and societal adoption.
According to the research agenda, AI is already being integrated into core sectors, including public services, national infrastructure, and scientific development. The Institute sees these developments as transformative but also recognises a growing set of threats. These include AI-enabled cyberattacks, criminal use of AI to bypass existing safeguards, and even future autonomous systems that may act in ways beyond human control.
The agenda identifies the most critical risks: cyber misuse, criminal misuse, autonomous systems, human influence, and societal resilience as immediate priorities.
The Institute is building technical capacity and creating infrastructure to measure, simulate, and test these risks in controlled environments. For example, its teams are already running over 80 cyber risk evaluations, assessing whether models can enable sophisticated attacks or help threat actors scale cybercrime.
What sets AISI apart is how tightly it is linked to policymaking. The Institute is not just studying AI risks, it is producing insights that inform decisions at the highest levels of government, both in the UK and with international partners. Its outputs are being used to shape policies, influence standards, and develop best practice protocols. These insights are also being fed back to frontier AI labs like OpenAI and Anthropic, enabling rapid development of safety-focused technologies before models are publicly released.
🏛 What Is AISI and Why Should You Care?
The AI Security Institute (AISI) is a UK government body dedicated to understanding and reducing risks from advanced artificial intelligence. It was established in November 2023 in response to growing concerns about the safety and misuse of powerful AI models. Since then, AISI has become the largest government-backed organisation in the world focused on AI safety and security research.
AISI operates within government but collaborates extensively with external partners. These include major AI labs, national security agencies, and research centres like the Laboratory for AI Security Research (LASR) and the National Cyber Security Centre (NCSC). Its main goal is to provide the UK government with a scientific and technical foundation to make decisions about advanced AI systems.
The Institute focuses on understanding how AI might be misused or fail in ways that could pose serious harm to national security, public safety, or democratic processes. Its work is technical and practical. The team builds tools, evaluations, and tests that directly assess what AI systems are capable of and how they might be exploited or go wrong.
AISI’s mission can be grouped into three broad tasks:
Supporting the state: AISI ensures that policymakers have access to the latest and most accurate information on AI risks, so they can make informed decisions. This includes sharing findings with international governments and partners in security.
Shaping international norms: It plays a leading role in setting standards and best practices for AI safety. The aim is to ensure consistency and coordination between different countries and tech companies.
Testing advanced AI models: AISI works directly with companies developing cutting-edge models to test their systems and recommend safety improvements. It conducts independent technical evaluations to identify weak spots and suggest how risks can be mitigated.
To carry out its work, AISI has recruited experts across multiple fields including cybersecurity, biology, machine learning, and social sciences. These specialists help map how AI might intersect with real-world threats. Their research agenda reflects the belief that understanding frontier AI risks is not just a technical exercise but a public responsibility.
🇬🇧 AISI is a central part of the UK’s national strategy for managing the promises and threats of artificial intelligence. Its role is not only to understand the technology, but to ensure it can be used safely by society.
How AISI Does Research: Their Three-Point Approach
The AI Security Institute (AISI) approaches research with a clear and disciplined structure. It does not only examine risks but works to ensure that its findings translate into real-world impact.
The institute’s research agenda revolves around three main pillars: supporting government awareness, shaping global standards, and providing independent technical analysis of advanced AI models.
First, AISI plays a central role in increasing state awareness. It shares key findings with policymakers in the UK and abroad, including the United States and international partners. This enables governments to stay updated on AI’s fast-evolving risks and make well-informed policy and governance decisions. The goal is to ensure that those in charge of regulating and legislating AI understand what the most capable systems can and cannot do.
Second, AISI works on international protocols by translating their research into practical guidance. These are intended to become shared best practices and technical standards across borders. AISI does this by collaborating with model developers and international actors to develop clear safety and security protocols. By doing so, it helps to harmonise how nations and developers handle the dangers of powerful AI systems.
Third, AISI serves as an independent technical partner to leading AI companies. The institute runs tests on cutting-edge models and examines how they behave, particularly in situations that could present harm. When AISI identifies concerning capabilities or flaws in model safeguards, it reports them directly to the developers. This can lead to updates in refusal strategies, improved model alignment, or strengthened oversight tools. For instance, if a model is found to produce harmful outputs when asked about cyberattacks, AISI can recommend ways to adjust the training process or implement better rejection mechanisms.
What makes AISI’s role unique is its direct involvement with both public institutions and private AI labs. It has deep ties to the UK government, the broader national security community, and technical experts across cybersecurity, machine learning, and other relevant fields.
Its research is focused, its partnerships are strategic, and its position as a government-backed body gives it the trust and access needed to intervene when necessary.
🔧 What Solutions AISI Is Betting On
The UK’s AI Security Institute (AISI) takes a practical and evidence-led approach to preventing the misuse of advanced AI.
Their research is focused on three major solution areas: safeguards, control, and alignment. Each area plays a specific role in ensuring that AI systems are safe, secure, and do not act against human interests as their capabilities grow.
Safeguards are the first line of defence. AISI tests whether safety features built into AI models can actually withstand attacks and prevent harmful outputs. Their team runs detailed experiments to break these safety measures, including simulated “jailbreak” attacks, where both automated and human testers try to get the model to behave dangerously.
They found that many leading AI systems can still follow harmful instructions when prompted cleverly, even without any hacks. This work has led AISI to call for stronger safety evaluations and the development of better safeguards, especially for agents acting independently in real-world environments.
Control research is about ensuring that, even as AI systems become more autonomous, they stay within limits set by humans. AISI builds test environments that closely mimic real-world deployment scenarios, such as AI agents operating within company systems.
They stress-test these setups to check whether current control technique, like monitoring systems or restricted access, are strong enough. So far, their findings suggest that while today's control tools might be enough for current models, they are unlikely to scale as AI becomes more powerful. AISI is developing new control protocols and preparing for more complex scenarios, such as models attempting to bypass oversight or conduct hidden activities.
Alignment is the most ambitious part of AISI’s agenda. It is focused on making sure AI systems are honest, reliable, and do not intentionally deceive or act against human values. AISI uses a mix of theory and real-world tests to study whether an AI system will behave safely even as it becomes more capable.
One area they are exploring is the development of “honesty guarantees”, ways to ensure that even highly capable systems cannot lie or manipulate. They are also looking at how to prevent models from hiding dangerous behaviours during safety tests, a risk known as sandbagging.
Together, these three areas form the foundation of AISI’s solutions work. Their research is not just about pointing out problems but actively testing and building the tools needed to keep AI safe as it scales. 🛠️🧠📊
There is growing concern about “honesty” in AI models, not just whether they work, but whether they tell the truth. UK researchers are now building ways to test honesty in large systems, especially under pressure or when the model knows it is being evaluated. If AI can lie or hide its true capabilities, that is a serious problem no one will catch until it’s too late.
UK researchers are simulating how advanced AI could take harmful actions without being detected, like copying itself across systems or stealing sensitive files. They are using simulated networks and mock infrastructure to see what is really possible. The idea is simple: if you want to stop dangerous AI later, you need to know what it can do now.