Introducing Mistral Devstral Small LLMs: Developer‑Tuned Models for Code and Enterprise AI

Mistral Devstral Small is a custom, developer-tuned variant of Mistral AI “Small” language model series, built specifically for software development use cases. Developed in collaboration with All Hands AI, Devstral transforms Mistral’s base Small 3.1 model into an “agentic” coding assistant – in other words, an AI that can act as a full software engineering agent rather than just a code autocomplete.

With 24 billion parameters and an extended 128,000-token context window, Devstral packs high performance into a relatively lightweight package.

This “small” size (by modern LLM standards) means Devstral can run on accessible hardware – even a single NVIDIA RTX 4090 GPU or a 32GB RAM MacBook – bringing low latency and local deployment capability to developers.

Crucially, Mistral has open-sourced Devstral Small (versions 1.0 and the upgraded 1.1) under the permissive Apache 2.0 license, allowing anyone to use and fine-tune it without restrictions. This makes Devstral a lightweight enterprise LLM solution: powerful enough for serious coding tasks, yet efficient, private, and flexible for organizations to adopt.

Devstral 1.0 vs 1.1: The Devstral Small series debuted with version 1.0 in May 2025 as a research preview, and was later enhanced with Devstral Small 1.1 by July 2025. Both versions share the same 24B architecture (derived from Mistral Small 3.1), but 1.1 introduces significant fine-tuning improvements.

We will explore these differences in detail below, including boosts in benchmark performance, better instruction adherence, reduced hallucinations, and expanded integration capabilities.

First, let’s look at what key features set Devstral apart from other AI models in the coding domain.

Key Differentiators of Devstral Small LLMs

Devstral Small 1.0/1.1 comes with several key differentiators that make it stand out as a developer-focused language model:

Agentic Coding Abilities

Unlike general-purpose LLMs that excel at isolated tasks (e.g. writing a short function or completing a line of code), Devstral is trained to tackle real-world software engineering problems across an entire codebase. It was fine-tuned on solving actual GitHub issues, learning to navigate large projects, understand context across multiple files, and identify subtle bugs in complex systems.

In practice, this means Devstral can ingest a project’s code (up to 128k tokens worth) and provide holistic assistance – from understanding how different components interrelate, to generating multi-file fixes and feature implementations.

This contextual reasoning over big codebases and “full agent” behavior is a major leap beyond basic code completion models.

Low Latency, Lightweight Deployment

With 24B parameters, Devstral hits a sweet spot between capability and efficiency. The model is compact enough to run with limited computing resources, enabling fast inference speeds and low latency responses. In fact, Mistral reports running Devstral at ~150 tokens/second on a single high-end GPU.

Developers can self-host the model on local hardware or edge devices for near real-time interactions, without requiring a server farm.

This is a strategic advantage for enterprises concerned with cost, scalability, or data privacy – Devstral can be deployed on-premises or even on personal laptops to keep sensitive codebases local.

Compared to larger proprietary models, Devstral offers a cost-effective, resource-friendly solution that doesn’t compromise significantly on accuracy. Its lightweight nature and open license also remove the friction of vendor lock-in, making it easy to integrate into existing infrastructure.

Developer Tool Integration & Function Calling

Devstral was built with developer workflows in mind, featuring out-of-the-box integration with popular coding agent frameworks and tool APIs. It works seamlessly with OpenHands, SWE-Agent, OpenDevin and similar agentic scaffolds that allow the model to execute tools or tests during its reasoning process.

In other words, Devstral can not only generate code, but also call functions, run code, navigate file systems, and interact with a development environment through these tools.

Version 1.1 further expanded these integration capabilities by adding support for Mistral’s function calling format (a structured way for the model to return function-callable outputs) as well as XML output formats.

This increases Devstral’s interoperability with various systems and APIs, enabling accurate function calling and tool use directly from the model’s responses.

For example, in an IDE plugin scenario, Devstral could draft a function and also produce a JSON or XML snippet calling an API correctly, all in one go.

Such tight coupling with developer tools and structured output is a key differentiator that improves the model’s utility in real software development tasks.

Code Generation and Completion Excellence

At its core, Devstral is an excellent code generation model, capable of producing code in over 80 programming languages (leveraging the multilingual training of its Mistral base).

It can certainly handle classic coding assistant duties: writing functions or classes from natural language prompts, completing partial code, generating unit tests, refactoring code, and so on.

Its training on real GitHub issues and diverse repositories has honed its ability to produce coherent, correct code that adheres to instructions. Notably, Devstral’s performance on the SWE-Bench benchmark – which involves solving real coding problems – underscores its coding prowess (more on this in the next section).

Additionally, Devstral’s Tekken tokenizer (with a 131k token vocabulary) is optimized for coding languages, meaning it can represent code syntax and structure very efficiently. This leads to more fluent code completions and fewer tokenization quirks compared to models using general-purpose tokenizers.

Optimized Reasoning for Software & Data Workflows

Beyond just spitting out code, Devstral is optimized for the reasoning process required in software development workflows. Its fine-tuning process involved reinforcement learning and alignment specifically targeting success on coding tasks.

As a result, Devstral demonstrates strong problem-solving abilities like step-by-step debugging, evaluating error messages, and figuring out how to modify code to meet a given objective.

It can follow multi-step instructions such as “find the bug in this function, fix it, and then call an API to notify the user,” handling each part methodically.

This makes it well-suited for complex workflows that mix code understanding, generation, and external tool use (e.g. reading logs or querying a database as part of debugging).

Moreover, the model’s large context window means it can incorporate substantial background information – such as technical documentation, config files, or data schemas – into its reasoning.

In an enterprise data scenario, Devstral could be used to orchestrate data pipeline tasks or perform retrieval-augmented generation (RAG) by reading knowledge base documents and writing code to interface with data sources.

All of these capabilities underscore Devstral’s orientation toward real-world developer tasks, not just toy examples.

Performance: Benchmark Leadership in an Open Model

One of Devstral’s biggest claims to fame is its record-setting performance on coding benchmarks despite its smaller size. In particular, Mistral highlighted Devstral’s results on SWE-Bench Verified, a benchmark of 500 real-world GitHub coding issues (with known solutions).

Devstral Small 1.0 achieved 46.8% success on this benchmark, outperforming all prior open-source models by a large margin and even surpassing some much larger closed models.

For example, Devstral beat the previous open-source state-of-the-art by over 6 percentage points, and exceeded OpenAI’s GPT-4.1-mini by more than 20 points on this test.

This is remarkable considering GPT-4.1-mini (and other competitors like Anthropic’s Claude 3.5 or DeepMind’s Gemini prototypes) have far greater parameter counts. It speaks to the effectiveness of Devstral’s fine-tuning for coding tasks.

Devstral’s SWE-Bench performance (higher is better) vs. other models and parameter scales. Devstral Small 1.0 (24B) led open models with 46.8%, even outperforming some closed models many times its size. Devstral 1.1 further boosts this to 53.6%, solidifying Mistral’s lead in code benchmark rankings.

In July 2025, Devstral Small 1.1 raised the bar even higher. Version 1.1 scored 53.6% on SWE-Bench Verified, setting a new state-of-the-art for any open model on that benchmark. This improvement reflects the continued tuning and refinement of the model (more details on what changed in 1.1 are below).

It’s worth noting that Mistral also introduced a larger Devstral Medium model (with more parameters) available via API, which scores about 61.6% on SWE-Bench.

However, Devstral Small remains the flagship open-source release, striking an excellent balance of performance and efficiency. Overall, these benchmark results establish Devstral as perhaps the best open LLM for coding agents currently available.

Beyond benchmarks, anecdotal reports from developers indicate Devstral feels fast and capable in practical use. Many appreciate that they can run it locally and get prompt responses while working on coding tasks (for example, using it to modify a code snippet or update a config file automatically).

The combination of performance and portability means Devstral is not just a research demo – it’s a model that developers and teams can immediately put to work in real projects.

Use Cases: From Code Generation to Enterprise AI Integration

Devstral’s design opens up a wide range of use cases across software development and enterprise AI workflows.

Here are some of the key scenarios where Devstral Small 1.0/1.1 shines:

Intelligent Code Generation and Completion

At a basic level, Devstral can serve as a powerful code generation engine. Developers can use natural language prompts to have Devstral write functions, generate algorithms, produce code templates, or translate pseudocode into actual code.

Thanks to its training, it handles nuanced instructions and produces cohesive, well-documented code rather than just boilerplate. It’s equally adept at code completion – for instance, integrating with an IDE to autocomplete lines or entire blocks based on the context.

This can speed up development by handling routine coding patterns. Where Devstral goes further is generating code with an understanding of project-level context.

For example, if you prompt it to “Implement a caching layer for our data fetch function,” it can consider relevant classes or config defined elsewhere in the repository when writing the code. This context-aware generation is extremely valuable for large projects.

Developer Assistant in IDEs and CI/CD

Because Devstral can run locally and has a permissive license, it’s ideal for embedding within developer tools. Teams can integrate Devstral into IDE extensions (VS Code, JetBrains, etc.) to act as a pair programmer AI.

In this role, Devstral can answer questions about the codebase (“What does this function do?”), suggest improvements or fixes, and even automatically refactor code on command.

The Mistral team explicitly notes that if you are building an agentic coding IDE or plugin, Devstral is a great model to include.

Beyond the editor, Devstral can integrate into continuous integration (CI) pipelines or DevOps workflows – for example, analyzing test failures and proposing code changes, or generating release notes from commit histories.

Its ability to operate with no internet connection (running entirely on local hardware) is crucial for these use cases in secure environments.

Autonomous Coding Agents & Tool Orchestration

Devstral truly shines when used as the brain of an autonomous coding agent. Paired with frameworks like OpenHands or SWE-Agent, it can iteratively plan and execute coding tasks.

For instance, given a high-level goal (“Add a user authentication feature to the app”), an agent powered by Devstral could: read relevant parts of the codebase, create new files or modify existing ones, run tests to verify its changes, and repeat this process until the goal is achieved. Devstral’s training on agentic behavior and tool use allows it to make API calls or CLI commands through a tool interface accurately.

It can decide to open certain files, search for error messages, compile the project, etc., as a human developer would when debugging or adding features. This opens up possibilities for AI-driven software maintenance, automated bug fixing, and even AI DevOps bots that can manage routine programming tasks.

Additionally, Devstral’s support for function call outputs means it can respond with structured data that could trigger external functions directly – enabling tight integration in agent pipelines (e.g. returning a JSON that an automation script then executes).

Enterprise Knowledge Integration (RAG systems)

Many enterprises are exploring retrieval-augmented generation (RAG) systems, where an LLM is used in conjunction with company-specific data. Devstral’s features make it a strong candidate for these applications. Its large context window (128k tokens) means it can absorb lengthy corporate documents, wikis, or knowledge base articles alongside a query, allowing it to provide very informed and specific answers.

For example, an enterprise could use Devstral to power an internal chatbot that answers technical questions by retrieving relevant policy documents or code snippets and feeding them into the model. Because Devstral is a smaller model that can run on-premises, this can be done while keeping data completely private – a critical factor for industries with strict compliance needs.

Devstral can also be fine-tuned on internal data (or even “continued pre-trained” on domain-specific code) for improved accuracy on proprietary knowledge.

Use cases here include: an IT support assistant that reads configuration files to troubleshoot issues, a data analytics helper that writes SQL queries based on company databases, or a documentation assistant that drafts answers by combining code comments with official manuals. Essentially, Devstral can serve as the AI layer connecting enterprise data silos with natural language and code.

Customized Mistral Deployments

As an open model, Devstral is highly customizable. Organizations can take Devstral and fine-tune it on their own codebase or problem domain to create a private, internal version. Mistral AI even offers support for such fine-tuning or distilling Devstral’s capabilities into other models for specialized needs.

This means a company could create, say, a Devstral-Finance model tuned for fintech development, or integrate Devstral into a larger system with other AI components. Because Devstral Small is relatively lightweight, it can be one component among many in an AI stack – working alongside other LLMs or micro-models in a microservice-like architecture.

Its open-source nature also means users can inspect its outputs for auditing, add guardrails, or extend its knowledge without restriction. For enterprise adopters, this level of control and extensibility is a huge plus compared to closed API-only models.

Devstral 1.1 vs 1.0: Improvements in the Latest Version

The jump from Devstral Small 1.0 to 1.1 brought a series of refinements aimed at making the model even more useful and reliable for developers and enterprises. Here’s a comparison of the two versions and what improved in Devstral 1.1:

Higher Instruction Compliance

Devstral 1.1 has been further aligned to follow user instructions and system prompts accurately. While 1.0 was already good at adhering to task descriptions, 1.1 introduces tweaks (including an updated default system prompt and chat template) that make its responses more consistently on-target.

Mistral’s team and community contributors worked to debug and correct issues in the initial model, fine-tuning how the model handles various prompt formats.

The result is that Devstral 1.1 is less likely to go off track or require as much prompt engineering to get the desired output. It also generalizes better to different interaction styles – for example, it can handle both the OpenAI ChatML format and a more plain assistant style, whereas 1.0 sometimes expected a specific prompt style.

For developers, this means less hassle in integrating the model into custom applications; Devstral 1.1 will comply with instructions whether you use it in a Jupyter notebook, an IDE chat, or a backend service.

Reduced Hallucination and Error Rate

Thanks to fine-tuning on more data and reinforcement learning from human feedback (RLHF) focusing on correctness, Devstral 1.1 exhibits a lower hallucination rate compared to 1.0.

In coding terms, “hallucination” might mean suggesting code that doesn’t actually solve the problem or referencing functions that don’t exist – 1.1 makes those mistakes less frequently. Its improved SWE-Bench score (53.6% vs 46.8%) is one concrete measure of this gain in correctness.

Behind that jump are likely improvements in how the model validates its own outputs against tests during training, leading to answers that are more factual and on-point. Moreover, the Mistral team applied safety alignment techniques in training Devstral, which not only prevents toxic outputs but also encourages truthful, justifiable answers.

Anecdotally, early users of Devstral 1.1 report that it’s more reliable in solving problems without making things up. For enterprise use (where hallucinated answers can be costly), this enhancement adds confidence in deploying Devstral widely.

Better Tool and System Integration

A major focus of the 1.1 update was versatility across different agent systems and toolchains. Devstral 1.0 was primarily released alongside the OpenHands tool scaffold, which it excelled at.

Devstral 1.1 retains that excellence with OpenHands, and now handles a broader range of formats. Specifically, 1.1 supports Mistral’s new function-calling syntax as well as XML-style outputs for tool instructions.

This means it can plug into Mistral’s own API function calling interface (useful if you’re using Mistral’s platform services) and also work with any custom XML-based agent protocols. In practice, 1.1 can adapt to however your system orchestrates AI actions.

The model’s responses in 1.1 are also tuned to require less bespoke formatting; for instance, Unsloth’s Devstral 1.1 distributions include chat template fixes that help it respond well in standard chat UIs. All of these changes make Devstral 1.1 easier to integrate into real products.

Whether you want it to output a JSON for a function call or just follow a new system role convention, it’s ready out-of-the-box.

This flexibility is critical for enterprise adoption, where one company might use a different conversation schema or tool API than another. Devstral 1.1 meets those needs without requiring you to heavily customize the model.

Performance and Efficiency Boosts

Even though Devstral 1.1 did not change the model’s size, the training improvements have made it effectively more efficient in solving tasks. It finds solutions with fewer attempts and can generalize from fewer examples, as evidenced by the benchmark gains.

Mistral noted that 1.1 sets a new Pareto point in cost-performance, beating some larger rivals like Gemini 2.5 Pro at a fraction of the cost. For a decision-maker, this means Devstral 1.1 delivers more value per compute dollar – an important consideration when deploying at scale.

Furthermore, the community has produced quantized versions of Devstral 1.1 (e.g. 4-bit GGUF formats via Unsloth) that maintain its strong performance with minimal accuracy loss. Companies can leverage these optimized models to serve Devstral 1.1 at even lower latency or on smaller devices (like edge servers or advanced laptops), broadening the range of deployment options.

In summary, Devstral Small 1.1 is a significant iteration that refines the original vision of Devstral 1.0. It didn’t reinvent the wheel – the architecture and core concept remain – but it polished the model where it matters: accuracy, compliance, and integration.

Teams already using Devstral 1.0 will find 1.1 to be a drop-in upgrade that brings tangible quality improvements. New adopters should directly choose 1.1 for the best experience.

And given Mistral’s commitment to iterating on this series, we can expect future versions to continue pushing boundaries (with hints of larger Devstral Medium and beyond on the horizon).

Conclusion: Developer-Tuned LLMs for the Enterprise Era

Mistral’s Devstral Small series exemplifies the emerging trend of smaller, domain-optimized language models that bridge the gap between cutting-edge AI and practical deployment in enterprise settings.

By tuning a “small” 24B parameter model specifically for software development, Mistral AI has delivered a system that speaks the language of developers – from writing code to controlling developer tools – all while remaining efficient enough to run anywhere.

This developer-tuned focus, combined with open availability, makes Devstral a compelling choice for organizations seeking to infuse AI into their software engineering workflows.

For developers and software teams, Devstral offers a reliable AI coding assistant that can handle complex projects, not just toy examples. It operates at the speed of thought, with low latency suggestions, and can be deeply integrated into IDEs, version control systems, and continuous deployment pipelines.

This has the potential to accelerate development cycles, catch bugs faster, and offload mundane tasks to AI – freeing human developers to focus on creative design and problem-solving.

For enterprise AI adopters and decision-makers, Devstral checks crucial boxes: it’s cost-effective (no exorbitant API fees, and it runs on commodity hardware), it’s private (you can keep your code on your own servers), and it’s customizable to your domain. The Apache 2.0 license means no legal headaches in using it commercially.

Whether a company wants to build an internal coding Copilot, automate IT operations, or deploy smarter chatbots that can both converse and script solutions, Devstral provides a robust foundation.

Moreover, its success on benchmarks and in-the-field usage demonstrates that lightweight models can deliver heavyweight performance – an encouraging sign for those who need AI solutions but can’t or won’t rely on gigantic proprietary models.

In conclusion, Devstral Small 1.1 (and its predecessor 1.0) illustrate how a targeted, developer-tuned LLM can drive innovation in software development and enterprise AI integration. With lower latency, better tool use, and improved reasoning, Devstral shows that “small” models can handle big challenges.

As Mistral AI continues to refine this series and introduce new versions, the Devstral lineup is poised to remain at the forefront of open-source AI for coding.

It invites developers and enterprises alike to embrace a future where AI is a hands-on collaborator in every coding project – efficient, knowledgeable, and always ready to commit the next great piece of code.