DEV Community

Cover image for Mistral Le Chat is 10X faster than ChatGPT, Perplexity Sonar is out, and new GitHub Copilot AI update
This Week in AI Engineering
This Week in AI Engineering

Posted on

Mistral Le Chat is 10X faster than ChatGPT, Perplexity Sonar is out, and new GitHub Copilot AI update

Hello AI Enthusiasts!

Welcome to the sixth edition of "This Week in AI Engineering"!

This week started with Mistral’s new AI Assistant, Le Chat making noise in the community, followed by major releases from Perplexity and GitHub.

With this, we’ll be covering news from DeepSeek and Cline, with some must-know tools to make developing AI agents and apps easier.

Le Chat: 10x Faster than ChatGPT

Mistral AI has introduced Le Chat, featuring Cerebras-powered Flash Answers for enhanced response speeds. The platform has integrated Cerebras Inference technology with the 123B parameter Mistral Large 2 model, delivering significant performance improvements in text processing.

Technical Architecture:

  • Processing Engine: Wafer Scale Engine 3 with SRAM-based inference and speculative decoding

  • Model Configuration: Mistral Large 2 (123B parameters) optimized for text queries

  • Token Processing: 1,100 tokens per second throughput

Performance Metrics:

  • Speed Comparison: 1,100 tokens/s versus Gemini 2.0 Flash (168 tokens/s)

  • Relative Performance: 10x faster than ChatGPT 4o (115 tokens/s)

  • Code Generation: Sub-second completion times compared to standard 50-second responses

Le Chat vs ChatGPT

The initial release has focused on text-based queries, with Cerebras and Mistral AI planning expanded model support throughout 2025.

Perplexity Sonar: New Search Model with Enhanced Speed and Accuracy

Perplexity Labs has introduced Sonar, a new search-optimized model built on the Llama 3.3 70B architecture. The model has integrated Cerebras inference infrastructure to deliver response speeds of 1,200 tokens per second, establishing significant performance improvements over existing solutions.

Technical Architecture:

  • Base Model: Llama 3.3 70B with optimized training for search and factual responses

  • Inference System: Cerebras-powered infrastructure for high-speed processing

  • Response Generation: 1,200 tokens per second throughput

  • Deployment Framework: Available to all Perplexity Pro subscribers

Performance Metrics:

  • Factuality Score: 85.1% accuracy in search result grounding

  • Readability Rating: 85.9% on text organization benchmarks

  • IFEval Results: 86.8% on instruction following tasks

  • MMLU Performance: 87.1% on knowledge evaluation

Comparative Testing:

  • User Satisfaction: Higher engagement rates compared to GPT-4o mini and Claude 3.5 Haiku

  • Speed Analysis: 10x faster processing than Gemini 2.0 Flash for real-time responses

  • Benchmark Results: Outperforming Claude 3.5 Sonnet while approaching GPT-4o capabilities

The platform has enhanced its search capabilities through A/B testing.

GitHub Copilot: Agent Mode Integration with Multi-Model Support

GitHub has introduced Agent Mode for Copilot, integrating advanced AI models including Gemini 2.0 Flash, GPT-4o, and Claude 3.5 Sonnet. The platform has enhanced its autonomous coding capabilities through VS Code Insiders, focusing on automated error resolution and task management.

Technical Architecture:

  • Processing System: Dual-model architecture with foundation language model and speculative decoding endpoint

  • Model Integration: Support for multiple AI models including GPT-4o, o3-mini, and Gemini 2.0 Flash

  • Execution Environment: Secure cloud sandbox for autonomous task processing

Core Features:

  • Self-Healing Mechanism: Automatic error detection and resolution capabilities

  • Multi-File Management: Cross-file editing and consistency maintenance

  • Task Automation: Terminal command suggestions with execution validation

Deployment Options:

  • Free Tier: 2,000 completions and 50 chat requests monthly

  • Pro Version: $10/month with unlimited access

  • Business Plan: $19/user/month for team workflows

  • Enterprise Tier: $39/user/month with customization options

The platform has demonstrated significant improvements in code completion and error handling, with Project Padawan scheduled for expanded autonomous agent capabilities later in 2025.

DeepSeek VL2: Advanced Vision-Language Model with MoE Architecture

DeepSeek has released DeepSeek-VL2, a new series of Mixture-of-Experts (MoE) vision-language models designed for enhanced multimodal understanding. The model family has introduced three variants with different parameter scales and efficiency optimizations.

Technical Architecture:

  • Model Variants: DeepSeek-VL2-Tiny (1.0B), VL2-Small (2.8B), and VL2 (4.5B) activated parameters

  • Context Window: 4096 token length support across all variants

  • Processing Pipeline: Integrated transformer architecture for visual-language tasks

  • Memory Usage: 40GB GPU support for VL2-Small with incremental prefilling

Performance Features:

  • Resource Efficiency: VL2-Tiny operates on single GPU with <40GB memory

  • Processing Speed: Optimized inference with chunk size 512 for memory efficiency

  • Deployment Options: Support for vllm, sglang, and lmdeploy optimizations

  • Commercial Usage: MIT license for code and DeepSeek Model License for models

Core Capabilities:

  • Visual QA: Enhanced question-answering with visual context

  • OCR Integration: Advanced optical character recognition support

The model has focused on efficient parameter activation while maintaining competitive performance against larger dense models, with full commercial use support under the DeepSeek Model License.

Cline 3.3: AI Programming Assistant enhances security and API integration

Cline, an AI-powered code assistant for VS Code that helps developers write, review, and explain code has released version 3.3. The update introduces key security features and expanded API provider support. It focuses on file access control through a new .clineignore system while increasing its model compatibility with additional providers.

Technical Updates:

  • Security Implementation: New .clineignore file system for blocking specific file patterns

  • AWS Integration: Support for AWS Bedrock profiles with long-lived connection capabilities

  • Provider Expansion: Added Requesty, Together, and Alibaba Qwen API providers

Core Improvements:

  • Rate Limiting: Automatic retry system for handling rate-limited requests

  • UI Enhancement: Keyboard shortcut (CMD + Shift + A) for Plan/Act mode switching

  • Cost Tracking: Resolved OpenRouter request cost/token statistics reporting

The update has maintained backward compatibility while introducing significant security features and reliability improvements for enterprise development workflows.

Tools & Releases YOU Should Know About

PearAI: PearAI is an open-source AI-driven code editor designed to boost developer productivity using AI tools. Built on Visual Studio Code, it features automated routing to the best-performing AI models, real-time AI-powered search, and a strict zero data retention policy for user privacy. Key models utilized include Claude 3.5 and GPT-4o, ensuring high performance and efficiency.

OneCompiler: OneCompiler is an online platform that provides a versatile coding environment for multiple programming languages, including Python, Java, C++, and JavaScript. It features web-based code editors with built-in compilers and interpreters for real-time code execution. Additionally, OneCompiler offers embeddable code editors for integration into other websites and APIs for backend integration, making it an ideal solution for developers, educators, and businesses seeking flexible coding tools.

Tabby: Tabby is an open-source AI coding assistant built to enhance developer productivity by providing AI-powered code completion, an answer engine for coding questions, and inline chat for collaboration within integrated development environments (IDEs). It offers flexible deployment options, including cloud and on-premises solutions, while ensuring transparency and security through its open-source nature.

Potpie: Potpie is an advanced AI debugging tool designed to assist developers in efficiently identifying and resolving code issues. It leverages AI-powered debugging techniques that mimic human developer processes, utilizing a knowledge graph of the codebase to understand relationships between code elements. Potpie offers specialized retrieval methods, such as Knowledge Graph Queries and Tag-based Retrieval, acting as an experienced pair programmer.

And that wraps up this issue of "This Week in AI Engineering", brought to you by jam.dev—the tool that makes it impossible for your team to send you bad bug reports.

Thank you for tuning in! Be sure to share this newsletter with your fellow AI enthusiasts and subscribe to get the latest updates directly in your inbox.

Until next time, happy building!

Top comments (0)