NVIDIA Deploys Alibaba Qwen3.5 VLM on Blackwell GPUs for AI Agent Development

Jessie A Ellis
Feb 27, 2026 18:05

NVIDIA offers free GPU-accelerated endpoints for Alibaba’s 397B parameter Qwen3.5 vision-language model, enabling developers to build multimodal AI agents.

NVIDIA has rolled out free GPU-accelerated endpoints for Alibaba’s Qwen3.5 vision-language model, giving developers immediate access to the 397 billion parameter system through Blackwell architecture hardware. The move positions both tech giants to capture the emerging market for multimodal AI agents capable of understanding and navigating user interfaces.

The Qwen3.5 model, which Alibaba released on February 16, 2026, represents a significant architectural shift in large language models. Despite its massive 397B total parameters, only 17 billion activate per forward pass—a 4.28% activation rate achieved through a hybrid mixture-of-experts (MoE) design combined with Gated Delta Networks. This efficiency translates to real cost savings: Alibaba claims the system runs 60% cheaper and handles large workloads eight times more efficiently than its predecessor.

Technical Specifications Worth Noting

The model supports an input context length of 256K tokens, extensible to 1 million—enough to process approximately two hours of video content natively. It handles 200+ languages and runs 512 experts per layer, with 11 experts (10 routed plus 1 shared) activated per token across 60 layers.

Developers can access Qwen3.5 through NVIDIA’s build.nvidia.com platform with free registration in the NVIDIA Developer Program. The API follows OpenAI-compatible conventions, making integration straightforward for teams already working with similar tool-calling patterns.

Production Deployment Options

For enterprises moving beyond experimentation, NVIDIA NIM packages the model as containerized inference microservices. These can run on-premises, in cloud environments, or across hybrid deployments. The NeMo framework provides fine-tuning capabilities for domain-specific applications—NVIDIA specifically highlights a medical visual QA tutorial demonstrating radiological dataset training.

Alibaba has continued expanding the Qwen3.5 family since the initial release. On February 24, the company pushed out three additional variants: Qwen3.5-122B-A10B, Qwen3.5-35B-A3B, and Qwen3.5-27B, offering smaller footprint options for different deployment scenarios.

Alibaba, trading with a market cap around $372 billion as of February 27, has positioned Qwen3.5 against GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro on benchmark performance. The open-weight models remain available on Hugging Face Hub and ModelScope for developers who prefer self-hosting over NVIDIA’s managed endpoints.

Image source: Shutterstock

Source link