Banana Code Blog

Release notes and deeper dives. More posts will show up here over time.

All posts

May 16, 2026 · Guide

Using Qwen3.6 35B A3B With Banana Code

Qwen and Banana Code local Ollama setup preview

Qwen3.6 35B A3B is a strong local coding model that you can run on your own machine and connect to Banana Code through Ollama. This gives you a private local setup where Banana Code can work with your project without sending prompts, source code, or file contents to a cloud AI provider.

The main benefit is simple: you can use Banana Code with a capable local model instead of paying per-token API costs or relying on cloud rate limits.

Qwen3.6 35B A3B is especially interesting because it gets close to frontier cloud coding models on several coding benchmarks. It is not automatically better than Claude Sonnet 4.5 overall, but for a local open-weight model, the benchmark results are very good.

Hardware Requirements

Qwen3.6 35B A3B and Qwen3.6 27B are not the same type of model.

Qwen3.6 35B A3B is a MoE model. That means it has 35B total parameters, but only part of the model is active per token. Because of that, it is more realistic to run on consumer hardware with partial RAM offload.

Qwen3.6 27B is a dense model. That means the full model is active during inference. Because of that, CPU/RAM offload is much worse for speed. For Qwen3.6 27B, 24GB VRAM is the realistic recommendation if you want a usable local coding experience.

Model Type Hardware recommendation
Qwen3.6 35B A3B MoE 16GB VRAM minimum with RAM offload. 24GB VRAM highly recommended.
Qwen3.6 27B Dense 24GB VRAM recommended. CPU/RAM offload is possible, but it will usually be extremely slow.
Claude Sonnet 4.5 Cloud model No local VRAM needed, but prompts and code are sent to a cloud provider.

If you only have 16GB VRAM, Qwen3.6 35B A3B is the better local choice. It can work with RAM offload, although it will be slower than running more of the model on GPU.

If you have 24GB VRAM or more, Qwen3.6 27B becomes very interesting. It is dense, strong, and performs very well on coding benchmarks, but it needs more VRAM to feel usable locally.

Benchmark Comparison

These benchmark results show why Qwen3.6 is interesting for local coding. Qwen3.6 35B A3B is already very good for a local MoE model, while Qwen3.6 27B is even stronger on several coding benchmarks.

Benchmark Qwen3.6 35B A3B Qwen3.6 27B Dense Claude Sonnet 4.5 Notes
SWE-bench Verified 73.4 77.2 77.2 primary / 82.0 high-compute Qwen3.6 27B matches Sonnet 4.5's primary reported SWE-bench Verified score. That is very good for a local model.
SWE-bench Multilingual 67.2 Not listed here Not listed here Qwen3.6 35B A3B has a strong multilingual software-engineering result.
SWE-bench Pro 49.5 53.5 Not listed here Qwen3.6 27B is stronger than 35B A3B on this harder coding-agent benchmark.
Terminal-Bench 2.0 51.5 59.3 Not listed here Qwen3.6 27B is much stronger for terminal-based agent tasks.
LiveCodeBench v6 80.4 Not listed here Not listed here Qwen3.6 35B A3B already performs strongly on coding.
GPQA 86.0 Not listed here Not listed here Strong reasoning and knowledge result for 35B A3B.
AIME 2026 92.7 Not listed here Not listed here Very strong math benchmark result for 35B A3B.

The important takeaway: Claude Sonnet 4.5 is still a top-tier cloud model and is likely stronger overall for complex long-running agentic coding. But Qwen3.6 35B A3B reaching 73.4 on SWE-bench Verified is very good for a local model.

Qwen3.6 27B is even more impressive if you have enough VRAM. It is a dense 27B model and reaches 77.2 on SWE-bench Verified, matching Sonnet 4.5's primary reported score in this comparison. For a local model, that is very good.

For 16GB VRAM, use Qwen3.6 35B A3B with RAM offload.

For 24GB VRAM or more, Qwen3.6 27B is probably the better coding model to try first.

1. Install Ollama

First, install Ollama from the official download page:

Download Ollama

After installing it, open Ollama once so the local server starts.

Ollama usually runs a local API server at:

http://localhost:11434

Banana Code can connect to this local server and use Ollama as a model provider.

2. Download Qwen3.6 35B A3B

Open your terminal and run:

ollama pull qwen3.6:35b-a3b

This downloads Qwen3.6 35B A3B into Ollama.

This is a large model, so the download can take a while. Make sure you have enough disk space before starting.

You can test the model directly with:

ollama run qwen3.6:35b-a3b

Then try a simple prompt:

Write a simple JavaScript function that checks whether a number is prime.

If the model responds, Ollama is working.

3. Optional: Download Qwen3.6 27B Instead

If you have 24GB VRAM or more and want to try the stronger dense coding model, you can also download Qwen3.6 27B:

ollama pull qwen3.6:27b

Then run it with:

ollama run qwen3.6:27b

Only use Qwen3.6 27B if your hardware can handle it. Because it is dense, CPU/RAM offload will usually make it extremely slow compared with running it mostly or fully on GPU.

If you are on 16GB VRAM, Qwen3.6 35B A3B is usually the more realistic choice.

4. Start Banana Code

Go into the project you want to work on:

cd your-project

Then start Banana Code:

banana

During setup, choose Ollama as the provider.

When Banana Code asks for the Ollama server URL, use:

http://localhost:11434

For the model name, use Qwen3.6 35B A3B:

qwen3.6:35b-a3b

Or, if you downloaded Qwen3.6 27B and have enough VRAM, use:

qwen3.6:27b

After this, Banana Code will send requests to your local Ollama server instead of a cloud provider.

5. Test the Connection

Inside Banana Code, try a small coding request:

Explain the structure of this project and suggest the first file I should inspect.

Or:

Find the main entry point of this project.

If Banana Code responds using Qwen3.6, the setup is working.

6. Why Use Qwen3.6 Locally?

Running Qwen3.6 through Ollama gives you several practical benefits.

Benefit Why it matters
Local privacy Your prompts and code stay on your machine.
No per-token API cost You do not pay for every input and output token.
No cloud rate limits You are limited by your own hardware instead of provider quotas.
Good coding performance Qwen3.6 performs very well for a local open-weight coding model.
Works with Banana Code You can use an AI coding assistant workflow without relying on a cloud model.

This is especially useful for private repositories, local experiments, and projects where you do not want to upload source code to external model providers.

7. Which Qwen3.6 Model Should You Choose?

Use this simple rule:

Your hardware Recommended model
16GB VRAM Qwen3.6 35B A3B with RAM offload
24GB VRAM Qwen3.6 27B or Qwen3.6 35B A3B
More than 24GB VRAM Try Qwen3.6 27B first for coding
CPU-only Not recommended for either model unless you are only testing and can accept very slow output

Qwen3.6 35B A3B is better if you need something that can survive on lower VRAM with offload.

Qwen3.6 27B is better if you have enough VRAM and want the stronger dense coding model.

8. Performance Notes

Qwen3.6 35B A3B is large, so performance depends heavily on your hardware.

If you have 16GB VRAM, it can be usable with RAM offload, but it may feel slower. If you have 24GB VRAM or more, the experience should be much better.

Qwen3.6 27B is different. Since it is dense, offloading a lot of it to CPU/RAM can make it very slow. For that model, 24GB VRAM is strongly recommended.

If either model feels too slow, you can try a smaller Qwen model first, then switch back to Qwen3.6 when you need stronger reasoning or better coding quality.

You can also keep Ollama running in the background so Banana Code can connect to it instantly whenever you start a coding session.

9. Finished

You now have Banana Code connected to Qwen3.6 locally through Ollama.

From here, you can use Banana Code normally:

banana

Then ask it to inspect files, explain code, make edits, generate tests, or help debug your project while using a local model instead of a cloud API.

Sources: Qwen3.6 35B A3B model card; Qwen3.6 27B model card; Qwen3.6 27B blog post; Claude Sonnet 4.5 announcement.

Image source: Qwen_Logo.svg on Wikimedia Commons. Copyright © Alibaba Cloud.

License: Apache License, Version 2.0. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Trademark notice: The Qwen logo may be protected as a trademark in some jurisdictions. Its use here is only to identify Qwen in a guide about using Qwen3.6 with Banana Code and does not imply endorsement, sponsorship, or affiliation with Alibaba Cloud or Qwen.

May 10, 2026 · Guide

Using Banana Code Completly For Free & 100% Private

Free and local Banana Code with Ollama preview

Banana Code can run against local models, which means you can use it without paying for API tokens and without sending your prompts, code, or file contents to a cloud AI provider. The easiest way to do that is to run a model locally with Ollama and connect Banana Code to the local Ollama server.

1. Download Ollama

First, install Ollama from the official download page:

Download Ollama

Open Ollama after installing it. It runs a local server on your machine, usually at http://localhost:11434, which Banana Code can use as a provider.

2. Pull Gemma 4

Once Ollama is installed, open your terminal and download the model:

ollama pull gemma4

This downloads Gemma 4 to your machine. After that, the model runs locally through Ollama.

If your PC has no dedicated GPU, or your GPU has less than 12GB of VRAM, start with the smaller edge model instead:

ollama pull gemma4:e2b

You can still try the normal gemma4 model without a tag, but expect it to be slower on lower-end hardware. You can also choose other Gemma 4 tags from the Ollama Gemma 4 library page.

3. Install or Open Banana Code

If you do not have Banana Code installed yet, install it with npm:

npm install -g @banaxi/banana-code

Then open Banana Code in the project you want to work on:

banana

4. Select Ollama in Banana Code

During first-time setup, choose Ollama as your provider. If Banana Code is already set up, switch providers inside Banana Code:

/provider ollama

Use the local Ollama URL when asked:

http://localhost:11434

Then select gemma4 as the model. You can also use /model later to switch between the models installed on your machine.

Why This Is Free and Private

For the most private setup, keep Banana Remote disabled and use the local Ollama provider. That gives you a fully local AI coding workflow with Banana Code and Gemma 4.

Short version: install Ollama, run ollama pull gemma4, open Banana Code, switch to /provider ollama, and start coding for free.

Image source: Ollama-logo.svg on Wikimedia Commons. Original source: ollama/ollama docs/ollama-logo.svg. Author listed by Wikimedia Commons: ParthSareen on ollama.

License: MIT/Expat License. Copyright © The author(s). Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: the above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The Software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the Software or the use or other dealings in the Software.

May 2026 · New Feature

Browser Use & Edit with AI: Code Changes from the Page Itself

Browser Use and Edit with AI Preview

Banana Code Studio now connects the AI coding workflow directly to the browser. Instead of describing a UI element from memory or pasting screenshots into chat, you can open the page, point at the exact element, and ask Banana Code to change the local source code that produced it.

🌐 Browser Use in Studio

Browser Use gives Banana Code a visible browser panel inside Studio. The assistant can open pages, inspect the current state, click, type, scroll, and capture page context while you watch. This makes UI work more grounded because the AI can reason about the running app instead of only reading files.

✏️ Edit with AI

Edit with AI is built for the moment when you are looking at the page and know exactly what should change. Press Ctrl+Alt+E, hover over any element, right-click, and choose Edit using AI. A small prompt opens next to the selected element, so you can type requests like Make this headline blue, Increase the card spacing, or Make this button more prominent.

When you send the prompt, Banana Code attaches the selected DOM element to the normal chat turn. The context includes the page URL, selector, XPath, visible text, attributes, nearby HTML, computed style, and framework source hints when available. That gives the coding agent enough detail to search the active workspace for matching components, classes, and text.

Local Code First

The feature is designed for real project work. If the active workspace contains the source code for the page, Banana Code can make the code changes locally using its normal file-editing flow. If the browser is on a site whose code is not in the workspace, Banana Code will say what it needs instead of pretending it can edit a remote page.

Why It Matters

How to Try It

Open Banana Code Studio, start or load a workspace, and ask the AI to open your local app in the browser. Once the page is visible, use Ctrl+Alt+E to enable element picking, choose Edit using AI, type a short instruction, and send it. The request will appear in the regular chat with the selected element attached.

Install or upgrade with npm install -g @banaxi/banana-code, then launch Studio to try Browser Use and Edit with AI.

April 23, 2026 · Release

BananaCode v2.4.0: DeepReview & Enhanced Personalization

2.4.0 Release Preview

BananaCode v2.4.0 is here, focusing on giving users more control over how the AI interacts and introducing a powerful new audit mode.

🔍 DeepReview: Full Codebase Audit

The new /deepreview command switches BananaCode into a specialized review mode. You can choose between a Full Review (auditing the entire current codebase) or a Diff Review (reviewing only staged/unstaged changes via git diff). In this mode, BananaCode focuses purely on providing a structured report with Critical, Warning, and Suggestion findings, without making any file modifications.

✨ Emoji & Style Personalization

We've added more ways to customize your AI pair programmer's personality:

🛠️ New Tools & UI Polish

Bug Fixes & Reliability

We've improved tool execution error handling to better manage user cancellations and repair dangling tool calls. Additionally, the startup telemetry now correctly uses https for more secure connections.

Update now with npm install -g @banaxi/banana-code and try out the new /deepreview command!

April 2026 · New Feature

Local Intelligence: LM Studio Support is Here!

LM Studio Support Preview

Banana Code has always been about flexibility, and today we're taking a huge leap towards local-first development. We are excited to announce full, first-class support for LM Studio.

Why LM Studio?

LM Studio has become the go-to tool for running large language models (LLMs) locally on your own hardware. By integrating LM Studio, Banana Code users can now leverage powerful models like Llama 3, Mistral, and many others without needing an API key or an active internet connection for the model inference.

First-Class Features

This isn't just a simple proxy; we've implemented a full provider suite tailored for the local experience:

Getting Started

Switching to LM Studio is simple. Just run the following command in your terminal:

/provider lmstudio

Banana Code will ask for your local server URL (defaulting to http://localhost:1234/v1) and then let you pick from your loaded models. You can also configure it during initial setup with banana --setup.

Optimized for Performance

We've included automatic JSON schema sanitization for local models, ensuring that even strict local inference engines can understand and use Banana Code's tool definitions without errors.

Download Banana Code using npm install -g @banaxi/banana-code and then download LM Studio at lmstudio.ai and start coding locally today!

April 2026 · Release

2.0.0 Released, What changed?

2.0.0 Release Preview

Version 2.0.0 is a major step forward for Banana Code as a terminal-native AI pair programmer. Here is a concise tour of what shipped, aligned with the actual app behavior.

Smarter Auto Mode (model + effort)

When you pick Auto Mode as your model, a small router model still picks the best concrete model for each user turn—but for Claude, it now also selects a reasoning effort level (low through max, including xhigh where supported). That keeps simple questions cheap and fast while reserving depth for hard tasks. Use /effort to adjust effort manually when you are on Claude.

Interactive terminal suite

Banana Code moves beyond one-shot shell runs. New tools drive a persistent PTY:

Together, these let the agent work through flows that used to stall on non-interactive runners—while one-off tasks still use execute_command.

Financial intelligence

For providers that expose usage (notably Anthropic), the app tracks real session spend and estimates what you saved with Prompt Caching. Run /context for a breakdown (messages, estimated tokens by category, cost, cache savings). On exit, you get a final session cost summary when costing is available.

Skill Creator mode

New command /skill-creator switches the assistant into a mode that helps you author Agent Skills: structured SKILL.md files with YAML frontmatter, written under ~/.config/banana-code/skills/<skill-name>/. The status bar shows SKILL CREATOR MODE; return with /agent.

New slash commands and style

Built-in docs for the model

The get_banana_docs tool gives the model a reliable summary of Banana Code (plus README when present), so answers about slash commands and setup stay accurate.

UltraMemory (optional)

Enable UltraMemory under /settings to run background summarization of eligible chats into global memory. It can significantly increase API usage; the CLI asks for confirmation before turning it on, and only processes activity after you enable it.

Richer @-mentions

File mentions support quoted paths (spaces), ~ expansion, and attaching images via @@path for multimodal providers.

Headless API security

banana --api now uses a generated API token stored at ~/.config/banana-code/token.json. HTTP requests need Authorization: Bearer <token> or ?token=; WebSockets should connect with ?token=... unless you explicitly use --no-auth (discouraged).

Other polish

Install or upgrade with npm install -g @banaxi/banana-code and read the full docs on the Docs page.