May 16, 2026 · Guide

Using Qwen3.6 35B A3B With Banana Code

Qwen and Banana Code local Ollama setup preview

Qwen3.6 35B A3B is a strong local coding model that you can run on your own machine and connect to Banana Code through Ollama. This gives you a private local setup where Banana Code can work with your project without sending prompts, source code, or file contents to a cloud AI provider.

The main benefit is simple: you can use Banana Code with a capable local model instead of paying per-token API costs or relying on cloud rate limits.

Qwen3.6 35B A3B is especially interesting because it gets close to frontier cloud coding models on several coding benchmarks. It is not automatically better than Claude Sonnet 4.5 overall, but for a local open-weight model, the benchmark results are very good.

Hardware Requirements

Qwen3.6 35B A3B and Qwen3.6 27B are not the same type of model.

Qwen3.6 35B A3B is a MoE model. That means it has 35B total parameters, but only part of the model is active per token. Because of that, it is more realistic to run on consumer hardware with partial RAM offload.

Qwen3.6 27B is a dense model. That means the full model is active during inference. Because of that, CPU/RAM offload is much worse for speed. For Qwen3.6 27B, 24GB VRAM is the realistic recommendation if you want a usable local coding experience.

Model	Type	Hardware recommendation
Qwen3.6 35B A3B	MoE	16GB VRAM minimum with RAM offload. 24GB VRAM highly recommended.
Qwen3.6 27B	Dense	24GB VRAM recommended. CPU/RAM offload is possible, but it will usually be extremely slow.
Claude Sonnet 4.5	Cloud model	No local VRAM needed, but prompts and code are sent to a cloud provider.

If you only have 16GB VRAM, Qwen3.6 35B A3B is the better local choice. It can work with RAM offload, although it will be slower than running more of the model on GPU.

If you have 24GB VRAM or more, Qwen3.6 27B becomes very interesting. It is dense, strong, and performs very well on coding benchmarks, but it needs more VRAM to feel usable locally.

Benchmark Comparison

These benchmark results show why Qwen3.6 is interesting for local coding. Qwen3.6 35B A3B is already very good for a local MoE model, while Qwen3.6 27B is even stronger on several coding benchmarks.

Benchmark	Qwen3.6 35B A3B	Qwen3.6 27B Dense	Claude Sonnet 4.5	Notes
SWE-bench Verified	73.4	77.2	77.2 primary / 82.0 high-compute	Qwen3.6 27B matches Sonnet 4.5's primary reported SWE-bench Verified score. That is very good for a local model.
SWE-bench Multilingual	67.2	Not listed here	Not listed here	Qwen3.6 35B A3B has a strong multilingual software-engineering result.
SWE-bench Pro	49.5	53.5	Not listed here	Qwen3.6 27B is stronger than 35B A3B on this harder coding-agent benchmark.
Terminal-Bench 2.0	51.5	59.3	Not listed here	Qwen3.6 27B is much stronger for terminal-based agent tasks.
LiveCodeBench v6	80.4	Not listed here	Not listed here	Qwen3.6 35B A3B already performs strongly on coding.
GPQA	86.0	Not listed here	Not listed here	Strong reasoning and knowledge result for 35B A3B.
AIME 2026	92.7	Not listed here	Not listed here	Very strong math benchmark result for 35B A3B.

The important takeaway: Claude Sonnet 4.5 is still a top-tier cloud model and is likely stronger overall for complex long-running agentic coding. But Qwen3.6 35B A3B reaching 73.4 on SWE-bench Verified is very good for a local model.

Qwen3.6 27B is even more impressive if you have enough VRAM. It is a dense 27B model and reaches 77.2 on SWE-bench Verified, matching Sonnet 4.5's primary reported score in this comparison. For a local model, that is very good.

For 16GB VRAM, use Qwen3.6 35B A3B with RAM offload.

For 24GB VRAM or more, Qwen3.6 27B is probably the better coding model to try first.

1. Install Ollama

First, install Ollama from the official download page:

Download Ollama

After installing it, open Ollama once so the local server starts.

Ollama usually runs a local API server at:

http://localhost:11434

Banana Code can connect to this local server and use Ollama as a model provider.

2. Download Qwen3.6 35B A3B

Open your terminal and run:

ollama pull qwen3.6:35b-a3b

This downloads Qwen3.6 35B A3B into Ollama.

This is a large model, so the download can take a while. Make sure you have enough disk space before starting.

You can test the model directly with:

ollama run qwen3.6:35b-a3b

Then try a simple prompt:

Write a simple JavaScript function that checks whether a number is prime.

If the model responds, Ollama is working.

3. Optional: Download Qwen3.6 27B Instead

If you have 24GB VRAM or more and want to try the stronger dense coding model, you can also download Qwen3.6 27B:

ollama pull qwen3.6:27b

Then run it with:

ollama run qwen3.6:27b

Only use Qwen3.6 27B if your hardware can handle it. Because it is dense, CPU/RAM offload will usually make it extremely slow compared with running it mostly or fully on GPU.

If you are on 16GB VRAM, Qwen3.6 35B A3B is usually the more realistic choice.

4. Start Banana Code

Go into the project you want to work on:

cd your-project

Then start Banana Code:

banana

During setup, choose Ollama as the provider.

When Banana Code asks for the Ollama server URL, use:

http://localhost:11434

For the model name, use Qwen3.6 35B A3B:

qwen3.6:35b-a3b

Or, if you downloaded Qwen3.6 27B and have enough VRAM, use:

qwen3.6:27b

After this, Banana Code will send requests to your local Ollama server instead of a cloud provider.

5. Test the Connection

Inside Banana Code, try a small coding request:

Explain the structure of this project and suggest the first file I should inspect.

Or:

Find the main entry point of this project.

If Banana Code responds using Qwen3.6, the setup is working.

6. Why Use Qwen3.6 Locally?

Running Qwen3.6 through Ollama gives you several practical benefits.

Benefit	Why it matters
Local privacy	Your prompts and code stay on your machine.
No per-token API cost	You do not pay for every input and output token.
No cloud rate limits	You are limited by your own hardware instead of provider quotas.
Good coding performance	Qwen3.6 performs very well for a local open-weight coding model.
Works with Banana Code	You can use an AI coding assistant workflow without relying on a cloud model.

This is especially useful for private repositories, local experiments, and projects where you do not want to upload source code to external model providers.

7. Which Qwen3.6 Model Should You Choose?

Use this simple rule:

Your hardware	Recommended model
16GB VRAM	Qwen3.6 35B A3B with RAM offload
24GB VRAM	Qwen3.6 27B or Qwen3.6 35B A3B
More than 24GB VRAM	Try Qwen3.6 27B first for coding
CPU-only	Not recommended for either model unless you are only testing and can accept very slow output

Qwen3.6 35B A3B is better if you need something that can survive on lower VRAM with offload.

Qwen3.6 27B is better if you have enough VRAM and want the stronger dense coding model.

8. Performance Notes

Qwen3.6 35B A3B is large, so performance depends heavily on your hardware.

If you have 16GB VRAM, it can be usable with RAM offload, but it may feel slower. If you have 24GB VRAM or more, the experience should be much better.

Qwen3.6 27B is different. Since it is dense, offloading a lot of it to CPU/RAM can make it very slow. For that model, 24GB VRAM is strongly recommended.

If either model feels too slow, you can try a smaller Qwen model first, then switch back to Qwen3.6 when you need stronger reasoning or better coding quality.

You can also keep Ollama running in the background so Banana Code can connect to it instantly whenever you start a coding session.

9. Finished

You now have Banana Code connected to Qwen3.6 locally through Ollama.

From here, you can use Banana Code normally:

banana

Then ask it to inspect files, explain code, make edits, generate tests, or help debug your project while using a local model instead of a cloud API.

Sources: Qwen3.6 35B A3B model card; Qwen3.6 27B model card; Qwen3.6 27B blog post; Claude Sonnet 4.5 announcement.

License: Apache License, Version 2.0. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Trademark notice: The Qwen logo may be protected as a trademark in some jurisdictions. Its use here is only to identify Qwen in a guide about using Qwen3.6 with Banana Code and does not imply endorsement, sponsorship, or affiliation with Alibaba Cloud or Qwen.

May 10, 2026 · Guide

Using Banana Code Completly For Free & 100% Private

Free and local Banana Code with Ollama preview

Banana Code can run against local models, which means you can use it without paying for API tokens and without sending your prompts, code, or file contents to a cloud AI provider. The easiest way to do that is to run a model locally with Ollama and connect Banana Code to the local Ollama server.

1. Download Ollama

First, install Ollama from the official download page:

Download Ollama

Open Ollama after installing it. It runs a local server on your machine, usually at http://localhost:11434, which Banana Code can use as a provider.

2. Pull Gemma 4

Once Ollama is installed, open your terminal and download the model:

ollama pull gemma4

This downloads Gemma 4 to your machine. After that, the model runs locally through Ollama.

If your PC has no dedicated GPU, or your GPU has less than 12GB of VRAM, start with the smaller edge model instead:

ollama pull gemma4:e2b

You can still try the normal gemma4 model without a tag, but expect it to be slower on lower-end hardware. You can also choose other Gemma 4 tags from the Ollama Gemma 4 library page.

3. Install or Open Banana Code

If you do not have Banana Code installed yet, install it with npm:

npm install -g @banaxi/banana-code

Then open Banana Code in the project you want to work on:

banana

4. Select Ollama in Banana Code

During first-time setup, choose Ollama as your provider. If Banana Code is already set up, switch providers inside Banana Code:

/provider ollama

Use the local Ollama URL when asked:

http://localhost:11434

Then select gemma4 as the model. You can also use /model later to switch between the models installed on your machine.

Why This Is Free and Private

No paid API calls: the model runs through Ollama on your own computer.
No cloud model provider: your prompts and code are processed by the local model instead of OpenAI, Anthropic, Google, or another hosted API.
Your files stay local: Banana Code reads your project files locally and sends model requests to your local Ollama server.

For the most private setup, keep Banana Remote disabled and use the local Ollama provider. That gives you a fully local AI coding workflow with Banana Code and Gemma 4.

Short version: install Ollama, run ollama pull gemma4, open Banana Code, switch to /provider ollama, and start coding for free.

Image source: Ollama-logo.svg on Wikimedia Commons. Original source: ollama/ollama docs/ollama-logo.svg. Author listed by Wikimedia Commons: ParthSareen on ollama.

License: MIT/Expat License. Copyright © The author(s). Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: the above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The Software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the Software or the use or other dealings in the Software.

May 2026 · New Feature

Browser Use & Edit with AI: Code Changes from the Page Itself

Banana Code Studio now connects the AI coding workflow directly to the browser. Instead of describing a UI element from memory or pasting screenshots into chat, you can open the page, point at the exact element, and ask Banana Code to change the local source code that produced it.

🌐 Browser Use in Studio

Browser Use gives Banana Code a visible browser panel inside Studio. The assistant can open pages, inspect the current state, click, type, scroll, and capture page context while you watch. This makes UI work more grounded because the AI can reason about the running app instead of only reading files.

✏️ Edit with AI

Edit with AI is built for the moment when you are looking at the page and know exactly what should change. Press Ctrl+Alt+E, hover over any element, right-click, and choose Edit using AI. A small prompt opens next to the selected element, so you can type requests like Make this headline blue, Increase the card spacing, or Make this button more prominent.

When you send the prompt, Banana Code attaches the selected DOM element to the normal chat turn. The context includes the page URL, selector, XPath, visible text, attributes, nearby HTML, computed style, and framework source hints when available. That gives the coding agent enough detail to search the active workspace for matching components, classes, and text.

Local Code First

The feature is designed for real project work. If the active workspace contains the source code for the page, Banana Code can make the code changes locally using its normal file-editing flow. If the browser is on a site whose code is not in the workspace, Banana Code will say what it needs instead of pretending it can edit a remote page.

Why It Matters

Less explaining: Point at the exact element instead of describing where it is.
Better targeting: DOM context helps the AI find the right file, component, selector, or CSS rule.
Faster UI iteration: Ask for a change while looking at the running page, then review the local diff.
Persistent browser context: Studio remembers the browser page for the chat, so you can return to the same work without reopening the page manually.

How to Try It

Open Banana Code Studio, start or load a workspace, and ask the AI to open your local app in the browser. Once the page is visible, use Ctrl+Alt+E to enable element picking, choose Edit using AI, type a short instruction, and send it. The request will appear in the regular chat with the selected element attached.

Install or upgrade with npm install -g @banaxi/banana-code, then launch Studio to try Browser Use and Edit with AI.

April 23, 2026 · Release

BananaCode v2.4.0: DeepReview & Enhanced Personalization

BananaCode v2.4.0 is here, focusing on giving users more control over how the AI interacts and introducing a powerful new audit mode.

🔍 DeepReview: Full Codebase Audit

The new /deepreview command switches BananaCode into a specialized review mode. You can choose between a Full Review (auditing the entire current codebase) or a Diff Review (reviewing only staged/unstaged changes via git diff). In this mode, BananaCode focuses purely on providing a structured report with Critical, Warning, and Suggestion findings, without making any file modifications.

✨ Emoji & Style Personalization

We've added more ways to customize your AI pair programmer's personality:

Emoji Modes (/emoji): Choose between Normal, Minimal, or More emojis. Whether you want a strictly professional vibe or a lively, expressive output, the choice is yours.
Concise Style: Added to the /style command, the new Concise mode provides terse, code-first responses, skipping long preambles and summaries.

🛠️ New Tools & UI Polish

Rename File Tool: The new rename_file tool allows the agent to move or rename files and directories safely with your permission.
Dynamic Spinners: We've replaced the static "Thinking..." text with randomized, provider-specific verbs like "Clauding...", "Gemming...", and "Ollaming...".
Transparent Guard: Banana Guard now explicitly states the reason behind its auto-approval decisions in the terminal.

Bug Fixes & Reliability

We've improved tool execution error handling to better manage user cancellations and repair dangling tool calls. Additionally, the startup telemetry now correctly uses https for more secure connections.

Update now with npm install -g @banaxi/banana-code and try out the new /deepreview command!

April 2026 · New Feature

Local Intelligence: LM Studio Support is Here!

Banana Code has always been about flexibility, and today we're taking a huge leap towards local-first development. We are excited to announce full, first-class support for LM Studio.

Why LM Studio?

LM Studio has become the go-to tool for running large language models (LLMs) locally on your own hardware. By integrating LM Studio, Banana Code users can now leverage powerful models like Llama 3, Mistral, and many others without needing an API key or an active internet connection for the model inference.

First-Class Features

This isn't just a simple proxy; we've implemented a full provider suite tailored for the local experience:

Automatic Model Discovery: Banana Code can now talk to LM Studio to see which models you have currently loaded. No more manual typing of long model identifiers.
Full Tool Calling: Local models can now use the entire Banana Code tool suite. Whether it's reading files, running terminal commands, or searching the web, your local model is now an agent.
Streaming Responses: Get immediate feedback with real-time token streaming, just like with cloud providers.
OpenAI-Compatible: Leverages the standardized local server API provided by LM Studio for maximum compatibility.

Getting Started

Switching to LM Studio is simple. Just run the following command in your terminal:

/provider lmstudio

Banana Code will ask for your local server URL (defaulting to http://localhost:1234/v1) and then let you pick from your loaded models. You can also configure it during initial setup with banana --setup.

Optimized for Performance

We've included automatic JSON schema sanitization for local models, ensuring that even strict local inference engines can understand and use Banana Code's tool definitions without errors.

Download Banana Code using npm install -g @banaxi/banana-code and then download LM Studio at lmstudio.ai and start coding locally today!

April 2026 · Release

2.0.0 Released, What changed?

Version 2.0.0 is a major step forward for Banana Code as a terminal-native AI pair programmer. Here is a concise tour of what shipped, aligned with the actual app behavior.

Smarter Auto Mode (model + effort)

When you pick Auto Mode as your model, a small router model still picks the best concrete model for each user turn—but for Claude, it now also selects a reasoning effort level (low through max, including xhigh where supported). That keeps simple questions cheap and fast while reserving depth for hard tasks. Use /effort to adjust effort manually when you are on Claude.

Interactive terminal suite

Banana Code moves beyond one-shot shell runs. New tools drive a persistent PTY:

execute_command_in_terminal — start an interactive command (e.g. npm init, wizards).
send_to_terminal — send stdin (remember \n for Enter) for Y/N, prompts, or editors.
terminate_terminal_session — clean up when the session is done.

Together, these let the agent work through flows that used to stall on non-interactive runners—while one-off tasks still use execute_command.

Financial intelligence

For providers that expose usage (notably Anthropic), the app tracks real session spend and estimates what you saved with Prompt Caching. Run /context for a breakdown (messages, estimated tokens by category, cost, cache savings). On exit, you get a final session cost summary when costing is available.

Skill Creator mode

New command /skill-creator switches the assistant into a mode that helps you author Agent Skills: structured SKILL.md files with YAML frontmatter, written under ~/.config/banana-code/skills/<skill-name>/. The status bar shows SKILL CREATOR MODE; return with /agent.

New slash commands and style

/style — Normal, Explanatory, or Formal writing tone.
/effort — Claude reasoning effort (provider-specific tiers).
Documentation table and help output also list /skill-creator alongside existing plan/ask/security flows.

Built-in docs for the model

The get_banana_docs tool gives the model a reliable summary of Banana Code (plus README when present), so answers about slash commands and setup stay accurate.

UltraMemory (optional)

Enable UltraMemory under /settings to run background summarization of eligible chats into global memory. It can significantly increase API usage; the CLI asks for confirmation before turning it on, and only processes activity after you enable it.

Richer @-mentions

File mentions support quoted paths (spaces), ~ expansion, and attaching images via @@path for multimodal providers.

Headless API security

banana --api now uses a generated API token stored at ~/.config/banana-code/token.json. HTTP requests need Authorization: Bearer <token> or ?token=; WebSockets should connect with ?token=... unless you explicitly use --no-auth (discouraged).

Other polish

Claude: Opus 4.7 in the roster, prompt-cache-aware costing, extended streaming/thinking behavior where the API supports it.
Sessions: More reliable save paths on exit, Ctrl+C, and errors; terminal sessions are cleaned up on shutdown.
Startup: Refreshed ASCII banner and messaging.

Install or upgrade with npm install -g @banaxi/banana-code and read the full docs on the Docs page.

Banana Code Blog

All posts