Release notes and deeper dives. More posts will show up here over time.
May 16, 2026 · Guide
Using Qwen3.6 35B A3B With Banana Code
Qwen3.6 35B A3B is a strong local coding model that you can run on your own machine and connect to Banana Code through Ollama. This gives you a private local setup where Banana Code can work with your project without sending prompts, source code, or file contents to a cloud AI provider.
The main benefit is simple: you can use Banana Code with a capable local model instead of paying per-token API costs or relying on cloud rate limits.
Qwen3.6 35B A3B is especially interesting because it gets close to frontier cloud coding models on several coding benchmarks. It is not automatically better than Claude Sonnet 4.5 overall, but for a local open-weight model, the benchmark results are very good.
Hardware Requirements
Qwen3.6 35B A3B and Qwen3.6 27B are not the same type of model.
Qwen3.6 35B A3B is a MoE model. That means it has 35B total parameters, but only part of the model is active per token. Because of that, it is more realistic to run on consumer hardware with partial RAM offload.
Qwen3.6 27B is a dense model. That means the full model is active during inference. Because of that, CPU/RAM offload is much worse for speed. For Qwen3.6 27B, 24GB VRAM is the realistic recommendation if you want a usable local coding experience.
| Model |
Type |
Hardware recommendation |
| Qwen3.6 35B A3B |
MoE |
16GB VRAM minimum with RAM offload. 24GB VRAM highly recommended. |
| Qwen3.6 27B |
Dense |
24GB VRAM recommended. CPU/RAM offload is possible, but it will usually be extremely slow. |
| Claude Sonnet 4.5 |
Cloud model |
No local VRAM needed, but prompts and code are sent to a cloud provider. |
If you only have 16GB VRAM, Qwen3.6 35B A3B is the better local choice. It can work with RAM offload, although it will be slower than running more of the model on GPU.
If you have 24GB VRAM or more, Qwen3.6 27B becomes very interesting. It is dense, strong, and performs very well on coding benchmarks, but it needs more VRAM to feel usable locally.
Benchmark Comparison
These benchmark results show why Qwen3.6 is interesting for local coding. Qwen3.6 35B A3B is already very good for a local MoE model, while Qwen3.6 27B is even stronger on several coding benchmarks.
| Benchmark |
Qwen3.6 35B A3B |
Qwen3.6 27B Dense |
Claude Sonnet 4.5 |
Notes |
| SWE-bench Verified |
73.4 |
77.2 |
77.2 primary / 82.0 high-compute |
Qwen3.6 27B matches Sonnet 4.5's primary reported SWE-bench Verified score. That is very good for a local model. |
| SWE-bench Multilingual |
67.2 |
Not listed here |
Not listed here |
Qwen3.6 35B A3B has a strong multilingual software-engineering result. |
| SWE-bench Pro |
49.5 |
53.5 |
Not listed here |
Qwen3.6 27B is stronger than 35B A3B on this harder coding-agent benchmark. |
| Terminal-Bench 2.0 |
51.5 |
59.3 |
Not listed here |
Qwen3.6 27B is much stronger for terminal-based agent tasks. |
| LiveCodeBench v6 |
80.4 |
Not listed here |
Not listed here |
Qwen3.6 35B A3B already performs strongly on coding. |
| GPQA |
86.0 |
Not listed here |
Not listed here |
Strong reasoning and knowledge result for 35B A3B. |
| AIME 2026 |
92.7 |
Not listed here |
Not listed here |
Very strong math benchmark result for 35B A3B. |
The important takeaway: Claude Sonnet 4.5 is still a top-tier cloud model and is likely stronger overall for complex long-running agentic coding. But Qwen3.6 35B A3B reaching 73.4 on SWE-bench Verified is very good for a local model.
Qwen3.6 27B is even more impressive if you have enough VRAM. It is a dense 27B model and reaches 77.2 on SWE-bench Verified, matching Sonnet 4.5's primary reported score in this comparison. For a local model, that is very good.
For 16GB VRAM, use Qwen3.6 35B A3B with RAM offload.
For 24GB VRAM or more, Qwen3.6 27B is probably the better coding model to try first.
1. Install Ollama
First, install Ollama from the official download page:
Download Ollama
After installing it, open Ollama once so the local server starts.
Ollama usually runs a local API server at:
http://localhost:11434
Banana Code can connect to this local server and use Ollama as a model provider.
2. Download Qwen3.6 35B A3B
Open your terminal and run:
ollama pull qwen3.6:35b-a3b
This downloads Qwen3.6 35B A3B into Ollama.
This is a large model, so the download can take a while. Make sure you have enough disk space before starting.
You can test the model directly with:
ollama run qwen3.6:35b-a3b
Then try a simple prompt:
Write a simple JavaScript function that checks whether a number is prime.
If the model responds, Ollama is working.
3. Optional: Download Qwen3.6 27B Instead
If you have 24GB VRAM or more and want to try the stronger dense coding model, you can also download Qwen3.6 27B:
ollama pull qwen3.6:27b
Then run it with:
ollama run qwen3.6:27b
Only use Qwen3.6 27B if your hardware can handle it. Because it is dense, CPU/RAM offload will usually make it extremely slow compared with running it mostly or fully on GPU.
If you are on 16GB VRAM, Qwen3.6 35B A3B is usually the more realistic choice.
4. Start Banana Code
Go into the project you want to work on:
cd your-project
Then start Banana Code:
banana
During setup, choose Ollama as the provider.
When Banana Code asks for the Ollama server URL, use:
http://localhost:11434
For the model name, use Qwen3.6 35B A3B:
qwen3.6:35b-a3b
Or, if you downloaded Qwen3.6 27B and have enough VRAM, use:
qwen3.6:27b
After this, Banana Code will send requests to your local Ollama server instead of a cloud provider.
5. Test the Connection
Inside Banana Code, try a small coding request:
Explain the structure of this project and suggest the first file I should inspect.
Or:
Find the main entry point of this project.
If Banana Code responds using Qwen3.6, the setup is working.
6. Why Use Qwen3.6 Locally?
Running Qwen3.6 through Ollama gives you several practical benefits.
| Benefit |
Why it matters |
| Local privacy |
Your prompts and code stay on your machine. |
| No per-token API cost |
You do not pay for every input and output token. |
| No cloud rate limits |
You are limited by your own hardware instead of provider quotas. |
| Good coding performance |
Qwen3.6 performs very well for a local open-weight coding model. |
| Works with Banana Code |
You can use an AI coding assistant workflow without relying on a cloud model. |
This is especially useful for private repositories, local experiments, and projects where you do not want to upload source code to external model providers.
7. Which Qwen3.6 Model Should You Choose?
Use this simple rule:
| Your hardware |
Recommended model |
| 16GB VRAM |
Qwen3.6 35B A3B with RAM offload |
| 24GB VRAM |
Qwen3.6 27B or Qwen3.6 35B A3B |
| More than 24GB VRAM |
Try Qwen3.6 27B first for coding |
| CPU-only |
Not recommended for either model unless you are only testing and can accept very slow output |
Qwen3.6 35B A3B is better if you need something that can survive on lower VRAM with offload.
Qwen3.6 27B is better if you have enough VRAM and want the stronger dense coding model.
8. Performance Notes
Qwen3.6 35B A3B is large, so performance depends heavily on your hardware.
If you have 16GB VRAM, it can be usable with RAM offload, but it may feel slower. If you have 24GB VRAM or more, the experience should be much better.
Qwen3.6 27B is different. Since it is dense, offloading a lot of it to CPU/RAM can make it very slow. For that model, 24GB VRAM is strongly recommended.
If either model feels too slow, you can try a smaller Qwen model first, then switch back to Qwen3.6 when you need stronger reasoning or better coding quality.
You can also keep Ollama running in the background so Banana Code can connect to it instantly whenever you start a coding session.
9. Finished
You now have Banana Code connected to Qwen3.6 locally through Ollama.
From here, you can use Banana Code normally:
banana
Then ask it to inspect files, explain code, make edits, generate tests, or help debug your project while using a local model instead of a cloud API.
Sources: Qwen3.6 35B A3B model card; Qwen3.6 27B model card; Qwen3.6 27B blog post; Claude Sonnet 4.5 announcement.
Image source: Qwen_Logo.svg on Wikimedia Commons. Copyright © Alibaba Cloud.
License: Apache License, Version 2.0. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Trademark notice: The Qwen logo may be protected as a trademark in some jurisdictions. Its use here is only to identify Qwen in a guide about using Qwen3.6 with Banana Code and does not imply endorsement, sponsorship, or affiliation with Alibaba Cloud or Qwen.
May 10, 2026 · Guide
Using Banana Code Completly For Free & 100% Private
Banana Code can run against local models, which means you can use it without paying for API tokens and without sending your prompts, code, or file contents to a cloud AI provider. The easiest way to do that is to run a model locally with Ollama and connect Banana Code to the local Ollama server.
1. Download Ollama
First, install Ollama from the official download page:
Download Ollama
Open Ollama after installing it. It runs a local server on your machine, usually at http://localhost:11434, which Banana Code can use as a provider.
2. Pull Gemma 4
Once Ollama is installed, open your terminal and download the model:
ollama pull gemma4
This downloads Gemma 4 to your machine. After that, the model runs locally through Ollama.
If your PC has no dedicated GPU, or your GPU has less than 12GB of VRAM, start with the smaller edge model instead:
ollama pull gemma4:e2b
You can still try the normal gemma4 model without a tag, but expect it to be slower on lower-end hardware. You can also choose other Gemma 4 tags from the Ollama Gemma 4 library page.
3. Install or Open Banana Code
If you do not have Banana Code installed yet, install it with npm:
npm install -g @banaxi/banana-code
Then open Banana Code in the project you want to work on:
banana
4. Select Ollama in Banana Code
During first-time setup, choose Ollama as your provider. If Banana Code is already set up, switch providers inside Banana Code:
/provider ollama
Use the local Ollama URL when asked:
http://localhost:11434
Then select gemma4 as the model. You can also use /model later to switch between the models installed on your machine.
Why This Is Free and Private
- No paid API calls: the model runs through Ollama on your own computer.
- No cloud model provider: your prompts and code are processed by the local model instead of OpenAI, Anthropic, Google, or another hosted API.
- Your files stay local: Banana Code reads your project files locally and sends model requests to your local Ollama server.
For the most private setup, keep Banana Remote disabled and use the local Ollama provider. That gives you a fully local AI coding workflow with Banana Code and Gemma 4.
Short version: install Ollama, run ollama pull gemma4, open Banana Code, switch to /provider ollama, and start coding for free.
Image source: Ollama-logo.svg on Wikimedia Commons. Original source: ollama/ollama docs/ollama-logo.svg. Author listed by Wikimedia Commons: ParthSareen on ollama.
License: MIT/Expat License. Copyright © The author(s). Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: the above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The Software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the Software or the use or other dealings in the Software.
May 2026 · New Feature
Browser Use & Edit with AI: Code Changes from the Page Itself
Banana Code Studio now connects the AI coding workflow directly to the browser. Instead of describing a UI element from memory or pasting screenshots into chat, you can open the page, point at the exact element, and ask Banana Code to change the local source code that produced it.
🌐 Browser Use in Studio
Browser Use gives Banana Code a visible browser panel inside Studio. The assistant can open pages, inspect the current state, click, type, scroll, and capture page context while you watch. This makes UI work more grounded because the AI can reason about the running app instead of only reading files.
✏️ Edit with AI
Edit with AI is built for the moment when you are looking at the page and know exactly what should change. Press Ctrl+Alt+E, hover over any element, right-click, and choose Edit using AI. A small prompt opens next to the selected element, so you can type requests like Make this headline blue, Increase the card spacing, or Make this button more prominent.
When you send the prompt, Banana Code attaches the selected DOM element to the normal chat turn. The context includes the page URL, selector, XPath, visible text, attributes, nearby HTML, computed style, and framework source hints when available. That gives the coding agent enough detail to search the active workspace for matching components, classes, and text.
Local Code First
The feature is designed for real project work. If the active workspace contains the source code for the page, Banana Code can make the code changes locally using its normal file-editing flow. If the browser is on a site whose code is not in the workspace, Banana Code will say what it needs instead of pretending it can edit a remote page.
Why It Matters
- Less explaining: Point at the exact element instead of describing where it is.
- Better targeting: DOM context helps the AI find the right file, component, selector, or CSS rule.
- Faster UI iteration: Ask for a change while looking at the running page, then review the local diff.
- Persistent browser context: Studio remembers the browser page for the chat, so you can return to the same work without reopening the page manually.
How to Try It
Open Banana Code Studio, start or load a workspace, and ask the AI to open your local app in the browser. Once the page is visible, use Ctrl+Alt+E to enable element picking, choose Edit using AI, type a short instruction, and send it. The request will appear in the regular chat with the selected element attached.
Install or upgrade with npm install -g @banaxi/banana-code, then launch Studio to try Browser Use and Edit with AI.
April 2026 · Release
2.0.0 Released, What changed?
Version 2.0.0 is a major step forward for Banana Code as a terminal-native AI pair programmer. Here is a concise tour of what shipped, aligned with the actual app behavior.
Smarter Auto Mode (model + effort)
When you pick Auto Mode as your model, a small router model still picks the best concrete model for each user turn—but for Claude, it now also selects a reasoning effort level (low through max, including xhigh where supported). That keeps simple questions cheap and fast while reserving depth for hard tasks. Use /effort to adjust effort manually when you are on Claude.
Interactive terminal suite
Banana Code moves beyond one-shot shell runs. New tools drive a persistent PTY:
execute_command_in_terminal — start an interactive command (e.g. npm init, wizards).
send_to_terminal — send stdin (remember \n for Enter) for Y/N, prompts, or editors.
terminate_terminal_session — clean up when the session is done.
Together, these let the agent work through flows that used to stall on non-interactive runners—while one-off tasks still use execute_command.
Financial intelligence
For providers that expose usage (notably Anthropic), the app tracks real session spend and estimates what you saved with Prompt Caching. Run /context for a breakdown (messages, estimated tokens by category, cost, cache savings). On exit, you get a final session cost summary when costing is available.
Skill Creator mode
New command /skill-creator switches the assistant into a mode that helps you author Agent Skills: structured SKILL.md files with YAML frontmatter, written under ~/.config/banana-code/skills/<skill-name>/. The status bar shows SKILL CREATOR MODE; return with /agent.
New slash commands and style
/style — Normal, Explanatory, or Formal writing tone.
/effort — Claude reasoning effort (provider-specific tiers).
- Documentation table and help output also list
/skill-creator alongside existing plan/ask/security flows.
Built-in docs for the model
The get_banana_docs tool gives the model a reliable summary of Banana Code (plus README when present), so answers about slash commands and setup stay accurate.
UltraMemory (optional)
Enable UltraMemory under /settings to run background summarization of eligible chats into global memory. It can significantly increase API usage; the CLI asks for confirmation before turning it on, and only processes activity after you enable it.
Richer @-mentions
File mentions support quoted paths (spaces), ~ expansion, and attaching images via @@path for multimodal providers.
Headless API security
banana --api now uses a generated API token stored at ~/.config/banana-code/token.json. HTTP requests need Authorization: Bearer <token> or ?token=; WebSockets should connect with ?token=... unless you explicitly use --no-auth (discouraged).
Other polish
- Claude: Opus 4.7 in the roster, prompt-cache-aware costing, extended streaming/thinking behavior where the API supports it.
- Sessions: More reliable save paths on exit, Ctrl+C, and errors; terminal sessions are cleaned up on shutdown.
- Startup: Refreshed ASCII banner and messaging.
Install or upgrade with npm install -g @banaxi/banana-code and read the full docs on the Docs page.