CodexSigma — Build Your Own LLM

Why Local AI

Your Code, Your Model, Your Rules

Every other IDE sends your code to someone else's server. CodexSigma runs on your machine, with your model.

🔒

100% Private

Your code never leaves your laptop. No telemetry, no cloud calls, no data collection. Fine for HIPAA, PCI, GDPR from day one.

💰

Zero Per-Query Cost

Once the model is downloaded, every query is free. No API bills, no token counting, no surprise charges at scale.

⚡

Works Offline

No internet needed after setup. Works on planes, in dorms, behind firewalls, in air-gapped environments.

Model Options

Pick Your Model

CodexSigma works with any model that Ollama supports — from 1.5B to 70B parameters.

🤖

DeepSeek-Coder 6.7B

Best for general coding. Strong at Python, JavaScript, TypeScript. 4GB RAM. Runs on any laptop.

ollama pull deepseek-coder:6.7b

⚡

Qwen2.5-Coder 7B

Excellent instruction following. Supports tools/function calling natively. 4.5GB RAM.

ollama pull qwen2.5-coder:7b

🧠

CodeLlama 13B

Stronger reasoning for complex tasks. 8GB RAM. Recommended for teams with decent hardware.

ollama pull codellama:13b

🚀

Llama 3.2 3B

Fastest option. Runs on any machine including Raspberry Pi. Good for quick edits and simple tasks.

ollama pull llama3.2:3b

🏆

Qwen3-Coder 30B

Best quality. Near GPT-4 level coding. Requires 20GB RAM and GPU recommended.

ollama pull qwen3-coder:latest

🧠

EngineMind

Enterprise-grade LLM for code generation and reasoning. IBM-influenced architecture. Optimized for business logic and compliance workflows.

ollama pull enginemind
export AI_MODEL=enginemind

🍎

MLX Models (Mac)

Apple Silicon optimized. Runs 2-3x faster than Ollama on Mac. Native Metal acceleration.

pip install mlx-lm && mlx_lm.generate

Fine-Tune Your Own

Custom Model for Your Codebase

Fine-tune a model on your company's code, style, and standards. Then use it in CodexSigma.

Prepare your dataset

Collect code examples from your repos. Format as instruction-response pairs. Use axolotl or unsloth for training.

Fine-tune with LoRA

Train on a single GPU. LoRA adapters are ~100MB. Keeps the base model unchanged.

python -m axolotl.cli.train config.yml

Export to GGUF

Convert your fine-tuned model to GGUF format for Ollama compatibility.

python convert.py --outfile model.gguf

Create Ollama model

Create a Modelfile and import into Ollama. CodexSigma detects it automatically.

ollama create my-custom-model -f Modelfile
export AI_MODEL=my-custom-model

Quick Start

Use Your Model in CodexSigma

Once your model is in Ollama, CodexSigma finds it automatically.

# 1. List available models
ollama list

# 2. Set your model
export AI_MODEL=my-custom-model

# 3. Start CodexSigma
JWT_SECRET=$(node -e "console.log(require('crypto').randomBytes(48).toString('hex'))") \
AI_MODEL=my-custom-model \
npx tsx apps/ide/packages/server/src/index.ts

# ✓ Dr. Q will use your custom model

Multi-Provider Fallback

Never Get Stuck

Configure multiple models. CodexSigma falls through automatically if one fails.

🦙

Ollama (Local)

Your custom model runs first. Zero latency. No network needed.

🤖

Groq (Cloud, Free)

Free 70B model via API. Falls back if local model is overloaded.

☁️

OpenAI/Anthropic

Final fallback. Only used when local and free options are exhausted.

Bring Your OwnLocal LLM