Skip to main content

Overview

Local Models support allows you to run AI models directly on your machine, ensuring complete privacy and security of your code while providing fast, offline AI assistance.
Coming Soon: Local Models support is currently in development and will be available to beta users first.

Why Use Local Models?

Privacy and Security

  • No Data Transmission: Your code never leaves your machine
  • No Internet Required: Work completely offline
  • No Third-Party Access: No external services can access your code
  • Corporate Compliance: Meet strict security requirements
  • Full Ownership: You control all your data
  • No Logging: No usage logs or analytics
  • Custom Policies: Set your own data retention rules
  • Audit Trail: Complete control over data access

Performance Benefits

  • No Network Latency: Instant responses
  • No Rate Limits: Use as much as you need
  • Consistent Performance: No network-related slowdowns
  • Predictable Costs: No per-request charges
  • Always Available: No internet connection required
  • No Service Outages: Independent of external services
  • Consistent Quality: Same model performance every time
  • Custom Tuning: Optimize for your specific use cases

Supported Platforms

Ollama Integration

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull codellama

# Start the model
ollama serve
  • Code Llama: Specialized for code generation
  • Llama 2: General-purpose language model
  • Mistral: Fast and efficient coding assistant
  • Custom Models: Your own fine-tuned models

LM Studio Integration

  • Model Management: Easy model switching
  • Performance Tuning: Optimize for your hardware
  • Custom Prompts: Fine-tune model behavior
  • Resource Monitoring: Track GPU/CPU usage
  • GGML: Optimized for CPU inference
  • GGUF: Next-generation format
  • ONNX: Cross-platform compatibility
  • Custom Formats: Your preferred model format

Hardware Requirements

Minimum Requirements

  • RAM: 16GB minimum (32GB recommended)
  • CPU: 8-core processor (Intel i7/AMD Ryzen 7 or better)
  • Storage: 50GB available space
  • OS: macOS 12.0 or later
  • GPU: Apple Silicon M1/M2/M3 or NVIDIA GPU with 8GB+ VRAM
  • RAM: 32GB minimum (64GB recommended)
  • Storage: 100GB available space
  • Performance: 10x faster than CPU-only

Basic Setup

  • Apple Silicon M1/M2
  • 16GB RAM
  • 256GB SSD
  • Good for small to medium projects

Professional Setup

  • Apple Silicon M2 Pro/Max
  • 32GB RAM
  • 512GB SSD
  • Ideal for large codebases

Enterprise Setup

  • Apple Silicon M3 Max
  • 64GB+ RAM
  • 1TB+ SSD
  • Maximum performance and capability

Custom Setup

  • Custom hardware configuration
  • Optimized for specific use cases
  • Maximum flexibility and control

Model Selection Guide

Code-Specific Models

Best for: General code generation and completion
  • Size: 7B, 13B, 34B parameters
  • Specialization: Code understanding and generation
  • Performance: Excellent for Swift and iOS development
  • Resource Usage: Moderate to high
Best for: Fast responses and efficient resource usage
  • Size: 7B parameters
  • Specialization: Balanced performance and efficiency
  • Performance: Good for most coding tasks
  • Resource Usage: Low to moderate
Best for: Specialized use cases and company-specific needs
  • Size: Variable
  • Specialization: Trained on your specific codebase
  • Performance: Optimized for your patterns
  • Resource Usage: Depends on model size

Performance vs. Resource Trade-offs

  • Speed Priority
  • Quality Priority
  • Resource Constrained
Recommended: Code Llama 7B with GPU acceleration
  • Fast responses (1-3 seconds)
  • Good code quality
  • Moderate resource usage
  • Suitable for most developers

Setup and Configuration

Initial Setup

1

Install Ollama

# macOS installation
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version
2

Download Models

# Download Code Llama (recommended)
ollama pull codellama:7b

# Or download Mistral for efficiency
ollama pull mistral:7b
3

Configure Parsaa

  1. Open Parsaa settings in Xcode
  2. Go to “Local Models” section
  3. Select your preferred model
  4. Configure performance settings
4

Test Setup

// Test with simple code
func hello() {
    print("Hello, Parsaa!")
}
If you get AI suggestions for this code, your setup is working!

Advanced Configuration

{
  "model": "codellama:7b",
  "gpu_layers": 32,
  "context_length": 4096,
  "batch_size": 512,
  "threads": 8
}
{
  "low_vram": true,
  "mmap": true,
  "mlock": false,
  "f16_kv": true
}
{
  "system_prompt": "You are a Swift expert specializing in iOS development...",
  "user_prompt": "Help me with this Swift code:",
  "assistant_prompt": "Here's how I can help you:"
}

Hybrid Mode

Cloud + Local Processing

  • Simple Tasks: Process locally for speed and privacy
  • Complex Tasks: Use cloud for better quality
  • Fallback: Cloud processing when local model is unavailable
  • User Choice: Override automatic routing
{
  "hybrid_mode": true,
  "local_threshold": 0.7,
  "cloud_models": ["gpt-4", "claude-3"],
  "local_models": ["codellama:7b"],
  "fallback_to_cloud": true
}

Privacy Levels

Maximum Privacy

  • All processing local
  • No data transmission
  • Complete offline operation
  • Corporate compliance

Balanced Approach

  • Sensitive code local
  • General queries cloud
  • User-controlled routing
  • Best of both worlds

Performance Priority

  • Cloud for complex tasks
  • Local for simple tasks
  • Automatic optimization
  • Maximum speed

Custom Rules

  • Define your own rules
  • Project-specific settings
  • Team-wide policies
  • Flexible configuration

Troubleshooting

Common Issues

Solutions:
  1. Check available disk space
  2. Verify model download completed
  3. Restart Ollama service
  4. Check system resources
Solutions:
  1. Reduce model size
  2. Enable GPU acceleration
  3. Increase system RAM
  4. Optimize model parameters
Solutions:
  1. Reduce context length
  2. Enable memory mapping
  3. Close other applications
  4. Use smaller models
Solutions:
  1. Check Ollama is running
  2. Verify port configuration
  3. Restart Parsaa extension
  4. Check firewall settings

Performance Optimization

  • GPU Acceleration: Use Apple Silicon or NVIDIA GPU
  • Memory: Increase RAM for larger models
  • Storage: Use SSD for faster model loading
  • Cooling: Ensure adequate cooling for sustained performance
  • Model Quantization: Use quantized models for efficiency
  • Context Length: Optimize context window size
  • Batch Processing: Process multiple requests together
  • Caching: Cache frequent responses

Best Practices

Model Management

  • Model Selection
  • Resource Monitoring
  • Backup and Recovery
  • Start with smaller models
  • Test performance on your hardware
  • Upgrade to larger models as needed
  • Keep multiple models for different tasks

Security Considerations

Data Isolation

  • Keep models in secure directories
  • Use encrypted storage if needed
  • Restrict access permissions
  • Regular security audits

Network Security

  • Disable unnecessary network access
  • Use VPN for model downloads
  • Monitor network traffic
  • Keep software updated
Beta Feature: Local Models support is currently in development. Join our waitlist to get early access and help shape this privacy-focused feature.