Local Models - Parsaa Documentation

Overview

Local Models support allows you to run AI models directly on your machine, ensuring complete privacy and security of your code while providing fast, offline AI assistance.

Coming Soon: Local Models support is currently in development and will be available to beta users first.

Why Use Local Models?

Privacy and Security

Complete Privacy

No Data Transmission: Your code never leaves your machine
No Internet Required: Work completely offline
No Third-Party Access: No external services can access your code
Corporate Compliance: Meet strict security requirements

Data Control

Full Ownership: You control all your data
No Logging: No usage logs or analytics
Custom Policies: Set your own data retention rules
Audit Trail: Complete control over data access

Performance Benefits

Speed Advantages

No Network Latency: Instant responses
No Rate Limits: Use as much as you need
Consistent Performance: No network-related slowdowns
Predictable Costs: No per-request charges

Reliability

Always Available: No internet connection required
No Service Outages: Independent of external services
Consistent Quality: Same model performance every time
Custom Tuning: Optimize for your specific use cases

Supported Platforms

Ollama Integration

Easy Setup

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull codellama

# Start the model
ollama serve

Supported Models

Code Llama: Specialized for code generation
Llama 2: General-purpose language model
Mistral: Fast and efficient coding assistant
Custom Models: Your own fine-tuned models

LM Studio Integration

Advanced Configuration

Model Management: Easy model switching
Performance Tuning: Optimize for your hardware
Custom Prompts: Fine-tune model behavior
Resource Monitoring: Track GPU/CPU usage

Supported Formats

GGML: Optimized for CPU inference
GGUF: Next-generation format
ONNX: Cross-platform compatibility
Custom Formats: Your preferred model format

Hardware Requirements

Minimum Requirements

CPU-Only Setup

RAM: 16GB minimum (32GB recommended)
CPU: 8-core processor (Intel i7/AMD Ryzen 7 or better)
Storage: 50GB available space
OS: macOS 12.0 or later

GPU-Accelerated Setup

GPU: Apple Silicon M1/M2/M3 or NVIDIA GPU with 8GB+ VRAM
RAM: 32GB minimum (64GB recommended)
Storage: 100GB available space
Performance: 10x faster than CPU-only

Recommended Configurations

Basic Setup

Apple Silicon M1/M2
16GB RAM
256GB SSD
Good for small to medium projects

Professional Setup

Apple Silicon M2 Pro/Max
32GB RAM
512GB SSD
Ideal for large codebases

Enterprise Setup

Apple Silicon M3 Max
64GB+ RAM
1TB+ SSD
Maximum performance and capability

Custom Setup

Custom hardware configuration
Optimized for specific use cases
Maximum flexibility and control

Model Selection Guide

Code-Specific Models

Code Llama

Best for: General code generation and completion

Size: 7B, 13B, 34B parameters
Specialization: Code understanding and generation
Performance: Excellent for Swift and iOS development
Resource Usage: Moderate to high

Mistral 7B

Best for: Fast responses and efficient resource usage

Size: 7B parameters
Specialization: Balanced performance and efficiency
Performance: Good for most coding tasks
Resource Usage: Low to moderate

Custom Fine-tuned Models

Best for: Specialized use cases and company-specific needs

Size: Variable
Specialization: Trained on your specific codebase
Performance: Optimized for your patterns
Resource Usage: Depends on model size

Performance vs. Resource Trade-offs

Speed Priority
Quality Priority
Resource Constrained

Recommended: Code Llama 7B with GPU acceleration

Fast responses (1-3 seconds)
Good code quality
Moderate resource usage
Suitable for most developers

Setup and Configuration

Initial Setup

Install Ollama

# macOS installation
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

Download Models

# Download Code Llama (recommended)
ollama pull codellama:7b

# Or download Mistral for efficiency
ollama pull mistral:7b

Configure Parsaa

Open Parsaa settings in Xcode
Go to “Local Models” section
Select your preferred model
Configure performance settings

Test Setup

// Test with simple code
func hello() {
    print("Hello, Parsaa!")
}

If you get AI suggestions for this code, your setup is working!

Advanced Configuration

Performance Tuning

{
  "model": "codellama:7b",
  "gpu_layers": 32,
  "context_length": 4096,
  "batch_size": 512,
  "threads": 8
}

Memory Optimization

{
  "low_vram": true,
  "mmap": true,
  "mlock": false,
  "f16_kv": true
}

Custom Prompts

{
  "system_prompt": "You are a Swift expert specializing in iOS development...",
  "user_prompt": "Help me with this Swift code:",
  "assistant_prompt": "Here's how I can help you:"
}

Hybrid Mode

Cloud + Local Processing

Smart Routing

Simple Tasks: Process locally for speed and privacy
Complex Tasks: Use cloud for better quality
Fallback: Cloud processing when local model is unavailable
User Choice: Override automatic routing

Configuration

{
  "hybrid_mode": true,
  "local_threshold": 0.7,
  "cloud_models": ["gpt-4", "claude-3"],
  "local_models": ["codellama:7b"],
  "fallback_to_cloud": true
}

Privacy Levels

Maximum Privacy

All processing local
No data transmission
Complete offline operation
Corporate compliance

Balanced Approach

Sensitive code local
General queries cloud
User-controlled routing
Best of both worlds

Performance Priority

Cloud for complex tasks
Local for simple tasks
Automatic optimization
Maximum speed

Custom Rules

Define your own rules
Project-specific settings
Team-wide policies
Flexible configuration

Troubleshooting

Common Issues

Model Not Loading

Solutions:

Check available disk space
Verify model download completed
Restart Ollama service
Check system resources

Slow Performance

Solutions:

Reduce model size
Enable GPU acceleration
Increase system RAM
Optimize model parameters

Memory Issues

Solutions:

Reduce context length
Enable memory mapping
Close other applications
Use smaller models

Connection Problems

Solutions:

Check Ollama is running
Verify port configuration
Restart Parsaa extension
Check firewall settings

Performance Optimization

Hardware Optimization

GPU Acceleration: Use Apple Silicon or NVIDIA GPU
Memory: Increase RAM for larger models
Storage: Use SSD for faster model loading
Cooling: Ensure adequate cooling for sustained performance

Software Optimization

Model Quantization: Use quantized models for efficiency
Context Length: Optimize context window size
Batch Processing: Process multiple requests together
Caching: Cache frequent responses

Best Practices

Model Management

Model Selection
Resource Monitoring
Backup and Recovery

Start with smaller models
Test performance on your hardware
Upgrade to larger models as needed
Keep multiple models for different tasks

Security Considerations

Data Isolation

Keep models in secure directories
Use encrypted storage if needed
Restrict access permissions
Regular security audits

Network Security

Disable unnecessary network access
Use VPN for model downloads
Monitor network traffic
Keep software updated

Beta Feature: Local Models support is currently in development. Join our waitlist to get early access and help shape this privacy-focused feature.

Getting Started

Features

Advanced

Configuration

Advanced Features

​Overview

​Why Use Local Models?

​Privacy and Security

​Performance Benefits

​Supported Platforms

​Ollama Integration

​LM Studio Integration

​Hardware Requirements

​Minimum Requirements

​Recommended Configurations

Basic Setup

Professional Setup

Enterprise Setup

Custom Setup

​Model Selection Guide

​Code-Specific Models

​Performance vs. Resource Trade-offs

​Setup and Configuration

​Initial Setup

​Advanced Configuration

​Hybrid Mode

​Cloud + Local Processing

​Privacy Levels

Maximum Privacy

Balanced Approach

Performance Priority

Custom Rules

​Troubleshooting

​Common Issues

​Performance Optimization

​Best Practices

​Model Management

​Security Considerations

Data Isolation

Network Security

Overview

Why Use Local Models?

Privacy and Security

Performance Benefits

Supported Platforms

Ollama Integration

LM Studio Integration

Hardware Requirements

Minimum Requirements

Recommended Configurations

Model Selection Guide

Code-Specific Models

Performance vs. Resource Trade-offs

Setup and Configuration

Initial Setup

Advanced Configuration

Hybrid Mode

Cloud + Local Processing

Privacy Levels

Troubleshooting

Common Issues

Performance Optimization

Best Practices

Model Management

Security Considerations