Overview
Local Models support allows you to run AI models directly on your machine, ensuring complete privacy and security of your code while providing fast, offline AI assistance.Coming Soon: Local Models support is currently in development and will be available to beta users first.
Why Use Local Models?
Privacy and Security
Complete Privacy
Complete Privacy
- No Data Transmission: Your code never leaves your machine
- No Internet Required: Work completely offline
- No Third-Party Access: No external services can access your code
- Corporate Compliance: Meet strict security requirements
Data Control
Data Control
- Full Ownership: You control all your data
- No Logging: No usage logs or analytics
- Custom Policies: Set your own data retention rules
- Audit Trail: Complete control over data access
Performance Benefits
Speed Advantages
Speed Advantages
- No Network Latency: Instant responses
- No Rate Limits: Use as much as you need
- Consistent Performance: No network-related slowdowns
- Predictable Costs: No per-request charges
Reliability
Reliability
- Always Available: No internet connection required
- No Service Outages: Independent of external services
- Consistent Quality: Same model performance every time
- Custom Tuning: Optimize for your specific use cases
Supported Platforms
Ollama Integration
Easy Setup
Easy Setup
Supported Models
Supported Models
- Code Llama: Specialized for code generation
- Llama 2: General-purpose language model
- Mistral: Fast and efficient coding assistant
- Custom Models: Your own fine-tuned models
LM Studio Integration
Advanced Configuration
Advanced Configuration
- Model Management: Easy model switching
- Performance Tuning: Optimize for your hardware
- Custom Prompts: Fine-tune model behavior
- Resource Monitoring: Track GPU/CPU usage
Supported Formats
Supported Formats
- GGML: Optimized for CPU inference
- GGUF: Next-generation format
- ONNX: Cross-platform compatibility
- Custom Formats: Your preferred model format
Hardware Requirements
Minimum Requirements
CPU-Only Setup
CPU-Only Setup
- RAM: 16GB minimum (32GB recommended)
- CPU: 8-core processor (Intel i7/AMD Ryzen 7 or better)
- Storage: 50GB available space
- OS: macOS 12.0 or later
GPU-Accelerated Setup
GPU-Accelerated Setup
- GPU: Apple Silicon M1/M2/M3 or NVIDIA GPU with 8GB+ VRAM
- RAM: 32GB minimum (64GB recommended)
- Storage: 100GB available space
- Performance: 10x faster than CPU-only
Recommended Configurations
Basic Setup
- Apple Silicon M1/M2
- 16GB RAM
- 256GB SSD
- Good for small to medium projects
Professional Setup
- Apple Silicon M2 Pro/Max
- 32GB RAM
- 512GB SSD
- Ideal for large codebases
Enterprise Setup
- Apple Silicon M3 Max
- 64GB+ RAM
- 1TB+ SSD
- Maximum performance and capability
Custom Setup
- Custom hardware configuration
- Optimized for specific use cases
- Maximum flexibility and control
Model Selection Guide
Code-Specific Models
Code Llama
Code Llama
Best for: General code generation and completion
- Size: 7B, 13B, 34B parameters
- Specialization: Code understanding and generation
- Performance: Excellent for Swift and iOS development
- Resource Usage: Moderate to high
Mistral 7B
Mistral 7B
Best for: Fast responses and efficient resource usage
- Size: 7B parameters
- Specialization: Balanced performance and efficiency
- Performance: Good for most coding tasks
- Resource Usage: Low to moderate
Custom Fine-tuned Models
Custom Fine-tuned Models
Best for: Specialized use cases and company-specific needs
- Size: Variable
- Specialization: Trained on your specific codebase
- Performance: Optimized for your patterns
- Resource Usage: Depends on model size
Performance vs. Resource Trade-offs
- Speed Priority
- Quality Priority
- Resource Constrained
Recommended: Code Llama 7B with GPU acceleration
- Fast responses (1-3 seconds)
- Good code quality
- Moderate resource usage
- Suitable for most developers
Setup and Configuration
Initial Setup
1
Install Ollama
2
Download Models
3
Configure Parsaa
- Open Parsaa settings in Xcode
- Go to “Local Models” section
- Select your preferred model
- Configure performance settings
4
Test Setup
If you get AI suggestions for this code, your setup is working!
Advanced Configuration
Performance Tuning
Performance Tuning
Memory Optimization
Memory Optimization
Custom Prompts
Custom Prompts
Hybrid Mode
Cloud + Local Processing
Smart Routing
Smart Routing
- Simple Tasks: Process locally for speed and privacy
- Complex Tasks: Use cloud for better quality
- Fallback: Cloud processing when local model is unavailable
- User Choice: Override automatic routing
Configuration
Configuration
Privacy Levels
Maximum Privacy
- All processing local
- No data transmission
- Complete offline operation
- Corporate compliance
Balanced Approach
- Sensitive code local
- General queries cloud
- User-controlled routing
- Best of both worlds
Performance Priority
- Cloud for complex tasks
- Local for simple tasks
- Automatic optimization
- Maximum speed
Custom Rules
- Define your own rules
- Project-specific settings
- Team-wide policies
- Flexible configuration
Troubleshooting
Common Issues
Model Not Loading
Model Not Loading
Solutions:
- Check available disk space
- Verify model download completed
- Restart Ollama service
- Check system resources
Slow Performance
Slow Performance
Solutions:
- Reduce model size
- Enable GPU acceleration
- Increase system RAM
- Optimize model parameters
Memory Issues
Memory Issues
Solutions:
- Reduce context length
- Enable memory mapping
- Close other applications
- Use smaller models
Connection Problems
Connection Problems
Solutions:
- Check Ollama is running
- Verify port configuration
- Restart Parsaa extension
- Check firewall settings
Performance Optimization
Hardware Optimization
Hardware Optimization
- GPU Acceleration: Use Apple Silicon or NVIDIA GPU
- Memory: Increase RAM for larger models
- Storage: Use SSD for faster model loading
- Cooling: Ensure adequate cooling for sustained performance
Software Optimization
Software Optimization
- Model Quantization: Use quantized models for efficiency
- Context Length: Optimize context window size
- Batch Processing: Process multiple requests together
- Caching: Cache frequent responses
Best Practices
Model Management
- Model Selection
- Resource Monitoring
- Backup and Recovery
- Start with smaller models
- Test performance on your hardware
- Upgrade to larger models as needed
- Keep multiple models for different tasks
Security Considerations
Data Isolation
- Keep models in secure directories
- Use encrypted storage if needed
- Restrict access permissions
- Regular security audits
Network Security
- Disable unnecessary network access
- Use VPN for model downloads
- Monitor network traffic
- Keep software updated
Beta Feature: Local Models support is currently in development. Join our waitlist to get early access and help shape this privacy-focused feature.
