OllamaMax now uses a unified model pulling system through the ollama_pull_and_run.sh script, which handles:
- Automatic Ollama installation (if needed)
- Hardware detection (GPU/CPU)
- Optimal quantization selection
- Model downloading and preparation
./ollama_pull_and_run.sh llama3.1:8b- Called automatically when a user selects a model from the dropdown
- Accepts model name as parameter
- Returns success/failure status
- Handles quantization automatically based on hardware
./ollama_pull_and_run.sh- Shows menu with all 15 models (A-O options)
- User selects model interactively
- Good for manual testing or setup
-
Frontend (index.html/script.js)
- User selects model from dropdown
- Sends model name to backend via WebSocket/API
-
Backend (main.go)
- Receives model selection
- Checks if model is installed (
checkModelInstalled()) - If not installed, calls
pullOllamaModel()
-
Model Pull (
pullOllamaModel())- Executes:
bash ./ollama_pull_and_run.sh <model-name> - Script handles:
- Ollama installation check
- Hardware detection (Apple Silicon, NVIDIA GPU, CPU)
- Quantization selection:
- Apple Silicon: Native (FP16)
- NVIDIA 6GB+:
:q5_k_m - NVIDIA <6GB or CPU:
:q4_0
- Falls back to direct
ollama pullif script fails
- Executes:
-
Response
- Success: Model ready for use
- Failure: Error message returned to user
The script automatically detects hardware and applies optimal quantization:
| Hardware | Detection | Quantization |
|---|---|---|
| Apple Silicon | sysctl -n machdep.cpu.brand_string |
Native (no suffix) |
| NVIDIA GPU 6GB+ | nvidia-smi VRAM check |
:q5_k_m |
| NVIDIA GPU <6GB | nvidia-smi VRAM check |
:q4_0 |
| CPU Only | No GPU detected | :q4_0 |
These models are used as-is without quantization suffixes:
phi3:minitinyllama:*moondream:*nomic-embed-textdeepseek-r1glm-4.6deepseek-v3.1qwen3-vl
- Original basic installer
- Kept for manual use
- Has its own menu system (A-O)
- Linux/Mac optimized version
- For advanced users
- Manual use only
# Test with a specific model
./ollama_pull_and_run.sh llama3.1:8b
# Should output:
# [*] OllamaMax Model Pull & Run
# [*] Model requested: llama3.1:8b
# [*] Ollama already installed
# [!] Hardware detection message
# [*] Processing model: llama3.1:8b
# [*] Model ready!# Run without parameters
./ollama_pull_and_run.sh
# Shows menu with A-O options
# User selects, model gets pulledchmod +x ollama_pull_and_run.sh- Check internet connection
- Ensure curl is installed
- Try manual install:
curl -fsSL https://ollama.com/install.sh | sh
- Script auto-detects hardware
- Check GPU detection:
nvidia-smi(Linux) orsysctl -n machdep.cpu.brand_string(Mac) - Override by editing the script's
detect_quant()function
- Check model name spelling
- Ensure model exists in Ollama registry
- Check disk space
- Try direct pull:
ollama pull model-name
/home/linux/Projects/Bots/OllamaBots/OllamaMax/
├── ollama_pull_and_run.sh # PRIMARY - Used by application
├── ollama_install_basic.sh # Manual use only
├── ollam-linuxmac-optimized.sh # Manual use only
├── main.go # Calls ollama_pull_and_run.sh
├── static/
│ ├── index.html # Model dropdown UI
│ └── script.js # Sends model selection to backend
└── MODELS_ALIGNMENT.md # Model list documentation