Was this helpful?
<|think|>
You are a careful coding assistant. Explain your answer clearly.You are a careful coding assistant. Explain your answer clearly.<|channel>thought
[internal reasoning]
<channel|>
[final answer]<|channel>thought
<channel|>
[final answer]<bos><|turn>system\n<|think|><turn|>\n<|turn>user\nWhat is the capital of France?<turn|>\n<|turn>model\n<|channel>thought\nThe user is asking for the capital of France.\nThe capital of France is Paris.<channel|>The capital of France is Paris.<turn|><bos><|turn>user\nWhat is 1+1?<turn|>\n<|turn>model\n2<turn|>\n<|turn>user\nWhat is 1+1?<turn|>\n<|turn>model\n2<turn|>\n<|turn>user\nWhat is 1+1?<turn|>\n<|turn>model\n2<turn|>\n<|turn>user\nWhat is 1+1?<turn|>\n<|turn>model\n2<turn|>\ncurl -fsSL https://unsloth.ai/install.sh | shirm https://unsloth.ai/install.ps1 | iexunsloth studio -H 0.0.0.0 -p 8888apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
-DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cppexport LLAMA_CACHE="unsloth/gemma-4-26B-A4B-it-GGUF"
./llama.cpp/llama-cli \
-hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL \
--temp 1.0 \
--top-p 0.95 \
--top-k 64export LLAMA_CACHE="unsloth/gemma-4-31B-it-GGUF"
./llama.cpp/llama-cli \
-hf unsloth/gemma-4-31B-it-GGUF:UD-Q4_K_XL \
--temp 1.0 \
--top-p 0.95 \
--top-k 64export LLAMA_CACHE="unsloth/gemma-4-E4B-it-GGUF"
./llama.cpp/llama-cli \
-hf unsloth/gemma-4-E4B-it-GGUF:Q8_0 \
--temp 1.0 \
--top-p 0.95 \
--top-k 64export LLAMA_CACHE="unsloth/gemma-4-E2B-it-GGUF"
./llama.cpp/llama-cli \
-hf unsloth/gemma-4-E2B-it-GGUF:Q8_0 \
--temp 1.0 \
--top-p 0.95 \
--top-k 64hf download unsloth/gemma-4-26B-A4B-it-GGUF \
--local-dir unsloth/gemma-4-26B-A4B-it-GGUF \
--include "*mmproj-BF16*" \
--include "*UD-Q4_K_XL*" # Use "*UD-Q2_K_XL*" for Dynamic 2bit./llama.cpp/llama-cli \
--model unsloth/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf \
--mmproj unsloth/gemma-4-26B-A4B-it-GGUF/mmproj-BF16.gguf \
--temp 1.0 \
--top-p 0.95 \
--top-k 64./llama.cpp/llama-server \
--model unsloth/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf \
--mmproj unsloth/gemma-4-26B-A4B-it-GGUF/mmproj-BF16.gguf \
--temp 1.0 \
--top-p 0.95 \
--top-k 64 \
--alias "unsloth/gemma-4-26B-A4B-it-GGUF" \
--port 8001 \
--chat-template-kwargs '{"enable_thinking":true}'curl -fsSL https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/scripts/install_gemma4_mlx.sh | sh
source ~/.unsloth/unsloth_gemma4_mlx/bin/activate
python -m mlx_vlm.chat --model unsloth/gemma-4-26b-a4b-it-UD-MLX-4bitSystem:
<|think|>
You are a precise reasoning assistant.
User:
A train leaves at 8:15 AM and arrives at 11:47 AM. How long was the journey?[image first]
Extract all text from this receipt. Return line items, total, merchant, and date as JSON.[image 1]
[image 2]
Compare these two screenshots and tell me which one is more likely to confuse a new user.[audio first]
Transcribe the following speech segment in English into English text.
Follow these specific instructions for formatting the answer:
* Only output the transcription, with no newlines.
* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.[audio first]
Transcribe the following speech segment in Spanish, then translate it into English.
When formatting the answer, first output the transcription in Spanish, then one newline, then output the string 'English: ', then the translation in English.Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text.
Follow these specific instructions for formatting the answer:
* Only output the transcription, with no newlines.
* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}.
When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}.






