Substack April 6, 2026 at 03:07 AM

running-google-gemma-4-locally-with

2 corrections found

Claim

LM Studio 0.4.0 also added an Anthropic-compatible endpoint at POST /v1/messages, which means tools that speak the Anthropic protocol can connect directly without an adapter.

Correction

This endpoint was introduced in LM Studio 0.4.1, not 0.4.0. LM Studio’s own release notes say 0.4.0 added the native stateful `/v1/chat` API, while `/v1/messages` arrived in 0.4.1.

Full reasoning

LM Studio's official 0.4.0 release post says that version introduced the new headless llmster daemon, parallel requests, and a stateful /v1/chat endpoint. It does not list an Anthropic-compatible /v1/messages endpoint as part of 0.4.0.

LM Studio's official API changelog then lists 0.4.1 as the release that added the Anthropic-compatible API: POST /v1/messages. LM Studio's separate Claude Code announcement says the same thing explicitly: "With LM Studio 0.4.1, we're introducing an Anthropic-compatible /v1/messages endpoint."

So the article is off by one release: the Anthropic-compatible endpoint existed by the time the article was published, but it was added in 0.4.1, not 0.4.0.

3 sources

Introducing LM Studio 0.4.0 | LM Studio Blog
This release introduces parallel requests with continuous batching for high throughput serving, all-new non-GUI deployment option, new stateful REST API... New stateful REST API endpoint: /v1/chat...
API Changelog | LM Studio Docs
LM Studio 0.4.1 — Anthropic-compatible API. New Anthropic-compatible endpoint: POST /v1/messages.
Use your LM Studio Models in Claude Code | LM Studio Blog
With LM Studio 0.4.1, we're introducing an Anthropic-compatible /v1/messages endpoint. This means you can use your local models with Claude Code!

Claim

The default is 3600 seconds (1 hour).

Correction

That is not the default for `lms load`. LM Studio’s docs say models loaded with `lms load` have no TTL by default; the 60-minute default applies to JIT-loaded models, not manually loaded ones.

Full reasoning

LM Studio's TTL docs distinguish between JIT-loaded models and models loaded manually with lms load.

For JIT-loaded models, the docs say the default TTL is 60 minutes.
But for models loaded with lms load, the docs say they do not have a TTL by default and stay in memory until manually unloaded, unless you explicitly pass --ttl.

So in the context of lms load google/gemma-4-26b-a4b --ttl 1800, saying "The default is 3600 seconds (1 hour)" is incorrect. A 1-hour TTL is something you can set manually with --ttl 3600; it is not the default behavior for lms load.

1 source

Idle TTL and Auto-Evict | LM Studio Docs
By default, JIT-loaded models have a TTL of 60 minutes... Set TTL for models loaded with lms: By default, models loaded with lms load do not have a TTL, and will remain loaded in memory until you manually unload them.

Model: OPENAI_GPT_5 Prompt: v1.16.0