Integration guide
Star LLM exposes internal AI models (chat, transcription, rerank) through one gateway. Access is via an MCP token issued by the platform admin — ask for one scoped to the tools you need. Flat internal cost: there is no per-call billing, only per-token rate limits.
1. Authentication
Every request carries your token as a bearer header. Treat it like a password: server-side only, in your secret manager, never in client code or git.
Authorization: Bearer slm_xxxxxxxxxxxxxxxxxxxx
2. MCP endpoint (primary)
JSON-RPC 2.0 over a single POST — the same shape as Star Drive's /api/mcp. Discover what your token can do:
curl -X POST https://api.llm.star.sa/api/mcp \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'tools/list returns only the tools your token is scoped for. Call one:
curl -X POST https://api.llm.star.sa/api/mcp \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
"name":"chat.complete",
"arguments":{"messages":[{"role":"user","content":"Summarize: ..."}],
"max_tokens":300,"temperature":0.3}}}'3. Tools
| Tool | Arguments | Returns |
|---|---|---|
chat.complete | messages (role/content array), max_tokens?, temperature? | content, usage, model, and clamped if policy adjusted your params |
rerank | query, candidates ([{id, text}]), top_k? | ranked ([{id, score}], best first) |
request_audio_upload | filename, content_type? | upload_url (signed PUT, 15 min), gcs_uri |
transcribe_audio | gcs_uri (from upload step), language? (e.g. "ar") | text, language, duration_s, segments |
3b. Models currently serving (live)
| Model | Kind | How to use |
|---|---|---|
Qwen/Qwen2.5-7B-Instruct-AWQ | llm | pass as model in chat.complete |
BAAI/bge-reranker-v2-m3 | reranker | rerank (optionally pass as model) |
faster-whisper large-v3 | stt | transcribe_audio |
Your token may be restricted to a subset — tools/list is always the authoritative view for you.
4. Transcription flow (3 steps)
1. tools/call request_audio_upload {"filename":"meeting.wav","content_type":"audio/wav"}
2. HTTP PUT your audio bytes to upload_url (same Content-Type)
3. tools/call transcribe_audio {"gcs_uri": "...", "language": "ar"} # omit language = autoYour upload is deleted server-side immediately after a successful transcription. Tokens can only transcribe their own uploads.
5. REST mirrors (if you prefer plain REST)
| Endpoint | Equivalent tool |
|---|---|
POST https://api.llm.star.sa/v1/chat | chat.complete |
POST https://api.llm.star.sa/v1/rerank | rerank |
POST https://api.llm.star.sa/v1/uploads | request_audio_upload |
POST https://api.llm.star.sa/v1/transcribe | transcribe_audio |
Same bearer token, same JSON bodies as the tool arguments.
6. Policies — what to expect
- Clamping, not rejection: if you ask for
temperature=1.5but your policy caps at 0.7, the call runs at 0.7 and the response includes"clamped": {"temperature": 0.7}. Same formax_tokens. - Rate limits return 429 with a
retry-afterheader (seconds) and JSONretry_after_s. Respect it — denied calls don't consume quota. - Long calls are fine: transcription connections are held up to 15 minutes. Set your client timeout above your longest audio, not below.
7. Errors
| Status | Meaning | What to do |
|---|---|---|
| 401 | missing / unknown / revoked / expired token | check the header; ask admin to reissue |
| 403 | token not scoped for that tool | ask admin to add the scope |
| 429 | rate / daily / concurrency limit | back off per retry-after |
| 502 | model server error | retry once; report if persistent |
| 503 | GPU VM down (gateway healthy) | retry with backoff |
8. Node.js example
async function mcpCall(name, args) {
const res = await fetch("https://api.llm.star.sa/api/mcp", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.STAR_LLM_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/call",
params: { name, arguments: args } }),
// transcribe can take minutes — do NOT use default fetch timeouts
signal: AbortSignal.timeout(15 * 60_000),
});
if (res.status === 429) throw new Error(`rate limited, retry after ${res.headers.get("retry-after")}s`);
const body = await res.json();
if (body.error) throw new Error(body.error.message ?? JSON.stringify(body.error));
return body.result.structuredContent;
}9. Python example
import os, httpx
def mcp_call(name: str, args: dict):
r = httpx.post(
"https://api.llm.star.sa/api/mcp",
headers={"Authorization": f"Bearer {os.environ['STAR_LLM_TOKEN']}"},
json={"jsonrpc": "2.0", "id": 1, "method": "tools/call",
"params": {"name": name, "arguments": args}},
timeout=httpx.Timeout(900.0, connect=10.0), # long audio is normal
)
if r.status_code == 429:
raise RuntimeError(f"rate limited, retry after {r.headers['retry-after']}s")
body = r.json()
if "error" in body:
raise RuntimeError(body["error"])
return body["result"]["structuredContent"]Need a token, higher limits, or another tool scope? Contact the platform admin — these are dashboard toggles, not deployments.