Integration guide

Star LLM exposes internal AI models (chat, transcription, rerank) through one gateway. Access is via an MCP token issued by the platform admin — ask for one scoped to the tools you need. Flat internal cost: there is no per-call billing, only per-token rate limits.

1. Authentication

Every request carries your token as a bearer header. Treat it like a password: server-side only, in your secret manager, never in client code or git.

Authorization: Bearer slm_xxxxxxxxxxxxxxxxxxxx

2. MCP endpoint (primary)

JSON-RPC 2.0 over a single POST — the same shape as Star Drive's /api/mcp. Discover what your token can do:

curl -X POST https://api.llm.star.sa/api/mcp \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

tools/list returns only the tools your token is scoped for. Call one:

curl -X POST https://api.llm.star.sa/api/mcp \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
        "name":"chat.complete",
        "arguments":{"messages":[{"role":"user","content":"Summarize: ..."}],
                     "max_tokens":300,"temperature":0.3}}}'

3. Tools

ToolArgumentsReturns
chat.completemessages (role/content array), max_tokens?, temperature?content, usage, model, and clamped if policy adjusted your params
rerankquery, candidates ([{id, text}]), top_k?ranked ([{id, score}], best first)
request_audio_uploadfilename, content_type?upload_url (signed PUT, 15 min), gcs_uri
transcribe_audiogcs_uri (from upload step), language? (e.g. "ar")text, language, duration_s, segments

3b. Models currently serving (live)

ModelKindHow to use
Qwen/Qwen2.5-7B-Instruct-AWQllmpass as model in chat.complete
BAAI/bge-reranker-v2-m3rerankerrerank (optionally pass as model)
faster-whisper large-v3stttranscribe_audio

Your token may be restricted to a subset — tools/list is always the authoritative view for you.

4. Transcription flow (3 steps)

1. tools/call request_audio_upload {"filename":"meeting.wav","content_type":"audio/wav"}
2. HTTP PUT your audio bytes to upload_url (same Content-Type)
3. tools/call transcribe_audio {"gcs_uri": "...", "language": "ar"}   # omit language = auto

Your upload is deleted server-side immediately after a successful transcription. Tokens can only transcribe their own uploads.

5. REST mirrors (if you prefer plain REST)

EndpointEquivalent tool
POST https://api.llm.star.sa/v1/chatchat.complete
POST https://api.llm.star.sa/v1/rerankrerank
POST https://api.llm.star.sa/v1/uploadsrequest_audio_upload
POST https://api.llm.star.sa/v1/transcribetranscribe_audio

Same bearer token, same JSON bodies as the tool arguments.

6. Policies — what to expect

7. Errors

StatusMeaningWhat to do
401missing / unknown / revoked / expired tokencheck the header; ask admin to reissue
403token not scoped for that toolask admin to add the scope
429rate / daily / concurrency limitback off per retry-after
502model server errorretry once; report if persistent
503GPU VM down (gateway healthy)retry with backoff

8. Node.js example

async function mcpCall(name, args) {
  const res = await fetch("https://api.llm.star.sa/api/mcp", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.STAR_LLM_TOKEN}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/call",
                           params: { name, arguments: args } }),
    // transcribe can take minutes — do NOT use default fetch timeouts
    signal: AbortSignal.timeout(15 * 60_000),
  });
  if (res.status === 429) throw new Error(`rate limited, retry after ${res.headers.get("retry-after")}s`);
  const body = await res.json();
  if (body.error) throw new Error(body.error.message ?? JSON.stringify(body.error));
  return body.result.structuredContent;
}

9. Python example

import os, httpx

def mcp_call(name: str, args: dict):
    r = httpx.post(
        "https://api.llm.star.sa/api/mcp",
        headers={"Authorization": f"Bearer {os.environ['STAR_LLM_TOKEN']}"},
        json={"jsonrpc": "2.0", "id": 1, "method": "tools/call",
              "params": {"name": name, "arguments": args}},
        timeout=httpx.Timeout(900.0, connect=10.0),  # long audio is normal
    )
    if r.status_code == 429:
        raise RuntimeError(f"rate limited, retry after {r.headers['retry-after']}s")
    body = r.json()
    if "error" in body:
        raise RuntimeError(body["error"])
    return body["result"]["structuredContent"]

Need a token, higher limits, or another tool scope? Contact the platform admin — these are dashboard toggles, not deployments.