How do I use the BazaarLink API?

Use any OpenAI-compatible SDK. Set the base URL to https://bazaarlink.ai/api/v1 and use your BazaarLink API key (sk-bl-...). Model IDs include the provider prefix, e.g. openai/gpt-4o, anthropic/claude-sonnet-4-6, deepseek/deepseek-chat.

BazaarLink 支援哪些 SDK？

BazaarLink 相容所有 OpenAI SDK，包括 Python openai、TypeScript openai、LangChain、LlamaIndex、Vercel AI SDK 等。只需更改 base_url 即可使用。

How do I use embeddings on BazaarLink?

Use POST /api/v1/embeddings with any supported embedding model like openai/text-embedding-3-small. The request format is identical to OpenAI's embeddings API. BazaarLink supports 20+ embedding models.

BazaarLink 的 Worker Network 是什麼？

Worker Network 目前暫停使用。原先讓 GPU 擁有者可以連接自己的 GPU 來服務推論請求、賺取獎勵，其他用戶可以用更低費率呼叫社群托管的模型。

Does BazaarLink support tool calling and structured output?

Yes. BazaarLink fully supports OpenAI-compatible tool calling (function calling), structured outputs (JSON mode), streaming, and prompt caching across supported models.

Is BazaarLink's chat completions API OpenAI-compatible?

Yes. Point any OpenAI SDK at https://bazaarlink.ai/api/v1 and pass your BazaarLink API key as OPENAI_API_KEY. All standard parameters (messages, temperature, stream, tools, response_format, tool_choice) are supported across 300+ models.

How do I stream responses from the chat completions endpoint?

Pass stream: true in your request body. BazaarLink returns a Server-Sent Events stream identical to OpenAI's format — each chunk is a delta with choices[0].delta.content. Works with the official OpenAI SDK's .stream() helpers.

Can I call multiple models with fallback if one fails?

Yes. Use BazaarLink's provider routing — pass models: ['anthropic/claude-sonnet-4-6', 'openai/gpt-5.2'] in your request and BazaarLink will try them in order. You can also use reserved model IDs like auto:free to route to any available free model.

Does the API support tool calling / function calling?

Yes. Pass a tools array with function definitions in the standard OpenAI format. BazaarLink translates to Anthropic's tool_use / Google's functionCalls automatically when routing to those providers. Works with parallel tool calls.

What are the rate limits for the API?

Free tier has a shared rate limit suitable for development. Higher limits are available per-key via the Subscription API — see /docs/subscription for plan quotas and upgrade flow. Rate limit headers (X-RateLimit-*) are returned on every response.

文件 API 參考 SDK 參考 Agent 應用 Worker 網路暫停使用訂閱制暫停使用 AI Skills

API Reference

對話完成

主要端點。與 OpenAI Chat Completions API 相容。

POST/api/v1/chat/completions

請求內文

model必填

string

模型 ID，例如 "openai/gpt-4o" 或 "anthropic/claude-3.5-sonnet"

messages必填

Message[]

包含 role 和 content 的訊息物件陣列

stream

boolean

若為 true，返回 Server-Sent Events 串流。預設：false

temperature

number

取樣溫度 0–2。越高越隨機。預設：1

max_tokens

integer

要生成的最大 token 數量

max_completion_tokens

integer

max_tokens 的別名（OpenAI o 系列相容）。兩者皆支援，以提供的為準

top_p

number

核取樣概率質量。預設：1

top_k

integer

限制候選 token 數量。0 表示停用（考慮全部）。預設：0

frequency_penalty

number

懲罰重複的 token。範圍：[-2, 2]。預設：0

presence_penalty

number

基於存在情況懲罰 token。範圍：[-2, 2]。預設：0

repetition_penalty

number

降低輸入中 token 重複的機率。範圍：(0, 2]。預設：1

min_p

number

相對最高機率 token 的最低入選機率。範圍：[0, 1]。預設：0

top_a

number

基於最高機率 token 的動態 Top-P。範圍：[0, 1]。預設：0

seed

integer

整數隨機種子，用於確定性取樣。部分模型不保證

n

integer

產生的完成數量。預設：1

user

string

終端使用者識別碼，用於監控與濫用偵測。對計費無影響

stop

string | string[]

停止序列 — 遇到時停止生成

logit_bias

object

將 token ID 映射到偏差値 [-100, 100]，在取樣前加到機率

logprobs

boolean

回傳每個輸出 token 的對數機率

top_logprobs

integer

每個位置回傳概率最高的 N 個候選 token（需配合 logprobs: true）。範圍：0–20

tools

Tool[]

模型可以呼叫的工具（函式）列表

tool_choice

string | object

控制工具使用："auto"、"none" 或指定工具

parallel_tool_calls

boolean

啟用並行工具呼叫功能。預設：true

response_format

object

強制結構化 JSON 輸出。請參閱結構化輸出章節

transforms

string[]

要套用的訊息轉換，例如 ["middle-out"]。省略則在 ≤8k context 模型自動套用

models

string[]

備用模型清單——BazaarLink 依序嘗試，主模型失敗時自動切換

route

string

設為 "fallback" 則啟用透過 models 陣列的瀊流路由

provider

object

Provider 路由偏好設定——order、only、ignore、sort、allow_fallbacks

debug

object

除錯選項。echo_upstream_body: true 會將轉換後的請求 body 作為第一個 SSE chunk 回傳（僅限串流模式）

請求結構 (TypeScript)

typescript

type Request = {
  // Required
  model: string;                    // "provider/model-name"
  messages: Message[];

  // Common
  stream?: boolean;                 // Default: true
  temperature?: number;             // Range: [0, 2], default: 0.7
  max_tokens?: number;              // Range: [1, context_length)
  n?: number;                       // Default: 1
  seed?: integer;                   // Deterministic sampling
  stop?: string | string[];

  // Sampling
  top_p?: number;                   // Range: (0, 1]
  top_k?: integer;                  // Default: 0 (disabled)
  frequency_penalty?: number;       // Range: [-2, 2]
  presence_penalty?: number;        // Range: [-2, 2]
  repetition_penalty?: number;      // Range: (0, 2], default: 1
  min_p?: number;                   // Range: [0, 1]
  top_a?: number;                   // Range: [0, 1]

  // Logprobs
  logit_bias?: Record<number, number>;  // Token ID → bias [-100, 100]
  logprobs?: boolean;
  top_logprobs?: number;            // Range: [0, 20], requires logprobs: true

  // Tools & output
  tools?: Tool[];
  tool_choice?: ToolChoice;
  parallel_tool_calls?: boolean;    // Default: true
  response_format?: ResponseFormat;

  // BazaarLink-only
  transforms?: string[];            // e.g. ["middle-out"]
  models?: string[];                // Fallback model list
  route?: "fallback";
  provider?: ProviderPreferences;
  debug?: {
    echo_upstream_body?: boolean;   // Streaming only
  };
};

type Message =
  | { role: "system" | "user" | "assistant"; content: string | ContentPart[] }
  | { role: "tool"; content: string; tool_call_id: string };

type ContentPart =
  | { type: "text"; text: string }
  | { type: "image_url"; image_url: { url: string; detail?: string } };

type Tool = {
  type: "function";
  function: {
    name: string;
    description?: string;
    parameters: object;  // JSON Schema
  };
};

type ToolChoice =
  | "none" | "auto" | "required"
  | { type: "function"; function: { name: string } };

type ResponseFormat =
  | { type: "json_object" }
  | { type: "json_schema"; json_schema: { name: string; strict?: boolean; schema: object } };

type ProviderPreferences = {
  order?: string[];
  only?: string[];
  ignore?: string[];
  allow_fallbacks?: boolean;
  sort?: "price" | "latency" | "throughput";
};

範例請求

bash

curl https://bazaarlink.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $BAZAARLINK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in one paragraph."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

回應

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1740000000,
  "model": "openai/gpt-4.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing leverages quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 74,
    "total_tokens": 102,
    "cost": 0.0006480,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

回應結構 (TypeScript)

typescript

type Response = {
  id: string;
  object: "chat.completion" | "chat.completion.chunk";
  created: number;                 // Unix timestamp
  model: string;
  choices: (NonStreamingChoice | StreamingChoice)[];
  usage?: ResponseUsage;
  cost?: number;                   // Total cost in USD
};

type NonStreamingChoice = {
  index: number;
  finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | null;
  native_finish_reason: string | null;  // Provider's original finish reason
  message: {
    role: "assistant";
    content: string | null;
    tool_calls?: ToolCall[];
  };
};

type StreamingChoice = {
  index: number;
  finish_reason: string | null;
  native_finish_reason: string | null;  // Provider's original finish reason
  delta: {
    role?: string;
    content?: string | null;
    tool_calls?: ToolCall[];
  };
};

type ResponseUsage = {
  prompt_tokens: number;
  completion_tokens: number;
  total_tokens: number;
  cost: number;                      // Total cost for this request in USD
  prompt_tokens_details?: {
    cached_tokens: number;           // Tokens served from prompt cache (reduced cost)
    cache_write_tokens?: number;     // Tokens written to cache in this request
    audio_tokens?: number;
  };
  completion_tokens_details?: {
    reasoning_tokens?: number;       // Thinking/reasoning tokens (e.g. o3, Qwen3, DeepSeek R1)
    image_tokens?: number;
  };
};

type ToolCall = {
  id: string;
  type: "function";
  function: { name: string; arguments: string };
};

Responses API

相容 OpenAI Responses API 格式的端點，支援無狀態多輪對話、工具呼叫與多模態輸入。適用於使用 OpenAI Python SDK ≥ 1.x 的 client.responses.create() 的 Agent 框架。

POST/api/v1/responses

Note

使用與 Chat Completions 相同的身份驗證與模型路由邏輯。

請求內文

model必填

string

模型 ID，例如 "openai/gpt-4o" 或 "anthropic/claude-3.5-sonnet"

input必填

string | Item[]

使用者輸入 — 純字串（單則訊息）或輸入項目陣列（多輪 / 多模態對話）。

instructions

string

系統層級指令，等同於 system 角色的訊息。每次請求都必須重新傳送。

stream

boolean

若為 true，回傳 Responses API SSE 串流事件，包含 response.created、response.output_text.delta、response.completed 等事件類型。

max_output_tokens

integer

最大輸出 Token 數量（o 系列推理模型包含推理 Token）。

temperature

number

取樣溫度 0–2。越高越隨機。預設：1

top_p

number

核取樣概率質量。預設：1

tools

Tool[]

工具（函式）定義，格式與 Chat Completions 相同的 JSON Schema。不支援內建工具（web_search、file_search、computer_use）。

tool_choice

string | object

控制工具使用："auto"、"none" 或指定工具

parallel_tool_calls

boolean

啟用並行工具呼叫功能。預設：true

response_format

object

強制結構化 JSON 輸出。請參閱結構化輸出章節

models

string[]

備用模型清單——BazaarLink 依序嘗試，主模型失敗時自動切換

transforms

string[]

要套用的訊息轉換，例如 ["middle-out"]。省略則在 ≤8k context 模型自動套用

previous_response_id

string

本實作不支援此欄位。請使用無狀態模式：在 input 陣列中帶入完整對話歷史。

provider

object

Provider 路由偏好設定——order、only、ignore、sort、allow_fallbacks

請求結構 (TypeScript)

typescript

type ResponsesRequest = {
  model: string;                    // "provider/model-name"
  input: string | InputItem[];      // string or multi-turn array

  // Optional
  instructions?: string;            // System-level message
  stream?: boolean;                 // Default: false
  max_output_tokens?: number;
  temperature?: number;             // Range: [0, 2], default: 0.7
  top_p?: number;
  tools?: Tool[];
  tool_choice?: "auto" | "none" | "required" | object;
  parallel_tool_calls?: boolean;    // Default: true
  previous_response_id?: string;    // Not supported — use full input array
  provider?: ProviderPreferences;   // Same as Chat Completions
};

type InputItem =
  | { type?: "message"; role: "user" | "assistant" | "system" | "developer"; content: string | ContentBlock[] }
  | { type: "function_call_output"; call_id: string; output: string }   // tool result
  | { type: "function_call"; call_id: string; name: string; arguments: string };

type ContentBlock =
  | { type: "input_text"; text: string }
  | { type: "input_image"; image_url: string; detail?: "auto" | "low" | "high" };

範例請求

bash

curl https://bazaarlink.ai/api/v1/responses \
  -H "Authorization: Bearer $ROUTEFREE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "instructions": "You are a helpful assistant.",
    "input": "What is the capital of Taiwan?"
  }'

回應格式

typescript

// Non-streaming response object
type ResponsesResponse = {
  id: string;             // "resp_..."
  object: "response";
  created_at: number;
  completed_at: number;
  status: "completed" | "failed" | "incomplete";
  model: string;
  output: OutputItem[];
  usage: {
    input_tokens: number;   // equivalent to prompt_tokens
    output_tokens: number;  // equivalent to completion_tokens
    total_tokens: number;
    cost?: number;          // actual cost in credits
  } | null;
  error: null | { code: string; message: string };
};

type OutputItem =
  | {
      type: "message";
      id: string;
      role: "assistant";
      status: "completed";
      content: Array<{ type: "output_text"; text: string; annotations: [] }>;
    }
  | { type: "function_call"; id: string; call_id: string; name: string; arguments: string; status: "completed" };

從 Chat Completions 遷移

將 messages 改為 input（字串或陣列），以 instructions 取代 system 角色訊息，並從 output[0].content[0].text 讀取回應內容（原為 choices[0].message.content）。

python

# Chat Completions (before)
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user",   "content": "Hello"},
    ]
)
text = response.choices[0].message.content

# Responses API (after)
response = client.responses.create(
    model="openai/gpt-4o-mini",
    instructions="You are helpful.",
    input="Hello"
)
text = response.output[0].content[0].text

限制事項

previous_response_id 會被接受但忽略 — 請使用無狀態模式（傳入完整 input 陣列）。
不支援內建工具（web_search_preview、file_search、computer_use_preview）。
不支援 background: true（非同步執行）。

模型

列出所有可用模型及其定價和能力資訊。此端點不需要身份驗證。

GET/api/v1/models

bash

curl https://bazaarlink.ai/api/v1/models

回應

json

{
  "data": [
    {
      "id": "openai/gpt-4.1",
      "name": "GPT 4.1",
      "context_length": 1047576,
      "modality": "text+image+file->text",
      "pricing": {
        "prompt": "2.00",
        "completion": "8.00"
      }
    }
  ]
}

typescript

// /v1/models — Response Schema
type ModelsResponse = {
  data: Model[];
};

type Model = {
  id: string;                    // Model ID (e.g. "openai/gpt-4.1")
  name: string;                  // Human-readable name
  context_length: number | null; // Max context window in tokens
  modality: string | null;       // e.g. "text->text", "text+image->text"
  pricing: {
    prompt: string;              // Input price per 1M tokens (USD)
    completion: string;          // Output price per 1M tokens (USD)
  };
  description?: string | null;   // Model description
  top_provider?: {
    max_completion_tokens?: number;
  };
  supported_parameters?: string[]; // e.g. ["tools", "response_format", "reasoning"]
};

可用模型 (357)

以下是目前 BazaarLink 上可用的模型，從資料庫動態載入：

OpenAI

openai/gpt-5.3-codex400K ctx · $1.22/$9.80text+image->text

openai/text-embedding-3-small8K ctx · $0.02/$undefinedtext->embeddings

openai/text-embedding-ada-0028K ctx · $0.10/$undefinedtext->embeddings

openai/text-embedding-3-large8K ctx · $0.13/$undefinedtext->embeddings

openai/gpt-4.11048K ctx · $2.00/$8.00text+image+file->text

openai/gpt-5400K ctx · $1.25/$10.00text+image+file->text

openai/gpt-4o-search-preview · $2.50/$10.00

openai/gpt-5-chat128K ctx · $1.25/$10.00text+image+file->text

openai/gpt-5.1-codex-mini400K ctx · $0.25/$2.00text+image->text

openai/gpt-5.1400K ctx · $1.25/$10.00text+image+file->text

openai/gpt-5.4-mini400K ctx · $0.75/$4.50text+image+file->text

openai/gpt-5.4-nano400K ctx · $0.20/$1.25text+image+file->text

openai/gpt-4-0314 · $30.00/$60.00

openai/gpt-4-turbo · $10.00/$30.00

openai/gpt-4o-2024-05-13 · $5.00/$15.00

openai/gpt-4o · $2.50/$10.00

openai/gpt-4o-mini-2024-07-18 · $0.15/$0.60

openai/gpt-4o-mini · $0.15/$0.60

openai/gpt-4o-mini-search-preview · $0.15/$0.60

openai/gpt-5.1-chat128K ctx · $1.25/$10.00text+image+file->text

openai/gpt-5-mini400K ctx · $0.25/$2.00text+image+file->text

openai/gpt-5-codex400K ctx · $1.25/$10.00text+image->text

openai/gpt-5.1-codex400K ctx · $1.25/$10.00text+image->text

openai/gpt-5.2-codex400K ctx · $1.75/$14.00text+image->text

openai/gpt-5.2-chat128K ctx · $1.75/$14.00text+image+file->text

openai/gpt-3.5-turbo · $0.50/$1.50

openai/gpt-3.5-turbo-0613 · $1.00/$2.00

openai/gpt-3.5-turbo-16k · $3.00/$4.00

openai/gpt-4 · $30.00/$60.00

openai/gpt-3.5-turbo-instruct · $1.50/$2.00

openai/gpt-4-turbo-preview · $10.00/$30.00

openai/gpt-4o-2024-11-20 · $2.50/$10.00

openai/gpt-4o-2024-08-06 · $2.50/$10.00

openai/gpt-4o:extended · $6.00/$18.00

openai/gpt-oss-120b:exacto · $0.04/$0.19

openai/gpt-oss-120b · $0.04/$0.19

openai/gpt-oss-safeguard-20b · $0.07/$0.30

openai/gpt-oss-20b · $0.03/$0.14

openai/gpt-5.1-codex-max400K ctx · $1.25/$10.00text+image->text

openai/gpt-5.2-pro400K ctx · $21.00/$168.00text+image+file->text

openai/o1-pro200K ctx · $150.00/$600.00text+image+file->text

openai/o3-mini-high200K ctx · $1.10/$4.40text+file->text

openai/o3-pro200K ctx · $20.00/$80.00text+image+file->text

openai/o3-mini200K ctx · $1.10/$4.40text+file->text

openai/o4-mini200K ctx · $1.10/$4.40text+image+file->text

openai/gpt-4-1106-preview128K ctx · $10.00/$30.00text->text

openai/gpt-4.1-mini1048K ctx · $0.40/$1.60text+image+file->text

openai/gpt-4.1-nano1048K ctx · $0.10/$0.40text+image+file->text

openai/gpt-5-pro400K ctx · $15.00/$120.00text+image+file->text

openai/gpt-5-nano400K ctx · $0.05/$0.40text+image+file->text

openai/gpt-5.2400K ctx · $1.75/$14.00text+image+file->text

openai/o1200K ctx · $15.00/$60.00text+image+file->text

openai/o3200K ctx · $2.00/$8.00text+image+file->text

openai/o3-deep-research200K ctx · $10.00/$40.00text+image+file->text

openai/o4-mini-high200K ctx · $1.10/$4.40text+image+file->text

openai/o4-mini-deep-research200K ctx · $2.00/$8.00text+image+file->text

openai/gpt-5.3-chat128K ctx · $1.75/$14.00text+image+file->text

openai/gpt-5.41050K ctx · $2.50/$15.00text+image+file->text

openai/gpt-5.4-pro1050K ctx · $30.00/$180.00text+image+file->text

Qwen

qwen/qwen3.5-35b-a3b262K ctx · $0.16/$1.30text+image+video->text

qwen/qwen3.5-27b262K ctx · $0.20/$1.56text+image+video->text

qwen/qwen3.5-122b-a10b262K ctx · $0.26/$2.08text+image+video->text

qwen/qwen3.5-flash-02-231000K ctx · $0.10/$0.40text+image+video->text

qwen/qwen-plus-2025-07-28 · $0.40/$1.20

qwen/qwen3-embedding-4b · $0.02/$undefined

qwen/qwen3-embedding-8b · $0.01/$undefined

qwen/qwen3-vl-235b-a22b-thinking · $0.26/$2.60

qwen/qwen2.5-vl-72b-instruct · $0.80/$0.80

qwen/qwen3-30b-a3b-thinking-2507 · $0.05/$0.34

qwen/qwen3-coder-30b-a3b-instruct · $0.07/$0.27

qwen/qwen3-vl-30b-a3b-instruct · $0.13/$0.52

qwen/qwen3-vl-32b-instruct · $0.10/$0.42

qwen/qwen3.5-plus-02-15 · $0.40/$2.40

qwen/qwen3.5-397b-a17b · $0.55/$3.50

qwen/qwen3-235b-a22b-thinking-2507 · $0.15/$1.50

qwen/qwen-2.5-72b-instruct · $0.12/$0.39

qwen/qwen-2.5-7b-instruct · $0.04/$0.10

qwen/qwen-2.5-vl-7b-instruct · $0.20/$0.20

qwen/qwen-2.5-coder-32b-instruct · $0.20/$0.20

qwen/qwen-plus · $0.40/$1.20

qwen/qwen-max · $1.60/$6.40

qwen/qwen-turbo · $0.05/$0.20

qwen/qwen-plus-2025-07-28:thinking · $0.40/$1.20

qwen/qwen-vl-max · $0.80/$3.20

qwen/qwen2.5-vl-32b-instruct · $0.20/$0.60

qwen/qwen-vl-plus · $0.21/$0.63

qwen/qwen2.5-coder-7b-instruct · $0.03/$0.09

qwen/qwen3-235b-a22b · $0.46/$1.82

qwen/qwen3-14b · $0.06/$0.24

qwen/qwen3-235b-a22b-2507 · $0.07/$0.10

qwen/qwen3-30b-a3b · $0.08/$0.28

qwen/qwen3-vl-30b-a3b-thinking · $0.13/$1.56

qwen/qwen3-32b · $0.08/$0.24

qwen/qwen3-30b-a3b-instruct-2507 · $0.09/$0.30

qwen/qwen3-coder · $0.22/$1.00

qwen/qwen3-8b · $0.05/$0.40

qwen/qwen3-coder-next · $0.12/$0.75

qwen/qwen3-coder-flash · $0.30/$1.50

qwen/qwen3-coder:exacto · $0.22/$1.80

qwen/qwen3-coder-plus · $1.00/$5.00

qwen/qwen3-next-80b-a3b-instruct · $0.09/$1.10

qwen/qwen3-max · $1.20/$6.00

qwen/qwen3-max-thinking · $1.20/$6.00

qwen/qwen3-vl-235b-a22b-instruct · $0.20/$0.88

qwen/qwen3-next-80b-a3b-thinking · $0.15/$1.20

qwen/qwen3-vl-8b-thinking · $0.12/$1.36

qwen/qwen3-vl-8b-instruct · $0.08/$0.50

qwen/qwq-32b · $0.15/$0.40

qwen/qwen3.5-9b262K ctx · $0.10/$0.15text+image+video->text

qwen/qwen3.6-plus1000K ctx · $0.33/$1.95text+image+video->text

Mistral

mistralai/mistral-7b-instruct-v0.2 · $0.20/$0.20

mistralai/mistral-small-3.1-24b-instruct · $0.35/$0.56

mistralai/mistral-small-3.2-24b-instruct · $0.06/$0.18

mistralai/mixtral-8x7b-instruct · $0.54/$0.54

mistralai/voxtral-small-24b-2507 · $0.10/$0.30

mistralai/mixtral-8x22b-instruct66K ctx · $2.00/$6.00text->text

mistralai/pixtral-large-2411131K ctx · $2.00/$6.00text+image->text

mistralai/mistral-small-24b-instruct-2501 · $0.05/$0.08

mistralai/mistral-small-creative · $0.10/$0.30

mistralai/mistral-embed-2312 · $0.10/$undefined

mistralai/codestral-embed-2505 · $0.15/$undefined

mistralai/codestral-2508 · $0.30/$0.90

mistralai/devstral-2512 · $0.40/$2.00

mistralai/devstral-medium · $0.40/$2.00

mistralai/ministral-3b-2512 · $0.10/$0.10

mistralai/ministral-8b-2512 · $0.15/$0.15

mistralai/ministral-14b-2512 · $0.20/$0.20

mistralai/mistral-7b-instruct · $0.20/$0.20

mistralai/mistral-7b-instruct-v0.1 · $0.11/$0.19

mistralai/mistral-large · $2.00/$6.00

mistralai/mistral-large-2407 · $2.00/$6.00

mistralai/mistral-7b-instruct-v0.3 · $0.20/$0.20

mistralai/mistral-medium-3 · $0.40/$2.00

mistralai/mistral-large-2411 · $2.00/$6.00

mistralai/mistral-large-2512 · $0.50/$1.50

mistralai/mistral-nemo · $0.02/$0.04

mistralai/mistral-saba · $0.20/$0.60

mistralai/mistral-medium-3.1131K ctx · $0.40/$2.00text+image->text

mistralai/mistral-small-2603262K ctx · $0.15/$0.60text+image->text

mistralai/devstral-small131K ctx · $0.10/$0.30text->text

Google

google/gemini-3.1-flash-lite-preview1049K ctx · $0.25/$1.50text+image+file+audio+video->text

google/gemini-3.1-pro-preview · $2.00/$12.00

google/gemini-3.1-pro-preview-customtools1049K ctx · $2.00/$12.00text+image+file+audio+video->text

google/gemini-2.5-pro-preview-05-06 · $1.25/$10.00

google/gemini-embedding-001 · $0.15/$undefined

google/gemini-2.0-flash-001 · $0.10/$0.40

google/gemini-2.0-flash-lite-001 · $0.07/$0.30

google/gemini-2.5-flash · $0.30/$2.50

google/gemini-2.5-flash-lite-preview-09-2025 · $0.10/$0.40

google/gemini-2.5-flash-lite · $0.10/$0.40

google/gemini-2.5-pro · $1.25/$10.00

google/gemini-2.5-pro-preview · $1.25/$10.00

google/gemini-3-pro-preview · $2.00/$12.00

google/gemini-3-flash-preview · $0.50/$3.00

google/gemma-2-27b-it · $0.65/$0.65

google/gemma-2-9b-it · $0.03/$0.09

google/gemma-3-12b-it · $0.04/$0.13

google/gemma-3-27b-it · $0.04/$0.15

google/gemma-3-4b-it · $0.04/$0.08

google/gemma-3n-e4b-it33K ctx · $0.02/$0.04text->text

google/gemma-4-31b-it262K ctx · $0.14/$0.40text+image+video->text

google/gemma-4-26b-a4b-it262K ctx · $0.13/$0.40text+image+video->text

Anthropic

anthropic/claude-sonnet-4.61000K ctx · $3.00/$15.00text+image->text

anthropic/claude-opus-4.6 · $5.00/$25.00

anthropic/claude-3.5-sonnet · $6.00/$30.00

anthropic/claude-3.5-haiku · $0.80/$4.00

anthropic/claude-3.7-sonnet · $3.00/$15.00

anthropic/claude-haiku-4.5 · $1.00/$5.00

anthropic/claude-opus-4 · $15.00/$75.00

anthropic/claude-opus-4.5 · $5.00/$25.00

anthropic/claude-sonnet-4.5 · $3.00/$15.00

anthropic/claude-sonnet-4 · $3.00/$15.00

anthropic/claude-3.7-sonnet:thinking200K ctx · $3.00/$15.00text+image+file->text

anthropic/claude-opus-4.6-fast1000K ctx · $30.00/$150.00text+image->text

anthropic/claude-opus-4.1 · $15.00/$75.00

anthropic/claude-3-haiku · $0.25/$1.25

anthropic/claude-opus-4.71000K ctx · $5.00/$25.00text+image->text

串流

設定 stream: true 以接收 Server-Sent Events (SSE) 串流。每個事件包含一個回應片段。

python

from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Count to 10 slowly."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

SSE 格式

bash

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"},"index":0}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{},"finish_reason":"stop","index":0}],"usage":{"prompt_tokens":10,"completion_tokens":4,"total_tokens":14}}

data: [DONE]

串流中的使用量

串流時，使用量資料會在 [DONE] 訊息之前的最後一個區塊中返回，帶有空的 choices 陣列。

嵌入向量

生成與 OpenAI Embeddings API 相容的文字嵌入向量。

POST/api/v1/embeddings

Note

並非所有上游供應商都支援嵌入向量。若您設定的供應商不支援所請求的模型，BazaarLink 將自動故障轉移至下一個可用的供應商。

python

from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

response = client.embeddings.create(
    model="openai/text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog",
)

print(response.data[0].embedding)  # 1536-dimensional vector

專用參數

取樣參數影響 token 產生過程。BazaarLink 會將支援的參數傳遞給上游 provider；不支援的參數會被静默忽略。

取樣參數

temperature

number

取樣溫度 0–2。越高越隨機。預設：1

top_p

number

核取樣概率質量。預設：1

top_k

integer

限制候選 token 數量。0 表示停用（考慮全部）。預設：0

frequency_penalty

number

懲罰重複的 token。範圍：[-2, 2]。預設：0

presence_penalty

number

基於存在情況懲罰 token。範圍：[-2, 2]。預設：0

repetition_penalty

number

降低輸入中 token 重複的機率。範圍：(0, 2]。預設：1

min_p

number

相對最高機率 token 的最低入選機率。範圍：[0, 1]。預設：0

top_a

number

基於最高機率 token 的動態 Top-P。範圍：[0, 1]。預設：0

seed

integer

整數隨機種子，用於確定性取樣。部分模型不保證

max_tokens

integer

要生成的最大 token 數量

n

integer

產生的完成數量。預設：1

logit_bias

object

將 token ID 映射到偏差値 [-100, 100]，在取樣前加到機率

logprobs

boolean

回傳每個輸出 token 的對數機率

top_logprobs

integer

每個位置回傳概率最高的 N 個候選 token（需配合 logprobs: true）。範圍：0–20

response_format

object

強制結構化 JSON 輸出。請參閱結構化輸出章節

stop

string | string[]

停止序列 — 遇到時停止生成

tools

Tool[]

模型可以呼叫的工具（函式）列表

tool_choice

string | object

控制工具使用："auto"、"none" 或指定工具

parallel_tool_calls

boolean

啟用並行工具呼叫功能。預設：true

BazaarLink 專屬參數

transforms

string[]

要套用的訊息轉換，例如 ["middle-out"]。省略則在 ≤8k context 模型自動套用

models

string[]

備用模型清單——BazaarLink 依序嘗試，主模型失敗時自動切換

route

string

設為 "fallback" 則啟用透過 models 陣列的瀊流路由

provider

object

Provider 路由偏好設定——order、only、ignore、sort、allow_fallbacks

debug

object

除錯選項。echo_upstream_body: true 會將轉換後的請求 body 作為第一個 SSE chunk 回傳（僅限串流模式）

工具呼叫

工具呼叫（也稱為函式呼叫）讓模型可以呼叫您定義的外部函式。模型會決定何時呼叫工具並產生結構化參數 — 您的程式碼負責執行函式並將結果回傳以繼續對話。

支援的模型

大多數前沿模型都支援工具呼叫。以下是一些熱門選擇：

定義工具

每個工具是一個描述模型可呼叫函式的 JSON 物件。parameters 欄位使用 JSON Schema。

name必填

string

函式名稱（a-z、A-Z、0-9、底線、連字號）

description必填

string

清楚描述函式應在何時及如何被使用

parameters必填

object

定義函式參數的 JSON Schema 物件

tool_choice 選項

值

行為

"auto"模型自行決定是否呼叫工具（預設）

"none"模型不會呼叫任何工具

"required"模型必須呼叫至少一個工具

{"type": "function", "function": {"name": "get_weather"}}模型必須呼叫指定的函式

完整流程

工具呼叫是一個多輪流程：(1) 帶工具發送請求 → (2) 模型回傳 tool_calls → (3) 執行函式 → (4) 回傳結果 → (5) 模型生成最終回應。

python

import json
from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

# Step 1: Define tools and send request
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather in Taipei?"}],
    tools=tools,
    tool_choice="auto",
)

# Step 2: Check for tool calls
message = response.choices[0].message
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Step 3: Execute your function
    result = {"temperature": 28, "unit": "celsius", "condition": "Partly cloudy"}

    # Step 4: Send result back
    final = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=[
            {"role": "user", "content": "What's the weather in Taipei?"},
            message,
            {"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result)},
        ],
        tools=tools,
    )

    # Step 5: Get final response
    print(final.choices[0].message.content)
    # "The weather in Taipei is 28°C and partly cloudy."

平行工具呼叫

某些模型可以在單一回應中呼叫多個工具。處理每個工具呼叫並回傳所有結果：

python

# Model may return multiple tool_calls
if message.tool_calls:
    messages = [
        {"role": "user", "content": "Weather and time in Tokyo?"},
        message,
    ]

    for tool_call in message.tool_calls:
        # Execute each function
        if tool_call.function.name == "get_weather":
            result = {"temperature": 22, "condition": "Clear"}
        elif tool_call.function.name == "get_time":
            result = {"time": "2026-02-23T15:30:00+09:00"}

        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result),
        })

    # Send all results back at once
    final = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=messages,
        tools=tools,
    )
    print(final.choices[0].message.content)

結構化輸出

強制模型返回符合 Schema 的有效 JSON。這對於建立需要程式化解析模型輸出的可靠應用程式至關重要。

方法 1：response_format（JSON Schema）

以強制嚴格的 JSON Schema 合規性：

type必填

string

必須為 "json_schema"

json_schema.name必填

string

Schema 的名稱（用於快取）

json_schema.strict

boolean

設為 true 時，保證完全符合 Schema

json_schema.schema必填

object

JSON Schema 定義

python

response = client.chat.completions.create(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "Review the movie Inception"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "movie_review",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "rating": {"type": "integer", "description": "Rating 1-10"},
                    "summary": {"type": "string"},
                    "pros": {"type": "array", "items": {"type": "string"}},
                    "cons": {"type": "array", "items": {"type": "string"}},
                },
                "required": ["title", "rating", "summary", "pros", "cons"],
                "additionalProperties": False,
            },
        },
    },
)

import json
review = json.loads(response.choices[0].message.content)
print(review["title"])    # "Inception"
print(review["rating"])   # 9

提示

使用清晰、描述性的屬性名稱 — 模型會將其作為上下文。
為 Schema 屬性添加描述來引導模型。
設定 strict: true 以保證 Schema 合規（可能略微增加延遲）。
保持 Schema 簡單 — 深度巢狀的 Schema 可能降低輸出品質。
使用不同模型測試 — 某些模型處理複雜 Schema 的能力更強。

助手預填

透過在訊息陣列末尾加入部分 assistant 訊息，引導模型以特定方式回應。模型會從您中斷的地方繼續。

python

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
        {"role": "assistant", "content": "The capital of France is"},
    ],
)

# Model continues: " Paris, known for the Eiffel Tower..."
print(response.choices[0].message.content)

運作方式

BazaarLink 直接將訊息傳送至上游供應商。助手預填適用於任何支援的模型 — 包括 Anthropic Claude 和大多數 OpenAI 模型。

模型路由

BazaarLink 使用 provider/model-name 格式將請求路由到正確的上游供應商。這讓您可以透過單一 API 端點存取 200+ 個模型。

模型 ID 格式

bash

{provider}/{model-name}

# Examples:
openai/gpt-4.1
anthropic/claude-sonnet-4.6
google/gemini-2.5-flash
deepseek/deepseek-chat
meta-llama/llama-4-maverick

路由優先順序

當您發送請求時，BazaarLink 依以下順序解析上游供應商：

精確匹配 — 尋找與完整模型 ID 匹配的模型路由
供應商萬用字元 — 回退至 provider/* 路由（例如 openai/*）
全域萬用字元 — 回退至 * 萬用字元路由
預設供應商 — 使用標記為預設的供應商金鑰
環境回退 — 使用配置的 API 金鑰作為最後手段

在模型頁面瀏覽所有可用模型。

自動路由

自動路由器分析您的請求並選擇最適合的模型，不需手動指定模型。提供兩種模式：

auto — 付費模式。路由至高效能模型，按解析後的模型計費。
auto:free — 免費模式。路由至免費模型，不扣任何費用，有 RPM 和每日請求上限。

如何使用

將 model 設為 "auto"（付費）或 "auto:free"（免費）以啟用自動路由：

python

from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

# Paid auto routing — uses premium models, charges credits
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
)
print(f"Model used: {response.model}")  # e.g. anthropic/claude-4.6-opus

# Free auto routing — uses free models, no credits needed
response = client.chat.completions.create(
    model="auto:free",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(f"Model used: {response.model}")  # e.g. openai/gpt-5-nano
print(f"Cost: {response.usage.cost}")   # 0

如何使用

任務分類：分析您的 prompt 判斷任務類型
模型選擇：從對應的模型池（付費或免費）中選擇最適合該任務的模型
請求轉發：請求被透明地轉發至選定的模型
回應追蹤：解析後的模型會在回應本體和 X-Auto-Resolved-Model 標頭中回傳

分類邏輯

路由器對每個 prompt 同時計算所有類別的加權分數（英文 + 中文關鍵字 + 結構特徵），選出分數最高的類別：

#CategorySignalExample

硬規則工具呼叫請求包含 tools 參數任何帶有 function calling 的請求

硬規則極簡問答極短 prompt 且無任何信號"Hi"、"OK"、"謝謝"

加權評分數學 / 推理數學/邏輯關鍵字（有 1.3x 加權）"證明 √2 是無理數"、"解方程式"

加權評分程式碼生成程式碼關鍵字、檔案副檔名、code block"寫一個 Python 排序函式"

加權評分深度分析分析/比較關鍵字（有 1.1x 加權）"分析 AI 對就業的影響"

加權評分創意寫作創意/敘事關鍵字（有 1.1x 加權）"寫一篇關於時間旅行的短篇故事"

加權評分步驟教學教學/安裝/設定關鍵字"如何建立 Node.js 專案"

Fallback複雜多輪對話≥6 則訊息 + 長 prompt（無關鍵字命中時）帶有上下文的多輪對話

Fallback簡單問答短 prompt（無關鍵字命中時）"法國首都是哪裡？"

Fallback一般任務預設 fallback其他所有任務

auto vs auto:free

兩種模式使用相同的分類邏輯，差異在於模型池：

auto — 從高效能模型中選擇（GPT-5、Claude Opus、Gemini Pro 等）。按解析後模型的費率扣費。
auto:free — 從免費模型中選擇（DeepSeek、Gemini Flash Lite、GPT-5 Nano 等）。不扣費，但有 RPM 和每日請求上限。
Fallback：若 auto:free 達到 rate limit，且用戶有餘額 + 啟用 fallback，請求會自動轉為付費 auto 路由。

Response Header

When model="auto" or "auto:free" is used, the response includes an X-Auto-Resolved-Model header showing the actual model selected. The response body's model field also reflects the resolved model.

適用情境

通用應用：當您不知道用戶會發送什麼類型的請求時
成本最佳化：開發/測試用 auto:free，正式環境用 auto
品質優先：確保複雜請求被路由至有能力的模型
免費產品：為用戶提供 AI 功能而無需儲值

限制

需使用 messages 格式（非原始 prompt 字串）
auto 和 auto:free 均支援串流請求
支援所有標準 BazaarLink 功能（tool calling、response_format 等）
auto:free 有 RPM 和每日請求上限，依用戶層級不同

自動路由

自動路由功能已上線。將 model 設為 "auto"（付費）或 "auto:free"（免費）即可啟用。實際使用的模型會在 X-Auto-Resolved-Model 回應標頭和回應本體的 model 欄位中回傳。

故障轉移

當供應商發生故障或返回錯誤時，BazaarLink 可以自動使用替代模型重試您的請求。這確保了高可用性，無需對您的程式碼做任何更改。

已完整實作

`body.models[]` 備用模型清單已完整支援。請求會依序嘗試備用清單中的模型，直到其中一個成功或全部失敗為止。

運作方式

您的請求會發送到主要模型。
如果主要模型失敗（5xx 錯誤、逾時或速率限制），BazaarLink 會自動使用列表中的下一個模型重試。
這個過程會持續到某個模型成功或所有模型都已嘗試。
回應會包含一個標頭，指示實際服務請求的模型。

最佳實踐

按偏好順序排列模型 — 第一個模型始終最先被嘗試。
混合不同供應商以獲得最大彈性（例如 OpenAI → Anthropic → Google）。
使用能力相似的模型以確保一致的結果。
設定合理的逾時時間以避免在備用觸發前等待過久。
監控 X-Fallback-Used 標頭以追蹤供應商可靠性。

模型變體

在任何模型 ID 後加上後綴來改變路由行為。BazaarLink 支援 7 種變體類型。

Multi-Provider

模型變體現已支援。在任何模型 ID 後加上 :free、:nitro 或 :floor 等後綴即可使用。BazaarLink 會根據上游供應商的支援情況，原生傳遞後綴或在本地處理變體路由。

變體類型

有兩類變體：獨立模型 ID（帶後綴的模型是獨立的端點）和路由捷徑（後綴改變 BazaarLink 選擇供應商的方式，但不改變模型本身）。

獨立模型 ID

這些變體作為獨立模型存在，各自擁有定價和功能。BazaarLink 優先嘗試完整模型 ID（含後綴），若無匹配則回退至基礎模型。

Suffix	Description	Example
:free	Free-tier version (rate-limited)	deepseek/deepseek-r1:free
:extended	Extended context window	anthropic/claude-sonnet-4.5:extended
:thinking	Extended reasoning / chain-of-thought	deepseek/deepseek-r1:thinking
:exacto	Curated providers for tool-calling accuracy	moonshotai/kimi-k2-0905:exacto

路由捷徑

這些後綴修改供應商選擇方式，不改變模型身份。路由匹配前會先去除後綴。

Suffix	Equivalent	Behaviour
:nitro	provider.sort="throughput"	Prioritise highest throughput providers
:floor	provider.sort="price"	Sort candidates by price ASC (cheapest first)
:online	plugins: { web: {} }	啟用即時網路搜尋

多供應商行為

對於支援變體的上游供應商，後綴會原樣傳遞。對於直連供應商（如直連 OpenAI、Fireworks），後綴會被去除，由 BazaarLink 在本地處理路由。

範例

json

// Independent variant — use free tier
{
  "model": "deepseek/deepseek-r1:free",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Routing shortcut — cheapest provider first
{
  "model": "meta-llama/llama-4-maverick:floor",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Routing shortcut — highest throughput
{
  "model": "openai/gpt-4o:nitro",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Web search
{
  "model": "openai/gpt-4o:online",
  "messages": [{"role": "user", "content": "What happened today?"}]
}

訊息轉換

自動轉換訊息以符合模型上下文限制。當您的訊息超過模型的上下文窗口時，轉換會從對話中間移除訊息，以智慧地壓縮對話。

Auto

上下文窗口 ≤ 8,192 tokens 的模型預設自動套用 middle-out。若要停用，請傳入 `transforms: []`；若要對任何模型啟用，請傳入 `transforms: ["middle-out"]`。

用法

json

// Enable middle-out on any model
{
  "model": "openai/gpt-4.1",
  "transforms": ["middle-out"],
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    ... // long conversation — middle will be trimmed to fit context
  ]
}

// Disable auto-trimming for small-context models
{ "transforms": [] }

轉換類型

轉換

說明

middle-out先移除中間的訊息，保留開頭（系統提示詞、上下文）和結尾（最近的訊息）

預設行為

上下文 ≤ 8k 的模型預設啟用 middle-out。較大上下文的模型需明確傳入 `transforms: ["middle-out"]` 才會啟用。Anthropic Claude 模型無論 transforms 設定為何，均自動強制執行 1,000 則訊息上限。

零完成保險 BETA

針對請求完全失敗（上游無法建立連線）的情況提供計費保護。串流從未開始即失敗時不會收費。

Beta

保障範圍為部分實作。串流開始後中途失敗仍收取 10% 最低費用；模型返回空內容（0 output tokens）時仍依 input token 計費。

已保障情境

上游拒絕連線 / 回傳空 body — 全額退款
串流從未開始即失敗 — 全額退款

未涵蓋情境

串流已開始後中途中斷：收取預留金的 10% 最低費用。模型回傳 0 output tokens（空內容）：仍依 input token 計費。

安全護欄 BETA

為您的 API 請求新增安全護欄，過濾有害內容、執行合規政策並保護您的應用程式。

Beta

安全護欄是規劃中的功能。目前，內容過濾由每個上游模型供應商的內建安全系統處理。

規劃中功能

護欄

說明

內容過濾攔截輸入和輸出中的有害、有毒或不當內容

PII 偵測偵測並遮蔽個人識別資訊

主題限制限制模型回應僅涵蓋批准的主題

輸出驗證在返回前根據自訂規則驗證模型輸出

目前行為

所有上游供應商都有自己的內容安全系統。觸發內容過濾的模型回應將返回 finish_reason: "content_filter"。自訂護欄配置將在未來更新中提供。

零資料保留

BazaarLink 預設不儲存您的訊息內容。本頁說明您的資料處理方式，適用於處理敏感資料的應用程式。

目前的資料處理方式

訊息內容：預設不儲存，在記憶體中處理後立即丟棄
計費元資料：token 數量、時間戳記、模型 ID（保留 90 天）
使用日誌：請求統計，不含訊息內容
上游轉發：訊息轉發至上游供應商，受其隱私政策約束

提示快取

提示快取可以重用之前計算過的 prompt tokens，顯著降低成本並減少延遲，特別適合有大量重複系統提示的應用程式。

Note

BazaarLink 會自動追蹤快取節省並反映在帳單中。回應中的 `cached_tokens` 欄位顯示實際快取命中數量，`cacheDiscount` 欄位顯示本次節省金額。

運作方式

快取由各模型供應商在後端自動處理，無需額外設定。BazaarLink 透明代理快取相關參數，並在使用量回應中回報結果。支援快取的模型在相同前綴重複出現時，讀取 tokens 的費用通常為正常費率的 10–50%。

python

response = client.chat.completions.create(
    model="anthropic/claude-3-7-sonnet",
    messages=[
        {"role": "system", "content": "You are an expert..."},  # Long system prompt cached
        {"role": "user", "content": "Question here"},
    ],
)

# Check cache savings in the response usage
usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Cached tokens: {usage.prompt_tokens_details.cached_tokens}")
print(f"Cache savings: {usage.prompt_tokens_details.cached_tokens / usage.prompt_tokens * 100:.1f}%")

推理 Tokens

推理模型（如 DeepSeek R1、o1 系列）在生成最終答案之前，會先在內部進行思考。這些思考過程消耗的 tokens 稱為推理 tokens，會分開計費。

Note

BazaarLink 在回應的 `usage.completion_tokens_details.reasoning_tokens` 欄位回報推理 tokens，並在計費中分開顯示。

在回應中讀取推理 Tokens

python

response = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[{"role": "user", "content": "Solve: if f(x) = x^2 + 3x, what is f(5)?"}],
)

# Read reasoning tokens from usage
usage = response.usage
print(f"Completion tokens: {usage.completion_tokens}")
if hasattr(usage, "completion_tokens_details"):
    details = usage.completion_tokens_details
    print(f"Reasoning tokens: {details.reasoning_tokens}")
    print(f"Output tokens: {details.accepted_prediction_tokens}")

typescript

const response = await client.chat.completions.create({
  model: "openai/o3-mini",
  messages: [{ role: "user", content: "Prove that sqrt(2) is irrational." }],
  // @ts-ignore - BazaarLink extension
  reasoning_effort: "high",  // low | medium | high
});

const usage = response.usage;
console.log("Reasoning tokens:", usage?.completion_tokens_details?.reasoning_tokens);

思考模式控制

部分模型支援切換「思考」模式。思考模式在輸出最終答案前產生內部推理 token，以更多 token 為代價提升輸出品質。

模型系列	參數	預設值
qwen3-*	enable_thinking: boolean	false（平台預設值）
openai/o1, o3, o4-mini	reasoning_effort: "low" \| "medium" \| "high"	medium
deepseek/deepseek-r1	—	永遠啟用（無法關閉）

python

# Qwen3: explicitly enable thinking mode
response = client.chat.completions.create(
    model="qwen/qwen3-32b",
    messages=[{"role": "user", "content": "Prove the Pythagorean theorem"}],
    extra_body={"enable_thinking": True},  # opt-in to thinking
)

# usage.completion_tokens_details.reasoning_tokens shows thinking token count

統一 reasoning 物件（新格式）

BazaarLink 也支援統一的 reasoning 物件，以單一一致的 API 適用所有模型系列：

欄位	數值	適用模型
reasoning.effort	"xhigh" \| "high" \| "medium" \| "low" \| "none"	OpenAI o-series, Grok
reasoning.max_tokens	integer	Anthropic Claude, Gemini
reasoning.exclude	boolean	從回應中隱藏思考內容（模型仍會推理）

typescript

// Claude extended thinking — specify thinking budget in tokens
const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "Prove the Pythagorean theorem" }],
  // @ts-ignore - BazaarLink extension
  reasoning: { max_tokens: 5000 },
});

// OpenAI o3 — specify effort level
const response2 = await client.chat.completions.create({
  model: "openai/o3",
  messages: [{ role: "user", content: "Solve this math problem..." }],
  // @ts-ignore - BazaarLink extension
  reasoning: { effort: "high" },
});

// Hide thinking content from response (model still thinks)
const response3 = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "What is 2+2?" }],
  // @ts-ignore - BazaarLink extension
  reasoning: { max_tokens: 2000, exclude: true },
});

計費說明

思考 token 以 completion token 計費。部分供應商在思考模式啟用時收取較高費率 — Qwen3 開啟思考時費率為標準的 2 倍。BazaarLink 預設 Qwen3 的 enable_thinking=false 以避免意外費用。

延遲與效能

優化 AI API 的回應延遲對用戶體驗至關重要。以下是 BazaarLink 架構中影響延遲的關鍵因素及最佳化建議。

Note

BazaarLink 記錄每次請求的 `latencyMs`（端對端延遲）和 `throughput`（tokens/秒），可在使用日誌中查看。

影響延遲的因素

模型大小：較大的模型（70B+）通常生成速度較慢
提供商負載：不同時段不同供應商的負載有所差異
Token 數量：max_tokens 越大，完成時間越長
串流 vs 非串流：串流（stream: true）可更快取得第一個 token
上下文長度：超長 context 會增加前置處理時間

最佳化建議

優先使用串流（stream: true）以改善感知延遲
使用 :nitro 變體選擇高吞吐量供應商
對延遲敏感的場景選擇較小的模型（flash/mini/haiku）
使用 provider.sort: "latency" 自動選擇最低延遲供應商
啟用提示快取以降低重複請求的延遲

python

import time

# Measure time to first token with streaming
start = time.time()
first_token_time = None

stream = client.chat.completions.create(
    model="google/gemini-2.0-flash-001",  # Fast model
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content and not first_token_time:
        first_token_time = time.time() - start

print(f"Time to first token: {first_token_time:.3f}s")

# Check latency in usage logs via /api/v1/usage
# Each log entry includes: latency_ms, throughput (tokens/sec)

python

# Use provider.sort for automatic latency optimization
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "provider": {
            "sort": "latency",  # Always pick lowest-latency provider
        }
    },
)

可用性優化

BazaarLink 透過多層機制最大化 API 可用性，包括自動故障轉移、熔斷器和供應商健康監控。

Note

BazaarLink 追蹤所有上游供應商的可用性狀態。當供應商錯誤率超過閾值時，熔斷器會自動觸發，將請求路由至下一個可用供應商。

可用性機制

熔斷器：自動偵測並隔離故障供應商
自動故障轉移：無縫切換至備用供應商，無需修改程式碼
供應商健康監控：持續追蹤各供應商的錯誤率和延遲
重試邏輯：暫時性錯誤（5xx）自動重試

熔斷器配置

python

# BazaarLink handles failover automatically — no code changes needed.
# Configure fallback models for maximum resilience:

response = client.chat.completions.create(
    model="openai/gpt-4o",       # Primary model
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "models": [              # Fallback chain
            "openai/gpt-4o",
            "anthropic/claude-3.5-sonnet",
            "google/gemini-2.0-flash-001",
        ],
        "route": "fallback",     # Enable fallback routing
    },
)

# Check if failover was used (in usage logs)
# "is_failover": true indicates the primary provider was bypassed

bash

# Check provider health (admin only)
GET https://bazaarlink.ai/api/admin/provider-health
Authorization: Bearer sk-bl-ADMIN_KEY

# Response
{
  "providers": [
    {
      "id": "provider-1",
      "name": "Anthropic",
      "status": "healthy",
      "error_rate": 0.002,
      "avg_latency_ms": 145,
      "circuit_open": false
    }
  ]
}

錯誤代碼

BazaarLink 使用標準 HTTP 狀態碼。錯誤回應遵循 OpenAI 格式：

json

{
  "error": {
    "message": "Invalid or disabled API key.",
    "type": "invalid_request_error",
    "code": 401
  }
}

代碼

名稱

說明

400請求無效請求格式錯誤、messages 陣列為空，或缺少必填欄位

401未授權API 金鑰遺失、無效或已停用

402需要付款帳戶點數不足、單一金鑰花費上限已達，或每週 / 每月預算上限已達

403禁止存取帳戶已停用或沒有此操作的權限

413請求體過大請求 body 超過 10 MB；請縮小內容或分拆請求

429請求過多已超過速率限制；請查看 Retry-After 標頭後再重試

500伺服器錯誤BazaarLink 內部錯誤

502閘道錯誤所有上游提供者均失敗；已嘗試故障轉移

503服務不可用此模型沒有設定上游提供者；請聯絡管理員

錯誤處理

python

from openai import OpenAI, APIError, RateLimitError

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

try:
    response = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=[{"role": "user", "content": "Hello!"}],
    )
except RateLimitError:
    print("Rate limited — waiting before retry...")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")

串流錯誤格式

在任何 token 串流之前發生的錯誤，會以標準 HTTP 錯誤回應（JSON body）回傳。

串流過程中發生的錯誤，會以 SSE 事件形式傳送，finish_reason 為 "error"。請解析 delta 中的 error 欄位。

typescript

// Error chunk sent mid-stream (finish_reason: "error")
type MidStreamError = {
  choices: [
    {
      index: 0;
      finish_reason: "error";
      delta: { content: "" };
      native_finish_reason: null;
      error: {
        code: number;
        message: string;
        metadata?: {
          provider_name?: string;
          raw?: unknown;
        };
      };
    }
  ];
};

除錯

設定 debug.echo_upstream_body: true 可檢視實際傳送給上游 provider 的請求 body。轉換後的請求會作為第一個 SSE chunk 回傳。僅供開發與除錯使用，請勿在正式環境使用。

json

// Request with debug enabled (streaming only)
{
  "model": "openai/gpt-4.1",
  "messages": [{ "role": "user", "content": "Hello" }],
  "stream": true,
  "debug": { "echo_upstream_body": true }
}