feat(zai): adopt tier framework for plan-aware rate limiting by Societus · Pull Request #83 · repowise-dev/repowise

Societus · 2026-04-14T02:11:14Z

Summary

Wire Z.AI provider into the generic tier framework from #82. Adds plan-aware rate limiting based on Z.AI subscription tier (Lite/Pro/Max) with environment variable configuration.

Depends on: #82 (generic tier framework -- merge that first)

Changes

Z.AI Provider (`zai.py`)

Define RATE_LIMIT_TIERS with Lite/Pro/Max configs derived from Z.AI support guidance (April 2026)
Use resolve_rate_limiter() from BaseProvider in constructor
Add tier parameter to constructor and docstring
Bump retry budget: 5 retries / 30s max wait (from 3/4s) for Z.AI load-shedding tolerance

Rate Limiter (`rate_limiter.py`)

Update PROVIDER_DEFAULTS["zai"] to conservative Lite-tier default (10 RPM / 50K TPM)

CLI Helpers (`helpers.py`)

Add ZAI_TIER env var reading in both explicit and auto-detect provider resolution paths

LiteLLM Provider (`litellm.py`) -- behavioral change

When api_base is set without an api_key, auto-adds openai/ prefix to the model name for LiteLLM proxy routing
Injects sk-dummy as a placeholder API key when api_base is set without api_key (LiteLLM/OpenAI SDK requires a non-empty key even for unauthenticated local proxies)
Downstream impact: Users of LiteLLM-proxy with unauthenticated local endpoints will no longer need to manually add openai/ prefix or pass a dummy key. Existing configs with explicit keys are unaffected.

Tests (`test_zai_provider.py`, `test_litellm_provider.py`)

13 new Z.AI tier tests: tier creation, per-tier limits (lite/pro/max), case-insensitive matching, tier precedence over explicit limiter, invalid tier error, no-tier edge case, explicit limiter without tier, tier stored, and cross-provider empty tiers check
3 new LiteLLM tests: api_base adds openai/ prefix, prefix deduplication, and dummy key injection

Rate Limit Context

Z.AI support provided the following guidance (April 2026):

Lite: 2-3 concurrent, lower tolerance -> 10 RPM / 50K TPM
Pro: 5-8 concurrent, moderate tolerance -> 30 RPM / 150K TPM
Max: 10-15 concurrent, highest tolerance -> 60 RPM / 300K TPM

Key facts:

Limits are aggregate across all models (not per-model)
Advanced models (GLM-5 family) consume 2-3x quota per prompt due to reasoning tokens
Limits are dynamic and load-dependent; these are conservative estimates
Ref: https://docs.z.ai/devpack/usage-policy

Configuration

# Required
export ZAI_API_KEY=***
# Optional -- defaults to conservative (Lite-equivalent)
export ZAI_TIER="pro"           # lite | pro | max
export ZAI_PLAN="coding"        # coding | general
export ZAI_THINKING="disabled"  # enabled | disabled
export ZAI_BASE_URL="..."       # override plan-based URL

Test Plan

uv run pytest tests/unit/test_providers/test_zai_provider.py -v
# 34 passed (21 existing + 13 new tier tests)

uv run pytest tests/unit/test_providers/test_litellm_provider.py -v
# 10 passed (7 existing + 3 new api_base tests)

PR Stack

#	PR	Description	Status
1	#82 -- Generic tier framework	BaseProvider + resolve_rate_limiter()	Ready for review
2	#83 -- Z.AI adopts the framework (this PR)	RATE_LIMIT_TIERS + ZAI_TIER env var	Depends on #82
3	#84 -- MiniMax provider	New provider using the framework	Depends on #82

- Add litellm to interactive provider selection menu - Support LITELLM_BASE_URL for local proxy deployments (no API key required) - Auto-add openai/ prefix when using api_base for proper LiteLLM routing - Add dummy API key for local proxies (OpenAI SDK requirement) - Add validation and tests for litellm provider configuration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add first-class support for Z.AI with OpenAI-compatible API. - New ZAIProvider with thinking disabled by default for GLM-5 family - Plan selection: 'coding' (subscription) or 'general' (pay-as-you-go) - Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING - Rate limit defaults and auto-detection in CLI helpers Closes repowise-dev#68

Add RATE_LIMIT_TIERS class attribute and resolve_rate_limiter() static method to BaseProvider. Any provider with subscription tiers can define RATE_LIMIT_TIERS and pass tier + tiers to resolve_rate_limiter() to get automatic tier-aware rate limiter creation. Precedence: tier > explicit rate_limiter > None. Tier matching is case-insensitive. Invalid tiers raise ValueError. This is a provider-agnostic foundation -- no provider-specific code. Providers adopt it by defining RATE_LIMIT_TIERS and calling resolve_rate_limiter() in their constructor. Ref: repowise-dev#68

Wire Z.AI provider into the BaseProvider tier framework (from PR #NN). Changes: - Define RATE_LIMIT_TIERS on ZAIProvider with Lite/Pro/Max configs derived from Z.AI support guidance (April 2026) - Use resolve_rate_limiter() in constructor (tier > explicit > none) - Add ZAI_TIER env var support in CLI helpers - Add ZAI_TIER_DEFAULTS to rate_limiter.py for reference - Update PROVIDER_DEFAULTS['zai'] to conservative Lite-tier default - Bump retry budget: 5 retries / 30s max wait (from 3/4s) for Z.AI load-shedding tolerance - Add tier parameter to constructor and docstring Rate limit context: - Z.AI concurrency limits are aggregate, dynamic, and load-dependent - Advanced models (GLM-5 family) consume 2-3x quota per prompt - Conservative defaults: Lite 10 RPM, Pro 30 RPM, Max 60 RPM - Ref: https://docs.z.ai/devpack/usage-policy Depends on: feat/generic-tier-framework Supersedes: repowise-dev#80 (deprecates monolithic PR in favor of layered approach) Ref: repowise-dev#68

Add MiniMax as a built-in provider using the generic tier framework (repowise-dev#82). MiniMax is an OpenAI-compatible API provider with the M2.x model family (M2.7, M2.5, M2.1, M2) and published token plan rate tiers. Changes: - New MiniMaxProvider with RATE_LIMIT_TIERS (starter/plus/max/ultra) derived from published 5-hour rolling window limits - Uses resolve_rate_limiter() from BaseProvider for tier resolution - reasoning_split=True by default to separate thinking from content - Bumped retry budget: 5 retries / 30s max for load-shedding tolerance - Registered in provider registry with openai package dependency hint - Conservative PROVIDER_DEFAULTS (Starter-tier: 5 RPM / 25K TPM) - CLI env vars: MINIMAX_API_KEY, MINIMAX_BASE_URL, MINIMAX_REASONING_SPLIT, MINIMAX_TIER - 30 unit tests (constructor, tiers, generate, stream_chat, registry) Rate limit tiers (from https://platform.minimax.io/docs/token-plan/intro): Starter: 1,500 req/5hrs -> 5 RPM / 25K TPM Plus: 4,500 req/5hrs -> 15 RPM / 75K TPM Max: 15,000 req/5hrs -> 50 RPM / 250K TPM Ultra: 30,000 req/5hrs -> 100 RPM / 500K TPM Highspeed variants (e.g., MiniMax-M2.7-highspeed) share the same rate limits as their base plan -- the difference is faster inference, not quota. This provider is structurally identical to Z.AI (repowise-dev#83) and was trivial to implement because both use the generic tier framework. The framework eliminated all per-provider boilerplate for tier resolution. Depends on: repowise-dev#82 (generic tier framework) Ref: repowise-dev#68

swati510

Nice work on the generic tier framework, it's the right shape. Two things worth fixing before merge:

ZAI_TIER_DEFAULTS in packages/core/src/repowise/core/rate_limiter.py duplicates the same values as ZAIProvider.RATE_LIMIT_TIERS. Nothing imports ZAI_TIER_DEFAULTS since resolve_rate_limiter reads from the class attribute. It's dead code waiting to drift. Drop it or have the provider read from it, one source of truth.
The base URL normalization in zai.py concerns me. _PLAN_BASE_URLS values end with /v4, then init force-appends /v1, so the client hits https://api.z.ai/api/coding/paas/v4/v1/chat/completions. Has this been tested against the live API? Z.AI's OpenAI-compatible endpoint is /paas/v4 as-is, the SDK adds /chat/completions itself.

swati510 · 2026-04-18T15:55:50Z

+        # Normalize base URL for OpenAI SDK
+        effective_base_url = effective_base_url.rstrip("/")
+        if not effective_base_url.endswith("/v1"):
+            effective_base_url += "/v1"


Confirm this works against the live Z.AI API. Their OpenAI-compatible endpoint is /v4 as-is, the SDK tacks on /chat/completions. Adding /v1 here produces /v4/v1/chat/completions which I'd expect to 404.

@swati510

ZAI_TIER_DEFAULTS in rate_limiter.py duplicated the same values as ZAIProvider.RATE_LIMIT_TIERS and nothing imported it. Single source of truth lives on the provider class. The /v1 suffix normalization produced /v4/v1/chat/completions which 404s against Z.AI's live API. Their endpoint is /paas/v4 as-is; the OpenAI SDK appends /chat/completions itself. Tested against live Z.AI API: /v4/chat/completions → 200 /v4/v1/chat/completions → 404 Addresses review feedback from @swati510 on repowise-dev#83.

RaghavChamadiya · 2026-04-26T10:39:33Z

Consolidating the open feedback so there's a single checklist to work from before this can land. The framework shape is good and worth keeping.

Blockers:

Base URL bug. The values in _PLAN_BASE_URLS already end with /v4 (Z.AI's OpenAI-compatible endpoint), but __init__ then unconditionally appends /v1, so the client ends up calling https://api.z.ai/api/coding/paas/v4/v1/chat/completions which is the wrong path. Either skip the /v1 append when the configured base already ends in /v4, or drop the append entirely and document that callers pass full paths. Worth a smoke test against the live API to confirm the corrected path actually works for both the lite and coding plans.
ZAI_TIER_DEFAULTS in packages/core/src/repowise/core/rate_limiter.py duplicates the same values as ZAIProvider.RATE_LIMIT_TIERS and nothing imports it (resolve_rate_limiter reads off the class attribute). Either drop ZAI_TIER_DEFAULTS or have the provider read from it so there's a single source of truth.
Trailing-newline at EOF on zai.py, test_zai_provider.py, and test_litellm_provider.py. Will fail ruff format --check.
The PR also bundles a non-trivial LiteLLM behavioral change (auto openai/ prefix and sk-dummy injection when api_base is set without a key). It's a useful fix but please call it out in the PR description so reviewers and downstream LiteLLM-proxy users know to look for it.

Once 1 and 2 are sorted I'll re-review and we can merge. #84 is queued behind this one.

- Update base URL tests: no longer expect /v1 suffix (code was already correct, tests were stale from before the fix) - Add missing trailing newlines on zai.py, test_zai_provider.py, test_litellm_provider.py (fixes ruff format --check) - Update PR description to document LiteLLM behavioral change (openai/ prefix + sk-dummy injection for unauthenticated proxies) - Smoke-tested both coding/general plan endpoints against live Z.AI API (confirmed 401, not 404)

Societus · 2026-04-28T07:58:09Z

All four review items addressed:

Base URL bug -- Code was already correct (no /v1 append), but tests were stale. Updated all 6 base URL assertions to match. Smoke-tested both coding and general plan endpoints against the live Z.AI API -- both return 401 (auth), not 404, confirming the paths are correct.
ZAI_TIER_DEFAULTS dead code -- Already absent from rate_limiter.py. Single source of truth is ZAIProvider.RATE_LIMIT_TIERS.
Trailing newlines -- Added missing EOF newlines on zai.py, test_zai_provider.py, and test_litellm_provider.py.
LiteLLM behavioral change -- Added a dedicated section to the PR description documenting the openai/ prefix auto-addition and sk-dummy injection for unauthenticated local proxies, with downstream impact notes.

All 54 tests pass (34 Z.AI + 12 LiteLLM + 8 generic tier framework).

vinit13792 and others added 5 commits April 13, 2026 12:29

fix(litellm): add inline comment for sk-dummy to avoid secret scanner…

27f6770

… false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Societus requested review from RaghavChamadiya and swati510 as code owners April 14, 2026 02:11

This was referenced Apr 14, 2026

feat(zai): add Z.AI provider with tier-aware rate limiting #80

Closed

Feature: Add Z.AI (Zhipu AI) provider support #68

Open

feat: add generic tier-aware rate limiting framework #82

Closed

Societus mentioned this pull request Apr 14, 2026

feat(minimax): add MiniMax provider with tier-aware rate limiting #84

Closed

swati510 reviewed Apr 18, 2026

View reviewed changes

This was referenced Apr 18, 2026

feat: add Z.AI (Zhipu AI) provider support #74

Closed

feat(litellm): add support for local proxy without API key #46

Closed

CyanoTex mentioned this pull request Apr 20, 2026

feat(luau): add Luau/Roblox language support (issue #52) #89

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(zai): adopt tier framework for plan-aware rate limiting#83

feat(zai): adopt tier framework for plan-aware rate limiting#83
Societus wants to merge 7 commits intorepowise-dev:mainfrom
Societus:feat/zai-adopts-tier-framework

Societus commented Apr 14, 2026 •

edited

Loading

Uh oh!

swati510 left a comment

Uh oh!

swati510 Apr 18, 2026

Uh oh!

RaghavChamadiya commented Apr 26, 2026

Uh oh!

Societus commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Societus commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Z.AI Provider (zai.py)

Rate Limiter (rate_limiter.py)

CLI Helpers (helpers.py)

LiteLLM Provider (litellm.py) -- behavioral change

Tests (test_zai_provider.py, test_litellm_provider.py)

Rate Limit Context

Configuration

Test Plan

PR Stack

Related

Uh oh!

swati510 left a comment

Choose a reason for hiding this comment

Uh oh!

swati510 Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

RaghavChamadiya commented Apr 26, 2026

Uh oh!

Societus commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Societus commented Apr 14, 2026 •

edited

Loading

Z.AI Provider (`zai.py`)

Rate Limiter (`rate_limiter.py`)

CLI Helpers (`helpers.py`)

LiteLLM Provider (`litellm.py`) -- behavioral change

Tests (`test_zai_provider.py`, `test_litellm_provider.py`)