prune.
Private beta

LLM cost infrastructure.

One import change. Every LLM call protected, optimized, and receipted from day one.

We'll email you when your spot opens. No spam, ever.

AUTO MODE

Every response is a receipt

Saved about 340 tokens — template opt, schema compression, and output cap on a JSON workload.

Full receipt (for your logs)

{  "choices": [{ "message": { "content": "{ ... }" } }],  "usage": { "prompt_tokens": 684, "completion_tokens": 412 },  "prune_metadata": {    "cache_hit": false,    "tokens_saved": 340,    "template_opt_saved": 142,    "compressed_tokens_saved": 10,    "schema_compression_applied": true,    "schema_compression_tokens_saved": 18,    "suggested_max_tokens": 512,    "optimizations_applied": [      "template_opt",      "schema_compress",      "output_cap:512"    ]  }}

Works with

OpenAIAnthropicGeminiBedrock

Under the hood

One proxy. Four layers.

ALWAYS ON

Shield

Encrypted vault, rate limits, and spend caps enforced before any LLM call.

SAVINGS

Cache

Exact, semantic, and template-aware reuse. Zero upstream cost on cache hits.

SAVINGS

Compression

Template hoist, schema compression, and variable trimming — smaller prompts, same results.

INSIGHTS

Signals

Output caps from real p95 lengths, JSON repair, spend leak detection, and pre-request savings estimates.