Infrintia Overview
Infrintia is a decentralized GPU compute marketplace that connects users who need inference compute with hosts who have idle GPUs. Submit a job, and Infrintia's broker automatically routes it to the cheapest available host — results stream back in real time.
How It Works
User submits job → Broker finds cheapest host → Host runs inference → Results stream back
- Users submit inference jobs through the API or Python SDK, specifying a model and input
- The Broker matches the job to the cheapest available host based on price, capacity, and model availability
- Hosts pick up jobs, run inference on their GPUs, and stream results back token-by-token
- Credits are reserved upfront and settled after job completion
Key Features
| Feature |
Description |
| Credit Reservation |
Credits are held before job execution and released on completion, preventing overspend |
| Multi-Backend Support |
Hosts can run HuggingFace Transformers, LangChain pipelines, or custom Worker backends |
| Live Token Streaming |
Results stream back token-by-token via Server-Sent Events (SSE) for real-time output |
| 12% Platform Fee |
Transparent platform fee applied to each job; the rest goes to the host |
| Auto-Routing |
The broker automatically selects the cheapest host that supports the requested model |
Architecture
Components
- API Server — REST API for users and hosts, handles auth, job submission, and streaming
- Broker — Matches submitted jobs to available hosts based on price and capacity
- Host Agent — Runs on GPU machines, registers with the platform, picks up and executes jobs
- Credit System — Manages user balances, reservations, and settlements
Job Lifecycle
- User calls
POST /run/model with model name and input
- Broker reserves credits from user balance
- Broker assigns job to the cheapest available host
- Host pulls the job via
POST /hosts/next-job
- Host streams results back via the API
- On completion, credits are settled and the host is paid
Supported Backends
| Backend |
Description |
| HuggingFace |
Run any HuggingFace Transformers model directly |
| LangChain |
Execute LangChain chains and agents |
| Worker |
Custom inference backend — bring your own model server |
Next Steps