Hosting Guide
Earn credits by registering your GPU as a host on the Infrintia compute marketplace. When users submit inference jobs, the broker routes them to available hosts — you run the computation and get paid.
Requirements
- A machine with one or more NVIDIA GPUs (CUDA-capable)
- Python 3.9+
- Stable internet connection
- Sufficient VRAM for the models you want to serve
Register as a Host
1. Install the Infrintia host agent
pip install infrintia[host]
2. Register your machine
from infrintia.host import HostAgent
agent = HostAgent(
api_url="https://api.infrintia.crossgl.net",
gpu="NVIDIA A100 80GB",
models=["meta-llama/Llama-3-8B-Instruct"],
price_per_token=0.00012,
backend="huggingface"
)
registration = agent.register()
print(f"Host ID: {registration['host_id']}")
print(f"API Key: {registration['api_key']}")
The registration response includes a host_id and api_key — store the API key securely. You'll need it for all subsequent host operations.
3. Start serving
agent.serve()
This starts an event loop that:
- Sends heartbeats every 30 seconds
- Polls for new jobs via
POST /hosts/next-job - Runs inference on assigned jobs
- Streams results back to the platform
Auto-Detection
The host agent automatically detects:
- GPU model and VRAM via
nvidia-smi - Available models based on installed backends and cached weights
- System resources (CPU, RAM, disk) for capacity reporting
You can override any auto-detected value in the HostAgent constructor.
Backends
Choose the backend that matches your setup:
HuggingFace (default)
Runs models using HuggingFace Transformers. Models are downloaded and cached automatically.
agent = HostAgent(
api_url="https://api.infrintia.crossgl.net",
backend="huggingface",
models=["meta-llama/Llama-3-8B-Instruct"],
price_per_token=0.00012
)
LangChain
Execute LangChain chains and agents.
agent = HostAgent(
api_url="https://api.infrintia.crossgl.net",
backend="langchain",
models=["custom-chain-v1"],
price_per_token=0.0002
)
Worker (Custom)
Point to your own inference server. Useful for custom models, quantized weights, or vLLM/TGI backends.
agent = HostAgent(
api_url="https://api.infrintia.crossgl.net",
backend="worker",
endpoint_url="http://localhost:8000/generate",
models=["my-custom-model"],
price_per_token=0.00015
)
Earnings
- Hosts are paid per token generated, at the
price_per_tokenthey set during registration - The platform takes a 12% fee; the remaining 88% is credited to the host
- Earnings accumulate in the host's credit balance and can be withdrawn
Example
If you set price_per_token=0.00012 and generate 1,000,000 tokens:
- Gross: 120.00 credits
- Platform fee (12%): 14.40 credits
- Net earnings: 105.60 credits
Monitoring
Check your host's status and earnings:
from infrintia import ComputeClient
client = ComputeClient(
base_url="https://api.infrintia.crossgl.net",
api_key="hk_live_xxxxxxxx"
)
hosts = client.list_hosts()
for h in hosts:
if h["host_id"] == "host_xyz":
print(f"Status: {h['status']}")
print(f"Jobs completed: {h['total_jobs_completed']}")
Best Practices
- Set competitive pricing — check
GET /hoststo see current market rates - Keep your host online — missed heartbeats mark your host as inactive
- Use fast storage — model loading speed affects job pickup latency
- Monitor GPU utilization — the heartbeat reports utilization to the broker for smart routing