Skip to content

Getting Started with Infrintia

This guide walks you through installing the Python SDK, creating a user account, submitting your first inference job, and streaming results.


Prerequisites

  • Python 3.9+
  • An internet connection

Installation

Install the Infrintia Python SDK:

pip install infrintia

Create a User Account

from infrintia import ComputeClient

client = ComputeClient(base_url="https://api.infrintia.crossgl.net")

user = client.create_user(
    username="alice",
    email="alice@example.com",
    password="secure-password-123"
)

print(f"User created: {user['username']}")
print(f"Starting credits: {user['credits']}")

Submit an Inference Job

Once you have an account, authenticate and submit a job:

from infrintia import ComputeClient

client = ComputeClient(
    base_url="https://api.infrintia.crossgl.net",
    username="alice",
    password="secure-password-123"
)

job = client.run_model(
    model="meta-llama/Llama-3-8B-Instruct",
    input_text="Explain quantum computing in one paragraph.",
    max_tokens=256,
    temperature=0.7
)

print(f"Job ID: {job['job_id']}")
print(f"Status: {job['status']}")

Stream Results

For real-time token streaming:

for token in client.stream_job(job["job_id"]):
    print(token, end="", flush=True)

print()  # newline after streaming completes

The stream yields tokens as they are generated by the host GPU. If the job is still queued, the stream will wait until generation begins.


Check Job Status

status = client.get_job(job["job_id"])

print(f"Status: {status['status']}")
print(f"Credits used: {status['credits_used']}")
print(f"Host: {status['host_id']}")

List Available Models

models = client.list_models()

for model in models:
    print(f"{model['name']}{model['min_price_per_token']} credits/token")

Full Example

from infrintia import ComputeClient

client = ComputeClient(
    base_url="https://api.infrintia.crossgl.net",
    username="alice",
    password="secure-password-123"
)

# Submit a job
job = client.run_model(
    model="meta-llama/Llama-3-8B-Instruct",
    input_text="Write a haiku about GPUs.",
    max_tokens=64
)

# Stream the result
print(f"Job {job['job_id']}:")
for token in client.stream_job(job["job_id"]):
    print(token, end="", flush=True)
print()

# Check final status
final = client.get_job(job["job_id"])
print(f"\nCredits used: {final['credits_used']}")

Next Steps