Access Requirements
A primary access credential is required to run analyses through a hosted inference service.
Typical workflow:
- Create an account with your chosen service
- Generate an access credential
- Store it securely and pass it to your integration
Illustrative usage:
result = await run_analysis(
query="Your analysis query",
api_key="your-access-key",
)
Some deployments support optional live research or fresh-data retrieval. When available, an additional credential can unlock those capabilities.
result = await run_analysis(
query="Analyze recent developments in AI regulation",
api_key="your-access-key",
research_key="your-research-key",
)
Some environments support optional image generation for report artwork. When available, a separate credential can enable that feature.
result = await run_analysis(
query="Analyze the future of sustainable architecture",
api_key="your-access-key",
image_key="your-image-key",
)
Access Summary
| Credential | Required? | Purpose | Illustrative Name |
|---|---|---|---|
| Primary access credential | Required | Model access | api_key |
| Research credential | Optional | Live research features | research_key |
| Image credential | Optional | Image generation | image_key |
Available Models
Identifier: openai/gpt-oss-20b
| Description | Compact open-weight Mixture of Experts (MoE) model optimized for cost-efficient deployment |
|---|---|
| Size | 21 billion total parameters, 3.6 billion active per token (32 experts, Top-4 routing) |
| Architecture | MoE with 24 layers, Grouped Query Attention, RMSNorm |
| Context Window | 128K tokens |
| Speed | High-throughput hosted inference |
| License | Apache 2.0 (fully open for commercial use) |
Hardware Requirements: Can run on high-end consumer GPUs with at least 16-20 GB VRAM (NVIDIA RTX 4090/5090). Using MXFP4 quantization enables fast, efficient local inference. 24+ GB VRAM recommended for optimal performance.
Best For: Cost-efficient agentic workflows, tool calling, web browsing, code execution.
Identifier:
openai/gpt-oss-120b
| Description | Larger open-weight MoE model for complex tasks |
|---|---|
| Size | 120 billion total parameters |
| Context Window | 128K tokens |
| License | Apache 2.0 (fully open for commercial use) |
Hardware Requirements: Requires a single 80GB H100 GPU (typically accessed via data center or cloud).
Best For: Complex reasoning, advanced code generation, research tasks.
Why These Models?
This program relies extensively on advanced structured data output—the ability for LLMs to return responses in precise, validated formats (Pydantic models). The GPT-OSS models were specifically trained by OpenAI to handle structured data as part of their training process, making them ideal for this application where every response must conform to a specific schema.
Error Handling
Cost Safeguards
The system tracks costs continuously and enforces limits:
# Internal cost checking (from source)
current_cost = await cost_tracker.get_total_cost()
if current_cost > max_cost:
raise CostFailsafeError(
message=f"Cost limit exceeded: ${current_cost:.4f} > ${max_cost}",
current_cost=current_cost,
cost_limit=max_cost,
)
Task Evaluation Recovery
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#fff', 'primaryBorderColor': '#4a4a6a', 'lineColor': '#6c63ff', 'fontFamily': 'JetBrains Mono, monospace'}}}%%
flowchart TD
A["Task Completes"] --> B{"Status
Check"}
B -->|"Error"| C["Automatic Rejection
no LLM call"]
B -->|"Success"| D["LLM Evaluation"]
D --> E{"Quality
Assessment"}
E -->|"Accepted"| F["Write to
Analysis History"]
E -->|"Rejected"| G["Log Reason
Don't Write"]
C --> H(["Continue Pipeline"])
F --> H
G --> H
style A fill:#16213e,stroke:#4a4a6a,color:#fff
style B fill:#0f3460,stroke:#e94560,stroke-width:2px,color:#fff
style C fill:#6b2737,stroke:#e94560,color:#fff
style D fill:#16213e,stroke:#4a4a6a,color:#fff
style E fill:#0f3460,stroke:#e94560,stroke-width:2px,color:#fff
style F fill:#0d7377,stroke:#14ffec,stroke-width:2px,color:#fff
style G fill:#6b2737,stroke:#e94560,color:#fff
style H fill:#4a4a6a,stroke:#6c63ff,color:#fff
Only accepted tasks inform subsequent analysis, preventing error propagation.
Troubleshooting
Possible causes:
- Invalid or missing access credential
- Cost limit set too low
- Content filter blocking query
Solutions:
- Verify API key is valid and has credits
- Increase
cost_limitparameter - Review query for content policy issues
Possible causes:
- Complex query with many iterations
- Network issues
- API rate limiting
Solutions:
- Reduce
max_iterations - Use
quickmode for testing - Check your provider's service status page
Possible causes:
- Using 20B model for complex task
- Insufficient context
- Too few iterations
Solutions:
- Upgrade to
gpt-oss-120b - Provide more focused context
-
Increase
max_iterationsand usethoroughmode
Possible causes:
- Safety violations in generated code
- Runtime errors in calculations
- Timeout during execution
Solutions:
- Check error details in output
- Simplify the computational request
- Increase
python_tool_timeout
Symptoms:
- Slow execution
- Timeout errors
- Partial results
Solutions:
- Use shared state manager for concurrent runs
- Reduce concurrent analysis count
- Add delays between requests