Required API Keys
- Service
- Groq — Ultra-fast LLM Inference
- Website
- groq.com
- Purpose
- Access to GPT-OSS 20B and 120B models via Groq's Language Processing Units (LPUs)
Requirements:
- Register for a Groq account
- Obtain a paid API key
-
Pass to the system via
groq_api_keyparameter
Pricing (as of October 2025):
| Model | Input (per M tokens) | Cached (per M tokens) | Output (per M tokens) |
|---|---|---|---|
| GPT-OSS 20B | $0.075 | $0.037 | $0.300 |
| GPT-OSS 120B | $0.150 | $0.075 | $0.600 |
Usage:
result = await run_analysis(
query="Your analysis query",
groq_api_key="gsk_xxxxxxxxxxxx",
)
- Service
- Tavily — AI-optimized Search API
- Website
- tavily.com
- Purpose
- Real-time news search capabilities for current events analysis
Requirements:
- Register for a Tavily account
- Obtain an API key
-
Pass to the system via
tavily_api_keyparameter
Usage:
result = await run_analysis(
query="Analyze recent developments in AI regulation",
groq_api_key="gsk_xxxxxxxxxxxx",
tavily_api_key="tvly-xxxxxxxxxxxx",
)
Benefit: When provided, the
news_api_client tool becomes available, allowing
the system to search for and incorporate recent news articles
into its analysis.
- Service
- Replicate — ML Model Hosting
- Website
- replicate.com
- Purpose
- Generate AI images to accompany analysis reports
Requirements:
- Register for a Replicate account
- Obtain an API key
-
Pass to the system via
replicate_api_keyparameter
Usage:
result = await run_analysis(
query="Analyze the future of sustainable architecture",
groq_api_key="gsk_xxxxxxxxxxxx",
replicate_api_key="r8_xxxxxxxxxxxx",
)
Benefit: When provided, the system can automatically generate relevant AI images to enhance the final HTML report output.
API Key Summary
| API Key | Required? | Purpose | Parameter Name |
|---|---|---|---|
| Groq | Required | LLM access | groq_api_key |
| Tavily | Optional | News search | tavily_api_key |
| Replicate | Optional | Image generation | replicate_api_key |
Available Models
Identifier: openai/gpt-oss-20b
| Description | OpenAI's compact open-weight Mixture of Experts (MoE) model optimized for cost-efficient deployment |
|---|---|
| Size | 21 billion total parameters, 3.6 billion active per token (32 experts, Top-4 routing) |
| Architecture | MoE with 24 layers, Grouped Query Attention, RMSNorm |
| Context Window | 128K tokens |
| Speed | 1000+ tokens/second on Groq infrastructure |
| License | Apache 2.0 (fully open for commercial use) |
Hardware Requirements: Can run on high-end consumer GPUs with at least 16-20 GB VRAM (NVIDIA RTX 4090/5090). Using MXFP4 quantization enables fast, efficient local inference. 24+ GB VRAM recommended for optimal performance.
Best For: Cost-efficient agentic workflows, tool calling, web browsing, code execution.
Identifier:
openai/gpt-oss-120b
| Description | OpenAI's larger open-weight MoE model for complex tasks |
|---|---|
| Size | 120 billion total parameters |
| Context Window | 128K tokens |
| License | Apache 2.0 (fully open for commercial use) |
Hardware Requirements: Requires a single 80GB H100 GPU (typically accessed via data center or cloud).
Best For: Complex reasoning, advanced code generation, research tasks.
Why These Models?
This program relies extensively on advanced structured data output—the ability for LLMs to return responses in precise, validated formats (Pydantic models). The GPT-OSS models were specifically trained by OpenAI to handle structured data as part of their training process, making them ideal for this application where every response must conform to a specific schema.
Error Handling
Cost Safeguards
The system tracks costs continuously and enforces limits:
# Internal cost checking (from source)
current_cost = await cost_tracker.get_total_cost()
if current_cost > max_cost:
raise CostFailsafeError(
message=f"Cost limit exceeded: ${current_cost:.4f} > ${max_cost}",
current_cost=current_cost,
cost_limit=max_cost,
)
Task Evaluation Recovery
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#fff', 'primaryBorderColor': '#4a4a6a', 'lineColor': '#6c63ff', 'fontFamily': 'JetBrains Mono, monospace'}}}%%
flowchart TD
A["Task Completes"] --> B{"Status
Check"}
B -->|"Error"| C["Automatic Rejection
no LLM call"]
B -->|"Success"| D["LLM Evaluation"]
D --> E{"Quality
Assessment"}
E -->|"Accepted"| F["Write to
Analysis History"]
E -->|"Rejected"| G["Log Reason
Don't Write"]
C --> H(["Continue Pipeline"])
F --> H
G --> H
style A fill:#16213e,stroke:#4a4a6a,color:#fff
style B fill:#0f3460,stroke:#e94560,stroke-width:2px,color:#fff
style C fill:#6b2737,stroke:#e94560,color:#fff
style D fill:#16213e,stroke:#4a4a6a,color:#fff
style E fill:#0f3460,stroke:#e94560,stroke-width:2px,color:#fff
style F fill:#0d7377,stroke:#14ffec,stroke-width:2px,color:#fff
style G fill:#6b2737,stroke:#e94560,color:#fff
style H fill:#4a4a6a,stroke:#6c63ff,color:#fff
Only accepted tasks inform subsequent analysis, preventing error propagation.
Troubleshooting
Possible causes:
- Invalid or missing Groq API key
- Cost limit set too low
- Content filter blocking query
Solutions:
- Verify API key is valid and has credits
- Increase
cost_limitparameter - Review query for content policy issues
Possible causes:
- Complex query with many iterations
- Network issues
- API rate limiting
Solutions:
- Reduce
max_iterations - Use
quickmode for testing - Check Groq service status
Possible causes:
- Using 20B model for complex task
- Insufficient context
- Too few iterations
Solutions:
- Upgrade to
gpt-oss-120b - Provide more focused context
-
Increase
max_iterationsand usethoroughmode
Possible causes:
- Safety violations in generated code
- Runtime errors in calculations
- Timeout during execution
Solutions:
- Check error details in output
- Simplify the computational request
- Increase
python_tool_timeout
Symptoms:
- Slow execution
- Timeout errors
- Partial results
Solutions:
- Use shared state manager for concurrent runs
- Reduce concurrent analysis count
- Add delays between requests