Neural Constellation Banner

Documentation

Advanced AI Chain of Thought Analysis

A Python pipeline that breaks complex problems into manageable analytical tasks, executes them using specialized tools, and synthesizes comprehensive answers.

What This System Does

The Chain of Thought Analysis System mimics human-like reasoning by:

  • Analyzing query complexity before planning
  • Generating multiple candidate plans for solving problems
  • Evaluating and selecting the best plan
  • Executing tasks sequentially with immediate quality evaluation
  • Iterating when results don't meet quality thresholds
  • Synthesizing findings into coherent final answers

Design Philosophy

Helping Smaller Models Punch Above Their Weight

This system is designed to help smaller language models deliver analysis far more sophisticated than what one might expect from their size alone. By breaking down complex problems into manageable pieces, structuring the reasoning process, and enforcing quality checks at each step, the system enables models like GPT-OSS 20B to perform analysis that would normally require much larger models.

đź’ˇ

Tip

No single run should be relied upon for critical decisions when using smaller models. The program is async-safe, so you can run multiple analyses simultaneously and compare results. Convergent answers indicate high confidence; divergent answers suggest the problem may need reformulation or a larger model.

Controlled Reasoning

Models like GPT-OSS 20B and 120B include their own internal reasoning capabilities. This program disables that internal reasoning as much as possible because it provides its own structured reasoning framework. The goal is to shift the reasoning process onto the program's architecture, where it can be controlled, monitored, and quality-checked at each step.

Three Core Components

Component Description
The Program Python application that orchestrates planning, execution, quality control, and output generation
Large Language Models GPT-OSS 20B or 120B perform the actual reasoning, planning, and analysis work
API Platform (Groq) Provides ultra-fast inference via Language Processing Units (LPUs)
Three Core Components Architecture

Three Core Components Architecture

How It Works

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#fff', 'primaryBorderColor': '#4a4a6a', 'lineColor': '#6c63ff', 'secondaryColor': '#16213e', 'tertiaryColor': '#0f3460', 'fontFamily': 'JetBrains Mono, monospace'}}}%%
flowchart TD
    subgraph INPUT["📥 INPUT"]
        A[/"User Query"/]
    end

    subgraph ANALYSIS["🔍 ANALYSIS"]
        B["1. Complexity Analysis"]
        C["2. Plan Generation"]
        D["3. Plan Evaluation & Selection"]
    end

    subgraph EXECUTION["⚡ EXECUTION"]
        E["4. Task Execution"]
        F{"5. Iteration
Decision"} end subgraph OUTPUT["📤 OUTPUT"] G["6. Synthesis"] H["7. Focused Answer"] I[/"Final Result"/] end A --> B B --> C C --> D D --> E E --> F F -->|"Continue"| B F -->|"Done"| G G --> H H --> I style A fill:#4a4a6a,stroke:#6c63ff,stroke-width:2px,color:#fff style I fill:#4a4a6a,stroke:#6c63ff,stroke-width:2px,color:#fff style F fill:#0f3460,stroke:#e94560,stroke-width:2px,color:#fff style B fill:#1a1a2e,stroke:#4a4a6a,color:#fff style C fill:#1a1a2e,stroke:#4a4a6a,color:#fff style D fill:#1a1a2e,stroke:#4a4a6a,color:#fff style E fill:#1a1a2e,stroke:#4a4a6a,color:#fff style G fill:#1a1a2e,stroke:#4a4a6a,color:#fff style H fill:#1a1a2e,stroke:#4a4a6a,color:#fff

Pipeline Flow Overview

Each task is evaluated immediately after completion. Only accepted tasks write their results to analysis history, ensuring subsequent tasks see only quality results.

Input Limits

Input Limit Description
Query 1,200 characters The question or task you want analyzed
Context 36,000 characters Background information, data, or documents

These are character limits, not token limits.

⚠️

Warning

The context gets attached to many API calls. Because this program breaks problems into multiple tasks—each requiring its own API call with the context—a very large context would multiply costs significantly and slow down execution.

Best Practice for Context

Rather than feeding in massive documents and hoping the LLM finds what's relevant, pre-process and focus your context:

  1. Extract relevant sections that directly relate to your query
  2. Summarize background information to its essential points
  3. Remove redundant or tangential content
  4. Use another tool (or this program itself) to distill large documents first

Quick Start

Simple One-Shot Analysis

Python
import asyncio
from agent import run_analysis

async def main():
    result = await run_analysis(
        query="What are the key factors affecting renewable energy adoption?",
        context="Focus on economic and policy factors in the United States.",
        groq_api_key="your-groq-api-key",
        params={
            "mode": "balanced",
            "max_iterations": 3,
        }
    )
    
    print(f"Final Answer: {result.final_answer}")
    print(f"Execution Time: {result.execution_time:.2f}s")
    print(f"Success: {result.success}")

asyncio.run(main())

With More Control

Python
import asyncio
from agent import create_agent

async def main():
    agent = await create_agent(
        agent_id="myagnt",
        groq_api_key="your-groq-api-key",
        params={
            "model": "openai/gpt-oss-120b",
            "mode": "thorough",
            "cost_limit": 0.50,
        }
    )
    
    try:
        result = await agent.analyze(
            query="Analyze the potential impact of quantum computing on cryptography.",
            context="Consider both near-term and long-term implications."
        )
        
        print(f"Success: {result.success}")
        print(f"Final Answer: {result.final_answer}")
        
    finally:
        await agent.cleanup()

asyncio.run(main())

Example Output

After running an analysis, you receive a MainAnalysisOutput object:

Python
result = await run_analysis(
    query="What factors affect solar panel efficiency?",
    groq_api_key=key,
    params={"focused_answer_type": "number"}
)

# Access different answer layers
full_synthesis = result.synthesis_display.answer      # Comprehensive analysis
final_answer = result.final_answer_display.answer     # Refined, user-ready response
focused_value = result.focused_answer_display.answer_value  # Constrained answer

# Metadata
print(f"Title: {result.title}")
print(f"Execution time: {result.execution_time:.2f}s")
print(f"Tasks executed: {result.task_count}")
print(f"Iterations: {result.total_iterations}")

# Cost tracking
print(f"Total cost: ${result.cost_snapshot['costs']['total_cost']:.4f}")

Available Models

Model Best For Cost (per M tokens)
GPT-OSS 20B Cost-efficient analysis, faster execution, simpler queries $0.075 input / $0.30 output
GPT-OSS 120B Complex reasoning, research tasks, accuracy-critical work $0.15 input / $0.60 output

Both models have 128K token context windows and are available under Apache 2.0 license.

Execution Modes

Mode Iterations Best For
quick 1-2 Simple questions, fast responses
balanced 2-3 Most general analysis tasks
thorough 3-4 Complex problems, important decisions
research 4-5 Deep research, comprehensive analysis

Requirements

  • Groq API Key (required) — For LLM access
  • Tavily API Key (optional) — Enables real-time news search
  • Replicate API Key (optional) — Enables AI image generation for reports

Documentation Pages

Page Description
Quickstart Entry points, first analysis walkthrough
Settings All 17 user-configurable parameters
Pipeline The 7 pipeline stages in detail
Tools Reasoning, Knowledge, and Python tools
Outputs Understanding analysis results
Concurrent Async usage and ensemble analysis
Reference API keys, environment variables, error handling