Overview

TogetherLLMService provides access to Together AI’s language models, including Meta’s Llama 3.1 and 3.2 models, through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management.

Installation

To use TogetherLLMService, install the required dependencies:

pip install pipecat-ai[together]

You’ll also need to set up your Together API key as an environment variable: TOGETHER_API_KEY

Configuration

Constructor Parameters

api_key
str
required

Your Together AI API key

base_url
str
default:
"https://api.together.xyz/v1"

Together AI API endpoint

model
str
default:
"meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"

Model identifier

Input Parameters

Inherits all input parameters from OpenAILLMService:

extra
Optional[Dict[str, Any]]

Additional parameters to pass to the model

frequency_penalty
Optional[float]

Reduces likelihood of repeating tokens based on their frequency. Range: [-2.0, 2.0]

max_completion_tokens
Optional[int]

Maximum number of tokens in the completion. Must be greater than or equal to 1

max_tokens
Optional[int]

Maximum number of tokens to generate. Must be greater than or equal to 1

presence_penalty
Optional[float]

Reduces likelihood of repeating any tokens that have appeared. Range: [-2.0, 2.0]

seed
Optional[int]

Random seed for deterministic generation. Must be greater than or equal to 0

temperature
Optional[float]

Controls randomness in the output. Range: [0.0, 2.0]

top_p
Optional[float]

Controls diversity via nucleus sampling. Range: [0.0, 1.0]

Input Frames

OpenAILLMContextFrame
Frame

Contains OpenAI-specific conversation context

LLMMessagesFrame
Frame

Contains conversation messages

VisionImageRawFrame
Frame

Contains image for vision model processing

LLMUpdateSettingsFrame
Frame

Updates model settings

Output Frames

TextFrame
Frame

Contains generated text chunks

FunctionCallInProgressFrame
Frame

Indicates start of function call

FunctionCallResultFrame
Frame

Contains function call results

Usage Example

from pipecat.services.together import TogetherLLMService
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from openai.types.chat import ChatCompletionToolParam
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineParams, PipelineTask

# Configure service
llm = TogetherLLMService(
    api_key="your-together-api-key",
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
)

# Define tools for function calling
tools = [
    ChatCompletionToolParam(
        type="function",
        function={
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use"
                    }
                },
                "required": ["location", "format"]
            }
        }
    )
]

# Create context with system message and tools
context = OpenAILLMContext(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant in a voice conversation. Keep responses concise."
        }
    ],
    tools=tools
)

# Register function handlers
async def fetch_weather(function_name, tool_call_id, args, llm, context, result_callback):
    await result_callback({"conditions": "nice", "temperature": "75"})

llm.register_function(None, fetch_weather)

# Create context aggregator for message handling
context_aggregator = llm.create_context_aggregator(context)

# Set up pipeline
pipeline = Pipeline([
    transport.input(),
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

# Create and configure task
task = PipelineTask(
    pipeline,
    PipelineParams(
        allow_interruptions=True,
        enable_metrics=True,
        enable_usage_metrics=True,
    ),
)

Methods

See the LLM base class methods for additional functionality.

Function Calling

Supports OpenAI-compatible function calling:

# Define tools
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather information",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            }
        }
    }
}]

# Configure context with tools
context = OpenAILLMContext(
    messages=[],
    tools=tools
)

# Register function handler
@service.function("get_weather")
async def handle_weather(location: str):
    return {"temperature": 72, "condition": "sunny"}

Available Models

Together AI provides access to various models, including:

Model NameDescription
meta-llama/Meta-Llama-3.1-8B-Instruct-TurboLlama 3.1 8B instruct model optimized for speed
meta-llama/Meta-Llama-3.1-34B-InstructLlama 3.1 34B instruct model
meta-llama/Meta-Llama-3.1-70B-InstructLlama 3.1 70B instruct model
meta-llama/Meta-Llama-3.1-405B-InstructLlama 3.1 405B instruct model
meta-llama/Llama-3.2-3B-Instruct-TurboLlama 3.2 3B instruct model optimized for speed
meta-llama/Llama-3.2-11B-Vision-Instruct-TurboLlama 3.2 11B vision & instruct model
meta-llama/Llama-3.2-90B-Vision-Instruct-TurboLlama 3.2 90B vision & instruct model

See Together’s docs for a complete list of supported models.

Frame Flow

Inherits the OpenAI LLM Service frame flow:

Metrics Support

The service collects the same metrics as OpenAILLMService:

  • Token usage (prompt and completion)
  • Processing duration
  • Time to First Byte (TTFB)
  • Function call metrics

Notes

  • OpenAI-compatible interface
  • Supports streaming responses
  • Handles function calling
  • Manages conversation context
  • Includes token usage tracking
  • Thread-safe processing
  • Automatic error handling
  • Inherits OpenAI service features