The OpenAI API has become the foundation for countless AI applications, from chatbots to code assistants to creative tools. But with the rapid evolution of models—from GPT-4 to GPT-4 Turbo to the recent GPT-4o—and the introduction of the Assistants API, understanding the full landscape can be overwhelming.
In this comprehensive guide, I’ll walk you through everything you need to build production applications with the OpenAI API, including practical code examples and best practices I’ve learned from deploying these systems at scale.
What You’ll Learn
- Understanding the GPT-4 model family and when to use each
- Chat Completions API for conversational applications
- Function Calling for tool use and structured outputs
- The Assistants API for stateful AI agents
- Vision capabilities with GPT-4o
- Production best practices: error handling, rate limits, and cost optimization
Table of Contents
- The GPT-4 Model Family
- Getting Started
- Chat Completions API
- Function Calling
- Vision Capabilities
- The Assistants API
- Streaming Responses
- Production Best Practices
- Cost Optimization
The GPT-4 Model Family
As of September 2024, OpenAI offers several GPT-4 variants, each optimized for different use cases:
| Model | Context Window | Best For | Input / Output (per 1M tokens) |
|---|---|---|---|
| gpt-4o | 128K tokens | Best overall, vision, fast | $5 / $15 |
| gpt-4o-mini | 128K tokens | Cost-effective, fast | $0.15 / $0.60 |
| gpt-4-turbo | 128K tokens | Complex reasoning, legacy | $10 / $30 |
| gpt-4 | 8K tokens | Legacy applications | $30 / $60 |
💡 Recommendation
For most new applications, gpt-4o is the best choice. It offers the best price-to-performance ratio, includes vision capabilities, and is optimized for speed. Use gpt-4o-mini for high-volume, cost-sensitive applications where top-tier quality isn’t critical.
Getting Started
Installation
# Install the OpenAI Python SDK
pip install openai
# For async support (recommended for production)
pip install openai httpx
Authentication
from openai import OpenAI
import os
# Initialize the client
# The SDK will automatically use OPENAI_API_KEY environment variable
client = OpenAI()
# Or explicitly pass the key
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
# For organization-specific billing
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
organization=os.environ.get("OPENAI_ORG_ID"),
)
Chat Completions API
The Chat Completions API is the core interface for interacting with GPT models:
Basic Usage
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a helpful assistant specializing in Python programming."
},
{
"role": "user",
"content": "How do I read a JSON file in Python?"
}
],
temperature=0.7,
max_tokens=500,
)
# Extract the response
answer = response.choices[0].message.content
print(answer)
# Access usage statistics
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
Multi-Turn Conversations
class Conversation:
"""Manage multi-turn conversations with GPT-4o"""
def __init__(self, system_prompt: str, model: str = "gpt-4o"):
self.client = OpenAI()
self.model = model
self.messages = [
{"role": "system", "content": system_prompt}
]
def chat(self, user_message: str) -> str:
# Add user message to history
self.messages.append({"role": "user", "content": user_message})
# Get response
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=0.7,
)
assistant_message = response.choices[0].message.content
# Add assistant response to history
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
def clear_history(self):
"""Keep only the system prompt"""
self.messages = [self.messages[0]]
# Usage
conv = Conversation("You are a helpful coding assistant.")
print(conv.chat("What is a decorator in Python?"))
print(conv.chat("Can you show me an example?")) # Model remembers context
Function Calling
Function calling enables GPT models to invoke external functions, making them ideal for building agents and tool-using applications:
import json
from openai import OpenAI
client = OpenAI()
# Define available functions
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit"
}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "search_database",
"description": "Search the product database",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books"],
"description": "Product category to filter by"
}
},
"required": ["query"]
}
}
}
]
# Actual function implementations
def get_weather(location: str, unit: str = "fahrenheit") -> dict:
"""Mock weather API"""
return {
"location": location,
"temperature": 72 if unit == "fahrenheit" else 22,
"unit": unit,
"conditions": "sunny"
}
def search_database(query: str, category: str = None) -> dict:
"""Mock database search"""
return {
"results": [
{"name": f"{query} Product 1", "price": 29.99},
{"name": f"{query} Product 2", "price": 49.99},
],
"total": 2
}
# Function dispatcher
available_functions = {
"get_weather": get_weather,
"search_database": search_database,
}
def run_conversation(user_message: str):
messages = [{"role": "user", "content": user_message}]
# First API call
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto", # Let the model decide
)
response_message = response.choices[0].message
# Check if the model wants to call functions
if response_message.tool_calls:
messages.append(response_message)
# Execute each function call
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
# Call the function
function_response = available_functions[function_name](**function_args)
# Add function response to messages
messages.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": json.dumps(function_response),
})
# Get final response with function results
second_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
return second_response.choices[0].message.content
return response_message.content
# Usage
print(run_conversation("What's the weather in New York?"))
print(run_conversation("Search for laptops in electronics"))
Vision Capabilities
GPT-4o includes powerful vision capabilities, allowing you to analyze images:
from openai import OpenAI
import base64
client = OpenAI()
# Method 1: URL-based image
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image? Describe it in detail."},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg",
"detail": "high" # or "low" for faster/cheaper processing
}
}
]
}
],
max_tokens=500,
)
print(response.choices[0].message.content)
# Method 2: Base64-encoded image (for local files)
def analyze_local_image(image_path: str, prompt: str) -> str:
with open(image_path, "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
max_tokens=500,
)
return response.choices[0].message.content
# Usage
result = analyze_local_image("screenshot.png", "Extract all text from this screenshot")
The Assistants API
The Assistants API enables you to build stateful AI assistants with persistent threads, file handling, and built-in tools:
from openai import OpenAI
import time
client = OpenAI()
# Step 1: Create an Assistant
assistant = client.beta.assistants.create(
name="Data Analyst",
instructions="""You are a data analyst assistant.
You can analyze CSV files, generate insights, and create visualizations.
Always explain your analysis clearly.""",
model="gpt-4o",
tools=[
{"type": "code_interpreter"}, # Can execute Python code
{"type": "file_search"}, # Can search through files
]
)
print(f"Created assistant: {assistant.id}")
# Step 2: Create a Thread (represents a conversation)
thread = client.beta.threads.create()
print(f"Created thread: {thread.id}")
# Step 3: Add a message to the thread
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Analyze the sales data and identify the top 3 products by revenue."
)
# Step 4: Run the assistant
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
# Step 5: Wait for completion (polling)
def wait_for_run(thread_id: str, run_id: str) -> str:
while True:
run_status = client.beta.threads.runs.retrieve(
thread_id=thread_id,
run_id=run_id
)
if run_status.status == "completed":
# Get the assistant's response
messages = client.beta.threads.messages.list(thread_id=thread_id)
return messages.data[0].content[0].text.value
elif run_status.status == "failed":
raise Exception(f"Run failed: {run_status.last_error}")
elif run_status.status in ["queued", "in_progress"]:
time.sleep(1)
else:
raise Exception(f"Unexpected status: {run_status.status}")
response = wait_for_run(thread.id, run.id)
print(response)
# Continue the conversation in the same thread
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Now create a bar chart of these top products."
)
run2 = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
response2 = wait_for_run(thread.id, run2.id)
print(response2)
File Upload with Assistants
# Upload a file
file = client.files.create(
file=open("sales_data.csv", "rb"),
purpose="assistants"
)
# Create a message with the file
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Analyze this sales data file",
attachments=[
{
"file_id": file.id,
"tools": [{"type": "code_interpreter"}]
}
]
)
Streaming Responses
For real-time applications, streaming provides a better user experience:
from openai import OpenAI
client = OpenAI()
# Streaming chat completions
def stream_response(user_message: str):
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_message}],
stream=True,
)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
print(content, end="", flush=True)
full_response += content
print() # New line at the end
return full_response
# Async streaming (for web applications)
async def async_stream_response(user_message: str):
from openai import AsyncOpenAI
async_client = AsyncOpenAI()
stream = await async_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_message}],
stream=True,
)
async for chunk in stream:
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
# FastAPI example
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
@app.get("/stream")
async def stream_endpoint(prompt: str):
async def generate():
async for chunk in async_stream_response(prompt):
yield f"data: {chunk}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
Production Best Practices
Error Handling and Retries
from openai import OpenAI, RateLimitError, APIError, APIConnectionError
import time
from functools import wraps
def retry_with_exponential_backoff(
max_retries: int = 3,
base_delay: float = 1.0,
max_delay: float = 60.0,
):
"""Decorator for retrying OpenAI API calls with exponential backoff"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
retries = 0
delay = base_delay
while retries <= max_retries:
try:
return func(*args, **kwargs)
except RateLimitError as e:
retries += 1
if retries > max_retries:
raise
# Use retry-after header if available
retry_after = getattr(e, 'retry_after', None)
wait_time = retry_after if retry_after else delay
print(f"Rate limited. Retrying in {wait_time}s...")
time.sleep(wait_time)
delay = min(delay * 2, max_delay)
except APIConnectionError as e:
retries += 1
if retries > max_retries:
raise
print(f"Connection error. Retrying in {delay}s...")
time.sleep(delay)
delay = min(delay * 2, max_delay)
except APIError as e:
# Don't retry on client errors (4xx)
if e.status_code and 400 <= e.status_code < 500:
raise
retries += 1
if retries > max_retries:
raise
print(f"API error. Retrying in {delay}s...")
time.sleep(delay)
delay = min(delay * 2, max_delay)
return wrapper
return decorator
# Usage
@retry_with_exponential_backoff(max_retries=3)
def safe_completion(messages: list) -> str:
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
return response.choices[0].message.content
Timeout Configuration
from openai import OpenAI
import httpx
# Configure timeouts
client = OpenAI(
timeout=httpx.Timeout(
connect=5.0, # Connection timeout
read=30.0, # Read timeout
write=10.0, # Write timeout
pool=10.0, # Pool timeout
),
max_retries=2, # Built-in retry support
)
Cost Optimization
| Strategy | Impact | Implementation |
|---|---|---|
| Use gpt-4o-mini | ~30x cheaper than gpt-4o | For simpler tasks, classification, extraction |
| Prompt caching | 50-90% reduction | Cache responses for repeated queries |
| Limit max_tokens | Variable | Set appropriate limits per use case |
| Batch processing | 50% discount | Use Batch API for non-time-sensitive tasks |
| Compress prompts | 10-30% reduction | Remove redundant text, use abbreviations |
Implementing Semantic Caching
import hashlib
import json
from typing import Optional
import redis
class SemanticCache:
"""Cache OpenAI responses to reduce API costs"""
def __init__(self, redis_client: redis.Redis, ttl: int = 3600):
self.redis = redis_client
self.ttl = ttl
def _generate_key(self, model: str, messages: list) -> str:
"""Generate a cache key from the request"""
content = json.dumps({"model": model, "messages": messages}, sort_keys=True)
return f"openai:cache:{hashlib.sha256(content.encode()).hexdigest()}"
def get(self, model: str, messages: list) -> Optional[str]:
"""Get cached response if available"""
key = self._generate_key(model, messages)
cached = self.redis.get(key)
return cached.decode() if cached else None
def set(self, model: str, messages: list, response: str):
"""Cache a response"""
key = self._generate_key(model, messages)
self.redis.setex(key, self.ttl, response)
# Usage with OpenAI
def cached_completion(
client: OpenAI,
cache: SemanticCache,
model: str,
messages: list
) -> str:
# Check cache first
cached = cache.get(model, messages)
if cached:
return cached
# Make API call
response = client.chat.completions.create(
model=model,
messages=messages,
)
result = response.choices[0].message.content
# Cache the result
cache.set(model, messages, result)
return result
Key Takeaways
- gpt-4o is the best general-purpose model with vision support
- gpt-4o-mini offers excellent value for high-volume applications
- Function calling enables building powerful tool-using agents
- The Assistants API simplifies building stateful applications
- Always implement error handling and retries in production
- Use caching and batching to optimize costs
- Stream responses for better user experience
References
- OpenAI API Documentation
- OpenAI Models Overview
- Function Calling Guide
- Assistants API Overview
- OpenAI Pricing
- OpenAI Cookbook
The OpenAI API continues to evolve rapidly. Stay updated with the official documentation and changelog for the latest features and improvements. With these fundamentals in place, you’re ready to build powerful AI applications.
Building something interesting with the OpenAI API? I’d love to hear about it—connect with me on LinkedIn to share your projects.
Discover more from C4: Container, Code, Cloud & Context
Subscribe to get the latest posts sent to your email.