How to Build Your First AI Agent in Python (Step-by-Step)

Building your first AI agent in Python is less mysterious than it looks. You do not need a giant framework, a complicated planner, or ten specialized sub-agents. You need a clear goal, a few safe tools, a loop that lets the model decide what to do next, and enough logging to understand what happened.

This tutorial walks through a small business-friendly agent: a lead research assistant. Given a company name and website, the agent gathers available context, classifies fit, drafts discovery questions, and returns a structured summary. The same pattern can be adapted for support triage, invoice review, customer onboarding, internal knowledge search, or any AI agent workflow that combines reasoning with tool use.

The code examples use Python because it is widely used for AI projects, easy to read, and friendly to quick automation scripts. The architecture is intentionally simple so you can understand every moving part before adopting a larger framework.

What You Will Build

The agent will complete a narrow workflow:

Accept a company name, website, and target customer profile.
Use a tool to fetch or simulate company research.
Analyze the company against the target profile.
Produce a structured output with fit score, reasoning, risks, discovery questions, and a suggested next action.

In a production system, the research tool might call a search API, CRM, enrichment provider, or internal database. In this tutorial, the tool is a small Python function so you can run the agent locally without building a full integration layer.

This is not a toy chatbot. It is a miniature version of how AI agents for business work: receive a goal, use tools, follow rules, and return an output that fits a workflow.

Project Setup

Create a new folder and virtual environment:

mkdir first-python-ai-agent
cd first-python-ai-agent
python -m venv .venv

Activate the environment:

# macOS or Linux
source .venv/bin/activate

# Windows PowerShell
.venv\\Scripts\\Activate.ps1

Install the packages:

pip install openai python-dotenv pydantic rich

Create a .env file:

OPENAI_API_KEY=your_api_key_here

The tutorial uses an OpenAI-compatible client pattern because it is familiar and supports structured model interactions. If your organization uses another model provider, the surrounding agent design is the same: send instructions, provide context, call tools, validate output, and log the result.

Create this file structure:

first-python-ai-agent/
  .env
  agent.py
  tools.py
  models.py

Define the Output Model

Business agents should produce predictable outputs. Free-form prose is useful for humans, but structured outputs are easier to validate, store, compare, and hand to another system.

Create models.py:

from pydantic import BaseModel, Field


class LeadAssessment(BaseModel):
    company: str
    fit_score: int = Field(ge=1, le=10)
    summary: str
    likely_pain_points: list[str]
    risks_or_unknowns: list[str]
    discovery_questions: list[str]
    suggested_next_action: str

This schema is small but useful. It forces the agent to produce a fit score, list the pain points, surface unknowns, and recommend a next action. Those fields can be displayed in a dashboard or copied into a CRM note.

Create a Safe Tool

Tools are what turn a model response into an agent workflow. A tool can search a database, fetch a URL, create a ticket, draft an email, or update a spreadsheet. Start with read-only tools. Read-only tools are easier to trust and debug.

Create tools.py:

def research_company(company: str, website: str) -> dict:
    """Return lightweight research notes for a company.

    Replace this function with a real search, CRM, or enrichment API when
    you are ready for production use.
    """
    sample_notes = {
        "company": company,
        "website": website,
        "signals": [
            "Publishes frequent product updates",
            "Has a visible sales or demo request motion",
            "Uses multiple customer-facing software tools",
            "Likely has repetitive support and operations workflows",
        ],
        "possible_use_cases": [
            "AI lead qualification",
            "Support ticket triage",
            "Meeting brief generation",
            "CRM follow-up drafting",
        ],
    }
    return sample_notes

The function is simple, but the boundary is important. The agent does not browse the entire internet or write to business systems. It receives a safe tool with a narrow job.

Write the Agent

Create agent.py:

import json
import os
from dotenv import load_dotenv
from openai import OpenAI
from rich import print

from models import LeadAssessment
from tools import research_company

load_dotenv()

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])


SYSTEM_PROMPT = """
You are a careful AI sales operations agent.
Your job is to assess whether a company is a good fit for AI automation services.

Rules:
- Use the provided research notes only.
- Do not invent private facts.
- Be specific about uncertainty.
- Recommend a next action that a human sales rep can approve.
- Return concise, practical business language.
"""


def assess_lead(company: str, website: str, target_profile: str) -> LeadAssessment:
    research = research_company(company, website)

    user_message = {
        "company": company,
        "website": website,
        "target_profile": target_profile,
        "research_notes": research,
    }

    response = client.responses.parse(
        model="gpt-4.1-mini",
        input=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {
                "role": "user",
                "content": f"Assess this lead and return the required structure: {json.dumps(user_message)}",
            },
        ],
        text_format=LeadAssessment,
    )

    return response.output_parsed


if __name__ == "__main__":
    result = assess_lead(
        company="Example SaaS Co",
        website="https://example.com",
        target_profile="B2B companies with support, sales, or operations teams that handle repetitive knowledge work.",
    )
    print(result.model_dump())

Run it:

python agent.py

You should receive a structured assessment. The exact wording will vary, but the output should match the LeadAssessment schema.

Add Tool Calling Logic

The previous version calls the tool directly before the model runs. That is often enough for a first workflow. In a more agentic design, the model can decide when to request a tool. The surrounding program still controls which tools exist and whether the request is allowed.

Here is a simplified tool-calling loop:

TOOLS = {
    "research_company": research_company,
}


def run_tool(name: str, arguments: dict) -> dict:
    if name not in TOOLS:
        raise ValueError(f"Unknown tool: {name}")
    return TOOLS[name](**arguments)

In production, you would add permission checks, argument validation, timeouts, retries, and logging. A tool call should never be an invisible side effect. Even a read-only call should be recorded so you can debug bad outputs later.

Add Memory for the Task

Agents need memory for the task they are performing. For a small script, memory can be a list of events. For a production system, memory might live in a database or trace store.

Add this helper:

from dataclasses import dataclass, field
from datetime import datetime, timezone


@dataclass
class AgentEvent:
    type: str
    payload: dict
    created_at: str = field(default_factory=lambda: datetime.now(timezone.utc).isoformat())


class TaskMemory:
    def __init__(self) -> None:
        self.events: list[AgentEvent] = []

    def add(self, event_type: str, payload: dict) -> None:
        self.events.append(AgentEvent(type=event_type, payload=payload))

    def as_dict(self) -> list[dict]:
        return [event.__dict__ for event in self.events]

Then log the workflow:

memory = TaskMemory()
memory.add("task_started", {"company": company, "website": website})
research = research_company(company, website)
memory.add("research_completed", research)

This looks basic, but it creates an audit trail. When the agent produces a strange fit score, you can inspect which research notes it saw.

Add Business Rules

AI agents should not rely on the model for every rule. Use code for deterministic checks. For example, if your sales team only works with B2B companies above a certain size, enforce that outside the model when you have the data.

Here is a simple validation function:

def validate_assessment(assessment: LeadAssessment) -> list[str]:
    warnings = []
    if assessment.fit_score >= 8 and len(assessment.risks_or_unknowns) == 0:
        warnings.append("High fit score should still include at least one risk or unknown.")
    if len(assessment.discovery_questions) < 3:
        warnings.append("Include at least three discovery questions.")
    return warnings

Call it after the model returns:

warnings = validate_assessment(result)
if warnings:
    print({"validation_warnings": warnings})

Business rules are one of the easiest ways to make an AI agent more reliable. They also make the system easier to explain to stakeholders.

Add Human Approval

For a first production workflow, avoid direct external actions. Instead of sending an email, create a draft. Instead of changing a CRM opportunity stage, suggest the update. Instead of approving an invoice, prepare an approval packet.

You can represent approval in code:

def request_human_approval(assessment: LeadAssessment) -> bool:
    print("\\nSuggested next action:")
    print(assessment.suggested_next_action)
    answer = input("\\nApprove this recommendation? [y/N] ")
    return answer.lower().strip() == "y"

Then:

if request_human_approval(result):
    print("Approved. In production, this is where you would create a CRM note or draft email.")
else:
    print("Not approved. Add feedback and improve the workflow.")

This pattern is important for AI automation for small business. You can get value quickly without giving the agent unsafe authority.

Improve the Prompt

A good prompt for an agent is not a clever sentence. It is an operating procedure. It should include the role, goal, constraints, output expectations, and escalation rules.

For this lead agent, you might improve the system prompt:

You are a sales operations agent for an AI automation consultancy.
Assess whether the company matches the target customer profile.

Use these rules:
- Score 1-3 for poor fit, 4-6 for possible fit, 7-10 for strong fit.
- Mention uncertainty when research notes are thin.
- Prefer practical operational pain points over generic AI claims.
- Do not recommend immediate outreach if the company appears outside the target profile.
- Always include questions a human rep can ask on a discovery call.

The more your workflow depends on business nuance, the more examples you should include. Save good human-written outputs and use them as reference cases.

Test with Examples

Do not test your agent only with one friendly input. Create a small test set:

EXAMPLES = [
    {
        "company": "Example SaaS Co",
        "website": "https://example.com",
        "target_profile": "B2B SaaS with sales and support teams",
    },
    {
        "company": "Local Bakery",
        "website": "https://localbakery.example",
        "target_profile": "B2B SaaS with sales and support teams",
    },
    {
        "company": "Regional IT Services Firm",
        "website": "https://itservices.example",
        "target_profile": "Service businesses with repetitive support workflows",
    },
]

for example in EXAMPLES:
    result = assess_lead(**example)
    print(result.model_dump())

Look for patterns. Does the agent over-score every company? Does it invent details? Does it ask useful discovery questions? Does it handle weak research notes honestly? These observations are more valuable than a single perfect demo.

Production Checklist

Before using an agent in a real business workflow, cover the basics:

Use environment variables for secrets.
Log model inputs, outputs, tool calls, and validation warnings.
Keep tools narrow and permissioned.
Separate read, draft, and write actions.
Add human approval for external or sensitive actions.
Validate structured outputs.
Test against realistic examples.
Track cost, latency, and correction rate.
Document who owns the agent.

This checklist is intentionally practical. Most agent failures happen because teams skip ordinary software discipline, not because the model is incapable.

Troubleshooting Common Problems

If your agent returns vague answers, the prompt probably does not include enough business context or examples. Add a target customer profile, a scoring rubric, and two or three sample outputs that show the level of specificity you expect. Models often become more useful when they can see what "good" looks like.

If the agent invents facts, tighten the rules and improve retrieval. Tell the agent to use only provided research notes and to list unknowns instead of guessing. In production, require source citations for claims that affect business decisions.

If outputs are inconsistent, use structured schemas, deterministic validation, and a lower temperature setting when your provider supports it. You can also split the workflow into smaller steps: research summary first, fit scoring second, draft recommendation third. Smaller steps are easier to evaluate.

If the agent is too expensive, inspect the trace. You may be sending too much context, using a larger model than needed, or repeating the same research on every run. Cache stable information, summarize long records before sending them to the model, and reserve premium models for high-value or ambiguous cases.

Where to Go Next

Once this first agent works, you can extend it in several directions. Add a real search API. Connect it to a CRM. Store assessments in a database. Build a small web dashboard. Add retrieval over your sales playbook. Compare model outputs against a saved evaluation set. Add a second tool that drafts a follow-up email after approval.

You can also decide whether a no-code AI agent platform is enough. If the workflow is mostly linear and uses standard apps, no-code may be faster. If you need custom logic, strict validation, or deeper integrations, Python gives you more control.

The most important lesson is that an AI agent is a workflow, not a prompt. Start small, make the tools safe, structure the output, and measure whether the agent helps the business. That foundation will serve you better than jumping directly into complex multi-agent orchestration.