함수 호출(Function Calling)과 도구 사용(Tool Use)

LLM은 사실 아무것도 직접 수행하지 못합니다. LLM이 실제로 하는 일은 텍스트를 생성하는 것뿐이며, 그것이 LLM이 가진 능력의 전부입니다. LLM은 혼자서 날씨를 확인하거나, 데이터베이스에 질의하거나, 이메일을 보내거나, 코드를 실행하거나, 파일을 읽을 수 없습니다. 지금까지 사람들이 봐온 모든 "AI 에이전트(AI agent)"는 어떤 함수(Function)를 어떤 인자(Arguments)로 호출(Call)해야 하는지를 JSON 형태로 판단해 전달하는 LLM과, 그 호출을 실제로 실행하는 애플리케이션 코드가 결합된 구조입니다. 결국 모델은 사고와 판단을 담당하는 두뇌이고, 도구(Tools)는 실제 행동을 수행하는 손발이며, 함수 호출(Function Calling; 펑션 콜링)은 그 둘을 이어주는 신경계 역할을 합니다.

유형: Build 언어: Python 선수 지식: Phase 11 Lesson 03 (Structured Outputs) 예상 시간: 약 75분 관련: Phase 11 · 14 (Model Context Protocol) — 도구(tool)를 여러 호스트(host)에서 공유해야 한다면, 인라인 함수 호출(inline function-calling)에서 MCP 서버로 넘어가야 합니다. 이 강의는 인라인 방식의 경우를 다루고, MCP 강의는 프로토콜 방식의 경우를 다룹니다.

학습 목표

도구 스키마(tool schema)를 정의하고, 모델의 도구 호출(tool-call) JSON을 파싱하며, 함수를 실행하고 결과를 다시 돌려주는 함수 호출 루프(function calling loop)를 구현합니다.
모델이 안정적으로 호출(invoke)할 수 있도록, 명확한 설명(description)과 타입이 지정된 매개변수(typed parameters)를 갖춘 도구 스키마를 설계합니다.
복잡한 질의(query)에 답하기 위해 여러 함수 호출을 연결(chain)하는 다중 턴(multi-turn) 에이전트 루프를 만듭니다.
병렬 도구 호출(parallel tool calls), 오류 전파(error propagation), 무한 도구 루프(infinite tool loop) 방지 같은 함수 호출의 경계 상황(edge case)을 처리합니다.

문제

여러분이 챗봇(chatbot)을 만들었습니다. 사용자가 묻습니다. "지금 도쿄의 날씨는 어떤가요?"

모델은 이렇게 답합니다. "저는 실시간 날씨 데이터에 접근할 수 없습니다. 다만 계절을 기준으로 추측해 보면, 도쿄는 대략 섭씨 15도 정도일 것입니다..."

이것은 면책 문구(disclaimer)를 입힌 환각(Hallucination)일 뿐입니다. 모델은 실제 날씨를 모르고, 앞으로도 알 수 없습니다. 날씨는 매시간 바뀌고, 모델의 학습 데이터(training data)는 이미 몇 달 전 시점에 멈춰 있습니다.

올바른 답을 얻으려면 OpenWeatherMap API를 호출해 현재 기온을 가져오고, 실제 숫자를 돌려주어야 합니다. 모델은 API를 호출하지 못합니다. 호출하는 일은 여러분의 코드가 합니다. 비어 있는 조각은 모델이 "이 인자로 날씨 API를 호출해야 합니다"라고 말할 수 있게 해 주는 구조화된 프로토콜(structured protocol)과, 그 호출을 코드가 실행한 뒤 결과를 다시 모델에게 돌려주는 흐름입니다.

이것이 바로 함수 호출입니다. 모델은 어떤 함수를 어떤 인자로 실행해야 하는지를 설명하는 구조화된 JSON을 출력합니다. 애플리케이션이 그 함수를 실행합니다. 결과는 다시 대화(conversation)에 들어갑니다. 모델은 그 결과를 사용해 최종 답을 만들어 냅니다.

함수 호출이 없으면 LLM은 백과사전입니다. 함수 호출이 있으면 LLM은 에이전트(agent)가 됩니다.

사전 테스트

2문제 · 이 강의를 시작하기 전에 얼마나 알고 있는지 확인해보세요

1.LLM이 실제로 함수를 실행하거나 외부 시스템에 직접 접근할 수 있나요?

2.함수 호출(Function Calling)에서 도구 스키마(tool schema)란 무엇인가요?

0/2 답변 완료

개념

함수 호출 루프

모든 도구 사용 상호작용은 동일한 5단계 루프를 따릅니다.

sequenceDiagram
    participant U as User
    participant A as Application
    participant M as Model
    participant T as Tool

    U->>A: "What's the weather in Tokyo?"
    A->>M: messages + tool definitions
    M->>A: tool_call: get_weather(city="Tokyo")
    A->>T: Execute get_weather("Tokyo")
    T->>A: {"temp": 18, "condition": "cloudy"}
    A->>M: tool_result + conversation
    M->>A: "It's 18C and cloudy in Tokyo."
    A->>U: Final response

1단계: 사용자가 메시지를 보냅니다. 2단계: 모델은 메시지와 함께 도구 정의(tool definitions)를 받습니다. 도구 정의는 사용 가능한 함수를 설명하는 JSON Schema입니다. 3단계: 모델은 일반 텍스트 응답 대신 도구 호출(tool call)을 출력합니다. 이것은 함수 이름과 인자를 담은 구조화된 JSON 객체입니다. 4단계: 여러분의 코드가 함수를 실행하고 결과를 수집합니다. 5단계: 결과가 다시 모델로 들어가고, 모델은 실제 데이터를 사용해 최종 답을 만들어 냅니다.

모델은 어떤 것도 직접 실행하지 않습니다. 무엇을 어떤 인자로 호출할지 결정할 뿐입니다. 실행 주체(executor)는 여러분의 코드입니다.

도구 정의: JSON Schema 계약

각 도구는 그 함수가 무엇을 하는지, 어떤 인자를 받는지, 인자의 타입이 무엇인지를 모델에게 알려주는 JSON Schema로 정의됩니다.

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather for a city. Returns temperature in Celsius and conditions.",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "description": "City name, e.g. 'Tokyo' or 'San Francisco'"
        },
        "units": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"],
          "description": "Temperature units"
        }
      },
      "required": ["city"]
    }
  }
}

description 필드는 결정적으로 중요합니다. 모델은 이 설명을 읽고 도구를 언제, 어떻게 사용할지 판단합니다. "gets weather" 같은 모호한 설명은 "Get current weather for a city. Returns temperature in Celsius and conditions."처럼 구체적인 설명에 비해 도구 선택(tool selection) 품질을 떨어뜨립니다. 설명은 곧 도구 선택을 위한 프롬프트(prompt)입니다.

공급자 비교(Provider Comparison)

주요 공급자(provider)는 모두 함수 호출을 지원하지만, API 표면(API surface)이 조금씩 다릅니다.

Provider	API Parameter	Tool Call Format	Parallel Calls	Forced Calling
OpenAI (GPT-5, o4)	`tools`	`tool_calls[].function`	Yes (multiple per turn)	`tool_choice="required"`
Anthropic (Claude 4.6/4.7)	`tools`	`content[].type="tool_use"`	Yes (multiple blocks)	`tool_choice={"type":"any"}`
Google (Gemini 3)	`function_declarations`	`functionCall`	Yes	`function_calling_config`
Open-weight (Llama 4, Qwen3, DeepSeek-V3)	Llama 4는 네이티브 `tools`, 나머지는 Hermes 또는 ChatML	Mixed	Model-dependent	프롬프트 기반 또는 지원 시 `tool_choice`

2026년 기준으로 폐쇄형(closed) 공급자 세 곳은 거의 동일한 JSON-Schema 기반 형식으로 수렴했습니다. Llama 4는 OpenAI 형식과 맞는 네이티브 tools 필드를 함께 제공합니다. 오픈 웨이트(open-weight) 미세 조정(fine-tune) 모델들은 여전히 형식이 다양한데, 그중 NousResearch의 Hermes 형식이 서드파티 미세 조정에서 가장 흔합니다. 여러 호스트 사이에서 공유해야 하는 도구라면 인라인 함수 호출보다는 MCP(Phase 11 · 14)를 선호하세요. 서버를 하나 만들면 모든 클라이언트가 같은 도구를 함께 사용할 수 있습니다.

도구 선택 모드: Auto, Required, Specific

여러분은 모델이 언제 도구를 사용할지 직접 제어할 수 있습니다.

Auto(기본값): 모델이 도구를 호출할지 직접 텍스트로 답할지를 스스로 결정합니다. "2+2는 얼마인가요?"는 모델이 바로 답하고, "지금 날씨가 어때요?"는 도구를 호출합니다.

Required: 모델은 적어도 하나의 도구를 반드시 호출해야 합니다. 사용자의 의도가 도구를 필요로 한다는 것을 이미 알고 있을 때 사용합니다. 모델이 실제 데이터를 조회하지 않고 임의로 추측해 버리는 상황을 막아 줍니다.

Specific function: 특정 함수를 반드시 호출하도록 강제합니다. tool_choice={"type":"function", "function": {"name": "get_weather"}}는 질의 내용과 무관하게 날씨 도구가 호출되도록 보장합니다. 상위 로직(upstream logic)이 이미 어떤 도구를 써야 할지 결정한 라우팅(routing) 상황에서 사용합니다.

병렬 함수 호출

GPT-4o와 Claude는 한 번의 턴(single turn)에 여러 함수를 한꺼번에 호출할 수 있습니다. 사용자가 "도쿄와 뉴욕의 날씨는 어때요?"라고 묻는다면, 모델은 두 개의 도구 호출을 동시에 출력합니다.

[
  {"name": "get_weather", "arguments": {"city": "Tokyo"}},
  {"name": "get_weather", "arguments": {"city": "New York"}}
]

여러분의 코드는 두 호출을 모두 실행하고, 가능하면 동시에(concurrently) 처리합니다. 두 결과를 함께 돌려주면 모델은 한 번의 응답으로 종합 답변을 만들어 냅니다. 왕복(round trip)이 두 번에서 한 번으로 줄어듭니다. 질의 하나에 도구를 5~~10번 호출하는 에이전트라면, 병렬 호출만으로도 지연 시간(latency)을 60~~80% 줄일 수 있습니다.

구조화된 출력(Structured Outputs)과 함수 호출의 차이

Lesson 03에서는 구조화된 출력을 다뤘습니다. 함수 호출도 동일한 JSON Schema 메커니즘을 사용하지만 목적이 다릅니다.

구조화된 출력: 모델이 특정 형태(shape)의 데이터를 만들도록 강제합니다. 출력 자체가 최종 결과물입니다. 예를 들어 텍스트에서 상품 정보를 {name, price, in_stock} 형태로 추출하는 경우가 여기에 해당합니다.

함수 호출: 모델이 어떤 동작(action)을 실행하겠다는 의도를 선언합니다. 출력은 중간 단계(intermediate step)일 뿐입니다. 예를 들어 get_weather(city="Tokyo")는 최종 답이 아니라 행동 요청입니다.

데이터 추출이 목적이라면 구조화된 출력을 사용하세요. 모델이 외부 시스템과 상호작용해야 한다면 함수 호출을 사용하세요.

보안: 절대 양보할 수 없는 규칙

함수 호출은 LLM에게 줄 수 있는 가장 위험한 능력입니다. 무엇을 실행할지 모델이 선택하기 때문입니다. 도구 집합에 데이터베이스 질의가 들어 있으면 모델이 그 질의를 직접 구성합니다. 셸 명령(shell command)이 들어 있으면 모델이 명령을 작성합니다.

Rule 1: 모델이 생성한 SQL을 데이터베이스에 그대로 전달하지 않습니다. 모델은 DROP TABLE, UNION 인젝션(injection), 모든 행을 반환하는 질의를 만들 수 있고 실제로 만듭니다. 항상 매개변수화(parameterize)하고, 항상 검증(validate)하고, 항상 허용 목록(allowlist)에 있는 연산만 사용하세요.

Rule 2: 함수 허용 목록을 사용합니다. 모델은 여러분이 명시적으로 정의한 함수만 호출할 수 있어야 합니다. "이름으로 임의의 함수를 실행"하는 범용 도구는 절대 만들지 마세요. 내부 함수가 50개 있더라도 사용자에게 필요한 5개만 노출(expose)합니다.

Rule 3: 인자를 검증합니다. 모델은 도시 이름에 "; DROP TABLE users; --"를 넣을 수도 있습니다. 실행 전에 모든 인자를 기대 타입, 범위, 형식에 맞게 검증해야 합니다.

Rule 4: 도구 결과를 정제(sanitize)합니다. 도구가 민감 데이터(API 키, 개인 식별 정보(PII), 내부 오류 메시지 등)를 반환하는 경우, 그 결과를 모델에 돌려주기 전에 걸러 내야 합니다. 모델은 도구 결과를 응답에 그대로 포함시킬 수 있습니다.

Rule 5: 도구 호출에 속도 제한(rate limit)을 둡니다. 루프에 빠진 모델은 도구를 수백 번 호출할 수 있습니다. 최댓값을 설정하세요. 대화 한 번당 10~20회 정도가 합리적입니다. 무한 루프(infinite loop)는 끊어 주어야 합니다.

오류 처리

도구는 실패합니다. API는 시간이 초과(timeout)될 수 있고, 데이터베이스는 다운될 수 있으며, 파일은 존재하지 않을 수 있습니다. 모델은 도구가 실패했는지, 왜 실패했는지를 알아야 합니다.

오류는 예외(exception)가 아니라 구조화된 도구 결과(structured tool result)로 반환하세요.

{
  "error": true,
  "message": "City 'Toky' not found. Did you mean 'Tokyo'?",
  "code": "CITY_NOT_FOUND"
}

모델은 이 메시지를 읽고 인자를 조정해 다시 시도합니다. 모델은 구조화된 오류 메시지에서 스스로를 교정(self-correct)하는 데 강합니다. 빈 응답이나 "something went wrong" 같은 두루뭉술한 오류에서는 회복(recovery)이 어렵습니다.

MCP: 모델 컨텍스트 프로토콜(Model Context Protocol)

MCP는 Anthropic이 만든, 도구 상호운용성(interoperability)을 위한 개방형 표준입니다. 애플리케이션마다 자체 도구를 정의하는 대신, MCP는 보편적인 프로토콜을 제공합니다. 도구는 MCP 서버가 제공하고, Claude Code, Cursor 같은 클라이언트나 여러분의 애플리케이션이 MCP 클라이언트로서 그 도구를 소비합니다.

하나의 MCP 서버는 호환되는 어떤 클라이언트에게도 도구를 노출할 수 있습니다. Postgres MCP 서버는 호환되는 모든 에이전트에게 데이터베이스 접근을 제공합니다. GitHub MCP 서버는 저장소(repository) 접근을 제공합니다. 도구를 한 번 정의해 두면 어디서든 재사용할 수 있습니다.

함수 호출 입장에서 보면 MCP는 네트워킹에서의 HTTP와 같은 위치입니다. 전송 계층(transport layer)을 표준화해 도구를 이식 가능한(portable) 자원으로 바꿉니다.

직접 만들기

Step 1: 도구 레지스트리 정의

도구 정의와 그 구현을 함께 저장하는 레지스트리(registry)를 만듭니다. 각 도구는 모델이 보는 JSON Schema 정의와, 여러분의 코드가 실제로 실행하는 Python 함수를 함께 가집니다.

import json
import math
import time
import hashlib


TOOL_REGISTRY = {}


def register_tool(name, description, parameters, function):
    TOOL_REGISTRY[name] = {
        "definition": {
            "type": "function",
            "function": {
                "name": name,
                "description": description,
                "parameters": parameters,
            },
        },
        "function": function,
    }

Step 2: 5개의 도구 구현

계산기, 날씨 조회, 웹 검색 시뮬레이터, 파일 읽기, 코드 실행기를 차례로 만듭니다.

def calculator(expression, precision=2):
    allowed = set("0123456789+-*/.() ")
    if not all(c in allowed for c in expression):
        return {"error": True, "message": f"Invalid characters in expression: {expression}"}
    try:
        result = eval(expression, {"__builtins__": {}}, {"math": math})
        return {"result": round(float(result), precision), "expression": expression}
    except Exception as e:
        return {"error": True, "message": str(e)}


WEATHER_DB = {
    "tokyo": {"temp_c": 18, "condition": "cloudy", "humidity": 72, "wind_kph": 14},
    "new york": {"temp_c": 22, "condition": "sunny", "humidity": 45, "wind_kph": 8},
    "london": {"temp_c": 12, "condition": "rainy", "humidity": 88, "wind_kph": 22},
    "san francisco": {"temp_c": 16, "condition": "foggy", "humidity": 80, "wind_kph": 18},
    "sydney": {"temp_c": 25, "condition": "sunny", "humidity": 55, "wind_kph": 10},
}


def get_weather(city, units="celsius"):
    key = city.lower().strip()
    if key not in WEATHER_DB:
        suggestions = [c for c in WEATHER_DB if c.startswith(key[:3])]
        return {
            "error": True,
            "message": f"City '{city}' not found.",
            "suggestions": suggestions,
            "code": "CITY_NOT_FOUND",
        }
    data = WEATHER_DB[key].copy()
    if units == "fahrenheit":
        data["temp_f"] = round(data["temp_c"] * 9 / 5 + 32, 1)
        del data["temp_c"]
    data["city"] = city
    return data


SEARCH_DB = {
    "python function calling": [
        {"title": "OpenAI Function Calling Guide", "url": "https://platform.openai.com/docs/guides/function-calling", "snippet": "Learn how to connect LLMs to external tools."},
        {"title": "Anthropic Tool Use", "url": "https://docs.anthropic.com/en/docs/tool-use", "snippet": "Claude can interact with external tools and APIs."},
    ],
    "MCP protocol": [
        {"title": "Model Context Protocol", "url": "https://modelcontextprotocol.io", "snippet": "An open standard for connecting AI models to data sources."},
    ],
    "weather API": [
        {"title": "OpenWeatherMap API", "url": "https://openweathermap.org/api", "snippet": "Free weather API with current, forecast, and historical data."},
    ],
}


def web_search(query, max_results=3):
    key = query.lower().strip()
    for db_key, results in SEARCH_DB.items():
        if db_key in key or key in db_key:
            return {"query": query, "results": results[:max_results], "total": len(results)}
    return {"query": query, "results": [], "total": 0}


FILE_SYSTEM = {
    "data/config.json": '{"model": "gpt-4o", "temperature": 0.7, "max_tokens": 4096}',
    "data/users.csv": "name,email,role\nAlice,alice@example.com,admin\nBob,bob@example.com,user",
    "README.md": "# My Project\nA tool-use agent built from scratch.",
}


def read_file(path):
    if ".." in path or path.startswith("/"):
        return {"error": True, "message": "Path traversal not allowed.", "code": "FORBIDDEN"}
    if path not in FILE_SYSTEM:
        available = list(FILE_SYSTEM.keys())
        return {"error": True, "message": f"File '{path}' not found.", "available_files": available, "code": "NOT_FOUND"}
    content = FILE_SYSTEM[path]
    return {"path": path, "content": content, "size_bytes": len(content), "lines": content.count("\n") + 1}


def run_code(code, language="python"):
    if language != "python":
        return {"error": True, "message": f"Language '{language}' not supported. Only 'python' is available."}
    forbidden = ["import os", "import sys", "import subprocess", "exec(", "eval(", "__import__", "open("]
    for pattern in forbidden:
        if pattern in code:
            return {"error": True, "message": f"Forbidden operation: {pattern}", "code": "SECURITY_VIOLATION"}
    try:
        local_vars = {}
        exec(code, {"__builtins__": {"print": print, "range": range, "len": len, "str": str, "int": int, "float": float, "list": list, "dict": dict, "sum": sum, "min": min, "max": max, "abs": abs, "round": round, "sorted": sorted, "enumerate": enumerate, "zip": zip, "map": map, "filter": filter, "math": math}}, local_vars)
        result = local_vars.get("result", None)
        return {"success": True, "result": result, "variables": {k: str(v) for k, v in local_vars.items() if not k.startswith("_")}}
    except Exception as e:
        return {"error": True, "message": f"{type(e).__name__}: {e}"}

Step 3: 모든 도구 등록

def register_all_tools():
    register_tool(
        "calculator", "Evaluate a mathematical expression. Supports +, -, *, /, parentheses, and decimals. Returns the numeric result.",
        {"type": "object", "properties": {"expression": {"type": "string", "description": "Math expression, e.g. '(10 + 5) * 3'"}, "precision": {"type": "integer", "description": "Decimal places in result", "default": 2}}, "required": ["expression"]},
        calculator,
    )
    register_tool(
        "get_weather", "Get current weather for a city. Returns temperature, condition, humidity, and wind speed.",
        {"type": "object", "properties": {"city": {"type": "string", "description": "City name, e.g. 'Tokyo' or 'San Francisco'"}, "units": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature units, defaults to celsius"}}, "required": ["city"]},
        get_weather,
    )
    register_tool(
        "web_search", "Search the web for information. Returns a list of results with title, URL, and snippet.",
        {"type": "object", "properties": {"query": {"type": "string", "description": "Search query"}, "max_results": {"type": "integer", "description": "Maximum results to return", "default": 3}}, "required": ["query"]},
        web_search,
    )
    register_tool(
        "read_file", "Read the contents of a file. Returns the file content, size, and line count.",
        {"type": "object", "properties": {"path": {"type": "string", "description": "Relative file path, e.g. 'data/config.json'"}}, "required": ["path"]},
        read_file,
    )
    register_tool(
        "run_code", "Execute Python code in a sandboxed environment. Set a 'result' variable to return output.",
        {"type": "object", "properties": {"code": {"type": "string", "description": "Python code to execute"}, "language": {"type": "string", "enum": ["python"], "description": "Programming language"}}, "required": ["code"]},
        run_code,
    )

Step 4: 함수 호출 루프 만들기

이것이 핵심 엔진입니다. 모델이 어떤 도구를 호출할지 결정하는 과정을 모사(simulate)하고, 도구를 실행한 뒤 결과를 다시 모델에게 흘려보냅니다.

def simulate_model_decision(user_message, tools, conversation_history):
    msg = user_message.lower()

    if any(word in msg for word in ["weather", "temperature", "forecast"]):
        cities = []
        for city in WEATHER_DB:
            if city in msg:
                cities.append(city)
        if not cities:
            for word in msg.split():
                if word.capitalize() in [c.title() for c in WEATHER_DB]:
                    cities.append(word)
        if not cities:
            cities = ["tokyo"]
        calls = []
        for city in cities:
            calls.append({"name": "get_weather", "arguments": {"city": city.title()}})
        return calls

    if any(word in msg for word in ["calculate", "compute", "math", "what is", "how much"]):
        for token in msg.split():
            if any(c in token for c in "+-*/"):
                return [{"name": "calculator", "arguments": {"expression": token}}]
        if "+" in msg or "-" in msg or "*" in msg or "/" in msg:
            expr = "".join(c for c in msg if c in "0123456789+-*/.() ")
            if expr.strip():
                return [{"name": "calculator", "arguments": {"expression": expr.strip()}}]
        return [{"name": "calculator", "arguments": {"expression": "0"}}]

    if any(word in msg for word in ["search", "find", "look up", "google"]):
        query = msg.replace("search for", "").replace("look up", "").replace("find", "").strip()
        return [{"name": "web_search", "arguments": {"query": query}}]

    if any(word in msg for word in ["read", "file", "open", "cat", "show"]):
        for path in FILE_SYSTEM:
            if path.split("/")[-1].split(".")[0] in msg:
                return [{"name": "read_file", "arguments": {"path": path}}]
        return [{"name": "read_file", "arguments": {"path": "README.md"}}]

    if any(word in msg for word in ["run", "execute", "code", "python"]):
        return [{"name": "run_code", "arguments": {"code": "result = 'Hello from the sandbox!'", "language": "python"}}]

    return []

def execute_tool_call(tool_call):
    name = tool_call["name"]
    args = tool_call["arguments"]

    if name not in TOOL_REGISTRY:
        return {"error": True, "message": f"Unknown tool: {name}", "code": "UNKNOWN_TOOL"}

    tool = TOOL_REGISTRY[name]
    func = tool["function"]
    start = time.time()

    try:
        result = func(**args)
    except TypeError as e:
        result = {"error": True, "message": f"Invalid arguments: {e}"}

    elapsed_ms = round((time.time() - start) * 1000, 2)
    return {"tool": name, "result": result, "execution_time_ms": elapsed_ms}


def run_function_calling_loop(user_message, max_iterations=5):
    conversation = [{"role": "user", "content": user_message}]
    tool_definitions = [t["definition"] for t in TOOL_REGISTRY.values()]
    all_tool_results = []

    for iteration in range(max_iterations):
        tool_calls = simulate_model_decision(user_message, tool_definitions, conversation)

        if not tool_calls:
            break

        results = []
        for call in tool_calls:
            result = execute_tool_call(call)
            results.append(result)

        conversation.append({"role": "assistant", "content": None, "tool_calls": tool_calls})

        for result in results:
            conversation.append({"role": "tool", "content": json.dumps(result["result"]), "tool_name": result["tool"]})

        all_tool_results.extend(results)
        break

    return {"conversation": conversation, "tool_results": all_tool_results, "iterations": iteration + 1 if tool_calls else 0}

Step 5: 인자 검증

도구 호출 인자가 JSON Schema와 일치하는지 실행 전에 확인하는 검증기(validator)를 만듭니다.

def validate_tool_arguments(tool_name, arguments):
    if tool_name not in TOOL_REGISTRY:
        return [f"Unknown tool: {tool_name}"]

    schema = TOOL_REGISTRY[tool_name]["definition"]["function"]["parameters"]
    errors = []

    if not isinstance(arguments, dict):
        return [f"Arguments must be an object, got {type(arguments).__name__}"]

    for required_field in schema.get("required", []):
        if required_field not in arguments:
            errors.append(f"Missing required argument: {required_field}")

    properties = schema.get("properties", {})
    for arg_name, arg_value in arguments.items():
        if arg_name not in properties:
            errors.append(f"Unknown argument: {arg_name}")
            continue

        prop_schema = properties[arg_name]
        expected_type = prop_schema.get("type")

        type_checks = {"string": str, "integer": int, "number": (int, float), "boolean": bool, "array": list, "object": dict}
        if expected_type in type_checks:
            if not isinstance(arg_value, type_checks[expected_type]):
                errors.append(f"Argument '{arg_name}': expected {expected_type}, got {type(arg_value).__name__}")

        if "enum" in prop_schema and arg_value not in prop_schema["enum"]:
            errors.append(f"Argument '{arg_name}': '{arg_value}' not in {prop_schema['enum']}")

    return errors

Step 6: 데모 실행

def run_demo():
    register_all_tools()

    print("=" * 60)
    print("  Function Calling & Tool Use Demo")
    print("=" * 60)

    print("\n--- Registered Tools ---")
    for name, tool in TOOL_REGISTRY.items():
        desc = tool["definition"]["function"]["description"][:60]
        params = list(tool["definition"]["function"]["parameters"].get("properties", {}).keys())
        print(f"  {name}: {desc}...")
        print(f"    params: {params}")

    print(f"\n--- Argument Validation ---")
    validation_tests = [
        ("get_weather", {"city": "Tokyo"}, "Valid call"),
        ("get_weather", {}, "Missing required arg"),
        ("get_weather", {"city": "Tokyo", "units": "kelvin"}, "Invalid enum value"),
        ("calculator", {"expression": 123}, "Wrong type (int for string)"),
        ("unknown_tool", {"x": 1}, "Unknown tool"),
    ]
    for tool_name, args, label in validation_tests:
        errors = validate_tool_arguments(tool_name, args)
        status = "VALID" if not errors else f"ERRORS: {errors}"
        print(f"  {label}: {status}")

    print(f"\n--- Tool Execution ---")
    direct_tests = [
        {"name": "calculator", "arguments": {"expression": "(10 + 5) * 3 / 2"}},
        {"name": "get_weather", "arguments": {"city": "Tokyo"}},
        {"name": "get_weather", "arguments": {"city": "Mars"}},
        {"name": "web_search", "arguments": {"query": "python function calling"}},
        {"name": "read_file", "arguments": {"path": "data/config.json"}},
        {"name": "read_file", "arguments": {"path": "../etc/passwd"}},
        {"name": "run_code", "arguments": {"code": "result = sum(range(1, 101))"}},
        {"name": "run_code", "arguments": {"code": "import os; os.system('rm -rf /')"}},
    ]
    for call in direct_tests:
        result = execute_tool_call(call)
        print(f"\n  {call['name']}({json.dumps(call['arguments'])})")
        print(f"    -> {json.dumps(result['result'], indent=None)[:100]}")
        print(f"    time: {result['execution_time_ms']}ms")

    print(f"\n--- Full Function Calling Loop ---")
    test_queries = [
        "What's the weather in Tokyo?",
        "Calculate (100 + 250) * 0.15",
        "Search for MCP protocol",
        "Read the config file",
        "Run some Python code",
        "Tell me a joke",
    ]
    for query in test_queries:
        print(f"\n  User: {query}")
        result = run_function_calling_loop(query)
        if result["tool_results"]:
            for tr in result["tool_results"]:
                print(f"    Tool: {tr['tool']} ({tr['execution_time_ms']}ms)")
                print(f"    Result: {json.dumps(tr['result'], indent=None)[:90]}")
        else:
            print(f"    [No tool called -- direct response]")
        print(f"    Iterations: {result['iterations']}")

    print(f"\n--- Parallel Tool Calls ---")
    multi_city_query = "What's the weather in tokyo and london?"
    print(f"  User: {multi_city_query}")
    result = run_function_calling_loop(multi_city_query)
    print(f"  Tool calls made: {len(result['tool_results'])}")
    for tr in result["tool_results"]:
        city = tr["result"].get("city", "unknown")
        temp = tr["result"].get("temp_c", "N/A")
        print(f"    {city}: {temp}C, {tr['result'].get('condition', 'N/A')}")

    print(f"\n--- Security Checks ---")
    security_tests = [
        ("read_file", {"path": "../../etc/passwd"}),
        ("run_code", {"code": "import subprocess; subprocess.run(['ls'])"}),
        ("calculator", {"expression": "__import__('os').system('ls')"}),
    ]
    for tool_name, args in security_tests:
        result = execute_tool_call({"name": tool_name, "arguments": args})
        blocked = result["result"].get("error", False)
        print(f"  {tool_name}({list(args.values())[0][:40]}): {'BLOCKED' if blocked else 'ALLOWED'}")

사용해보기

OpenAI 함수 호출

# from openai import OpenAI
#
# client = OpenAI()
#
# tools = [{
#     "type": "function",
#     "function": {
#         "name": "get_weather",
#         "description": "Get current weather for a city",
#         "parameters": {
#             "type": "object",
#             "properties": {
#                 "city": {"type": "string"},
#                 "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
#             },
#             "required": ["city"]
#         }
#     }
# }]
#
# response = client.chat.completions.create(
#     model="gpt-4o",
#     messages=[{"role": "user", "content": "Weather in Tokyo?"}],
#     tools=tools,
#     tool_choice="auto",
# )
#
# tool_call = response.choices[0].message.tool_calls[0]
# args = json.loads(tool_call.function.arguments)
# result = get_weather(**args)
#
# final = client.chat.completions.create(
#     model="gpt-4o",
#     messages=[
#         {"role": "user", "content": "Weather in Tokyo?"},
#         response.choices[0].message,
#         {"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result)},
#     ],
# )
# print(final.choices[0].message.content)

OpenAI는 도구 호출을 response.choices[0].message.tool_calls로 반환합니다. 각 호출에는 결과를 돌려줄 때 함께 포함해야 하는 id가 들어 있고, 모델은 이 ID로 결과와 호출을 짝지어 매칭(match)합니다. GPT-4o는 단일 응답 안에 여러 도구 호출을 동시에 반환할 수 있으므로, 코드가 모두 순회(iterate)하면서 실행해 주어야 합니다.

Anthropic 도구 사용

# import anthropic
#
# client = anthropic.Anthropic()
#
# response = client.messages.create(
#     model="claude-sonnet-4-20250514",
#     max_tokens=1024,
#     tools=[{
#         "name": "get_weather",
#         "description": "Get current weather for a city",
#         "input_schema": {
#             "type": "object",
#             "properties": {
#                 "city": {"type": "string"},
#                 "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
#             },
#             "required": ["city"]
#         }
#     }],
#     messages=[{"role": "user", "content": "Weather in Tokyo?"}],
# )
#
# tool_block = next(b for b in response.content if b.type == "tool_use")
# result = get_weather(**tool_block.input)
#
# final = client.messages.create(
#     model="claude-sonnet-4-20250514",
#     max_tokens=1024,
#     tools=[...],
#     messages=[
#         {"role": "user", "content": "Weather in Tokyo?"},
#         {"role": "assistant", "content": response.content},
#         {"role": "user", "content": [{"type": "tool_result", "tool_use_id": tool_block.id, "content": json.dumps(result)}]},
#     ],
# )

Anthropic은 도구 호출을 type: "tool_use"를 가진 콘텐츠 블록(content block)으로 반환합니다. 도구 결과는 type: "tool_result"를 가진 사용자 메시지에 담아 돌려줍니다. 중요한 차이는 한 가지입니다. Anthropic은 도구 매개변수 정의에 input_schema 키를 쓰고, OpenAI는 parameters 키를 씁니다.

MCP 통합

# MCP servers expose tools over a standardized protocol.
# Any MCP-compatible client can discover and call these tools.
#
# Example: connecting to a Postgres MCP server
#
# from mcp import ClientSession, StdioServerParameters
# from mcp.client.stdio import stdio_client
#
# server_params = StdioServerParameters(
#     command="npx",
#     args=["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"],
# )
#
# async with stdio_client(server_params) as (read, write):
#     async with ClientSession(read, write) as session:
#         await session.initialize()
#         tools = await session.list_tools()
#         result = await session.call_tool("query", {"sql": "SELECT count(*) FROM users"})

MCP는 도구 구현과 도구 사용을 분리(decouple)합니다. Postgres 서버는 SQL을 알고, GitHub 서버는 그 쪽 API를 압니다. 여러분의 에이전트는 도구를 발견(discover)하고 호출할 뿐이며, 통합 대상마다 공급자별(provider-specific) 코드를 따로 작성할 필요가 없습니다.

산출물 만들기

이 강의는 outputs/prompt-tool-designer.md를 만듭니다. 도구가 해야 할 일을 자연어로 설명하면, 설명, 타입, 제약 조건이 모두 들어간 완전한 JSON Schema 도구 정의를 만들어 주는 재사용 가능한 프롬프트 템플릿입니다.

또한 outputs/skill-function-calling-patterns.md를 만듭니다. 운영 환경(production)에서 함수 호출을 구현할 때의 의사결정 프레임워크이며, 도구 설계, 오류 처리, 보안, 공급자별 패턴을 함께 다룹니다.

연습문제

여섯 번째 도구로 데이터베이스 질의를 추가하세요. 메모리 내(in-memory) 테이블을 가진 가짜 SQL 도구를 구현합니다. 도구는 원시 SQL이 아니라 테이블 이름과 필터 조건만 받습니다. 테이블 이름이 허용 목록에 있는지, 필터 연산자가 =, >, <, >=, <=로만 제한되는지를 검증한 뒤 일치하는 행(row)을 JSON으로 반환하세요. (난이도: 쉬움)
오류 피드백을 활용한 재시도(retry)를 구현하세요. 도구 호출이 실패하면(예: 도시를 찾지 못함) 그 오류 메시지를 모델 결정 함수에 다시 넣어 인자를 스스로 고치게 만듭니다. 각 호출이 몇 번 재시도되는지 추적하고, 도구 호출당 최대 3회로 제한하세요. (난이도: 중간)
다단계(multi-step) 에이전트를 만드세요. "설정 파일을 읽고 어떤 모델이 설정되어 있는지 알려준 다음, 그 모델의 가격을 웹에서 검색해 줘"처럼 도구 호출을 연쇄적으로 엮어야 하는 질의가 있습니다. 누적된 결과를 다음 결정 단계에 전달하면서, 모델이 더 이상 도구가 필요 없다고 판단할 때까지 루프를 돌리세요. 무한 루프 방지를 위해 10회로 제한합니다. (난이도: 어려움)
도구 선택 정확도를 측정하세요. 기대 도구 이름이 정해진 테스트 질의 30개를 만들고, 결정 함수를 모두 실행해 올바른 도구를 고른 비율을 측정하세요. 어떤 질의가 도구 사이의 혼동(confusion)을 가장 많이 일으키는지 분석하세요. (난이도: 중간)
도구 호출 캐싱(caching)을 구현하세요. 같은 도구가 동일한 인자로 60초 안에 다시 호출되면 재실행하지 않고 캐시된 결과를 반환하도록 만듭니다. (tool_name, frozenset(args.items()))를 키(key)로 하는 딕셔너리를 사용하고, 20개 질의로 이뤄진 대화에서 캐시 적중률(cache hit rate)을 측정하세요. (난이도: 중간)

핵심 용어

용어	흔한 설명	실제 의미
함수 호출(Function Calling)	"도구 사용"	모델이 어떤 함수를 어떤 인자로 호출할지 구조화된 JSON으로 출력하는 방식이다. 실행은 모델이 아니라 여러분의 코드가 한다.
도구 정의(Tool Definition)	"함수 스키마(function schema)"	도구의 이름, 목적, 매개변수, 타입을 설명하는 JSON Schema 객체이다. 모델은 이를 읽고 언제, 어떻게 도구를 쓸지 결정한다.
도구 선택 모드(Tool Choice)	"호출 모드(calling mode)"	모델이 도구를 반드시 호출해야 하는지(required), 호출해도 되는지(auto), 특정 도구를 호출해야 하는지(named)를 제어하는 설정이다.
병렬 호출(Parallel Calling)	"멀티툴(multi-tool)"	모델이 한 번의 턴에서 여러 도구 호출을 함께 출력해 왕복 횟수를 줄이는 방식이다. GPT-4o와 Claude 모두 지원한다.
도구 결과(Tool Result)	"함수 출력(function output)"	도구를 실행한 반환값으로, 모델이 실제 데이터를 응답에 반영할 수 있도록 메시지로 다시 전달된다.
인자 검증(Argument Validation)	"입력 검사(input checking)"	모델이 생성한 인자가 기대 타입, 범위, 제약 조건에 맞는지 실행 전에 확인하는 과정이다.
MCP(Model Context Protocol)	"도구 프로토콜(tool protocol)"	호환되는 클라이언트가 발견하고 호출할 수 있도록 도구를 서버로 노출하는 Anthropic의 개방형 표준이다.
에이전트 루프(Agent Loop)	"ReAct 루프"	모델이 도구를 결정하고, 코드가 도구를 실행하고, 결과가 다시 모델로 들어가는 주기를, 충분한 정보가 모일 때까지 반복하는 구조이다.
도구 결과 오염(Tool Poisoning)	"도구를 통한 프롬프트 인젝션(prompt injection)"	도구 결과 안에 모델의 행동을 조작하는 지시가 섞여 들어가는 공격으로, 모든 도구 출력은 정제(sanitize)해야 한다.
속도 제한(Rate Limiting)	"호출 예산(call budget)"	무한 루프와 폭주하는 API 비용을 막기 위해 대화당 도구 호출의 최댓값을 정해 두는 것이다.

더 읽을거리

OpenAI Function Calling Guide — GPT-4o의 도구 사용, 병렬 호출, 강제 호출, 구조화된 인자를 정리한 공식 레퍼런스입니다.
Anthropic Tool Use Guide — Claude의 input_schema, 다중 도구 응답, tool_choice 설정을 설명합니다.
Model Context Protocol Specification — AI 애플리케이션 사이의 도구 상호운용성을 위한 개방형 표준 명세입니다.
Schick et al., 2023 — "Toolformer: Language Models Can Teach Themselves to Use Tools" — LLM이 외부 도구를 언제, 어떻게 호출할지 스스로 학습하게 만든 기초 논문입니다.
Patil et al., 2023 — "Gorilla: Large Language Model Connected with Massive APIs" — 1,645개의 API 전반에서 정확한 호출과 환각 감소를 위해 LLM을 미세 조정한 연구입니다.
Berkeley Function Calling Leaderboard — GPT-4o, Claude, Gemini, 오픈 모델의 함수 호출 정확도를 실시간으로 비교하는 벤치마크입니다.
Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (ICLR 2023) — 모든 도구 호출의 바깥을 감싸는 사고-행동-관찰(Thought-Action-Observation) 루프를 제안한 논문입니다. 이 강의가 끝나는 지점에서 Phase 14가 이어집니다.
Anthropic — Building effective agents (Dec 2024) — 단일 도구 사용 원시 연산(primitive)에서 조합되는 프롬프트 체이닝, 라우팅, 병렬화, 오케스트레이터-워커, 평가자-최적화자 등 다섯 가지 패턴을 설명합니다.

실습 코드

이 강의의 실습 코드 1개

function calling

Code

산출물

이 강의에서 생성된 프롬프트, 스킬, 코드 산출물 2개

skill-function-calling-patterns

Decision framework for implementing function calling in production -- tool design, error handling, security, and provider patterns

Skill

prompt-tool-designer

Design complete tool definitions (JSON Schema) for function calling from a natural language description

Prompt

확인 문제

3문제 · 모두 맞추면 완료 표시가 가능합니다

1.다중 턴(multi-turn) 함수 호출 루프의 표준 패턴은 무엇인가요?

2.무한 도구 호출 루프(infinite tool calling loop)는 어떻게 방지하나요?

3.도구 스키마에서 명확한 매개변수 이름과 설명이 중요한 이유는 무엇인가요?

0/3 답변 완료

추가 문제 풀기

AI가 강의 내용을 바탕으로 새로운 문제를 생성합니다

이전 강의

LoRA와 QLoRA를 활용한 파인튜닝

다음 강의

LLM 애플리케이션 평가와 테스트