Agent skill manual for building Model Context Protocol servers, defining tools, and writing evaluation suites Claude Skills can rely on.

Source: Content adapted from anthropics/skills (MIT).

Overview

To create high-quality MCP (Model Context Protocol) servers that enable LLMs to effectively interact with external services, use this skill. An MCP server provides tools that allow LLMs to access external services and APIs. The quality of an MCP server is measured by how well it enables LLMs to accomplish real-world tasks using the tools provided.

Process

High-Level Workflow

Creating a high-quality MCP server involves four main phases:

Phase 1: Deep Research and Planning

1.1 Understand Agent-Centric Design Principles

Before diving into implementation, understand how to design tools for AI agents by reviewing these principles:

Build for Workflows, Not Just API Endpoints:

Don't simply wrap existing API endpoints - build thoughtful, high-impact workflow tools
Consolidate related operations (e.g., schedule_event that both checks availability and creates event)
Focus on tools that enable complete tasks, not just individual API calls
Consider what workflows agents actually need to accomplish

Optimize for Limited Context:

Agents have constrained context windows - make every token count
Return high-signal information, not exhaustive data dumps
Provide "concise" vs "detailed" response format options
Default to human-readable identifiers over technical codes (names over IDs)
Consider the agent's context budget as a scarce resource

Design Actionable Error Messages:

Error messages should guide agents toward correct usage patterns
Suggest specific next steps: "Try using filter='active_only' to reduce results"
Make errors educational, not just diagnostic
Help agents learn proper tool usage through clear feedback

Follow Natural Task Subdivisions:

Tool names should reflect how humans think about tasks
Group related tools with consistent prefixes for discoverability
Design tools around natural workflows, not just API structure

Use Evaluation-Driven Development:

Create realistic evaluation scenarios early
Let agent feedback drive tool improvements
Prototype quickly and iterate based on actual agent performance

1.3 Study MCP Protocol Documentation

Fetch the latest MCP protocol documentation:

Use WebFetch to load: https://modelcontextprotocol.io/llms-full.txt

This comprehensive document contains the complete MCP specification and guidelines.

1.4 Study Framework Documentation

Load and read the following reference files:

MCP Best Practices: View Best Practices - Core guidelines for all MCP servers

For Python implementations, also load:

Python SDK Documentation: Use WebFetch to load https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md
Python Implementation Guide - Python-specific best practices and examples

For Node/TypeScript implementations, also load:

TypeScript SDK Documentation: Use WebFetch to load https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md
TypeScript Implementation Guide - Node/TypeScript-specific best practices and examples

1.5 Exhaustively Study API Documentation

To integrate a service, read through ALL available API documentation:

Official API reference documentation
Authentication and authorization requirements
Rate limiting and pagination patterns
Error responses and status codes
Available endpoints and their parameters
Data models and schemas

To gather comprehensive information, use web search and the WebFetch tool as needed.

1.6 Create a Comprehensive Implementation Plan

Based on your research, create a detailed plan that includes:

Tool Selection:

List the most valuable endpoints/operations to implement
Prioritize tools that enable the most common and important use cases
Consider which tools work together to enable complex workflows

Shared Utilities and Helpers:

Identify common API request patterns
Plan pagination helpers
Design filtering and formatting utilities
Plan error handling strategies

Input/Output Design:

Define input validation models (Pydantic for Python, Zod for TypeScript)
Design consistent response formats (e.g., JSON or Markdown), and configurable levels of detail (e.g., Detailed or Concise)
Plan for large-scale usage (thousands of users/resources)
Implement character limits and truncation strategies (e.g., 25,000 tokens)

Error Handling Strategy:

Plan graceful failure modes
Design clear, actionable, LLM-friendly, natural language error messages which prompt further action
Consider rate limiting and timeout scenarios
Handle authentication and authorization errors

Phase 2: Implementation

Now that you have a comprehensive plan, begin implementation following language-specific best practices.

2.1 Set Up Project Structure

For Python:

Create a single .py file or organize into modules if complex (see Python Guide)
Use the MCP Python SDK for tool registration
Define Pydantic models for input validation

For Node/TypeScript:

Create proper project structure (see TypeScript Guide)
Set up package.json and tsconfig.json
Use MCP TypeScript SDK
Define Zod schemas for input validation

2.2 Implement Core Infrastructure First

To begin implementation, create shared utilities before implementing tools:

API request helper functions
Error handling utilities
Response formatting functions (JSON and Markdown)
Pagination helpers
Authentication/token management

2.3 Implement Tools Systematically

For each tool in the plan:

Define Input Schema:

Use Pydantic (Python) or Zod (TypeScript) for validation
Include proper constraints (min/max length, regex patterns, min/max values, ranges)
Provide clear, descriptive field descriptions
Include diverse examples in field descriptions

Write Comprehensive Docstrings/Descriptions:

One-line summary of what the tool does
Detailed explanation of purpose and functionality
Explicit parameter types with examples
Complete return type schema
Usage examples (when to use, when not to use)
Error handling documentation, which outlines how to proceed given specific errors

Implement Tool Logic:

Use shared utilities to avoid code duplication
Follow async/await patterns for all I/O
Implement proper error handling
Support multiple response formats (JSON and Markdown)
Respect pagination parameters
Check character limits and truncate appropriately

Add Tool Annotations:

readOnlyHint: true (for read-only operations)
destructiveHint: false (for non-destructive operations)
idempotentHint: true (if repeated calls have same effect)
openWorldHint: true (if interacting with external systems)

2.4 Follow Language-Specific Best Practices

At this point, load the appropriate language guide:

For Python: Load Python Implementation Guide and ensure the following:

Using MCP Python SDK with proper tool registration
Pydantic v2 models with model_config
Type hints throughout
Async/await for all I/O operations
Proper imports organization
Module-level constants (CHARACTER_LIMIT, API_BASE_URL)

For Node/TypeScript: Load TypeScript Implementation Guide and ensure the following:

Using server.registerTool properly
Zod schemas with .strict()
TypeScript strict mode enabled
No any types - use proper types
Explicit Promise<T> return types
Build process configured (npm run build)

Phase 3: Review and Refine

After initial implementation:

3.1 Code Quality Review

To ensure quality, review the code for:

DRY Principle: No duplicated code between tools
Composability: Shared logic extracted into functions
Consistency: Similar operations return similar formats
Error Handling: All external calls have error handling
Type Safety: Full type coverage (Python type hints, TypeScript types)
Documentation: Every tool has comprehensive docstrings/descriptions

3.2 Test and Build

Important: MCP servers are long-running processes that wait for requests over stdio/stdin or sse/http. Running them directly in your main process (e.g., python server.py or node dist/index.js) will cause your process to hang indefinitely.

Safe ways to test the server:

Use the evaluation harness (see Phase 4) - recommended approach
Run the server in tmux to keep it outside your main process
Use a timeout when testing: timeout 5s python server.py

For Python:

Verify Python syntax: python -m py_compile your_server.py
Check imports work correctly by reviewing the file
To manually test: Run server in tmux, then test with evaluation harness in main process
Or use the evaluation harness directly (it manages the server for stdio transport)

For Node/TypeScript:

Run npm run build and ensure it completes without errors
Verify dist/index.js is created
To manually test: Run server in tmux, then test with evaluation harness in main process
Or use the evaluation harness directly (it manages the server for stdio transport)

3.3 Use Quality Checklist

To verify implementation quality, load the appropriate checklist from the language-specific guide:

Python: see "Quality Checklist" in Python Guide
Node/TypeScript: see "Quality Checklist" in TypeScript Guide

Tool Inspection: List available tools and understand their capabilities
Content Exploration: Use READ-ONLY operations to explore available data
Question Generation: Create 10 complex, realistic questions
Answer Verification: Solve each question yourself to verify answers

4.3 Evaluation Requirements

Each question must be:

Independent: Not dependent on other questions
Read-only: Only non-destructive operations required
Complex: Requiring multiple tool calls and deep exploration
Realistic: Based on real use cases humans would care about
Verifiable: Single, clear answer that can be verified by string comparison
Stable: Answer won't change over time

4.4 Output Format

Create an XML file with this structure:

<evaluation>
  <qa_pair>
    <question>Find discussions about AI model launches with animal codenames. One model needed a specific safety designation that uses the format ASL-X. What number X was being determined for the model named after a spotted wild cat?</question>
    <answer>3</answer>
  </qa_pair>
<!-- More qa_pairs... -->
</evaluation>

Reference Files

Documentation Library

Load these resources as needed during development:

Core MCP Documentation (Load First)

MCP Protocol: Fetch from https://modelcontextprotocol.io/llms-full.txt - Complete MCP specification
MCP Best Practices - Universal MCP guidelines including:
- Server and tool naming conventions
- Response format guidelines (JSON vs Markdown)
- Pagination best practices
- Character limits and truncation strategies
- Tool development guidelines
- Security and error handling standards

SDK Documentation (Load During Phase 1/2)

Python SDK: Fetch from https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md
TypeScript SDK: Fetch from https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md

Language-Specific Implementation Guides (Load During Phase 2)

Python Implementation Guide - Complete Python/FastMCP guide with:
- Server initialization patterns
- Pydantic model examples
- Tool registration with @mcp.tool
- Complete working examples
- Quality checklist
TypeScript Implementation Guide - Complete TypeScript guide with:
- Project structure
- Zod schema patterns
- Tool registration with server.registerTool
- Complete working examples
- Quality checklist

Evaluation Guide (Load During Phase 4)

Evaluation Guide - Complete evaluation creation guide with:
- Question creation guidelines
- Answer verification strategies
- XML format specifications
- Example questions and answers
- Running an evaluation with the provided scripts

"""Lightweight connection handling for MCP servers."""

from abc import ABC, abstractmethod
from contextlib import AsyncExitStack
from typing import Any

from mcp import ClientSession, StdioServerParameters
from mcp.client.sse import sse_client
from mcp.client.stdio import stdio_client
from mcp.client.streamable_http import streamablehttp_client


class MCPConnection(ABC):
    """Base class for MCP server connections."""

    def __init__(self):
        self.session = None
        self._stack = None

    @abstractmethod
    def _create_context(self):
        """Create the connection context based on connection type."""

    async def __aenter__(self):
        """Initialize MCP server connection."""
        self._stack = AsyncExitStack()
        await self._stack.__aenter__()

        try:
            ctx = self._create_context()
            result = await self._stack.enter_async_context(ctx)

            if len(result) == 2:
                read, write = result
            elif len(result) == 3:
                read, write, _ = result
            else:
                raise ValueError(f"Unexpected context result: {result}")

            session_ctx = ClientSession(read, write)
            self.session = await self._stack.enter_async_context(session_ctx)
            await self.session.initialize()
            return self
        except BaseException:
            await self._stack.__aexit__(None, None, None)
            raise

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        """Clean up MCP server connection resources."""
        if self._stack:
            await self._stack.__aexit__(exc_type, exc_val, exc_tb)
        self.session = None
        self._stack = None

    async def list_tools(self) -> list[dict[str, Any]]:
        """Retrieve available tools from the MCP server."""
        response = await self.session.list_tools()
        return [
            {
                "name": tool.name,
                "description": tool.description,
                "input_schema": tool.inputSchema,
            }
            for tool in response.tools
        ]

    async def call_tool(self, tool_name: str, arguments: dict[str, Any]) -> Any:
        """Call a tool on the MCP server with provided arguments."""
        result = await self.session.call_tool(tool_name, arguments=arguments)
        return result.content


class MCPConnectionStdio(MCPConnection):
    """MCP connection using standard input/output."""

    def __init__(self, command: str, args: list[str] = None, env: dict[str, str] = None):
        super().__init__()
        self.command = command
        self.args = args or []
        self.env = env

    def _create_context(self):
        return stdio_client(
            StdioServerParameters(command=self.command, args=self.args, env=self.env)
        )


class MCPConnectionSSE(MCPConnection):
    """MCP connection using Server-Sent Events."""

    def __init__(self, url: str, headers: dict[str, str] = None):
        super().__init__()
        self.url = url
        self.headers = headers or {}

    def _create_context(self):
        return sse_client(url=self.url, headers=self.headers)


class MCPConnectionHTTP(MCPConnection):
    """MCP connection using Streamable HTTP."""

    def __init__(self, url: str, headers: dict[str, str] = None):
        super().__init__()
        self.url = url
        self.headers = headers or {}

    def _create_context(self):
        return streamablehttp_client(url=self.url, headers=self.headers)


def create_connection(
    transport: str,
    command: str = None,
    args: list[str] = None,
    env: dict[str, str] = None,
    url: str = None,
    headers: dict[str, str] = None,
) -> MCPConnection:
    """Factory function to create the appropriate MCP connection.

    Args:
        transport: Connection type ("stdio", "sse", or "http")
        command: Command to run (stdio only)
        args: Command arguments (stdio only)
        env: Environment variables (stdio only)
        url: Server URL (sse and http only)
        headers: HTTP headers (sse and http only)

    Returns:
        MCPConnection instance
    """
    transport = transport.lower()

    if transport == "stdio":
        if not command:
            raise ValueError("Command is required for stdio transport")
        return MCPConnectionStdio(command=command, args=args, env=env)

    elif transport == "sse":
        if not url:
            raise ValueError("URL is required for sse transport")
        return MCPConnectionSSE(url=url, headers=headers)

    elif transport in ["http", "streamable_http", "streamable-http"]:
        if not url:
            raise ValueError("URL is required for http transport")
        return MCPConnectionHTTP(url=url, headers=headers)

    else:
        raise ValueError(f"Unsupported transport type: {transport}. Use 'stdio', 'sse', or 'http'")

anthropic>=0.39.0
mcp>=1.1.0

Mcp Builder

Table of Contents