概述
在本教程中,您将学习如何使用 LangChain 代理 构建一个可以回答 SQL 数据库相关问题的代理。 从高层次来看,代理将:构建 SQL 数据库的问答系统需要执行模型生成的 SQL 查询。这样做存在固有风险。请确保您的数据库连接权限始终尽可能窄地限制在代理的需求范围内。这将减轻(虽然不能完全消除)构建模型驱动系统的风险。
概念
我们将涵盖以下概念:设置
安装
pip install langchain langgraph langchain-community
LangSmith
设置 LangSmith 以检查链或代理内部发生的情况。然后设置以下环境变量:export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="..."
1. 选择 LLM
选择支持工具调用的模型:- OpenAI
- Anthropic
- Azure
- Google Gemini
- AWS Bedrock
- HuggingFace
- OpenRouter
👉 Read the OpenAI chat model integration docs
pip install -U "langchain[openai]"
import os
from langchain.chat_models import init_chat_model
os.environ["OPENAI_API_KEY"] = "sk-..."
model = init_chat_model("gpt-5.2")
👉 Read the Anthropic chat model integration docs
pip install -U "langchain[anthropic]"
import os
from langchain.chat_models import init_chat_model
os.environ["ANTHROPIC_API_KEY"] = "sk-..."
model = init_chat_model("claude-sonnet-4-6")
👉 Read the Azure chat model integration docs
pip install -U "langchain[openai]"
import os
from langchain.chat_models import init_chat_model
os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["AZURE_OPENAI_ENDPOINT"] = "..."
os.environ["OPENAI_API_VERSION"] = "2025-03-01-preview"
model = init_chat_model(
"azure_openai:gpt-5.2",
azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
)
👉 Read the Google GenAI chat model integration docs
pip install -U "langchain[google-genai]"
import os
from langchain.chat_models import init_chat_model
os.environ["GOOGLE_API_KEY"] = "..."
model = init_chat_model("google_genai:gemini-2.5-flash-lite")
👉 Read the AWS Bedrock chat model integration docs
pip install -U "langchain[aws]"
from langchain.chat_models import init_chat_model
# Follow the steps here to configure your credentials:
# https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html
model = init_chat_model(
"anthropic.claude-3-5-sonnet-20240620-v1:0",
model_provider="bedrock_converse",
)
👉 Read the HuggingFace chat model integration docs
pip install -U "langchain[huggingface]"
import os
from langchain.chat_models import init_chat_model
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "hf_..."
model = init_chat_model(
"microsoft/Phi-3-mini-4k-instruct",
model_provider="huggingface",
temperature=0.7,
max_tokens=1024,
)
👉 Read the OpenRouter chat model integration docs
pip install -U "langchain-openrouter"
import os
from langchain.chat_models import init_chat_model
os.environ["OPENROUTER_API_KEY"] = "sk-..."
model = init_chat_model(
"auto",
model_provider="openrouter",
)
2. 配置数据库
您将为本教程创建一个 SQLite 数据库。SQLite 是一个轻量级数据库,易于设置和使用。我们将加载chinook 数据库,这是一个代表数字媒体商店的示例数据库。
为方便起见,我们已将数据库(Chinook.db)托管在公共 GCS 存储桶上。
import requests, pathlib
url = "https://storage.googleapis.com/benchmarks-artifacts/chinook/Chinook.db"
local_path = pathlib.Path("Chinook.db")
if local_path.exists():
print(f"{local_path} already exists, skipping download.")
else:
response = requests.get(url)
if response.status_code == 200:
local_path.write_bytes(response.content)
print(f"File downloaded and saved as {local_path}")
else:
print(f"Failed to download the file. Status code: {response.status_code}")
langchain_community 包中方便的 SQL 数据库包装器来与数据库交互。该包装器提供了执行 SQL 查询和获取结果的简单接口:
from langchain_community.utilities import SQLDatabase
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
print(f"Dialect: {db.dialect}")
print(f"Available tables: {db.get_usable_table_names()}")
print(f'Sample output: {db.run("SELECT * FROM Artist LIMIT 5;")}')
Dialect: sqlite
Available tables: ['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']
Sample output: [(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains')]
3. 添加数据库交互工具
使用langchain_community 包中可用的 SQLDatabaseToolkit 来与数据库交互。该工具包提供了执行 SQL 查询和获取结果的简单接口:
from langchain_community.agent_toolkits import SQLDatabaseToolkit
toolkit = SQLDatabaseToolkit(db=db, llm=model)
tools = toolkit.get_tools()
for tool in tools:
print(f"{tool.name}: {tool.description}\n")
sql_db_query: Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.
sql_db_schema: Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3
sql_db_list_tables: Input is an empty string, output is a comma-separated list of tables in the database.
sql_db_query_checker: Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!
4. 使用 create_agent
使用 create_agent 以最少的代码构建一个 ReAct 代理。代理将解释请求并生成 SQL 命令,工具将执行该命令。如果命令有错误,错误消息将返回给模型。模型可以检查原始请求和新的错误消息,然后生成新命令。这个过程可以持续到 LLM 成功生成命令或达到结束计数。这种为模型提供反馈的模式(在本例中是错误消息)非常强大。
使用描述性系统提示初始化代理来自定义其行为:
system_prompt = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.
You MUST double check your query before executing it. If you get an error while
executing a query, rewrite the query and try again.
DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
database.
To start you should ALWAYS look at the tables in the database to see what you
can query. Do NOT skip this step.
Then you should query the schema of the most relevant tables.
""".format(
dialect=db.dialect,
top_k=5,
)
from langchain.agents import create_agent
agent = create_agent(
model,
tools,
system_prompt=system_prompt,
)
5. 运行代理
对示例查询运行代理并观察其行为:question = "Which genre on average has the longest tracks?"
for step in agent.stream(
{"messages": [{"role": "user", "content": question}]},
stream_mode="values",
):
step["messages"][-1].pretty_print()
================================ Human Message =================================
Which genre on average has the longest tracks?
================================== Ai Message ==================================
Tool Calls:
sql_db_list_tables (call_BQsWg8P65apHc8BTJ1NPDvnM)
Call ID: call_BQsWg8P65apHc8BTJ1NPDvnM
Args:
================================= Tool Message =================================
Name: sql_db_list_tables
Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
================================== Ai Message ==================================
Tool Calls:
sql_db_schema (call_i89tjKECFSeERbuACYm4w0cU)
Call ID: call_i89tjKECFSeERbuACYm4w0cU
Args:
table_names: Track, Genre
================================= Tool Message =================================
Name: sql_db_schema
CREATE TABLE "Genre" (
"GenreId" INTEGER NOT NULL,
"Name" NVARCHAR(120),
PRIMARY KEY ("GenreId")
)
/*
3 rows from Genre table:
GenreId Name
1 Rock
2 Jazz
3 Metal
*/
CREATE TABLE "Track" (
"TrackId" INTEGER NOT NULL,
"Name" NVARCHAR(200) NOT NULL,
"AlbumId" INTEGER,
"MediaTypeId" INTEGER NOT NULL,
"GenreId" INTEGER,
"Composer" NVARCHAR(220),
"Milliseconds" INTEGER NOT NULL,
"Bytes" INTEGER,
"UnitPrice" NUMERIC(10, 2) NOT NULL,
PRIMARY KEY ("TrackId"),
FOREIGN KEY("MediaTypeId") REFERENCES "MediaType" ("MediaTypeId"),
FOREIGN KEY("GenreId") REFERENCES "Genre" ("GenreId"),
FOREIGN KEY("AlbumId") REFERENCES "Album" ("AlbumId")
)
/*
3 rows from Track table:
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99
2 Balls to the Wall 2 2 1 U. Dirkschneider, W. Hoffmann, H. Frank, P. Baltes, S. Kaufmann, G. Hoffmann 342562 5510424 0.99
3 Fast As a Shark 3 2 1 F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman 230619 3990994 0.99
*/
================================== Ai Message ==================================
Tool Calls:
sql_db_query_checker (call_G64yYm6R6UauiVPCXJZMA49b)
Call ID: call_G64yYm6R6UauiVPCXJZMA49b
Args:
query: SELECT Genre.Name, AVG(Track.Milliseconds) AS AverageLength FROM Track INNER JOIN Genre ON Track.GenreId = Genre.GenreId GROUP BY Genre.Name ORDER BY AverageLength DESC LIMIT 5;
================================= Tool Message =================================
Name: sql_db_query_checker
SELECT Genre.Name, AVG(Track.Milliseconds) AS AverageLength FROM Track INNER JOIN Genre ON Track.GenreId = Genre.GenreId GROUP BY Genre.Name ORDER BY AverageLength DESC LIMIT 5;
================================== Ai Message ==================================
Tool Calls:
sql_db_query (call_AnO3SrhD0ODJBxh6dHMwvHwZ)
Call ID: call_AnO3SrhD0ODJBxh6dHMwvHwZ
Args:
query: SELECT Genre.Name, AVG(Track.Milliseconds) AS AverageLength FROM Track INNER JOIN Genre ON Track.GenreId = Genre.GenreId GROUP BY Genre.Name ORDER BY AverageLength DESC LIMIT 5;
================================= Tool Message =================================
Name: sql_db_query
[('Sci Fi & Fantasy', 2911783.0384615385), ('Science Fiction', 2625549.076923077), ('Drama', 2575283.78125), ('TV Shows', 2145041.0215053763), ('Comedy', 1585263.705882353)]
================================== Ai Message ==================================
On average, the genre with the longest tracks is "Sci Fi & Fantasy" with an average track length of approximately 2,911,783 milliseconds. This is followed by "Science Fiction," "Drama," "TV Shows," and "Comedy."
您可以在 LangSmith trace 中检查上述运行的所有方面,包括执行的步骤、调用工具、LLM 看到的提示等。
(可选)使用 Studio
Studio 还提供了一个”客户端”循环以及内存,以便您可以将其作为聊天界面运行并查询数据库。您可以问诸如”告诉我数据库的模式”或”显示前5位客户的发票”等问题。您将看到生成的 SQL 命令及结果输出。以下是有关如何开始使用的详细信息。在 Studio 中运行您的代理
在 Studio 中运行您的代理
除了前面提到的包之外,您还需要:在您要运行的目录中,您需要一个包含以下内容的 创建一个文件
pip install -U langgraph-cli[inmem]>=0.4.0
langgraph.json 文件:{
"dependencies": ["."],
"graphs": {
"agent": "./sql_agent.py:agent",
"graph": "./sql_agent_langgraph.py:graph"
},
"env": ".env"
}
sql_agent.py 并插入以下内容:# 用于 studio 的 sql_agent.py
import pathlib
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_community.utilities import SQLDatabase
import requests
# 初始化 LLM
model = init_chat_model("gpt-4.1")
# 获取数据库并本地存储
url = "https://storage.googleapis.com/benchmarks-artifacts/chinook/Chinook.db"
local_path = pathlib.Path("Chinook.db")
if local_path.exists():
print(f"{local_path} already exists, skipping download.")
else:
response = requests.get(url)
if response.status_code == 200:
local_path.write_bytes(response.content)
print(f"File downloaded and saved as {local_path}")
else:
print(f"Failed to download the file. Status code: {response.status_code}")
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
# 创建工具
toolkit = SQLDatabaseToolkit(db=db, llm=model)
tools = toolkit.get_tools()
for tool in tools:
print(f"{tool.name}: {tool.description}\n")
# 使用 create_agent
system_prompt = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.
You MUST double check your query before executing it. If you get an error while
executing a query, rewrite the query and try again.
DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
database.
To start you should ALWAYS look at the tables in the database to see what you
can query. Do NOT skip this step.
Then you should query the schema of the most relevant tables.
""".format(
dialect=db.dialect,
top_k=5,
)
agent = create_agent(
model,
tools,
system_prompt=system_prompt,
)
6. 实现人工介入审查
在执行 SQL 查询之前检查代理的查询以防止任何意外操作或低效率可能是明智的。 LangChain 代理支持内置的人工介入中间件,以添加工具调用的监督。让我们配置代理在调用sql_db_query 工具时暂停以供人工审查:
from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import InMemorySaver
agent = create_agent(
model,
tools,
system_prompt=system_prompt,
middleware=[
HumanInTheLoopMiddleware(
interrupt_on={"sql_db_query": True},
description_prefix="Tool execution pending approval",
),
],
checkpointer=InMemorySaver(),
)
sql_db_query 工具之前暂停以供审查:
question = "Which genre on average has the longest tracks?"
config = {"configurable": {"thread_id": "1"}}
for step in agent.stream(
{"messages": [{"role": "user", "content": question}]},
config,
stream_mode="values",
):
if "__interrupt__" in step:
print("INTERRUPTED:")
interrupt = step["__interrupt__"][0]
for request in interrupt.value["action_requests"]:
print(request["description"])
elif "messages" in step:
step["messages"][-1].pretty_print()
else:
pass
...
INTERRUPTED:
Tool execution pending approval
Tool: sql_db_query
Args: {'query': 'SELECT g.Name AS Genre, AVG(t.Milliseconds) AS AvgTrackLength FROM Track t JOIN Genre g ON t.GenreId = g.GenreId GROUP BY g.Name ORDER BY AvgTrackLength DESC LIMIT 1;'}
from langgraph.types import Command
for step in agent.stream(
Command(resume={"decisions": [{"type": "approve"}]}),
config,
stream_mode="values",
):
if "messages" in step:
step["messages"][-1].pretty_print()
elif "__interrupt__" in step:
print("INTERRUPTED:")
interrupt = step["__interrupt__"][0]
for request in interrupt.value["action_requests"]:
print(request["description"])
else:
pass
================================== Ai Message ==================================
Tool Calls:
sql_db_query (call_7oz86Epg7lYRqi9rQHbZPS1U)
Call ID: call_7oz86Epg7lYRqi9rQHbZPS1U
Args:
query: SELECT Genre.Name, AVG(Track.Milliseconds) AS AvgDuration FROM Track JOIN Genre ON Track.GenreId = Genre.GenreId GROUP BY Genre.Name ORDER BY AvgDuration DESC LIMIT 5;
================================= Tool Message =================================
Name: sql_db_query
[('Sci Fi & Fantasy', 2911783.0384615385), ('Science Fiction', 2625549.076923077), ('Drama', 2575283.78125), ('TV Shows', 2145041.0215053763), ('Comedy', 1585263.705882353)]
================================== Ai Message ==================================
The genre with the longest average track length is "Sci Fi & Fantasy" with an average duration of about 2,911,783 milliseconds, followed by "Science Fiction" and "Drama."
后续步骤
要获得更深入的定制,请查看本教程,了解如何使用 LangGraph 原语直接实现 SQL 代理。通过 MCP 将这些文档连接到 Claude、VSCode 等,获取实时答案。

