概述

在本教程中,我们将使用 LangGraph 构建一个检索 Agent。 LangChain 提供了内置的 agent 实现,使用 LangGraph 原语实现。如果需要更深入的定制,可以直接在 LangGraph 中实现 Agent。本指南演示了检索 Agent 的示例实现。检索 Agent 在您希望 LLM 决定是否从向量存储中检索上下文或直接响应用户时非常有用。 在本教程结束时,我们将完成以下操作:
  1. 获取并预处理将用于检索的文档。
  2. 为语义搜索索引这些文档,并创建 Agent 的检索器工具。
  3. 构建一个可以决定何时使用检索器工具的 Agentic RAG 系统。
混合 RAG

概念

我们将涵盖以下概念:

设置

让我们下载所需的包并设置我们的 API 密钥:
pip install -U langgraph "langchain[openai]" langchain-community langchain-text-splitters bs4
import getpass
import os


def _set_env(key: str):
    if key not in os.environ:
        os.environ[key] = getpass.getpass(f"{key}:")


_set_env("OPENAI_API_KEY")
注册 LangSmith 以快速发现问题并提高 LangGraph 项目的性能。LangSmith 让您使用跟踪数据来调试、测试和监控使用 LangGraph 构建的 LLM 应用程序。

1. 预处理文档

  1. 获取将在 RAG 系统中使用的文档。我们将使用 Lilian Weng 优秀博客中最新的三页。我们首先使用 WebBaseLoader 工具获取页面的内容:
from langchain_community.document_loaders import WebBaseLoader

urls = [
    "https://lilianweng.github.io/posts/2024-11-28-reward-hacking/",
    "https://lilianweng.github.io/posts/2024-07-07-hallucination/",
    "https://lilianweng.github.io/posts/2024-04-12-diffusion-video/",
]

docs = [WebBaseLoader(url).load() for url in urls]
docs[0][0].page_content.strip()[:1000]
  1. 将获取的文档分割成更小的块,以便索引到向量存储中:
from langchain_text_splitters import RecursiveCharacterTextSplitter

docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=100, chunk_overlap=50
)
doc_splits = text_splitter.split_documents(docs_list)
doc_splits[0].page_content.strip()

2. 创建检索器工具

现在我们有了分割好的文档,可以将它们索引到向量存储中,用于语义搜索。
  1. 使用内存向量存储和 OpenAI 嵌入:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

vectorstore = InMemoryVectorStore.from_documents(
    documents=doc_splits, embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
  1. 使用 @tool 装饰器创建检索器工具:
from langchain.tools import tool

@tool
def retrieve_blog_posts(query: str) -> str:
    """搜索并返回关于 Lilian Weng 博客文章的信息"""
    docs = retriever.invoke(query)
    return "\n\n".join([doc.page_content for doc in docs])

retriever_tool = retrieve_blog_posts
  1. 测试工具:
retriever_tool.invoke({"query": "奖励黑客的类型"})

3. 生成查询

现在我们将开始为我们的 Agentic RAG 图构建组件(节点)。 请注意,组件将在MessagesState——包含带有聊天消息列表的 messages 键的图状态——上操作。
  1. 构建一个 generate_query_or_respond 节点。它将调用 LLM 根据当前图状态(消息列表)生成响应。给定输入消息,它将决定使用检索器工具进行检索,或直接响应用户。请注意,我们通过 .bind_tools 授予聊天模型对我们之前创建的 retriever_tool 的访问权限:
from langgraph.graph import MessagesState
from langchain.chat_models import init_chat_model

response_model = init_chat_model("gpt-4.1", temperature=0)


def generate_query_or_respond(state: MessagesState):
    """调用模型根据当前状态生成响应。给定
    问题,它将决定使用检索器工具进行检索,或者直接响应用户。
    """
    response = (
        response_model
        .bind_tools([retriever_tool]).invoke(state["messages"])
    )
    return {"messages": [response]}
  1. 在随机输入上尝试:
input = {"messages": [{"role": "user", "content": "你好!"}]}
generate_query_or_respond(input)["messages"][-1].pretty_print()
输出:
================================== AI 消息 ==================================

你好!今天我可以帮您什么?
  1. 问一个需要语义搜索的问题:
input = {
    "messages": [
        {
            "role": "user",
            "content": "Lilian Weng 关于奖励黑客的类型有什么说法?",
        }
    ]
}
generate_query_or_respond(input)["messages"][-1].pretty_print()
输出:
================================== AI 消息 ==================================
工具调用:
retrieve_blog_posts (call_tYQxgfIlnQUDMdtAhdbXNwIM)
调用 ID: call_tYQxgfIlnQUDMdtAhdbXNwIM
参数:
    query: 奖励黑客的类型

4. 文档评分

  1. 添加一个条件边——grade_documents——来确定检索到的文档是否与问题相关。我们将使用带有结构化输出模式 GradeDocuments 的模型进行文档评分。grade_documents 函数将根据评分决定返回要前往的节点名称(generate_answerrewrite_question):
from pydantic import BaseModel, Field
from typing import Literal

GRADE_PROMPT = (
    "您是一个评分员,评估检索到的文档与用户问题的相关性。 \n "
    "这是检索到的文档:\n\n {context} \n\n"
    "这是用户问题:{question} \n"
    "如果文档包含与用户问题相关的关键字或语义含义,则将其评为相关。 \n"
    "给出二进制的 'yes' 或 'no' 分数来指示文档是否与问题相关。"
)


class GradeDocuments(BaseModel):
    """使用二进制分数对文档进行相关性检查的评分。"""

    binary_score: str = Field(
        description="相关性分数:如果相关则为 'yes',如果不相关则为 'no'"
    )


grader_model = init_chat_model("gpt-4.1", temperature=0)


def grade_documents(
    state: MessagesState,
) -> Literal["generate_answer", "rewrite_question"]:
    """确定检索到的文档是否与问题相关。"""
    question = state["messages"][0].content
    context = state["messages"][-1].content

    prompt = GRADE_PROMPT.format(question=question, context=context)
    response = (
        grader_model
        .with_structured_output(GradeDocuments).invoke(
            [{"role": "user", "content": prompt}]
        )
    )
    score = response.binary_score

    if score == "yes":
        return "generate_answer"
    else:
        return "rewrite_question"
  1. 使用工具响应中的不相关文档运行此函数:
from langchain_core.messages import convert_to_messages

input = {
    "messages": convert_to_messages(
        [
            {
                "role": "user",
                "content": "What does Lilian Weng say about types of reward hacking?",
            },
            {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "1",
                        "name": "retrieve_blog_posts",
                        "args": {"query": "types of reward hacking"},
                    }
                ],
            },
            {"role": "tool", "content": "meow", "tool_call_id": "1"},
        ]
    )
}
grade_documents(input)
  1. 确认相关文档被分类为相关:
input = {
    "messages": convert_to_messages(
        [
            {
                "role": "user",
                "content": "What does Lilian Weng say about types of reward hacking?",
            },
            {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "1",
                        "name": "retrieve_blog_posts",
                        "args": {"query": "types of reward hacking"},
                    }
                ],
            },
            {
                "role": "tool",
                "content": "reward hacking can be categorized into two types: environment or goal misspecification, and reward tampering",
                "tool_call_id": "1",
            },
        ]
    )
}
grade_documents(input)

5. Rewrite question

  1. 构建 rewrite_question 节点。检索器工具可能返回不相关的文档,这表明需要改进原始用户问题。为此,我们将调用 rewrite_question 节点:
from langchain.messages import HumanMessage

REWRITE_PROMPT = (
    "查看输入并尝试推断其潜在的语义意图/含义。\n"
    "这是初始问题:"
    "\n ------- \n"
    "{question}"
    "\n ------- \n"
    "制定一个改进的问题:"
)


def rewrite_question(state: MessagesState):
    """重写原始用户问题。"""
    messages = state["messages"]
    question = messages[0].content
    prompt = REWRITE_PROMPT.format(question=question)
    response = response_model.invoke([{"role": "user", "content": prompt}])
    return {"messages": [HumanMessage(content=response.content)]}
  1. 尝试一下:
input = {
    "messages": convert_to_messages(
        [
            {
                "role": "user",
                "content": "What does Lilian Weng say about types of reward hacking?",
            },
            {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "1",
                        "name": "retrieve_blog_posts",
                        "args": {"query": "types of reward hacking"},
                    }
                ],
            },
            {"role": "tool", "content": "meow", "tool_call_id": "1"},
        ]
    )
}

response = rewrite_question(input)
print(response["messages"][-1].content)
输出:
What are the different types of reward hacking described by Lilian Weng, and how does she explain them?

6. Generate an answer

  1. 构建 generate_answer 节点:如果我们通过了评分器检查,我们可以根据原始问题和检索到的上下文生成最终答案:
GENERATE_PROMPT = (
    "您是一个用于问答任务的助手。"
    "使用以下检索到的上下文片段来回答问题。"
    "如果您不知道答案,就说您不知道。"
    "最多使用三句话并保持答案简洁。\n"
    "问题:{question} \n"
    "上下文:{context}"
)


def generate_answer(state: MessagesState):
    """生成答案。"""
    question = state["messages"][0].content
    context = state["messages"][-1].content
    prompt = GENERATE_PROMPT.format(question=question, context=context)
    response = response_model.invoke([{"role": "user", "content": prompt}])
    return {"messages": [response]}
  1. 尝试一下:
input = {
    "messages": convert_to_messages(
        [
            {
                "role": "user",
                "content": "What does Lilian Weng say about types of reward hacking?",
            },
            {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "1",
                        "name": "retrieve_blog_posts",
                        "args": {"query": "types of reward hacking"},
                    }
                ],
            },
            {
                "role": "tool",
                "content": "reward hacking can be categorized into two types: environment or goal misspecification, and reward tampering",
                "tool_call_id": "1",
            },
        ]
    )
}

response = generate_answer(input)
response["messages"][-1].pretty_print()
输出:
================================== AI 消息 ==================================

Lilian Weng 将奖励黑客分为两类:环境或目标误规范,以及奖励篡改。她将奖励黑客视为包含这两个类别的广泛概念。奖励黑客发生在代理利用奖励函数中的缺陷或歧义来获得高奖励而不执行预期行为时。

7. Assemble the graph

现在我们将所有节点和边组装成一个完整的图:
  • generate_query_or_respond 开始,确定是否需要调用 retriever_tool
  • 使用 tools_condition 路由到下一步:
    • 如果 generate_query_or_respond 返回了 tool_calls,调用 retriever_tool 检索上下文
    • 否则,直接响应用户
  • 评估检索到的文档内容与问题的相关性(grade_documents)并路由到下一步:
    • 如果不相关,使用 rewrite_question 重写问题,然后再次调用 generate_query_or_respond
    • 如果相关,继续到 generate_answer 并使用带有检索到的文档上下文的 ToolMessage 生成最终响应
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode, tools_condition

workflow = StateGraph(MessagesState)

# 定义我们将循环使用的节点
workflow.add_node(generate_query_or_respond)
workflow.add_node("retrieve", ToolNode([retriever_tool]))
workflow.add_node(rewrite_question)
workflow.add_node(generate_answer)

workflow.add_edge(START, "generate_query_or_respond")

# 决定是否检索
workflow.add_conditional_edges(
    "generate_query_or_respond",
    # 评估 LLM 决策(调用 `retriever_tool` 工具或响应用户)
    tools_condition,
    {
        # 将条件输出转换为图中的节点
        "tools": "retrieve",
        END: END,
    },
)

# 调用 `action` 节点后采取的边。
workflow.add_conditional_edges(
    "retrieve",
    # 评估代理决策
    grade_documents,
)
workflow.add_edge("generate_answer", END)
workflow.add_edge("rewrite_question", "generate_query_or_respond")

# 编译
graph = workflow.compile()
可视化图:
from IPython.display import Image, display

display(Image(graph.get_graph().draw_mermaid_png()))
SQL agent graph

8. Run the agentic RAG

现在让我们通过运行一个完整的问题来测试完整的图:
for chunk in graph.stream(
    {
        "messages": [
            {
                "role": "user",
                "content": "What does Lilian Weng say about types of reward hacking?",
            }
        ]
    }
):
    for node, update in chunk.items():
        print("Update from node", node)
        update["messages"][-1].pretty_print()
        print("\n\n")
输出:
Update from node generate_query_or_respond
================================== AI 消息 ==================================
工具调用:
  retrieve_blog_posts (call_NYu2vq4km9nNNEFqJwefWKu1)
 调用 ID: call_NYu2vq4km9nNNEFqJwefWKu1
 参数:
    query: types of reward hacking



Update from node retrieve
================================= 工具消息 ==================================
名称:retrieve_blog_posts

(Note: Some work defines reward tampering as a distinct category of misalignment behavior from reward hacking. But I consider reward hacking as a broader concept here.)
At a high level, reward hacking can be categorized into two types: environment or goal misspecification, and reward tampering.

Why does Reward Hacking Exist?#

Pan et al. (2022) investigated reward hacking as a function of agent capabilities, including (1) model size, (2) action space resolution, (3) observation space noise, and (4) training time. They also proposed a taxonomy of three types of misspecified proxy rewards:

Let's Define Reward Hacking#
Reward shaping in RL is challenging. Reward hacking occurs when an RL agent exploits flaws or ambiguities in the reward function to obtain high rewards without genuinely learning the intended behaviors or completing the task as designed. In recent years, several related concepts have been proposed, all referring to some form of reward hacking:



Update from node generate_answer
================================== AI 消息 ==================================

Lilian Weng 将奖励黑客分为两类:环境或目标误规范,以及奖励篡改。她将奖励黑客视为包含这两个类别的广泛概念。奖励黑客发生在代理利用奖励函数中的缺陷或歧义来获得高奖励而不执行预期行为时。