异步实现 Anthropic 的上下文检索

关于如何实现Anthropic的上下文检索技术并结合异步处理的博客文章。这篇文章来自Instructor网站，讨论了在RAG（Retrieval-Augmented Generation）系统中，如何通过上下文检索技术来保留关键上下文信息，从而提高检索效率。

背景：RAG 中的上下文问题
在传统的RAG系统中，文档被分割成多个块时会丢失上下文信息。

想象一下，你的知识库中嵌入了一系列财务信息（例如，美国证券交易委员会文件），并且你收到以下问题：‘ACME Corp 在 2023 年第二季度的收入增长是多少？

相关块可能包含以下文本：“该公司的收入比上一季度增长了 3%。”但是，这部分内容本身并未指定其指的是哪家公司或相关时间段。
也就是说：仅仅靠一个包含“公司收入比上一季度增长了3%”的块是不够的，因为它没有指明是哪个公司或相关时间段。

Anthropic 的解决方案：上下文检索
通过在嵌入之前添加特定块的解释性上下文来解决这个问题。

例如，将原始块“公司收入比上一季度增长了3%”上下文化为：“这个块来自ACME公司2023年第二季度的SEC文件；上一季度的收入为3.14亿美元。公司收入比上一季度增长了3%”。

Anthropic 的示例：

original_chunk = "The company's revenue grew by 3% over the previous quarter."

contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."

实现上下文检索
Anthropic 使用 Claude 来生成上下文。他们提供了以下提示：

<document> 
{{WHOLE_DOCUMENT}} 
</document> 
Here is the chunk we want to situate within the whole document 
<chunk> 
{{CHUNK_CONTENT}} 
</chunk> 
请简明扼要地说明该语段在整个文档中的位置，以便改进对该语段的搜索检索。 请只回答简洁的上下文，不要回答其他内容。

性能改进
Anthropic 报告了显著的改进：

上下文嵌入将前 20 个块检索失败率降低了 35% (5.7% → 3.7%)。
结合上下文嵌入和上下文 BM25，失败率降低了 49% (5.7% → 2.9%)。
增加重新排序后，失败率进一步降低了 67% (5.7% → 1.9%)。

Instructor的异步处理实现：
我们可以使用异步处理来实现 Anthropic 的技术以提高效率：

from instructor import AsyncInstructor, Mode, patch
from anthropic import AsyncAnthropic
from pydantic import BaseModel, Field
import asyncio
from typing import List, Dict

class SituatedContext(BaseModel):
    title: str = Field(..., description="The title of the document.")
    context: str = Field(..., description="The context to situate the chunk within the document.")

client = AsyncInstructor(
    create=patch(
        create=AsyncAnthropic().beta.prompt_caching.messages.create,
        mode=Mode.ANTHROPIC_TOOLS,
    ),
    mode=Mode.ANTHROPIC_TOOLS,
)

async def situate_context(doc: str, chunk: str) -> str:
    response = await client.chat.completions.create(
        model="claude-3-haiku-20240307",
        max_tokens=1024,
        temperature=0.0,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "<document>{{doc}}</document>",
                        "cache_control": {"type": "ephemeral"},
                    },
                    {
                        "type": "text",
                        "text": "Here is the chunk we want to situate within the whole document\n<chunk>{{chunk}}</chunk>\nPlease give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk.\nAnswer only with the succinct context and nothing else.",
                    },
                ],
            }
        ],
        response_model=SituatedContext,
        context={"doc": doc, "chunk": chunk},
    )
    return response.context

def chunking_function(doc: str) -> List[str]:
    chunk_size = 1000
    overlap = 200
    chunks = []
    start = 0
    while start < len(doc):
        end = start + chunk_size
        chunks.append(doc[start:end])
        start += chunk_size - overlap
    return chunks

async def process_chunk(doc: str, chunk: str) -> Dict[str, str]:
    context = await situate_context(doc, chunk)
    return {
        "chunk": chunk,
        "context": context
    }

async def process(doc: str) -> List[Dict[str, str]]:
    chunks = chunking_function(doc)
    tasks = [process_chunk(doc, chunk) for chunk in chunks]
    results = await asyncio.gather(*tasks)
    return results

Example usage
async def main():
    document = "Your full document text here..."
    processed_chunks = await process(document)
    for i, item in enumerate(processed_chunks):
        print(f"Chunk {i + 1}:")
        print(f"Text: {item['chunk'][:50]}...")
        print(f"Context: {item['context']}")
        print()

if name == "main":
    asyncio.run(main())

本实施方案的主要特点

异步处理：用于asyncio并发块处理。
结构化输出：使用 Pydantic 模型进行类型安全的响应。
即时缓存：利用 Anthropic 的即时缓存来提高效率。
分块：实现具有重叠的基本分块策略。
Jinja2 模板：使用 Jinja2 模板将变量注入提示中。

Anthropic 文章中的思考
Anthropic 提到了几个实施注意事项：

块边界：对块大小、边界和重叠进行试验。
嵌入模型：他们发现 Gemini 和 Voyage 嵌入有效。
自定义情境提示：考虑特定领域的提示。
块的数量：他们发现使用 20 个块最有效。
评估：始终针对您的具体用例进行评估。

进一步增强
根据 Anthropic 的建议：

根据内容复杂性实现动态块大小调整。
与矢量数据库集成，实现高效存储和检索。
添加错误处理和重试机制。
尝试不同的嵌入模型和提示。
实施重新排名步骤以进一步提高性能。

该实现为利用 Anthropic 的上下文检索技术并提高异步处理的效率提供了一个起点。