Qdrant官方快速入门和教程简化版

2024-08-29 04:36

本文主要是介绍Qdrant官方快速入门和教程简化版,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Qdrant官方快速入门和教程简化版

说明:

  • 首次发表日期:2024-08-28
  • Qdrant官方文档:https://qdrant.tech/documentation/

关于

阅读Qdrant一小部分的官方文档,并使用中文简化记录下,更多请阅读官方文档。

使用Docker本地部署Qdrant

docker pull qdrant/qdrant
docker run -d -p 6333:6333 -p 6334:6334 \-v $(pwd)/qdrant_storage:/qdrant/storage:z \qdrant/qdrant

默认配置下,所有的数据存储在./qdrant_storage

快速入门

安装qdrant-client包(python):

pip install qdrant-client

初始化客户端:

from qdrant_client import QdrantClientclient = QdrantClient(url="http://localhost:6333")

所有的向量数据(vector data)都存储在Qdrant Collection上。创建一个名为test_collection的collection,该collection使用dot product作为比较向量的指标。

from qdrant_client.models import Distance, VectorParamsclient.create_collection(collection_name="test_collection",vectors_config=VectorParams(size=4, distance=Distance.DOT),
)

添加带payload的向量。payload是与向量相关联的数据。

from qdrant_client.models import PointStructoperation_info = client.upsert(collection_name="test_collection",wait=True,points=[PointStruct(id=1, vector=[0.05, 0.61, 0.76, 0.74], payload={"city": "Berlin"}),PointStruct(id=2, vector=[0.19, 0.81, 0.75, 0.11], payload={"city": "London"}),PointStruct(id=3, vector=[0.36, 0.55, 0.47, 0.94], payload={"city": "Moscow"}),PointStruct(id=4, vector=[0.18, 0.01, 0.85, 0.80], payload={"city": "New York"}),PointStruct(id=5, vector=[0.24, 0.18, 0.22, 0.44], payload={"city": "Beijing"}),PointStruct(id=6, vector=[0.35, 0.08, 0.11, 0.44], payload={"city": "Mumbai"}),]
)print(operation_info)

运行一个查询:

search_result = client.query_points(collection_name="test_collection", query=[0.2, 0.1, 0.9, 0.7], limit=3
).pointsprint(search_result)

输出:

[{"id": 4,"version": 0,"score": 1.362,"payload": null,"vector": null},{"id": 1,"version": 0,"score": 1.273,"payload": null,"vector": null},{"id": 3,"version": 0,"score": 1.208,"payload": null,"vector": null}
]

添加一个过滤器:

from qdrant_client.models import Filter, FieldCondition, MatchValuesearch_result = client.query_points(collection_name="test_collection",query=[0.2, 0.1, 0.9, 0.7],query_filter=Filter(must=[FieldCondition(key="city", match=MatchValue(value="London"))]),with_payload=True,limit=3,
).pointsprint(search_result)

输出:

[{"id": 2,"version": 0,"score": 0.871,"payload": {"city": "London"},"vector": null}
]

教程

语义搜索入门

安装依赖:

pip install sentence-transformers

导入模块:

from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

使用all-MiniLM-L6-v2编码器作为embedding模型,embedding模型可以将raw data转化为embeddings)

encoder = SentenceTransformer("all-MiniLM-L6-v2")

添加数据集:

documents = [{"name": "The Time Machine","description": "A man travels through time and witnesses the evolution of humanity.","author": "H.G. Wells","year": 1895,},{"name": "Ender's Game","description": "A young boy is trained to become a military leader in a war against an alien race.","author": "Orson Scott Card","year": 1985,},{"name": "Brave New World","description": "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.","author": "Aldous Huxley","year": 1932,},{"name": "The Hitchhiker's Guide to the Galaxy","description": "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.","author": "Douglas Adams","year": 1979,},{"name": "Dune","description": "A desert planet is the site of political intrigue and power struggles.","author": "Frank Herbert","year": 1965,},{"name": "Foundation","description": "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.","author": "Isaac Asimov","year": 1951,},{"name": "Snow Crash","description": "A futuristic world where the internet has evolved into a virtual reality metaverse.","author": "Neal Stephenson","year": 1992,},{"name": "Neuromancer","description": "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.","author": "William Gibson","year": 1984,},{"name": "The War of the Worlds","description": "A Martian invasion of Earth throws humanity into chaos.","author": "H.G. Wells","year": 1898,},{"name": "The Hunger Games","description": "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.","author": "Suzanne Collins","year": 2008,},{"name": "The Andromeda Strain","description": "A deadly virus from outer space threatens to wipe out humanity.","author": "Michael Crichton","year": 1969,},{"name": "The Left Hand of Darkness","description": "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.","author": "Ursula K. Le Guin","year": 1969,},{"name": "The Three-Body Problem","description": "Humans encounter an alien civilization that lives in a dying system.","author": "Liu Cixin","year": 2008,},
]

将embedding数据存储在内存中:

client = QdrantClient(":memory:")

创建一个collection:

client.create_collection(collection_name="my_books",vectors_config=models.VectorParams(size=encoder.get_sentence_embedding_dimension(),  # Vector size is defined by used modeldistance=models.Distance.COSINE,),
)

上传数据:

client.upload_points(collection_name="my_books",points=[models.PointStruct(id=idx, vector=encoder.encode(doc["description"]).tolist(), payload=doc)for idx, doc in enumerate(documents)],
)

问一个问题:

hits = client.query_points(collection_name="my_books",query=encoder.encode("alien invasion").tolist(),limit=3,
).pointsfor hit in hits:print(hit.payload, "score:", hit.score)

输出:

{'name': 'The War of the Worlds', 'description': 'A Martian invasion of Earth throws humanity into chaos.', 'author': 'H.G. Wells', 'year': 1898} score: 0.570093257022374
{'name': "The Hitchhiker's Guide to the Galaxy", 'description': 'A comedic science fiction series following the misadventures of an unwitting human and his alien friend.', 'author': 'Douglas Adams', 'year': 1979} score: 0.5040468703143637
{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

过滤以便缩窄查询:

hits = client.query_points(collection_name="my_books",query=encoder.encode("alien invasion").tolist(),query_filter=models.Filter(must=[models.FieldCondition(key="year", range=models.Range(gte=2000))]),limit=1,
).pointsfor hit in hits:print(hit.payload, "score:", hit.score)

输出:

{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

简单的神经搜索

下载样本数据集:

wget https://storage.googleapis.com/generall-shared-data/startups_demo.json

安装SentenceTransformer等依赖库:

pip install sentence-transformers numpy pandas tqdm

导入模块:

from sentence_transformers import SentenceTransformer
import numpy as np
import json
import pandas as pd
from tqdm.notebook import tqdm

创建sentence encoder:

model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda"
)  # or device="cpu" if you don't have a GPU

读取数据:

df = pd.read_json("./startups_demo.json", lines=True)

为每一个description创建embedding向量。encode内部会将输入切分为一个个batch,以便提高处理速度。

vectors = model.encode([row.alt + ". " + row.description for row in df.itertuples()],show_progress_bar=True,
)
vectors.shape
# > (40474, 384)

保存为npy文件:

np.save("startup_vectors.npy", vectors, allow_pickle=False)

启动docker服务

docker pull qdrant/qdrant
docker run -p 6333:6333 \-v $(pwd)/qdrant_storage:/qdrant/storage \qdrant/qdrant

创建Qdrant客户端

# Import client library
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distanceclient = QdrantClient("http://localhost:6333")

创建collection,其中384是embedding模型(all-MiniLM-L6-v2)的输出维度。

if not client.collection_exists("startups"):client.create_collection(collection_name="startups",vectors_config=VectorParams(size=384, distance=Distance.COSINE),)

加载数据

fd = open("./startups_demo.json")# payload is now an iterator over startup data
payload = map(json.loads, fd)# Load all vectors into memory, numpy array works as iterable for itself.
# Other option would be to use Mmap, if you don't want to load all data into RAM
vectors = np.load("./startup_vectors.npy")

上传数据到Qdrant

client.upload_collection(collection_name="startups",vectors=vectors,payload=payload,ids=None,  # Vector ids will be assigned automaticallybatch_size=256,  # How many vectors will be uploaded in a single request?
)

创建neural_searcher.py文件:

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformerclass NeuralSearcher:def __init__(self, collection_name):self.collection_name = collection_name# Initialize encoder modelself.model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")# initializa Qdrant clientself.qdrant_client = QdrantClient("http://localhost:6333")def search(self, text:str):# Convert text query into vectorvector = self.model.encode(text).tolist()# Use `vector` for search for closet vectors in the collectionsearch_result = self.qdrant_client.search(collection_name=self.collection_name,query_vector=vector,query_filter=None, # If you don't want any filters for nowlimit=5, # 5 the most closet results is enough)# `search_result` contains found vector ids with similarity scores along with stored payload# In this function you are interested in payload onlypayloads = [hit.payload for hit in search_result]return payloads

使用FastAPI部署:

pip install fastapi uvicorn
from qdrant_client import QdrantClient
from qdrant_client.models import Filter
from sentence_transformers import SentenceTransformerclass NeuralSearcher:def __init__(self, collection_name):self.collection_name = collection_name# Initialize encoder modelself.model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")# initializa Qdrant clientself.qdrant_client = QdrantClient("http://localhost:6333")def search(self, text:str):# Convert text query into vectorvector = self.model.encode(text).tolist()# Use `vector` for search for closet vectors in the collectionsearch_result = self.qdrant_client.search(collection_name=self.collection_name,query_vector=vector,query_filter=None, # If you don't want any filters for nowlimit=5, # 5 the most closet results is enough)# `search_result` contains found vector ids with similarity scores along with stored payload# In this function you are interested in payload onlypayloads = [hit.payload for hit in search_result]return payloadsdef search_in_berlin(self, text:str):# Convert text query into vectorvector = self.model.encode(text).tolist()city_of_interest = "Berlin"# Define a filter for citiescity_filter = Filter(**{"must": [{"key": "city", # Store city information in a field of the same name "match": { # This condition checks if payload field has the requested value"value": city_of_interest}}]})# Use `vector` for search for closet vectors in the collectionsearch_result = self.qdrant_client.query_points(collection_name=self.collection_name,query=vector,query_filter=city_filter,limit=5,).points# `search_result` contains found vector ids with similarity scores along with stored payload# In this function you are interested in payload onlypayloads = [hit.payload for hit in search_result]return payloads
from fastapi import FastAPIapp = FastAPI()# Create a neural searcher instance
neural_searcher = NeuralSearcher(collection_name="startups")@app.get("/api/search")
def search_startup(q: str):return {"result": neural_searcher.search(text=q)}@app.get("/api/search_in_berlin")
def search_startup_filter(q: str):return {"result": neural_searcher.search_in_berlin(text=q)}if __name__ == "__main__":import uvicornuvicorn.run(app, host="0.0.0.0", port=8001)

如果是在jupyter notebook中运行,则需要添加

import nest_asyncio
nest_asyncio.apply()

安装nest_asyncio:

pip install nest_asyncio

异步使用Qdrant

Qdrant原生支持async

from qdrant_client import modelsimport qdrant_client
import asyncioasync def main():client = qdrant_client.AsyncQdrantClient("localhost")# Create a collectionawait client.create_collection(collection_name="my_collection",vectors_config=models.VectorParams(size=4, distance=models.Distance.COSINE),)# Insert a vectorawait client.upsert(collection_name="my_collection",points=[models.PointStruct(id="5c56c793-69f3-4fbf-87e6-c4bf54c28c26",payload={"color": "red",},vector=[0.9, 0.1, 0.1, 0.5],),],)# Search for nearest neighborspoints = await client.query_points(collection_name="my_collection",query=[0.9, 0.1, 0.1, 0.5],limit=2,).points# Your async code using AsyncQdrantClient might be put here# ...asyncio.run(main())

这篇关于Qdrant官方快速入门和教程简化版的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1116853

相关文章

Python中模块graphviz使用入门

《Python中模块graphviz使用入门》graphviz是一个用于创建和操作图形的Python库,本文主要介绍了Python中模块graphviz使用入门,具有一定的参考价值,感兴趣的可以了解一... 目录1.安装2. 基本用法2.1 输出图像格式2.2 图像style设置2.3 属性2.4 子图和聚

一文教你Python如何快速精准抓取网页数据

《一文教你Python如何快速精准抓取网页数据》这篇文章主要为大家详细介绍了如何利用Python实现快速精准抓取网页数据,文中的示例代码简洁易懂,具有一定的借鉴价值,有需要的小伙伴可以了解下... 目录1. 准备工作2. 基础爬虫实现3. 高级功能扩展3.1 抓取文章详情3.2 保存数据到文件4. 完整示例

springboot使用Scheduling实现动态增删启停定时任务教程

《springboot使用Scheduling实现动态增删启停定时任务教程》:本文主要介绍springboot使用Scheduling实现动态增删启停定时任务教程,具有很好的参考价值,希望对大家有... 目录1、配置定时任务需要的线程池2、创建ScheduledFuture的包装类3、注册定时任务,增加、删

快速修复一个Panic的Linux内核的技巧

《快速修复一个Panic的Linux内核的技巧》Linux系统中运行了不当的mkinitcpio操作导致内核文件不能正常工作,重启的时候,内核启动中止于Panic状态,该怎么解决这个问题呢?下面我们就... 感谢China编程(www.chinasem.cn)网友 鸢一雨音 的投稿写这篇文章是有原因的。为了配置完

如何为Yarn配置国内源的详细教程

《如何为Yarn配置国内源的详细教程》在使用Yarn进行项目开发时,由于网络原因,直接使用官方源可能会导致下载速度慢或连接失败,配置国内源可以显著提高包的下载速度和稳定性,本文将详细介绍如何为Yarn... 目录一、查询当前使用的镜像源二、设置国内源1. 设置为淘宝镜像源2. 设置为其他国内源三、还原为官方

Python利用ElementTree实现快速解析XML文件

《Python利用ElementTree实现快速解析XML文件》ElementTree是Python标准库的一部分,而且是Python标准库中用于解析和操作XML数据的模块,下面小编就来和大家详细讲讲... 目录一、XML文件解析到底有多重要二、ElementTree快速入门1. 加载XML的两种方式2.

Maven的使用和配置国内源的保姆级教程

《Maven的使用和配置国内源的保姆级教程》Maven是⼀个项目管理工具,基于POM(ProjectObjectModel,项目对象模型)的概念,Maven可以通过一小段描述信息来管理项目的构建,报告... 目录1. 什么是Maven?2.创建⼀个Maven项目3.Maven 核心功能4.使用Maven H

IDEA自动生成注释模板的配置教程

《IDEA自动生成注释模板的配置教程》本文介绍了如何在IntelliJIDEA中配置类和方法的注释模板,包括自动生成项目名称、包名、日期和时间等内容,以及如何定制参数和返回值的注释格式,需要的朋友可以... 目录项目场景配置方法类注释模板定义类开头的注释步骤类注释效果方法注释模板定义方法开头的注释步骤方法注

Python虚拟环境终极(含PyCharm的使用教程)

《Python虚拟环境终极(含PyCharm的使用教程)》:本文主要介绍Python虚拟环境终极(含PyCharm的使用教程),具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,... 目录一、为什么需要虚拟环境?二、虚拟环境创建方式对比三、命令行创建虚拟环境(venv)3.1 基础命令3

使用Node.js制作图片上传服务的详细教程

《使用Node.js制作图片上传服务的详细教程》在现代Web应用开发中,图片上传是一项常见且重要的功能,借助Node.js强大的生态系统,我们可以轻松搭建高效的图片上传服务,本文将深入探讨如何使用No... 目录准备工作搭建 Express 服务器配置 multer 进行图片上传处理图片上传请求完整代码示例