triton入门实战

2024-04-16 10:28
文章标签 实战 入门 triton

本文主要是介绍triton入门实战,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

这篇文章主要讲的是基于官方镜像及, pytorch script 格式模型,构建tritonserver 服务

1、环境准备:

  • 1.1. 下载 tritonserver镜像: Triton Inference Server | NVIDIA NGC

    • a. 注意:tritonserver 镜像中的invdia驱动版本对应,否则后面会启动失败。
  • 1.2. 然后,拉取Pytorch官方镜像作为推理系统的客户端同时进行一些预处理操作(当然也可以直接拉取tritonserver客户端SDK镜像)。

    • a. docker pull pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel暂时无法提供下载链接,因为无法访问dockerhub。
    • b. tritonserver客户端SDK镜像 Triton Inference Server | NVIDIA NGC
      # nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
      # docker pull nvcr.io/nvidia/tritonserver:23.04-py3-sdk
  • 1.3. 接下来,基于官方Pytorch镜像创建一个容器客户端。

    • a. 本地创建共享目录, D:\chinasoft\shumei\triton\demo_first\pytorch_container\workspace
    • b. docker run -dt --name pytorch200_cu117_dev --restart=always --gpus all --network=host --shm-size 4G -v /D/chinasoft/shumei/triton/demo_first/pytorch_container/workspace:/workspace -w /workspace pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel /bin/bash
    • c. 进入容器,docker exec -it pytorch200_cu117_dev bash
    • d. pip install datasets transformers -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
    • e. pip install tritonclient[all] -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn

2、模型准备

  • 2.1. 文将基于 PyTorch 后端使用 resnet50 模型来进行图片分类,因此,需预先下载 resnet50 模型,然后将其转换为torchscript格式。具体代码(resnet50_convert_torchscript.py )如下所示:
import torch
import torchvision.models as modelsresnet50 = models.resnet50(pretrained=True)
resnet50.eval()
image = torch.randn(1, 3, 244, 244)
resnet50_traced = torch.jit.trace(resnet50, image)
resnet50(image)
# resnet50_traced.save('/workspace/model/resnet50/model.pt')
torch.jit.save(resnet50_traced, "/workspace/model/resnet50/model.pt")
  • 2 2. 最后,拉取Triton Server 代码库。

    git clone -b r23.04 https://github.com/triton-inference-server/server.git

    一些常见后端backend的配置都在server/docs/examples目录下。

tree docs/examples -L 2
docs/examples
|-- README.md
|-- fetch_models.sh
|-- jetson
|   |-- README.md
|   `-- concurrency_and_dynamic_batching
`-- model_repository|-- densenet_onnx|-- inception_graphdef|-- simple|-- simple_dyna_sequence|-- simple_identity|-- simple_int8|-- simple_sequence`-- simple_string11 directories, 3 files
  • 2.3. 拉取Triton Tutorials库,该仓库中包含Triton的教程和样例,本文使用Quick_Deploy/PyTorch下部署一个Pytorch模型进行讲解。

    git clone https://github.com/triton-inference-server/tutorials.git

3、开发实践

  • 3.1 首先,在宿主机构建一个模型仓库,仓库的目录结构如下所示:
model_repository/
`-- resnet50|-- 1|   `-- model.pt`-- config.pbtxt

其中, config.pbtxt 是模型配置文件; 1表示模型版本号; resnet50表示模型名,需要与config.pbtxt文件中的name字段保存一致;model.pt为模型权重(即上面转换后的模型权重)。

  • 3.2. 编辑config.pbtxt文件,具体内容如下所示:
name: "resnet50"
platform: "pytorch_libtorch"
max_batch_size : 0
input [{name: "input__0"data_type: TYPE_FP32dims: [ 3, 224, 224 ]reshape { shape: [ 1, 3, 224, 224 ] }}
]
output [{name: "output__0"data_type: TYPE_FP32dims: [ 1, 1000 ,1, 1]reshape { shape: [ 1, 1000 ] }}
]

重要字段说明如下:

  • name:模型名
  • platform:用于指定模型对应的后端(backend),比如:pytorch_libtorch、onnxruntime_onnx、tensorrt_plan等
  • max_batch_size:模型推理在batch模式下支持的最大batch数
  • input:模型输入属性配置。
  • output:模型输出属性配置。

模型仓库构建好之后,接下来启动Triton推理服务端。

4、启动tritonserver推理服务

启动推理服务启动服务的方法有两种:一种是用 docker 启动并执行命令,一种是进入 docker 中然后手动调用命令。

我们在这里使用docker启动并执行命令:

docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /D/chinasoft/shumei/triton/demo_first/model_repository:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models

参数说明:

  • p:宿主机与容器内端口映射
  • v:将宿主机存储挂载进容器,这里将模型仓库挂载进容器
  • -model-repository:指定Triton服务模型仓库的地址
  • 这里注意指定的model_repository路径必须正确且模型文件已经配置无误,具体参考:模型准备章节。
(base) PS C:\Users\lenovo> docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /D/chinasoft/shumei/triton/demo_first/model_repository:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models=============================
== Triton Inference Server ==
=============================NVIDIA Release 22.12 (build 50109463)
Triton Server Version 2.29.0Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-licenseWARNING: CUDA Minor Version Compatibility mode ENABLED.Using driver version 516.94 which has support for CUDA 11.7.  This containerwas built with CUDA 11.8 and will be run in Minor Version Compatibility mode.CUDA Forward Compatibility is preferred over Minor Version Compatibility for usewith this container but was unavailable:[[]]See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.I0804 01:46:15.003883 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x304800000' with size 268435456
I0804 01:46:15.004050 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0804 01:46:15.322720 1 model_lifecycle.cc:459] loading: resnet50:1
I0804 01:46:17.472054 1 libtorch.cc:1985] TRITONBACKEND_Initialize: pytorch
I0804 01:46:17.472105 1 libtorch.cc:1995] Triton TRITONBACKEND API version: 1.10
I0804 01:46:17.472587 1 libtorch.cc:2001] 'pytorch' TRITONBACKEND API version: 1.10
I0804 01:46:17.472634 1 libtorch.cc:2034] TRITONBACKEND_ModelInitialize: resnet50 (version 1)
W0804 01:46:17.473291 1 libtorch.cc:284] skipping model configuration auto-complete for 'resnet50': not supported for pytorch backend
I0804 01:46:17.473618 1 libtorch.cc:313] Optimized execution is enabled for model instance 'resnet50'
I0804 01:46:17.473624 1 libtorch.cc:332] Cache Cleaning is disabled for model instance 'resnet50'
I0804 01:46:17.473626 1 libtorch.cc:349] Inference Mode is disabled for model instance 'resnet50'
I0804 01:46:17.473640 1 libtorch.cc:444] NvFuser is not specified for model instance 'resnet50'
I0804 01:46:17.473699 1 libtorch.cc:2078] TRITONBACKEND_ModelInstanceInitialize: resnet50 (GPU device 0)
I0804 01:46:22.750763 1 model_lifecycle.cc:694] successfully loaded 'resnet50' version 1
I0804 01:46:22.750870 1 server.cc:563]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+I0804 01:46:22.750917 1 server.cc:590]
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                    | Config                                                                                                                                                        |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+I0804 01:46:22.750948 1 server.cc:633]
+----------+---------+--------+
| Model    | Version | Status |
+----------+---------+--------+
| resnet50 | 1       | READY  |
+----------+---------+--------+I0804 01:46:22.810861 1 metrics.cc:864] Collecting metrics for GPU 0: NVIDIA GeForce GTX 1650
I0804 01:46:22.811494 1 metrics.cc:757] Collecting CPU metrics
I0804 01:46:22.811657 1 tritonserver.cc:2264]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                               |
| server_version                   | 2.29.0                                                                                                                                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                              |
| model_control_mode               | MODE_NONE                                                                                                                                                                                            |
| strict_model_config              | 0                                                                                                                                                                                                    |
| rate_limit                       | OFF                                                                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                            |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                             |
| response_cache_byte_size         | 0                                                                                                                                                                                                    |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                  |
| strict_readiness                 | 1                                                                                                                                                                                                    |
| exit_timeout                     | 30                                                                                                                                                                                                   |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+I0804 01:46:22.813086 1 grpc_server.cc:4819] Started GRPCInferenceService at 0.0.0.0:8001
I0804 01:46:22.813243 1 http_server.cc:3477] Started HTTPService at 0.0.0.0:8000
I0804 01:46:22.890915 1 http_server.cc:184] Started Metrics Service at 0.0.0.0:8002
W0804 01:46:23.822499 1 metrics.cc:603] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0804 01:46:24.822769 1 metrics.cc:603] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0804 01:46:25.831221 1 metrics.cc:603] Unable to get power limit for GPU 0. Status:Success, value:0.000000

可以看到resnet50模型已经 READY状态了,但是显卡没有用到,因为上面报警我宿主 机驱动版本和镜像驱动版本不匹配

WARNING: CUDA Minor Version Compatibility mode ENABLED.Using driver version 516.94 which has support for CUDA 11.7.  This containerwas built with CUDA 11.8 and will be run in Minor Version Compatibility mode.CUDA Forward Compatibility is preferred over Minor Version Compatibility for usewith this container but was unavailable:[[]]See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

5、发送推理请求

  • 5.1 首先,创建客户端脚本client.py,放到:
import numpy as np
from torchvision import transforms
from PIL import Image
import tritonclient.http as httpclient
from tritonclient.utils import triton_to_np_dtype# 图片预处理
# preprocessing function
def rn50_preprocess(img_path="img1.jpg"):img = Image.open(img_path)preprocess = transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),])return preprocess(img).numpy()transformed_img = rn50_preprocess()# 设置连接到Triton服务端
# Setting up client
client = httpclient.InferenceServerClient(url="localhost:8000")# 指定resnet50模型的输入和输出
inputs = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
inputs.set_data_from_numpy(transformed_img, binary_data=True)# class_count表示获取 TopK 分类预测结果。如果没有设置这个选项,默认值为0,那么将会得到一个 1000 维的向量。
outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)# 发送一个推理请求到Triton服务端
# Querying the server
results = client.infer(model_name="resnet50", inputs=[inputs], outputs=[outputs])
inference_output = results.as_numpy('output__0')
print(inference_output[:5])
  • 5.2. 进入客户端容器:docker exec -it pytorch200_cu117_dev bash
  • 5.3. 预先下载好,用于推理请求的图片:
    wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
  • 5.4. 执行客户端脚本发送请求:
python client.py
[b'12.474869:90' b'11.527128:92' b'9.659309:14' b'8.408504:136'b'8.216769:11']

输出的格式为<confidence_score>:<classification_index>。

这篇关于triton入门实战的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/908532

相关文章

从入门到精通MySQL联合查询

《从入门到精通MySQL联合查询》:本文主要介绍从入门到精通MySQL联合查询,本文通过实例代码给大家介绍的非常详细,需要的朋友可以参考下... 目录摘要1. 多表联合查询时mysql内部原理2. 内连接3. 外连接4. 自连接5. 子查询6. 合并查询7. 插入查询结果摘要前面我们学习了数据库设计时要满

从原理到实战深入理解Java 断言assert

《从原理到实战深入理解Java断言assert》本文深入解析Java断言机制,涵盖语法、工作原理、启用方式及与异常的区别,推荐用于开发阶段的条件检查与状态验证,并强调生产环境应使用参数验证工具类替代... 目录深入理解 Java 断言(assert):从原理到实战引言:为什么需要断言?一、断言基础1.1 语

从入门到精通C++11 <chrono> 库特性

《从入门到精通C++11<chrono>库特性》chrono库是C++11中一个非常强大和实用的库,它为时间处理提供了丰富的功能和类型安全的接口,通过本文的介绍,我们了解了chrono库的基本概念... 目录一、引言1.1 为什么需要<chrono>库1.2<chrono>库的基本概念二、时间段(Durat

Java MQTT实战应用

《JavaMQTT实战应用》本文详解MQTT协议,涵盖其发布/订阅机制、低功耗高效特性、三种服务质量等级(QoS0/1/2),以及客户端、代理、主题的核心概念,最后提供Linux部署教程、Sprin... 目录一、MQTT协议二、MQTT优点三、三种服务质量等级四、客户端、代理、主题1. 客户端(Clien

在Spring Boot中集成RabbitMQ的实战记录

《在SpringBoot中集成RabbitMQ的实战记录》本文介绍SpringBoot集成RabbitMQ的步骤,涵盖配置连接、消息发送与接收,并对比两种定义Exchange与队列的方式:手动声明(... 目录前言准备工作1. 安装 RabbitMQ2. 消息发送者(Producer)配置1. 创建 Spr

解析C++11 static_assert及与Boost库的关联从入门到精通

《解析C++11static_assert及与Boost库的关联从入门到精通》static_assert是C++中强大的编译时验证工具,它能够在编译阶段拦截不符合预期的类型或值,增强代码的健壮性,通... 目录一、背景知识:传统断言方法的局限性1.1 assert宏1.2 #error指令1.3 第三方解决

深度解析Spring Boot拦截器Interceptor与过滤器Filter的区别与实战指南

《深度解析SpringBoot拦截器Interceptor与过滤器Filter的区别与实战指南》本文深度解析SpringBoot中拦截器与过滤器的区别,涵盖执行顺序、依赖关系、异常处理等核心差异,并... 目录Spring Boot拦截器(Interceptor)与过滤器(Filter)深度解析:区别、实现

深度解析Spring AOP @Aspect 原理、实战与最佳实践教程

《深度解析SpringAOP@Aspect原理、实战与最佳实践教程》文章系统讲解了SpringAOP核心概念、实现方式及原理,涵盖横切关注点分离、代理机制(JDK/CGLIB)、切入点类型、性能... 目录1. @ASPect 核心概念1.1 AOP 编程范式1.2 @Aspect 关键特性2. 完整代码实

MySQL中的索引结构和分类实战案例详解

《MySQL中的索引结构和分类实战案例详解》本文详解MySQL索引结构与分类,涵盖B树、B+树、哈希及全文索引,分析其原理与优劣势,并结合实战案例探讨创建、管理及优化技巧,助力提升查询性能,感兴趣的朋... 目录一、索引概述1.1 索引的定义与作用1.2 索引的基本原理二、索引结构详解2.1 B树索引2.2

从入门到精通MySQL 数据库索引(实战案例)

《从入门到精通MySQL数据库索引(实战案例)》索引是数据库的目录,提升查询速度,主要类型包括BTree、Hash、全文、空间索引,需根据场景选择,建议用于高频查询、关联字段、排序等,避免重复率高或... 目录一、索引是什么?能干嘛?核心作用:二、索引的 4 种主要类型(附通俗例子)1. BTree 索引(