测试大语言模型在嵌入式设备部署的可能性——模型TinyLlama-1.1B-Chat-v1.0

本文主要是介绍测试大语言模型在嵌入式设备部署的可能性——模型TinyLlama-1.1B-Chat-v1.0,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

测试模型TinyLlama-1.1B-Chat-v1.0修改推理参数,观察参数变化与推理时间变化之间的关系。
本地环境:

处理器 Intel® Core™ i5-8400 CPU @ 2.80GHz 2.80 GHz
机带 RAM 16.0 GB (15.9 GB 可用)
集显 Intel® UHD Graphics 630
独显 NVIDIA GeForce GTX 1050

主要测试修改:

outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)

源代码来源(镜像):https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0

'''
https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0
测试tinyLlama 1.1B效果不错,比Qwen1.8B经过量化的都好很多
'''# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerateimport os
from datetime import datetime
import torchos.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
from transformers import pipeline'''
pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")# We use the tokenizer's chat template to format each message - see https://hf-mirror.com/docs/transformers/main/en/chat_templating
messages = [{"role": "system","content": "You are a friendly chatbot who always responds in the style of a pirate",},# {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},{"role": "user", "content": "你叫什么名字?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
'''# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...
def load_pipeline():pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16,device_map="auto")return pipedef generate_text(content, length=20):"""根据给定的prompt生成文本"""messages = [{"role": "提示","content": "这是个友好的聊天机器人...",},# {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},{"role": "user", "content": content},]prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)datetime1 = datetime.now()outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)print(outputs[0]["generated_text"])datetime2 = datetime.now()time12_interval = datetime2 - datetime1print("时间间隔", time12_interval)if False:outputs = pipe(prompt, max_new_tokens=32, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)print(outputs[0]["generated_text"])datetime3 = datetime.now()time23_interval = datetime3 - datetime2print("时间间隔2", time23_interval)outputs = pipe(prompt, max_new_tokens=32, do_sample=False, top_k=50)print(outputs[0]["generated_text"])datetime4 = datetime.now()time34_interval = datetime4 - datetime3print("时间间隔3", time34_interval)outputs = pipe(prompt, max_new_tokens=32, do_sample=True, temperature=0.7, top_k=30, top_p=0.95)print(outputs[0]["generated_text"])datetime5 = datetime.now()time45_interval = datetime5 - datetime4print("时间间隔4", time45_interval)outputs = pipe(prompt, max_new_tokens=32, do_sample=False, top_k=30)print(outputs[0]["generated_text"])datetime6 = datetime.now()time56_interval = datetime6 - datetime5print("时间间隔5", time56_interval)outputs = pipe(prompt, max_new_tokens=12, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)print(outputs[0]["generated_text"])datetime7 = datetime.now()time67_interval = datetime7 - datetime6print("时间间隔6", time67_interval)'''结论:修改top_p不会显著降低推理时间,并且中英文相同的问题,中文问题推理时间是英文的两倍do_sample修改成False基本不会降低推理时间只有max_new_tokens才能显著降低推理时间,但是max_new_tokens与推理时间不是呈线性关系比如max_new_tokens=256,推理时间2分钟当max_new_tokens=32的时候,推理时间才会变成约1分钟因此,不如将max_new_tokens设置大些用于获取比较完整的答案'''return outputsif __name__ == "__main__":'''main function'''global pipepipe = load_pipeline()# print('load pipe ok')while True:prompt = input("请输入一个提示(或输入'exit'退出):")if prompt.lower() == 'exit':breaktry:generated_text = generate_text(prompt)print("生成的文本:")print(generated_text[0]["generated_text"])except Exception as e:print("发生错误:", e)
请输入一个提示(或输入'exit'退出):如何开门?
<|user|>
如何开门?</s>
<|assistant|>
Certainly! Opening a door is a simple process that involves several steps. Here are the general steps to follow to open a door:1. Turn off the lock: Turn off the lock with the key by pressing the "lock" button.2. Press the handle: Use the handle to push the door open. If the door is mechanical, you may need to turn a knob or pull the door handle to activate the door.3. Release the latch: Once the door is open, release the latch by pulling it backward.4. Slide the door: Slide the door forward by pushing it against the wall with your feet or using a push bar.5. Close the door: Once the door is open, close it by pressing the lock button or pulling the handle backward.6. Use a second key: If the lock has a second key, make sure it is properly inserted and then turn it to the correct position to unlock the door.Remember to always double-check the locks before opening a door, as some locks can be tricky to open. If you're unsure about the correct procedure for opening a door,
时间间隔 0:04:23.561065
生成的文本:
<|user|>
如何开门?</s>
<|assistant|>
Certainly! Opening a door is a simple process that involves several steps. Here are the general steps to follow to open a door:1. Turn off the lock: Turn off the lock with the key by pressing the "lock" button.2. Press the handle: Use the handle to push the door open. If the door is mechanical, you may need to turn a knob or pull the door handle to activate the door.3. Release the latch: Once the door is open, release the latch by pulling it backward.4. Slide the door: Slide the door forward by pushing it against the wall with your feet or using a push bar.5. Close the door: Once the door is open, close it by pressing the lock button or pulling the handle backward.6. Use a second key: If the lock has a second key, make sure it is properly inserted and then turn it to the correct position to unlock the door.Remember to always double-check the locks before opening a door, as some locks can be tricky to open. If you're unsure about the correct procedure for opening a door,
请输入一个提示(或输入'exit'退出):

这篇关于测试大语言模型在嵌入式设备部署的可能性——模型TinyLlama-1.1B-Chat-v1.0的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/924985

相关文章

Go语言中泄漏缓冲区的问题解决

《Go语言中泄漏缓冲区的问题解决》缓冲区是一种常见的数据结构,常被用于在不同的并发单元之间传递数据,然而,若缓冲区使用不当,就可能引发泄漏缓冲区问题,本文就来介绍一下问题的解决,感兴趣的可以了解一下... 目录引言泄漏缓冲区的基本概念代码示例:泄漏缓冲区的产生项目场景:Web 服务器中的请求缓冲场景描述代码

Go语言如何判断两张图片的相似度

《Go语言如何判断两张图片的相似度》这篇文章主要为大家详细介绍了Go语言如何中实现判断两张图片的相似度的两种方法,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 在介绍技术细节前,我们先来看看图片对比在哪些场景下可以用得到:图片去重:自动删除重复图片,为存储空间"瘦身"。想象你是一个

Go语言中Recover机制的使用

《Go语言中Recover机制的使用》Go语言的recover机制通过defer函数捕获panic,实现异常恢复与程序稳定性,具有一定的参考价值,感兴趣的可以了解一下... 目录引言Recover 的基本概念基本代码示例简单的 Recover 示例嵌套函数中的 Recover项目场景中的应用Web 服务器中

详解如何使用Python从零开始构建文本统计模型

《详解如何使用Python从零开始构建文本统计模型》在自然语言处理领域,词汇表构建是文本预处理的关键环节,本文通过Python代码实践,演示如何从原始文本中提取多尺度特征,并通过动态调整机制构建更精确... 目录一、项目背景与核心思想二、核心代码解析1. 数据加载与预处理2. 多尺度字符统计3. 统计结果可

SpringBoot整合Sa-Token实现RBAC权限模型的过程解析

《SpringBoot整合Sa-Token实现RBAC权限模型的过程解析》:本文主要介绍SpringBoot整合Sa-Token实现RBAC权限模型的过程解析,本文给大家介绍的非常详细,对大家的学... 目录前言一、基础概念1.1 RBAC模型核心概念1.2 Sa-Token核心功能1.3 环境准备二、表结

python多线程并发测试过程

《python多线程并发测试过程》:本文主要介绍python多线程并发测试过程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、并发与并行?二、同步与异步的概念?三、线程与进程的区别?需求1:多线程执行不同任务需求2:多线程执行相同任务总结一、并发与并行?1、

Go语言中使用JWT进行身份验证的几种方式

《Go语言中使用JWT进行身份验证的几种方式》本文主要介绍了Go语言中使用JWT进行身份验证的几种方式,包括dgrijalva/jwt-go、golang-jwt/jwt、lestrrat-go/jw... 目录简介1. github.com/dgrijalva/jwt-go安装:使用示例:解释:2. gi

Go 语言中的 Struct Tag 的用法详解

《Go语言中的StructTag的用法详解》在Go语言中,结构体字段标签(StructTag)是一种用于给字段添加元信息(metadata)的机制,常用于序列化(如JSON、XML)、ORM映... 目录一、结构体标签的基本语法二、json:"token"的具体含义三、常见的标签格式变体四、使用示例五、使用

Web技术与Nginx网站环境部署教程

《Web技术与Nginx网站环境部署教程》:本文主要介绍Web技术与Nginx网站环境部署教程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、Web基础1.域名系统DNS2.Hosts文件3.DNS4.域名注册二.网页与html1.网页概述2.HTML概述3.

Go语言使用slices包轻松实现排序功能

《Go语言使用slices包轻松实现排序功能》在Go语言开发中,对数据进行排序是常见的需求,Go1.18版本引入的slices包提供了简洁高效的排序解决方案,支持内置类型和用户自定义类型的排序操作,本... 目录一、内置类型排序:字符串与整数的应用1. 字符串切片排序2. 整数切片排序二、检查切片排序状态: