Ubuntu20.04配置qwen0.5B记录

2024-06-11 01:44

本文主要是介绍Ubuntu20.04配置qwen0.5B记录,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

环境简介

Ubuntu20.04、
NVIDIA-SMI 545.29.06、
Cuda 11.4、
python3.10、
pytorch1.11.0

开始搭建

python环境设置

创建虚拟环境

conda create --name qewn python==3.10

预安装modelscope和transformers

pip install modelscope
pip install transformers

安装pytorch

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3
模型需要下载

创建一个python文件

gedit download.py

里面复制如下内容

from modelscope.hub.file_download import model_file_downloadmodel_dir = model_file_download(model_id='qwen/Qwen1.5-0.5B-Chat-GGUF',file_path='qwen1_5-0_5b-chat-q5_k_m.gguf',revision='master',cache_dir='path/to/local/dir')

运行python文件进行下载

python download.py
下载llama.cpp

使⽤git命令克隆llama.cpp项⽬

git clone https://github.com/ggerganov/llama.cpp

克隆完成之后我们进入llama.cpp目录中,对项目进行编译

cd llama.cpp
make -j
模型下载

在魔搭社区中下载模型运行
https://www.modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GGUF/files
本人下载的是qwen1_5-0_5b-chat-q5_k_m.gguf
终端运行,其中模型替换为自己的模型地址(官方给的-cml参数在help中没有找到,且影响运行,所以我删除掉了)
官方:

./main -m /path/to/local/dir/qwen/Qwen1.5-0.5B-Chat-GGUF/qwen1_5-0_5b-chat-q5_k_m.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

我运行:

./main -m /path/to/local/dir/qwen/Qwen1.5-0.5B-Chat-GGUF/qwen1_5-0_5b-chat-q5_k_m.gguf -n 512 --color -i -f prompts/chat-with-qwen.txt

help内容:

usage: ./main [options]general:-h,    --help, --usage          print usage and exit--version                show version and build info-v,    --verbose                print verbose information--verbosity N            set specific verbosity level (default: 0)--verbose-prompt         print a verbose prompt before generation (default: false)--no-display-prompt      don't print prompt at generation (default: false)-co,   --color                  colorise output to distinguish prompt and user input from generations (default: false)-s,    --seed SEED              RNG seed (default: -1, use random seed for < 0)-t,    --threads N              number of threads to use during generation (default: 8)-tb,   --threads-batch N        number of threads to use during batch and prompt processing (default: same as --threads)-td,   --threads-draft N        number of threads to use during generation (default: same as --threads)-tbd,  --threads-batch-draft N  number of threads to use during batch and prompt processing (default: same as --threads-draft)--draft N                number of tokens to draft for speculative decoding (default: 5)-ps,   --p-split N              speculative decoding split probability (default: 0.1)-lcs,  --lookup-cache-static FNAMEpath to static lookup cache to use for lookup decoding (not updated by generation)-lcd,  --lookup-cache-dynamic FNAMEpath to dynamic lookup cache to use for lookup decoding (updated by generation)-c,    --ctx-size N             size of the prompt context (default: 0, 0 = loaded from model)-n,    --predict N              number of tokens to predict (default: -1, -1 = infinity, -2 = until context filled)-b,    --batch-size N           logical maximum batch size (default: 2048)-ub,   --ubatch-size N          physical maximum batch size (default: 512)--keep N                 number of tokens to keep from the initial prompt (default: 0, -1 = all)--chunks N               max number of chunks to process (default: -1, -1 = all)-fa,   --flash-attn             enable Flash Attention (default: disabled)-p,    --prompt PROMPT          prompt to start generation with (default: '')-f,    --file FNAME             a file containing the prompt (default: none)--in-file FNAME          an input file (repeat to specify multiple files)-bf,   --binary-file FNAME      binary file containing the prompt (default: none)-e,    --escape                 process escapes sequences (\n, \r, \t, \', \", \\) (default: true)--no-escape              do not process escape sequences-ptc,  --print-token-count N    print token count every N tokens (default: -1)--prompt-cache FNAME     file to cache prompt state for faster startup (default: none)--prompt-cache-all       if specified, saves user input and generations to cache as wellnot supported with --interactive or other interactive options--prompt-cache-ro        if specified, uses the prompt cache but does not update it-r,    --reverse-prompt PROMPT  halt generation at PROMPT, return control in interactive modecan be specified more than once for multiple prompts-sp,   --special                special tokens output enabled (default: false)-cnv,  --conversation           run in conversation mode (does not print special tokens and suffix/prefix) (default: false)-i,    --interactive            run in interactive mode (default: false)-if,   --interactive-first      run in interactive mode and wait for input right away (default: false)-mli,  --multiline-input        allows you to write or paste multiple lines without ending each in '\'--in-prefix-bos          prefix BOS to user inputs, preceding the `--in-prefix` string--in-prefix STRING       string to prefix user inputs with (default: empty)--in-suffix STRING       string to suffix after user inputs with (default: empty)sampling:--samplers SAMPLERS      samplers that will be used for generation in the order, separated by ';'(default: top_k;tfs_z;typical_p;top_p;min_p;temperature)--sampling-seq SEQUENCE  simplified sequence for samplers that will be used (default: kfypmt)--ignore-eos             ignore end of stream token and continue generating (implies --logit-bias EOS-inf)--penalize-nl            penalize newline tokens (default: false)--temp N                 temperature (default: 0.8)--top-k N                top-k sampling (default: 40, 0 = disabled)--top-p N                top-p sampling (default: 0.9, 1.0 = disabled)--min-p N                min-p sampling (default: 0.1, 0.0 = disabled)--tfs N                  tail free sampling, parameter z (default: 1.0, 1.0 = disabled)--typical N              locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)--repeat-last-n N        last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)--repeat-penalty N       penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)--presence-penalty N     repeat alpha presence penalty (default: 0.0, 0.0 = disabled)--frequency-penalty N    repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)--dynatemp-range N       dynamic temperature range (default: 0.0, 0.0 = disabled)--dynatemp-exp N         dynamic temperature exponent (default: 1.0)--mirostat N             use Mirostat sampling.Top K, Nucleus, Tail Free and Locally Typical samplers are ignored if used.(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)--mirostat-lr N          Mirostat learning rate, parameter eta (default: 0.1)--mirostat-ent N         Mirostat target entropy, parameter tau (default: 5.0)-l TOKEN_ID(+/-)BIAS     modifies the likelihood of token appearing in the completion,i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',or `--logit-bias 15043-1` to decrease likelihood of token ' Hello'--cfg-negative-prompt PROMPTnegative prompt to use for guidance (default: '')--cfg-negative-prompt-file FNAMEnegative prompt file to use for guidance--cfg-scale N            strength of guidance (default: 1.0, 1.0 = disable)grammar:--grammar GRAMMAR        BNF-like grammar to constrain generations (see samples in grammars/ dir) (default: '')--grammar-file FNAME     file to read grammar from-j,    --json-schema SCHEMA     JSON schema to constrain generations (https://json-schema.org/), e.g. `{}` for any JSON objectFor schemas w/ external $refs, use --grammar + example/json_schema_to_grammar.py insteadembedding:--pooling {none,mean,cls}pooling type for embeddings, use model default if unspecifiedcontext hacking:--rope-scaling {none,linear,yarn}RoPE frequency scaling method, defaults to linear unless specified by the model--rope-scale N           RoPE context scaling factor, expands context by a factor of N--rope-freq-base N       RoPE base frequency, used by NTK-aware scaling (default: loaded from model)--rope-freq-scale N      RoPE frequency scaling factor, expands context by a factor of 1/N--yarn-orig-ctx N        YaRN: original context size of model (default: 0 = model training context size)--yarn-ext-factor N      YaRN: extrapolation mix factor (default: -1.0, 0.0 = full interpolation)--yarn-attn-factor N     YaRN: scale sqrt(t) or attention magnitude (default: 1.0)--yarn-beta-slow N       YaRN: high correction dim or alpha (default: 1.0)--yarn-beta-fast N       YaRN: low correction dim or beta (default: 32.0)-gan,  --grp-attn-n N           group-attention factor (default: 1)-gaw,  --grp-attn-w N           group-attention width (default: 512.0)-dkvc, --dump-kv-cache          verbose print of the KV cache-nkvo, --no-kv-offload          disable KV offload-ctk,  --cache-type-k TYPE      KV cache data type for K (default: f16)-ctv,  --cache-type-v TYPE      KV cache data type for V (default: f16)perplexity:--all-logits             return logits for all tokens in the batch (default: false)--hellaswag              compute HellaSwag score over random tasks from datafile supplied with -f--hellaswag-tasks N      number of tasks to use when computing the HellaSwag score (default: 400)--winogrande             compute Winogrande score over random tasks from datafile supplied with -f--winogrande-tasks N     number of tasks to use when computing the Winogrande score (default: 0)--multiple-choice        compute multiple choice score over random tasks from datafile supplied with -f--multiple-choice-tasks Nnumber of tasks to use when computing the multiple choice score (default: 0)--kl-divergence          computes KL-divergence to logits provided via --kl-divergence-base--ppl-stride N           stride for perplexity calculation (default: 0)--ppl-output-type {0,1}  output type for perplexity calculation (default: 0)parallel:-dt,   --defrag-thold N         KV cache defragmentation threshold (default: -1.0, < 0 - disabled)-np,   --parallel N             number of parallel sequences to decode (default: 1)-ns,   --sequences N            number of sequences to decode (default: 1)-cb,   --cont-batching          enable continuous batching (a.k.a dynamic batching) (default: enabled)multi-modality:--mmproj FILE            path to a multimodal projector file for LLaVA. see examples/llava/README.md--image FILE             path to an image file. use with multimodal models. Specify multiple times for batchingbackend:--rpc SERVERS            comma separated list of RPC servers--mlock                  force system to keep model in RAM rather than swapping or compressing--no-mmap                do not memory-map model (slower load but may reduce pageouts if not using mlock)--numa TYPE              attempt optimizations that help on some NUMA systems- distribute: spread execution evenly over all nodes- isolate: only spawn threads on CPUs on the node that execution started on- numactl: use the CPU map provided by numactlif run without this previously, it is recommended to drop the system page cache before using thissee https://github.com/ggerganov/llama.cpp/issues/1437model:--check-tensors          check model tensor data for invalid values (default: false)--override-kv KEY=TYPE:VALUEadvanced option to override model metadata by key. may be specified multiple times.types: int, float, bool, str. example: --override-kv tokenizer.ggml.add_bos_token=bool:false--lora FNAME             apply LoRA adapter (implies --no-mmap)--lora-scaled FNAME S    apply LoRA adapter with user defined scaling S (implies --no-mmap)--lora-base FNAME        optional model to use as a base for the layers modified by the LoRA adapter--control-vector FNAME   add a control vector--control-vector-scaled FNAME SCALEadd a control vector with user defined scaling SCALE--control-vector-layer-range START ENDlayer range to apply the control vector(s) to, start and end inclusive-m,    --model FNAME            model path (default: models/$filename with filename from --hf-fileor --model-url if set, otherwise models/7B/ggml-model-f16.gguf)-md,   --model-draft FNAME      draft model for speculative decoding (default: unused)-mu,   --model-url MODEL_URL    model download url (default: unused)-hfr,  --hf-repo REPO           Hugging Face model repository (default: unused)-hff,  --hf-file FILE           Hugging Face model file (default: unused)retrieval:--context-file FNAME     file to load context from (repeat to specify multiple files)--chunk-size N           minimum length of embedded text chunks (default: 64)--chunk-separator STRING separator between chunks (default: '')passkey:--junk N                 number of times to repeat the junk text (default: 250)--pos N                  position of the passkey in the junk text (default: -1)imatrix:-o,    --output FNAME           output file (default: 'imatrix.dat')--output-frequency N     output the imatrix every N iterations (default: 10)--save-frequency N       save an imatrix copy every N iterations (default: 0)--process-output         collect data for the output tensor (default: false)--no-ppl                 do not compute perplexity (default: true)--chunk N                start processing the input from chunk N (default: 0)bench:-pps                            is the prompt shared across parallel sequences (default: false)-npp n0,n1,...                  number of prompt tokens-ntg n0,n1,...                  number of text generation tokens-npl n0,n1,...                  number of parallel promptsserver:--host HOST              ip address to listen (default: 127.0.0.1)--port PORT              port to listen (default: 8080)--path PATH              path to serve static files from (default: )--embedding(s)           enable embedding endpoint (default: disabled)--api-key KEY            API key to use for authentication (default: none)--api-key-file FNAME     path to file containing API keys (default: none)--ssl-key-file FNAME     path to file a PEM-encoded SSL private key--ssl-cert-file FNAME    path to file a PEM-encoded SSL certificate--timeout N              server read/write timeout in seconds (default: 600)--threads-http N         number of threads used to process HTTP requests (default: -1)--system-prompt-file FNAMEset a file to load a system prompt (initial prompt of all slots), this is useful for chat applications--log-format {text,json} log output format: json or text (default: json)--metrics                enable prometheus compatible metrics endpoint (default: disabled)--no-slots               disables slots monitoring endpoint (default: enabled)--slot-save-path PATH    path to save slot kv cache (default: disabled)--chat-template JINJA_TEMPLATEset custom jinja chat template (default: template taken from model's metadata)only commonly used templates are accepted:https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template-sps,  --slot-prompt-similarity SIMILARITYhow much the prompt of a request must match the prompt of a slot in order to use that slot (default: 0.50, 0.0 = disabled)logging:--simple-io              use basic IO for better compatibility in subprocesses and limited consoles-ld,   --logdir LOGDIR          path under which to save YAML logs (no logging if unset)--log-test               Run simple logging test--log-disable            Disable trace logs--log-enable             Enable trace logs--log-file FNAME         Specify a log filename (without extension)--log-new                Create a separate new log file on start. Each log file will have unique name: "<name>.<ID>.log"--log-append             Don't truncate the old log file.

参考文章

(Qwen)通义千问大模型安装部署教程2024最新

这篇关于Ubuntu20.04配置qwen0.5B记录的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1049867

相关文章

IDEA中配置Tomcat全过程

《IDEA中配置Tomcat全过程》文章介绍了在IDEA中配置Tomcat的六步流程,包括添加服务器、配置部署选项、设置应用服务器及启动,并提及Maven依赖可能因约定大于配置导致问题,需检查依赖版本... 目录第一步第二步第三步第四步第五步第六步总结第一步选择这个方框第二步选择+号,找到Tomca

Win10安装Maven与环境变量配置过程

《Win10安装Maven与环境变量配置过程》本文介绍Maven的安装与配置方法,涵盖下载、环境变量设置、本地仓库及镜像配置,指导如何在IDEA中正确配置Maven,适用于Java及其他语言项目的构建... 目录Maven 是什么?一、下载二、安装三、配置环境四、验证测试五、配置本地仓库六、配置国内镜像地址

SpringBoot多环境配置数据读取方式

《SpringBoot多环境配置数据读取方式》SpringBoot通过环境隔离机制,支持properties/yaml/yml多格式配置,结合@Value、Environment和@Configura... 目录一、多环境配置的核心思路二、3种配置文件格式详解2.1 properties格式(传统格式)1.

java中pdf模版填充表单踩坑实战记录(itextPdf、openPdf、pdfbox)

《java中pdf模版填充表单踩坑实战记录(itextPdf、openPdf、pdfbox)》:本文主要介绍java中pdf模版填充表单踩坑的相关资料,OpenPDF、iText、PDFBox是三... 目录准备Pdf模版方法1:itextpdf7填充表单(1)加入依赖(2)代码(3)遇到的问题方法2:pd

Debian系和Redhat系防火墙配置方式

《Debian系和Redhat系防火墙配置方式》文章对比了Debian系UFW和Redhat系Firewalld防火墙的安装、启用禁用、端口管理、规则查看及注意事项,强调SSH端口需开放、规则持久化,... 目录Debian系UFW防火墙1. 安装2. 启用与禁用3. 基本命令4. 注意事项5. 示例配置R

PyCharm中配置PyQt的实现步骤

《PyCharm中配置PyQt的实现步骤》PyCharm是JetBrains推出的一款强大的PythonIDE,结合PyQt可以进行pythion高效开发桌面GUI应用程序,本文就来介绍一下PyCha... 目录1. 安装China编程PyQt1.PyQt 核心组件2. 基础 PyQt 应用程序结构3. 使用 Q

Redis MCP 安装与配置指南

《RedisMCP安装与配置指南》本文将详细介绍如何安装和配置RedisMCP,包括快速启动、源码安装、Docker安装、以及相关的配置参数和环境变量设置,感兴趣的朋友一起看看吧... 目录一、Redis MCP 简介二、安www.chinasem.cn装 Redis MCP 服务2.1 快速启动(推荐)2.

Spring Boot配置和使用两个数据源的实现步骤

《SpringBoot配置和使用两个数据源的实现步骤》本文详解SpringBoot配置双数据源方法,包含配置文件设置、Bean创建、事务管理器配置及@Qualifier注解使用,强调主数据源标记、代... 目录Spring Boot配置和使用两个数据源技术背景实现步骤1. 配置数据源信息2. 创建数据源Be

Spring Boot Maven 插件如何构建可执行 JAR 的核心配置

《SpringBootMaven插件如何构建可执行JAR的核心配置》SpringBoot核心Maven插件,用于生成可执行JAR/WAR,内置服务器简化部署,支持热部署、多环境配置及依赖管理... 目录前言一、插件的核心功能与目标1.1 插件的定位1.2 插件的 Goals(目标)1.3 插件定位1.4 核

RabbitMQ消息总线方式刷新配置服务全过程

《RabbitMQ消息总线方式刷新配置服务全过程》SpringCloudBus通过消息总线与MQ实现微服务配置统一刷新,结合GitWebhooks自动触发更新,避免手动重启,提升效率与可靠性,适用于配... 目录前言介绍环境准备代码示例测试验证总结前言介绍在微服务架构中,为了更方便的向微服务实例广播消息,