AGI|无GPU也能畅行无阻!Ollama大模型本地运行教程

2024-03-26 11:36

本文主要是介绍AGI|无GPU也能畅行无阻!Ollama大模型本地运行教程,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

本文介绍了如何在无GPU环境下,通过安装Docker、Ollama、Anaconda并创建虚拟环境,实现大模型的本地运行。安装完成后,启动API服务并进行测试,确保模型的高效稳定运行。Ollama的本地部署方案为没有GPU资源的用户提供了便捷的大模型运行方案。

目录

一、实施步骤

安装Docker(可跳过)

安装Ollama

二、API服务

三、测试


一、实施步骤

系统推荐使用Linux,如果是Windows请使用WSL2(2虚拟了完整的Linux内核,相当于Linux)

安装Docker(可跳过)

#更新源
yum -y update
yum install -y yum-utils#添加源
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo#安装docker
yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin#启动docker
systemctl start docker#开机自启
systemctl enable docker#验证
docker --version
#Docker version 25.0.1, build 29cf629

安装Ollama

#启动 ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama#加载一个模型,这里以llama2为例
docker exec -itd ollama ollama run qwen:7b

安装Anaconda并创建虚拟环境(可跳过)

#进入安装目录
cd /opt#下载Anaconda,如果提示没有wget请安装一下
wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh#安装Anaconda
bash Anaconda3-2023.09-0-Linux-x86_64.sh#创建ollama虚拟环境
conda create -n ollama python=3.10#激活虚拟环境
conda activate ollama

二、API服务

ollama本身提供了API服务,但是流式处理有点问题,python版本的没问题,这里以一个api_demo为例对齐chatgpt的api。

代码来源:LLaMA-Factory/src/api_demo.py

# 安装依赖
pip install ollama sse_starlette fastapi# 创建api_demo.py 文件
touch api_demo.py
vi api_demo.py
python api_demo.py

import asyncio
import json
import os
from typing import Any, Dict, Sequenceimport ollama
from sse_starlette.sse import EventSourceResponse
from fastapi import FastAPI, HTTPException, status
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
import time
from enum import Enum, unique
from typing import List, Optionalfrom pydantic import BaseModel, Field
from typing_extensions import Literal@unique
class Role(str, Enum):USER = "user"ASSISTANT = "assistant"SYSTEM = "system"FUNCTION = "function"TOOL = "tool"OBSERVATION = "observation"@unique
class Finish(str, Enum):STOP = "stop"LENGTH = "length"TOOL = "tool_calls"class ModelCard(BaseModel):id: strobject: Literal["model"] = "model"created: int = Field(default_factory=lambda: int(time.time()))owned_by: Literal["owner"] = "owner"class ModelList(BaseModel):object: Literal["list"] = "list"data: List[ModelCard] = []class Function(BaseModel):name: strarguments: strclass FunctionCall(BaseModel):id: Literal["call_default"] = "call_default"type: Literal["function"] = "function"function: Functionclass ChatMessage(BaseModel):role: Rolecontent: strclass ChatCompletionMessage(BaseModel):role: Optional[Role] = Nonecontent: Optional[str] = Nonetool_calls: Optional[List[FunctionCall]] = Noneclass ChatCompletionRequest(BaseModel):model: strmessages: List[ChatMessage]tools: Optional[list] = []do_sample: bool = Truetemperature: Optional[float] = Nonetop_p: Optional[float] = Nonen: int = 1max_tokens: Optional[int] = Nonestream: bool = Falseclass ChatCompletionResponseChoice(BaseModel):index: intmessage: ChatCompletionMessagefinish_reason: Finishclass ChatCompletionResponseStreamChoice(BaseModel):index: intdelta: ChatCompletionMessagefinish_reason: Optional[Finish] = Noneclass ChatCompletionResponseUsage(BaseModel):prompt_tokens: intcompletion_tokens: inttotal_tokens: intclass ChatCompletionResponse(BaseModel):id: Literal["chatcmpl-default"] = "chatcmpl-default"object: Literal["chat.completion"] = "chat.completion"created: int = Field(default_factory=lambda: int(time.time()))model: strchoices: List[ChatCompletionResponseChoice]usage: ChatCompletionResponseUsageclass ChatCompletionStreamResponse(BaseModel):id: Literal["chatcmpl-default"] = "chatcmpl-default"object: Literal["chat.completion.chunk"] = "chat.completion.chunk"created: int = Field(default_factory=lambda: int(time.time()))model: strchoices: List[ChatCompletionResponseStreamChoice]class ScoreEvaluationRequest(BaseModel):model: strmessages: List[str]max_length: Optional[int] = Noneclass ScoreEvaluationResponse(BaseModel):id: Literal["scoreeval-default"] = "scoreeval-default"object: Literal["score.evaluation"] = "score.evaluation"model: strscores: List[float]def dictify(data: "BaseModel") -> Dict[str, Any]:try: # pydantic v2return data.model_dump(exclude_unset=True)except AttributeError: # pydantic v1return data.dict(exclude_unset=True)def jsonify(data: "BaseModel") -> str:try: # pydantic v2return json.dumps(data.model_dump(exclude_unset=True), ensure_ascii=False)except AttributeError: # pydantic v1return data.json(exclude_unset=True, ensure_ascii=False)def create_app() -> "FastAPI":app = FastAPI()app.add_middleware(CORSMiddleware,allow_origins=["*"],allow_credentials=True,allow_methods=["*"],allow_headers=["*"],)semaphore = asyncio.Semaphore(int(os.environ.get("MAX_CONCURRENT", 1)))@app.get("/v1/models", response_model=ModelList)async def list_models():model_card = ModelCard(id="gpt-3.5-turbo")return ModelList(data=[model_card])@app.post("/v1/chat/completions", response_model=ChatCompletionResponse, status_code=status.HTTP_200_OK)async def create_chat_completion(request: ChatCompletionRequest):if len(request.messages) == 0 or request.messages[-1].role not in [Role.USER, Role.TOOL]:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid length")messages = [dictify(message) for message in request.messages]if len(messages) and messages[0]["role"] == Role.SYSTEM:system = messages.pop(0)["content"]else:system = Noneif len(messages) % 2 == 0:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Only supports u/a/u/a/u...")for i in range(len(messages)):if i % 2 == 0 and messages[i]["role"] not in [Role.USER, Role.TOOL]:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")elif i % 2 == 1 and messages[i]["role"] not in [Role.ASSISTANT, Role.FUNCTION]:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")elif messages[i]["role"] == Role.TOOL:messages[i]["role"] = Role.OBSERVATIONtool_list = request.toolsif len(tool_list):try:tools = json.dumps([tool_list[0]["function"]], ensure_ascii=False)except Exception:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid tools")else:tools = ""async with semaphore:loop = asyncio.get_running_loop()return await loop.run_in_executor(None, chat_completion, messages, system, tools, request)def chat_completion(messages: Sequence[Dict[str, str]], system: str, tools: str, request: ChatCompletionRequest):if request.stream:generate = stream_chat_completion(messages, system, tools, request)return EventSourceResponse(generate, media_type="text/event-stream")responses = ollama.chat(model=request.model,messages=messages,options={"top_p": request.top_p,"temperature": request.temperature})prompt_length, response_length = 0, 0choices = []result = responses['message']['content']response_message = ChatCompletionMessage(role=Role.ASSISTANT, content=result)finish_reason = Finish.STOP if responses.get("done", False) == True else Finish.LENGTHchoices.append(ChatCompletionResponseChoice(index=0, message=response_message, finish_reason=finish_reason))prompt_length = -1response_length += -1usage = ChatCompletionResponseUsage(prompt_tokens=prompt_length,completion_tokens=response_length,total_tokens=prompt_length + response_length,)return ChatCompletionResponse(model=request.model, choices=choices, usage=usage)def stream_chat_completion(messages: Sequence[Dict[str, str]], system: str, tools: str, request: ChatCompletionRequest):choice_data = ChatCompletionResponseStreamChoice(index=0, delta=ChatCompletionMessage(role=Role.ASSISTANT, content=""), finish_reason=None)chunk = ChatCompletionStreamResponse(model=request.model, choices=[choice_data])yield jsonify(chunk)for new_text in ollama.chat(model=request.model,messages=messages,stream=True,options={"top_p": request.top_p,"temperature": request.temperature}):if len(new_text) == 0:continuechoice_data = ChatCompletionResponseStreamChoice(index=0, delta=ChatCompletionMessage(content=new_text['message']['content']), finish_reason=None)chunk = ChatCompletionStreamResponse(model=request.model, choices=[choice_data])yield jsonify(chunk)choice_data = ChatCompletionResponseStreamChoice(index=0, delta=ChatCompletionMessage(), finish_reason=Finish.STOP)chunk = ChatCompletionStreamResponse(model=request.model, choices=[choice_data])yield jsonify(chunk)yield "[DONE]"return appif __name__ == "__main__":app = create_app()uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("API_PORT", 8000)), workers=1)

三、测试

curl --location 'http://127.0.0.1:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{"model": "qwen:7b","messages": [{"role": "user", "content": "What is the OpenAI mission?"}],"stream": true,"temperature": 0.7,"top_p": 1
}'

经过测试,速度在8token/s左右。

以上就是本期全部内容,有疑问的小伙伴欢迎留言讨论~

作者:徐辉| 后端开发工程师

更多AI小知识欢迎关注“神州数码云基地”公众号,回复“AI与数字化转型”进入社群交流

版权声明:文章由神州数码武汉云基地团队实践整理输出,转载请注明出处。

这篇关于AGI|无GPU也能畅行无阻!Ollama大模型本地运行教程的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/848364

相关文章

SpringBoot集成redisson实现延时队列教程

《SpringBoot集成redisson实现延时队列教程》文章介绍了使用Redisson实现延迟队列的完整步骤,包括依赖导入、Redis配置、工具类封装、业务枚举定义、执行器实现、Bean创建、消费... 目录1、先给项目导入Redisson依赖2、配置redis3、创建 RedissonConfig 配

基于C#实现PDF转图片的详细教程

《基于C#实现PDF转图片的详细教程》在数字化办公场景中,PDF文件的可视化处理需求日益增长,本文将围绕Spire.PDFfor.NET这一工具,详解如何通过C#将PDF转换为JPG、PNG等主流图片... 目录引言一、组件部署二、快速入门:PDF 转图片的核心 C# 代码三、分辨率设置 - 清晰度的决定因

Java Scanner类解析与实战教程

《JavaScanner类解析与实战教程》JavaScanner类(java.util包)是文本输入解析工具,支持基本类型和字符串读取,基于Readable接口与正则分隔符实现,适用于控制台、文件输... 目录一、核心设计与工作原理1.底层依赖2.解析机制A.核心逻辑基于分隔符(delimiter)和模式匹

使用Spring Cache本地缓存示例代码

《使用SpringCache本地缓存示例代码》缓存是提高应用程序性能的重要手段,通过将频繁访问的数据存储在内存中,可以减少数据库访问次数,从而加速数据读取,:本文主要介绍使用SpringCac... 目录一、Spring Cache简介核心特点:二、基础配置1. 添加依赖2. 启用缓存3. 缓存配置方案方案

使用Java读取本地文件并转换为MultipartFile对象的方法

《使用Java读取本地文件并转换为MultipartFile对象的方法》在许多JavaWeb应用中,我们经常会遇到将本地文件上传至服务器或其他系统的需求,在这种场景下,MultipartFile对象非... 目录1. 基本需求2. 自定义 MultipartFile 类3. 实现代码4. 代码解析5. 自定

spring AMQP代码生成rabbitmq的exchange and queue教程

《springAMQP代码生成rabbitmq的exchangeandqueue教程》使用SpringAMQP代码直接创建RabbitMQexchange和queue,并确保绑定关系自动成立,简... 目录spring AMQP代码生成rabbitmq的exchange and 编程queue执行结果总结s

Java实现本地缓存的四种方法实现与对比

《Java实现本地缓存的四种方法实现与对比》本地缓存的优点就是速度非常快,没有网络消耗,本地缓存比如caffine,guavacache这些都是比较常用的,下面我们来看看这四种缓存的具体实现吧... 目录1、HashMap2、Guava Cache3、Caffeine4、Encache本地缓存比如 caff

python使用Akshare与Streamlit实现股票估值分析教程(图文代码)

《python使用Akshare与Streamlit实现股票估值分析教程(图文代码)》入职测试中的一道题,要求:从Akshare下载某一个股票近十年的财务报表包括,资产负债表,利润表,现金流量表,保存... 目录一、前言二、核心知识点梳理1、Akshare数据获取2、Pandas数据处理3、Matplotl

Python pandas库自学超详细教程

《Pythonpandas库自学超详细教程》文章介绍了Pandas库的基本功能、安装方法及核心操作,涵盖数据导入(CSV/Excel等)、数据结构(Series、DataFrame)、数据清洗、转换... 目录一、什么是Pandas库(1)、Pandas 应用(2)、Pandas 功能(3)、数据结构二、安

2025版mysql8.0.41 winx64 手动安装详细教程

《2025版mysql8.0.41winx64手动安装详细教程》本文指导Windows系统下MySQL安装配置,包含解压、设置环境变量、my.ini配置、初始化密码获取、服务安装与手动启动等步骤,... 目录一、下载安装包二、配置环境变量三、安装配置四、启动 mysql 服务,修改密码一、下载安装包安装地