AGI|无GPU也能畅行无阻!Ollama大模型本地运行教程

2024-03-26 11:36

本文主要是介绍AGI|无GPU也能畅行无阻!Ollama大模型本地运行教程,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

本文介绍了如何在无GPU环境下,通过安装Docker、Ollama、Anaconda并创建虚拟环境,实现大模型的本地运行。安装完成后,启动API服务并进行测试,确保模型的高效稳定运行。Ollama的本地部署方案为没有GPU资源的用户提供了便捷的大模型运行方案。

目录

一、实施步骤

安装Docker(可跳过)

安装Ollama

二、API服务

三、测试


一、实施步骤

系统推荐使用Linux,如果是Windows请使用WSL2(2虚拟了完整的Linux内核,相当于Linux)

安装Docker(可跳过)

#更新源
yum -y update
yum install -y yum-utils#添加源
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo#安装docker
yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin#启动docker
systemctl start docker#开机自启
systemctl enable docker#验证
docker --version
#Docker version 25.0.1, build 29cf629

安装Ollama

#启动 ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama#加载一个模型,这里以llama2为例
docker exec -itd ollama ollama run qwen:7b

安装Anaconda并创建虚拟环境(可跳过)

#进入安装目录
cd /opt#下载Anaconda,如果提示没有wget请安装一下
wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh#安装Anaconda
bash Anaconda3-2023.09-0-Linux-x86_64.sh#创建ollama虚拟环境
conda create -n ollama python=3.10#激活虚拟环境
conda activate ollama

二、API服务

ollama本身提供了API服务,但是流式处理有点问题,python版本的没问题,这里以一个api_demo为例对齐chatgpt的api。

代码来源:LLaMA-Factory/src/api_demo.py

# 安装依赖
pip install ollama sse_starlette fastapi# 创建api_demo.py 文件
touch api_demo.py
vi api_demo.py
python api_demo.py

import asyncio
import json
import os
from typing import Any, Dict, Sequenceimport ollama
from sse_starlette.sse import EventSourceResponse
from fastapi import FastAPI, HTTPException, status
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
import time
from enum import Enum, unique
from typing import List, Optionalfrom pydantic import BaseModel, Field
from typing_extensions import Literal@unique
class Role(str, Enum):USER = "user"ASSISTANT = "assistant"SYSTEM = "system"FUNCTION = "function"TOOL = "tool"OBSERVATION = "observation"@unique
class Finish(str, Enum):STOP = "stop"LENGTH = "length"TOOL = "tool_calls"class ModelCard(BaseModel):id: strobject: Literal["model"] = "model"created: int = Field(default_factory=lambda: int(time.time()))owned_by: Literal["owner"] = "owner"class ModelList(BaseModel):object: Literal["list"] = "list"data: List[ModelCard] = []class Function(BaseModel):name: strarguments: strclass FunctionCall(BaseModel):id: Literal["call_default"] = "call_default"type: Literal["function"] = "function"function: Functionclass ChatMessage(BaseModel):role: Rolecontent: strclass ChatCompletionMessage(BaseModel):role: Optional[Role] = Nonecontent: Optional[str] = Nonetool_calls: Optional[List[FunctionCall]] = Noneclass ChatCompletionRequest(BaseModel):model: strmessages: List[ChatMessage]tools: Optional[list] = []do_sample: bool = Truetemperature: Optional[float] = Nonetop_p: Optional[float] = Nonen: int = 1max_tokens: Optional[int] = Nonestream: bool = Falseclass ChatCompletionResponseChoice(BaseModel):index: intmessage: ChatCompletionMessagefinish_reason: Finishclass ChatCompletionResponseStreamChoice(BaseModel):index: intdelta: ChatCompletionMessagefinish_reason: Optional[Finish] = Noneclass ChatCompletionResponseUsage(BaseModel):prompt_tokens: intcompletion_tokens: inttotal_tokens: intclass ChatCompletionResponse(BaseModel):id: Literal["chatcmpl-default"] = "chatcmpl-default"object: Literal["chat.completion"] = "chat.completion"created: int = Field(default_factory=lambda: int(time.time()))model: strchoices: List[ChatCompletionResponseChoice]usage: ChatCompletionResponseUsageclass ChatCompletionStreamResponse(BaseModel):id: Literal["chatcmpl-default"] = "chatcmpl-default"object: Literal["chat.completion.chunk"] = "chat.completion.chunk"created: int = Field(default_factory=lambda: int(time.time()))model: strchoices: List[ChatCompletionResponseStreamChoice]class ScoreEvaluationRequest(BaseModel):model: strmessages: List[str]max_length: Optional[int] = Noneclass ScoreEvaluationResponse(BaseModel):id: Literal["scoreeval-default"] = "scoreeval-default"object: Literal["score.evaluation"] = "score.evaluation"model: strscores: List[float]def dictify(data: "BaseModel") -> Dict[str, Any]:try: # pydantic v2return data.model_dump(exclude_unset=True)except AttributeError: # pydantic v1return data.dict(exclude_unset=True)def jsonify(data: "BaseModel") -> str:try: # pydantic v2return json.dumps(data.model_dump(exclude_unset=True), ensure_ascii=False)except AttributeError: # pydantic v1return data.json(exclude_unset=True, ensure_ascii=False)def create_app() -> "FastAPI":app = FastAPI()app.add_middleware(CORSMiddleware,allow_origins=["*"],allow_credentials=True,allow_methods=["*"],allow_headers=["*"],)semaphore = asyncio.Semaphore(int(os.environ.get("MAX_CONCURRENT", 1)))@app.get("/v1/models", response_model=ModelList)async def list_models():model_card = ModelCard(id="gpt-3.5-turbo")return ModelList(data=[model_card])@app.post("/v1/chat/completions", response_model=ChatCompletionResponse, status_code=status.HTTP_200_OK)async def create_chat_completion(request: ChatCompletionRequest):if len(request.messages) == 0 or request.messages[-1].role not in [Role.USER, Role.TOOL]:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid length")messages = [dictify(message) for message in request.messages]if len(messages) and messages[0]["role"] == Role.SYSTEM:system = messages.pop(0)["content"]else:system = Noneif len(messages) % 2 == 0:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Only supports u/a/u/a/u...")for i in range(len(messages)):if i % 2 == 0 and messages[i]["role"] not in [Role.USER, Role.TOOL]:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")elif i % 2 == 1 and messages[i]["role"] not in [Role.ASSISTANT, Role.FUNCTION]:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")elif messages[i]["role"] == Role.TOOL:messages[i]["role"] = Role.OBSERVATIONtool_list = request.toolsif len(tool_list):try:tools = json.dumps([tool_list[0]["function"]], ensure_ascii=False)except Exception:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid tools")else:tools = ""async with semaphore:loop = asyncio.get_running_loop()return await loop.run_in_executor(None, chat_completion, messages, system, tools, request)def chat_completion(messages: Sequence[Dict[str, str]], system: str, tools: str, request: ChatCompletionRequest):if request.stream:generate = stream_chat_completion(messages, system, tools, request)return EventSourceResponse(generate, media_type="text/event-stream")responses = ollama.chat(model=request.model,messages=messages,options={"top_p": request.top_p,"temperature": request.temperature})prompt_length, response_length = 0, 0choices = []result = responses['message']['content']response_message = ChatCompletionMessage(role=Role.ASSISTANT, content=result)finish_reason = Finish.STOP if responses.get("done", False) == True else Finish.LENGTHchoices.append(ChatCompletionResponseChoice(index=0, message=response_message, finish_reason=finish_reason))prompt_length = -1response_length += -1usage = ChatCompletionResponseUsage(prompt_tokens=prompt_length,completion_tokens=response_length,total_tokens=prompt_length + response_length,)return ChatCompletionResponse(model=request.model, choices=choices, usage=usage)def stream_chat_completion(messages: Sequence[Dict[str, str]], system: str, tools: str, request: ChatCompletionRequest):choice_data = ChatCompletionResponseStreamChoice(index=0, delta=ChatCompletionMessage(role=Role.ASSISTANT, content=""), finish_reason=None)chunk = ChatCompletionStreamResponse(model=request.model, choices=[choice_data])yield jsonify(chunk)for new_text in ollama.chat(model=request.model,messages=messages,stream=True,options={"top_p": request.top_p,"temperature": request.temperature}):if len(new_text) == 0:continuechoice_data = ChatCompletionResponseStreamChoice(index=0, delta=ChatCompletionMessage(content=new_text['message']['content']), finish_reason=None)chunk = ChatCompletionStreamResponse(model=request.model, choices=[choice_data])yield jsonify(chunk)choice_data = ChatCompletionResponseStreamChoice(index=0, delta=ChatCompletionMessage(), finish_reason=Finish.STOP)chunk = ChatCompletionStreamResponse(model=request.model, choices=[choice_data])yield jsonify(chunk)yield "[DONE]"return appif __name__ == "__main__":app = create_app()uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("API_PORT", 8000)), workers=1)

三、测试

curl --location 'http://127.0.0.1:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{"model": "qwen:7b","messages": [{"role": "user", "content": "What is the OpenAI mission?"}],"stream": true,"temperature": 0.7,"top_p": 1
}'

经过测试,速度在8token/s左右。

以上就是本期全部内容,有疑问的小伙伴欢迎留言讨论~

作者:徐辉| 后端开发工程师

更多AI小知识欢迎关注“神州数码云基地”公众号,回复“AI与数字化转型”进入社群交流

版权声明:文章由神州数码武汉云基地团队实践整理输出,转载请注明出处。

这篇关于AGI|无GPU也能畅行无阻!Ollama大模型本地运行教程的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/848364

相关文章

JavaWeb项目创建、部署、连接数据库保姆级教程(tomcat)

《JavaWeb项目创建、部署、连接数据库保姆级教程(tomcat)》:本文主要介绍如何在IntelliJIDEA2020.1中创建和部署一个JavaWeb项目,包括创建项目、配置Tomcat服务... 目录简介:一、创建项目二、tomcat部署1、将tomcat解压在一个自己找得到路径2、在idea中添加

Python + Streamlit项目部署方案超详细教程(非Docker版)

《Python+Streamlit项目部署方案超详细教程(非Docker版)》Streamlit是一款强大的Python框架,专为机器学习及数据可视化打造,:本文主要介绍Python+St... 目录一、针对 Alibaba Cloud linux/Centos 系统的完整部署方案1. 服务器基础配置(阿里

Spring IOC核心原理详解与运用实战教程

《SpringIOC核心原理详解与运用实战教程》本文详细解析了SpringIOC容器的核心原理,包括BeanFactory体系、依赖注入机制、循环依赖解决和三级缓存机制,同时,介绍了SpringBo... 目录1. Spring IOC核心原理深度解析1.1 BeanFactory体系与内部结构1.1.1

SpringBoot集成iText快速生成PDF教程

《SpringBoot集成iText快速生成PDF教程》本文介绍了如何在SpringBoot项目中集成iText9.4.0生成PDF文档,包括新特性的介绍、环境准备、Service层实现、Contro... 目录SpringBoot集成iText 9.4.0生成PDF一、iText 9新特性与架构变革二、环

2025最新版Android Studio安装及组件配置教程(SDK、JDK、Gradle)

《2025最新版AndroidStudio安装及组件配置教程(SDK、JDK、Gradle)》:本文主要介绍2025最新版AndroidStudio安装及组件配置(SDK、JDK、Gradle... 目录原生 android 简介Android Studio必备组件一、Android Studio安装二、A

前端Visual Studio Code安装配置教程之下载、汉化、常用组件及基本操作

《前端VisualStudioCode安装配置教程之下载、汉化、常用组件及基本操作》VisualStudioCode是微软推出的一个强大的代码编辑器,功能强大,操作简单便捷,还有着良好的用户界面,... 目录一、Visual Studio Code下载二、汉化三、常用组件1、Auto Rename Tag2

JavaScript装饰器从基础到实战教程

《JavaScript装饰器从基础到实战教程》装饰器是js中一种声明式语法特性,用于在不修改原始代码的情况下,动态扩展类、方法、属性或参数的行为,本文将从基础概念入手,逐步讲解装饰器的类型、用法、进阶... 目录一、装饰器基础概念1.1 什么是装饰器?1.2 装饰器的语法1.3 装饰器的执行时机二、装饰器的

Java领域模型示例详解

《Java领域模型示例详解》本文介绍了Java领域模型(POJO/Entity/VO/DTO/BO)的定义、用途和区别,强调了它们在不同场景下的角色和使用场景,文章还通过一个流程示例展示了各模型如何协... 目录Java领域模型(POJO / Entity / VO/ DTO / BO)一、为什么需要领域模

MySQL 5.7彻底卸载与重新安装保姆级教程(附常见问题解决)

《MySQL5.7彻底卸载与重新安装保姆级教程(附常见问题解决)》:本文主要介绍MySQL5.7彻底卸载与重新安装保姆级教程的相关资料,步骤包括停止服务、卸载程序、删除文件和注册表项、清理环境... 目录一、彻底卸载旧版本mysql(核心步骤)二、MySQL 5.7重新安装与配置三、常见问题解决总结废话不多

深入理解Redis线程模型的原理及使用

《深入理解Redis线程模型的原理及使用》Redis的线程模型整体还是多线程的,只是后台执行指令的核心线程是单线程的,整个线程模型可以理解为还是以单线程为主,基于这种单线程为主的线程模型,不同客户端的... 目录1 Redis是单线程www.chinasem.cn还是多线程2 Redis如何保证指令原子性2.