ubuntu18.04 下slowfast网络环境安装及模型测试( python3.9)

2023-10-24 12:50

本文主要是介绍ubuntu18.04 下slowfast网络环境安装及模型测试( python3.9),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

用pip 安装建议用国内源,如 pip install xxx -i https://pypi.tuna.tsinghua.edu.cn/simple

目录

1.conda env 环境创建

2. install pytorch 

3. install fvcore

4. install simplejson

5. gcc版本查看

6. PyAV

7.ffmpeg with PyAV

8. PyYaml , tqdm

9.iopath

10. psutil

11. opencv

12. tensorboard

13. moviepy

14. PyTorchVideo

15. Detectron2

16. FairScale

17. SlowFast

运行Demo测试模型

安装过程中遇到的一些errors

error0 

         error1

error2

error3

error4

error5

error6

error7


1.conda env 环境创建

conda create -n py39 python=3.9

2. install pytorch 

先查看cuda版本 , 再对应pytorch版本

查看系统nvidia驱动版本支持最高cuda版本

查看当前cuda版本

根据对应cuda版本安装pytorch torchvision

source activate py39
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch

3. install fvcore

pip install git+https://github.com/facebookresearch/fvcore

4. install simplejson

pip install simplejson 

5. gcc版本查看

gcc -v



版本是 7.5.0

6. PyAV

conda install av -c conda-forge

7.ffmpeg with PyAV

pip install av

8. PyYaml , tqdm

pip list fvcore

9.iopath

pip install -U iopath

10. psutil

pip install psutil

11. opencv

pip install opencv-python

12. tensorboard

查看是否安装tensorboard:

conda list tensorboard


没有安装tensorboard

pip install tensorboard

13. moviepy

pip install moviepy

14. PyTorchVideo

pip install pytorchvideo

15. Detectron2

git clone https://github.com/facebookresearch/detectron2 detectron2_repo

pip install -e detectron2_repo

16. FairScale

pip install git+https://github.com/facebookresearch/fairscale

17. SlowFast

git clone https://github.com/facebookresearch/SlowFast.git


cd SlowFast
python setup.py build develop

运行Demo测试模型

python3 tools/run_net.py --cfg demo/AVA/SLOWFAST_32x2_R101_50_50.yaml

安装过程中遇到的一些errors

error0 

not find PIL 

解决办法:将setup.py 中的 PIL 更改为 Pillow 

error1

from pytorchvideo.layers.distributed import ( # noqa
ImportError: cannot import name 'cat_all_gather' from 'pytorchvideo.layers.distributed' (/home/cxgk/anaconda3/envs/sf/lib/python3.9/site-packages/pytorchvideo/layers/distributed.py)

解决方式:

方式一:将pytorchvideo/pytorchvideo at main · facebookresearch/pytorchvideo · GitHub文件下内容复制到虚拟环境所对应的文件下,这里是:/home/cxgk/anaconda3/envs/sf/lib/python3.9/site-packages/pytorchvideo/

方式二:
layers/distributed.py添加如下内容

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved."""Distributed helpers."""import torch
import torch.distributed as dist
from torch._C._distributed_c10d import ProcessGroup
from torch.autograd.function import Function_LOCAL_PROCESS_GROUP = Nonedef get_world_size() -> int:"""Simple wrapper for correctly getting worldsize in both distributed/ non-distributed settings"""return (torch.distributed.get_world_size()if torch.distributed.is_available() and torch.distributed.is_initialized()else 1)def cat_all_gather(tensors, local=False):"""Performs the concatenated all_reduce operation on the provided tensors."""if local:gather_sz = get_local_size()else:gather_sz = torch.distributed.get_world_size()tensors_gather = [torch.ones_like(tensors) for _ in range(gather_sz)]torch.distributed.all_gather(tensors_gather,tensors,async_op=False,group=_LOCAL_PROCESS_GROUP if local else None,)output = torch.cat(tensors_gather, dim=0)return outputdef init_distributed_training(cfg):"""Initialize variables needed for distributed training."""if cfg.NUM_GPUS <= 1:returnnum_gpus_per_machine = cfg.NUM_GPUSnum_machines = dist.get_world_size() // num_gpus_per_machinefor i in range(num_machines):ranks_on_i = list(range(i * num_gpus_per_machine, (i + 1) * num_gpus_per_machine))pg = dist.new_group(ranks_on_i)if i == cfg.SHARD_ID:global _LOCAL_PROCESS_GROUP_LOCAL_PROCESS_GROUP = pgdef get_local_size() -> int:"""Returns:The size of the per-machine process group,i.e. the number of processes per machine."""if not dist.is_available():return 1if not dist.is_initialized():return 1return dist.get_world_size(group=_LOCAL_PROCESS_GROUP)def get_local_rank() -> int:"""Returns:The rank of the current process within the local (per-machine) process group."""if not dist.is_available():return 0if not dist.is_initialized():return 0assert _LOCAL_PROCESS_GROUP is not Nonereturn dist.get_rank(group=_LOCAL_PROCESS_GROUP)def get_local_process_group() -> ProcessGroup:assert _LOCAL_PROCESS_GROUP is not Nonereturn _LOCAL_PROCESS_GROUPclass GroupGather(Function):"""GroupGather performs all gather on each of the local process/ GPU groups."""@staticmethoddef forward(ctx, input, num_sync_devices, num_groups):"""Perform forwarding, gathering the stats across different process/ GPUgroup."""ctx.num_sync_devices = num_sync_devicesctx.num_groups = num_groupsinput_list = [torch.zeros_like(input) for k in range(get_local_size())]dist.all_gather(input_list, input, async_op=False, group=get_local_process_group())inputs = torch.stack(input_list, dim=0)if num_groups > 1:rank = get_local_rank()group_idx = rank // num_sync_devicesinputs = inputs[group_idx * num_sync_devices : (group_idx + 1) * num_sync_devices]inputs = torch.sum(inputs, dim=0)return inputs@staticmethoddef backward(ctx, grad_output):"""Perform backwarding, gathering the gradients across different process/ GPUgroup."""grad_output_list = [torch.zeros_like(grad_output) for k in range(get_local_size())]dist.all_gather(grad_output_list,grad_output,async_op=False,group=get_local_process_group(),)grads = torch.stack(grad_output_list, dim=0)if ctx.num_groups > 1:rank = get_local_rank()group_idx = rank // ctx.num_sync_devicesgrads = grads[group_idx* ctx.num_sync_devices : (group_idx + 1)* ctx.num_sync_devices]grads = torch.sum(grads, dim=0)return grads, None, None

error2

from scipy.ndimage import gaussian_filter

ModuleNotFoundError: No module named 'scipy'

解决方法:

pip install scipy

error3

from av._core import time_base, library_versions

ImportError: /home/cxgk/anaconda3/envs/sf/lib/python3.9/site-packages/av/../../.././libgnutls.so.30: symbol mpn_copyi version HOGWEED_6 not defined in file libhogweed.so.6 with link time reference
 

解决方法:

先移处av包

使用 pip安装


pip install av


error4

File "/media/cxgk/Linux/work/SlowFast/slowfast/models/losses.py", line 11, in
from pytorchvideo.losses.soft_target_cross_entropy import (
ModuleNotFoundError: No module named 'pytorchvideo.losses'

解决办法:

打开"/home/cxgk/anaconda3/envs/sf/lib/python3.9/site-packages/pytorchvideo/losses",在文件夹下新建 soft_target_cross_entropy.py, 并打开添加如下代码:

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.import torch
import torch.nn as nn
import torch.nn.functional as F
from pytorchvideo.layers.utils import set_attributes
from pytorchvideo.transforms.functional import convert_to_one_hotclass SoftTargetCrossEntropyLoss(nn.Module):"""Adapted from Classy Vision: ./classy_vision/losses/soft_target_cross_entropy_loss.py.This allows the targets for the cross entropy loss to be multi-label."""def __init__(self,ignore_index: int = -100,reduction: str = "mean",normalize_targets: bool = True,) -> None:"""Args:ignore_index (int): sample should be ignored for loss if the class is this value.reduction (str): specifies reduction to apply to the output.normalize_targets (bool): whether the targets should be normalized to a sum of 1based on the total count of positive targets for a given sample."""super().__init__()set_attributes(self, locals())assert isinstance(self.normalize_targets, bool)if self.reduction not in ["mean", "none"]:raise NotImplementedError('reduction type "{}" not implemented'.format(self.reduction))self.eps = torch.finfo(torch.float32).epsdef forward(self, input: torch.Tensor, target: torch.Tensor) -> torch.Tensor:"""Args:input (torch.Tensor): the shape of the tensor is N x C, where N is the number ofsamples and C is the number of classes. The tensor is raw input withoutsoftmax/sigmoid.target (torch.Tensor): the shape of the tensor is N x C or N. If the shape is N, wewill convert the target to one hot vectors."""# Check if targets are inputted as class integersif target.ndim == 1:assert (input.shape[0] == target.shape[0]), "SoftTargetCrossEntropyLoss requires input and target to have same batch size!"target = convert_to_one_hot(target.view(-1, 1), input.shape[1])assert input.shape == target.shape, ("SoftTargetCrossEntropyLoss requires input and target to be same "f"shape: {input.shape} != {target.shape}")# Samples where the targets are ignore_index do not contribute to the lossN, C = target.shapevalid_mask = torch.ones((N, 1), dtype=torch.float).to(input.device)if 0 <= self.ignore_index <= C - 1:drop_idx = target[:, self.ignore_idx] > 0valid_mask[drop_idx] = 0valid_targets = target.float() * valid_maskif self.normalize_targets:valid_targets /= self.eps + valid_targets.sum(dim=1, keepdim=True)per_sample_per_target_loss = -valid_targets * F.log_softmax(input, -1)per_sample_loss = torch.sum(per_sample_per_target_loss, -1)# Perform reductionif self.reduction == "mean":# Normalize based on the number of samples with > 0 non-ignored targetsloss = per_sample_loss.sum() / torch.sum((torch.sum(valid_mask, -1) > 0)).clamp(min=1)elif self.reduction == "none":loss = per_sample_lossreturn 

error5

from sklearn.metrics import confusion_matrix

ModuleNotFoundError: No module named 'sklearn'

解决办法:

pip install scikit-learn

error6

raise KeyError("Non-existent config key: {}".format(full_key))

KeyError: 'Non-existent config key: TENSORBOARD.MODEL_VIS.TOPK'

解决方法:

注释掉如下三行:

TENSORBOARD

MODEL_VIS

TOPK

error7

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.94 GiB total capacity; 2.83 GiB already allocated; 25.44 MiB free; 2.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

解决方法:

将yaml里的帧数改小:

DATA:
NUM_FRAMES: 16

Reference:

https://github.com/facebookresearch/pytorchvideo/blob/main/pytorchvideo

这篇关于ubuntu18.04 下slowfast网络环境安装及模型测试( python3.9)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/275294

相关文章

Linux系统中查询JDK安装目录的几种常用方法

《Linux系统中查询JDK安装目录的几种常用方法》:本文主要介绍Linux系统中查询JDK安装目录的几种常用方法,方法分别是通过update-alternatives、Java命令、环境变量及目... 目录方法 1:通过update-alternatives查询(推荐)方法 2:检查所有已安装的 JDK方

SQL Server安装时候没有中文选项的解决方法

《SQLServer安装时候没有中文选项的解决方法》用户安装SQLServer时界面全英文,无中文选项,通过修改安装设置中的国家或地区为中文中国,重启安装程序后界面恢复中文,解决了问题,对SQLSe... 你是不是在安装SQL Server时候发现安装界面和别人不同,并且无论如何都没有中文选项?这个问题也

2025版mysql8.0.41 winx64 手动安装详细教程

《2025版mysql8.0.41winx64手动安装详细教程》本文指导Windows系统下MySQL安装配置,包含解压、设置环境变量、my.ini配置、初始化密码获取、服务安装与手动启动等步骤,... 目录一、下载安装包二、配置环境变量三、安装配置四、启动 mysql 服务,修改密码一、下载安装包安装地

Redis MCP 安装与配置指南

《RedisMCP安装与配置指南》本文将详细介绍如何安装和配置RedisMCP,包括快速启动、源码安装、Docker安装、以及相关的配置参数和环境变量设置,感兴趣的朋友一起看看吧... 目录一、Redis MCP 简介二、安www.chinasem.cn装 Redis MCP 服务2.1 快速启动(推荐)2.

在macOS上安装jenv管理JDK版本的详细步骤

《在macOS上安装jenv管理JDK版本的详细步骤》jEnv是一个命令行工具,正如它的官网所宣称的那样,它是来让你忘记怎么配置JAVA_HOME环境变量的神队友,:本文主要介绍在macOS上安装... 目录前言安装 jenv添加 JDK 版本到 jenv切换 JDK 版本总结前言China编程在开发 Java

Linux下在线安装启动VNC教程

《Linux下在线安装启动VNC教程》本文指导在CentOS7上在线安装VNC,包含安装、配置密码、启动/停止、清理重启步骤及注意事项,强调需安装VNC桌面以避免黑屏,并解决端口冲突和目录权限问题... 目录描述安装VNC安装 VNC 桌面可能遇到的问题总结描js述linux中的VNC就类似于Window

Go语言编译环境设置教程

《Go语言编译环境设置教程》Go语言支持高并发(goroutine)、自动垃圾回收,编译为跨平台二进制文件,云原生兼容且社区活跃,开发便捷,内置测试与vet工具辅助检测错误,依赖模块化管理,提升开发效... 目录Go语言优势下载 Go  配置编译环境配置 GOPROXYIDE 设置(VS Code)一些基本

虚拟机Centos7安装MySQL数据库实践

《虚拟机Centos7安装MySQL数据库实践》用户分享在虚拟机安装MySQL的全过程及常见问题解决方案,包括处理GPG密钥、修改密码策略、配置远程访问权限及防火墙设置,最终通过关闭防火墙和停止Net... 目录安装mysql数据库下载wget命令下载MySQL安装包安装MySQL安装MySQL服务安装完成

Windows环境下解决Matplotlib中文字体显示问题的详细教程

《Windows环境下解决Matplotlib中文字体显示问题的详细教程》本文详细介绍了在Windows下解决Matplotlib中文显示问题的方法,包括安装字体、更新缓存、配置文件设置及编码調整,并... 目录引言问题分析解决方案详解1. 检查系统已安装字体2. 手动添加中文字体(以SimHei为例)步骤

JAVA中安装多个JDK的方法

《JAVA中安装多个JDK的方法》文章介绍了在Windows系统上安装多个JDK版本的方法,包括下载、安装路径修改、环境变量配置(JAVA_HOME和Path),并说明如何通过调整JAVA_HOME在... 首先去oracle官网下载好两个版本不同的jdk(需要登录Oracle账号,没有可以免费注册)下载完