VS2010 CUDA8.0 工程配置

2024-06-22 06:48
文章标签 配置 工程 vs2010 cuda8.0

本文主要是介绍VS2010 CUDA8.0 工程配置,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

1.打开vs2010,创建win32控制台项目,命名为cuda_test,点确定
这里写图片描述

2.勾选空项目
这里写图片描述

3.右键”源文件”目录,选择 “添加–新建项”
这里写图片描述

4.左侧选择NVIDIA CUDA8.0,在中间图标处选择CUDA/C file,命名为test
这里写图片描述

5.右键cuda项目–生成自定义

6.勾选CUDA8.0
这里写图片描述

7.右键刚才创建的test.cu文件.左侧选”配置属性–常规”,在右边选择项类型,下拉菜单中选CUDA
这里写图片描述

8.在test.cu文件中添加cuda代码(见文末)

9.右键cuda项目——属性——配置属性——CUDA C/C++——常规——附加包含目录,增加一项:
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\common\inc
这里写图片描述
该操作为是为了使工程找到所需的头文件

<helper_string.h>
<helper_cuda.h>
<helper_functions.h>

一般默认安装在路径
C:\ProgramData\NVIDIACorporation\CUDA Samples\v8.0\common\inc

10.在链接器–输入–附加依赖项,点击下拉编辑,添加一条cudart.lib,否则会报错无法生成项目
这里写图片描述

11.开始运行,如图所示则创建成功
这里写图片描述

附录:test.cu
(引用自cuda源码示例的vectorAdd)

/*** Copyright 1993-2015 NVIDIA Corporation.  All rights reserved.** Please refer to the NVIDIA end user license agreement (EULA) associated* with this source code for terms and conditions that govern your use of* this software. Any use, reproduction, disclosure, or distribution of* this software and related documentation outside the terms of the EULA* is strictly prohibited.**//*** Vector addition: C = A + B.** This sample is a very basic sample that implements element by element* vector addition. It is the same as the sample illustrating Chapter 2* of the programming guide with some additions like error checking.*/#include <stdio.h>// For the CUDA runtime routines (prefixed with "cuda_")
#include <cuda_runtime.h>#include <helper_cuda.h>
/*** CUDA Kernel Device code** Computes the vector addition of A and B into C. The 3 vectors have the same* number of elements numElements.*/
__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{int i = blockDim.x * blockIdx.x + threadIdx.x;if (i < numElements){C[i] = A[i] + B[i];}
}/*** Host main routine*/
int
main(void)
{// Error code to check return values for CUDA callscudaError_t err = cudaSuccess;// Print the vector length to be used, and compute its sizeint numElements = 50000;size_t size = numElements * sizeof(float);printf("[Vector addition of %d elements]\n", numElements);// Allocate the host input vector Afloat *h_A = (float *)malloc(size);// Allocate the host input vector Bfloat *h_B = (float *)malloc(size);// Allocate the host output vector Cfloat *h_C = (float *)malloc(size);// Verify that allocations succeededif (h_A == NULL || h_B == NULL || h_C == NULL){fprintf(stderr, "Failed to allocate host vectors!\n");exit(EXIT_FAILURE);}// Initialize the host input vectorsfor (int i = 0; i < numElements; ++i){h_A[i] = rand()/(float)RAND_MAX;h_B[i] = rand()/(float)RAND_MAX;}// Allocate the device input vector Afloat *d_A = NULL;err = cudaMalloc((void **)&d_A, size);if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector A (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Allocate the device input vector Bfloat *d_B = NULL;err = cudaMalloc((void **)&d_B, size);if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector B (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Allocate the device output vector Cfloat *d_C = NULL;err = cudaMalloc((void **)&d_C, size);if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector C (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Copy the host input vectors A and B in host memory to the device input vectors in// device memoryprintf("Copy input data from the host memory to the CUDA device\n");err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector B from host to device (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Launch the Vector Add CUDA Kernelint threadsPerBlock = 256;int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;printf("CUDA kernel launch with %d blocks of %d threads\n", blocksPerGrid, threadsPerBlock);vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements);err = cudaGetLastError();if (err != cudaSuccess){fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Copy the device result vector in device memory to the host result vector// in host memory.printf("Copy output data from the CUDA device to the host memory\n");err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Verify that the result vector is correctfor (int i = 0; i < numElements; ++i){if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5){fprintf(stderr, "Result verification failed at element %d!\n", i);exit(EXIT_FAILURE);}}printf("Test PASSED\n");// Free device global memoryerr = cudaFree(d_A);if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector A (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}err = cudaFree(d_B);if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector B (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}err = cudaFree(d_C);if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector C (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Free host memoryfree(h_A);free(h_B);free(h_C);printf("Done\n");return 0;
}

这篇关于VS2010 CUDA8.0 工程配置的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1083584

相关文章

mybatis映射器配置小结

《mybatis映射器配置小结》本文详解MyBatis映射器配置,重点讲解字段映射的三种解决方案(别名、自动驼峰映射、resultMap),文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定... 目录select中字段的映射问题使用SQL语句中的别名功能使用mapUnderscoreToCame

Linux下MySQL数据库定时备份脚本与Crontab配置教学

《Linux下MySQL数据库定时备份脚本与Crontab配置教学》在生产环境中,数据库是核心资产之一,定期备份数据库可以有效防止意外数据丢失,本文将分享一份MySQL定时备份脚本,并讲解如何通过cr... 目录备份脚本详解脚本功能说明授权与可执行权限使用 Crontab 定时执行编辑 Crontab添加定

Java使用jar命令配置服务器端口的完整指南

《Java使用jar命令配置服务器端口的完整指南》本文将详细介绍如何使用java-jar命令启动应用,并重点讲解如何配置服务器端口,同时提供一个实用的Web工具来简化这一过程,希望对大家有所帮助... 目录1. Java Jar文件简介1.1 什么是Jar文件1.2 创建可执行Jar文件2. 使用java

SpringBoot 多环境开发实战(从配置、管理与控制)

《SpringBoot多环境开发实战(从配置、管理与控制)》本文详解SpringBoot多环境配置,涵盖单文件YAML、多文件模式、MavenProfile分组及激活策略,通过优先级控制灵活切换环境... 目录一、多环境开发基础(单文件 YAML 版)(一)配置原理与优势(二)实操示例二、多环境开发多文件版

Vite 打包目录结构自定义配置小结

《Vite打包目录结构自定义配置小结》在Vite工程开发中,默认打包后的dist目录资源常集中在asset目录下,不利于资源管理,本文基于Rollup配置原理,本文就来介绍一下通过Vite配置自定义... 目录一、实现原理二、具体配置步骤1. 基础配置文件2. 配置说明(1)js 资源分离(2)非 JS 资

MySQL8 密码强度评估与配置详解

《MySQL8密码强度评估与配置详解》MySQL8默认启用密码强度插件,实施MEDIUM策略(长度8、含数字/字母/特殊字符),支持动态调整与配置文件设置,推荐使用STRONG策略并定期更新密码以提... 目录一、mysql 8 密码强度评估机制1.核心插件:validate_password2.密码策略级

ShardingProxy读写分离之原理、配置与实践过程

《ShardingProxy读写分离之原理、配置与实践过程》ShardingProxy是ApacheShardingSphere的数据库中间件,通过三层架构实现读写分离,解决高并发场景下数据库性能瓶... 目录一、ShardingProxy技术定位与读写分离核心价值1.1 技术定位1.2 读写分离核心价值二

QT Creator配置Kit的实现示例

《QTCreator配置Kit的实现示例》本文主要介绍了使用Qt5.12.12与VS2022时,因MSVC编译器版本不匹配及WindowsSDK缺失导致配置错误的问题解决,感兴趣的可以了解一下... 目录0、背景:qt5.12.12+vs2022一、症状:二、原因:(可以跳过,直奔后面的解决方法)三、解决方

SpringBoot路径映射配置的实现步骤

《SpringBoot路径映射配置的实现步骤》本文介绍了如何在SpringBoot项目中配置路径映射,使得除static目录外的资源可被访问,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一... 目录SpringBoot路径映射补:springboot 配置虚拟路径映射 @RequestMapp

Nginx中配置使用非默认80端口进行服务的完整指南

《Nginx中配置使用非默认80端口进行服务的完整指南》在实际生产环境中,我们经常需要将Nginx配置在其他端口上运行,本文将详细介绍如何在Nginx中配置使用非默认端口进行服务,希望对大家有所帮助... 目录一、为什么需要使用非默认端口二、配置Nginx使用非默认端口的基本方法2.1 修改listen指令