推荐算法之矩阵分解实例

2024-06-23 22:08

本文主要是介绍推荐算法之矩阵分解实例,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

矩阵分解的数据利用的上篇文章的数据,协同过滤

用到的知识

python的surprise
k折交叉验证
SVD
SVDpp
NMF

算法与结果可视化

# 可以使用上面提到的各种推荐系统算法
from surprise import SVD,SVDpp,NMF
from surprise import Dataset
from surprise import  print_perf
import os
from surprise import Reader, Dataset
from surprise.model_selection import cross_validate
from pandas import DataFrame 
import numpy as np
import pandas as pd##################SVD_noBiased 
## 指定文件路径
file_path = os.path.expanduser('./python_data.txt')
## 指定文件格式\n",
reader = Reader(line_format='user item rating timestamp', sep=',')
## 从文件读取数据
data = Dataset.load_from_file(file_path, reader=reader)# 在数据集上测试一下效果
#perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
# Run 5-fold cross-validation and print results.
perf1 = cross_validate(SVD(n_factors=1,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf2 = cross_validate(SVD(n_factors=3,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf3 = cross_validate(SVD(n_factors=5,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf4 = cross_validate(SVD(n_factors=7,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf5 = cross_validate(SVD(n_factors=9,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf6 = cross_validate(SVD(n_factors=11,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf7 = cross_validate(SVD(n_factors=13,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf8 = cross_validate(SVD(n_factors=15,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf9 = cross_validate(SVD(n_factors=17,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf10= cross_validate(SVD(n_factors=19,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf11= cross_validate(SVD(n_factors=21,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf12= cross_validate(SVD(n_factors=23,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf13= cross_validate(SVD(n_factors=25,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf14= cross_validate(SVD(n_factors=27,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf15= cross_validate(SVD(n_factors=29,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)perf_result=[]
for i in range(1,16):perf_result.append('perf'+ str(i)) 
MAE=[]
for perf in perf_result:MAE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,1]),4))RMSE=[]
for perf in perf_result:RMSE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,2]),4))FIT_TIME=[]
for perf in perf_result:FIT_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,0]),4))TEST_TIME=[]
for perf in perf_result:TEST_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,3]),4))MAE = DataFrame(MAE,columns=['MAE'])
RMSE = DataFrame(RMSE,columns=['RMSE'])
FIT_TIME = DataFrame(FIT_TIME,columns=['FIT_TIME'])
TEST_TIME = DataFrame(TEST_TIME,columns=['TEST_TIME'])
Factors = DataFrame([1,3,5,7,9,11,13,15,17,19,21,23,25,27,29],columns=['Factors'])SVD_noBaised_result = pd.concat([Factors,MAE,RMSE,FIT_TIME,TEST_TIME],axis=1)
SVD_noBaised_result.to_csv('./result_data/SVD_noBaised_result.csv',header=True,encoding='utf-8')
##################SVD_biased
# Run 5-fold cross-validation and print results.
perf01 = cross_validate(SVD(n_factors=1,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf02 = cross_validate(SVD(n_factors=3,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf03 = cross_validate(SVD(n_factors=5,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf04 = cross_validate(SVD(n_factors=7,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf05 = cross_validate(SVD(n_factors=9,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf06 = cross_validate(SVD(n_factors=11,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf07 = cross_validate(SVD(n_factors=13,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf08 = cross_validate(SVD(n_factors=15,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf09 = cross_validate(SVD(n_factors=17,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf010= cross_validate(SVD(n_factors=19,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf011= cross_validate(SVD(n_factors=21,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf012= cross_validate(SVD(n_factors=23,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf013= cross_validate(SVD(n_factors=25,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf014= cross_validate(SVD(n_factors=27,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf015= cross_validate(SVD(n_factors=29,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)perf_result1=[]
for i in range(1,16):perf_result1.append('perf0'+ str(i)) 
MAE=[]
for perf in perf_result1:MAE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,1]),4))RMSE=[]
for perf in perf_result1:RMSE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,2]),4))FIT_TIME=[]
for perf in perf_result1:FIT_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,0]),4))TEST_TIME=[]
for perf in perf_result1:TEST_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,3]),4))MAE = DataFrame(MAE,columns=['MAE'])
RMSE = DataFrame(RMSE,columns=['RMSE'])
FIT_TIME = DataFrame(FIT_TIME,columns=['FIT_TIME'])
TEST_TIME = DataFrame(TEST_TIME,columns=['TEST_TIME'])
Factors = DataFrame([1,3,5,7,9,11,13,15,17,19,21,23,25,27,29],columns=['Factors'])SVD_baised_result = pd.concat([Factors,MAE,RMSE,FIT_TIME,TEST_TIME],axis=1)
SVD_baised_result.to_csv('./result_data/SVD_baised_result.csv',header=True,encoding='utf-8')##############SVD++# Run 5-fold cross-validation and print results.
perf001 = cross_validate(SVDpp(n_factors=1),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf002 = cross_validate(SVDpp(n_factors=3),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf003 = cross_validate(SVDpp(n_factors=5),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf004 = cross_validate(SVDpp(n_factors=7),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf005 = cross_validate(SVDpp(n_factors=9),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf006 = cross_validate(SVDpp(n_factors=11),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf007 = cross_validate(SVDpp(n_factors=13),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf008 = cross_validate(SVDpp(n_factors=15),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf009 = cross_validate(SVDpp(n_factors=17),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0010= cross_validate(SVDpp(n_factors=19),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0011= cross_validate(SVDpp(n_factors=21),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0012= cross_validate(SVDpp(n_factors=23),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0013= cross_validate(SVDpp(n_factors=25),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0014= cross_validate(SVDpp(n_factors=27),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0015= cross_validate(SVDpp(n_factors=29),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)perf_result2=[]
for i in range(1,16):perf_result2.append('perf00'+ str(i)) 
MAE=[]
for perf in perf_result2:MAE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,1]),4))RMSE=[]
for perf in perf_result2:RMSE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,2]),4))FIT_TIME=[]
for perf in perf_result2:FIT_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,0]),4))TEST_TIME=[]
for perf in perf_result2:TEST_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,3]),4))MAE = DataFrame(MAE,columns=['MAE'])
RMSE = DataFrame(RMSE,columns=['RMSE'])
FIT_TIME = DataFrame(FIT_TIME,columns=['FIT_TIME'])
TEST_TIME = DataFrame(TEST_TIME,columns=['TEST_TIME'])
Factors = DataFrame([1,3,5,7,9,11,13,15,17,19,21,23,25,27,29],columns=['Factors'])SVDpp_result = pd.concat([Factors,MAE,RMSE,FIT_TIME,TEST_TIME],axis=1)
SVDpp_result.to_csv('./result_data/SVDpp_result.csv',header=True,encoding='utf-8')######## NMF# Run 5-fold cross-validation and print results.
perf0001 = cross_validate(NMF(n_factors=1,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0002 = cross_validate(NMF(n_factors=3,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0003 = cross_validate(NMF(n_factors=5,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0004 = cross_validate(NMF(n_factors=7,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0005 = cross_validate(NMF(n_factors=9,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0006 = cross_validate(NMF(n_factors=11,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0007 = cross_validate(NMF(n_factors=13,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0008 = cross_validate(NMF(n_factors=15,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0009 = cross_validate(NMF(n_factors=17,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00010= cross_validate(NMF(n_factors=19,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00011= cross_validate(NMF(n_factors=21,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00012= cross_validate(NMF(n_factors=23,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00013= cross_validate(NMF(n_factors=25,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00014= cross_validate(NMF(n_factors=27,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00015= cross_validate(NMF(n_factors=29,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)perf_result3=[]
for i in range(1,16):perf_result3.append('perf000'+ str(i)) 
MAE=[]
for perf in perf_result3:MAE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,1]),4))RMSE=[]
for perf in perf_result3:RMSE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,2]),4))FIT_TIME=[]
for perf in perf_result3:FIT_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,0]),4))TEST_TIME=[]
for perf in perf_result3:TEST_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,3]),4))MAE = DataFrame(MAE,columns=['MAE'])
RMSE = DataFrame(RMSE,columns=['RMSE'])
FIT_TIME = DataFrame(FIT_TIME,columns=['FIT_TIME'])
TEST_TIME = DataFrame(TEST_TIME,columns=['TEST_TIME'])
Factors = DataFrame([1,3,5,7,9,11,13,15,17,19,21,23,25,27,29],columns=['Factors'])NMF_result = pd.concat([Factors,MAE,RMSE,FIT_TIME,TEST_TIME],axis=1)
NMF_result.to_csv('./result_data/NMF_result.csv',header=True,encoding='utf-8')
#########################################################################################################################
####################################对SVD的可视化SVD_noBaised_result <- read.csv('SVD_noBaised_result1.csv',encoding = 'utf-8')
SVD_baised_result <- read.csv('SVD_baised_result1.csv',encoding = 'utf-8')
SVDpp_result <- read.csv('SVDpp_result1.csv',encoding = 'utf-8')
NMF_result <- read.csv('NMF_result1.csv',encoding = 'utf-8')SVD_noBiased_result <- as.data.table(SVD_noBaised_result)
SVD_biased_result <- as.data.table(SVD_baised_result)
SVDpp_result <- as.data.table(SVDpp_result)
NMF_result <- as.data.table(NMF_result)SVD_noBiased_result <- SVD_noBiased_result[,SVD_class:='SVD_noBiased']
SVD_biased_result <- SVD_biased_result[,SVD_class:='SVD_biased']
SVDpp_result <- SVDpp_result[,SVD_class:='SVD++']
NMF_result <- NMF_result[,SVD_class:='NMF']merge_SVD_result <- rbind(SVD_noBiased_result,SVD_biased_result,SVDpp_result,NMF_result)
merge_SVD_result <- merge_SVD_result[,-1]# plot resultcolour <- c('#34495e','#3498db','#2ecc71','#f1c40f','#e74c3c','#9b59b6','#1abc9c')
mycol <- define_palette(swatch = colour,gradient = c(lower=colour[1L],upper=colour[2L]))
ggthemr(mycol)p01 <- ggplot(data= merge_SVD_result, aes(x=Factors, y= MAE,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="top",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p02 <- ggplot(data= merge_SVD_result, aes(x=Factors, y= RMSE,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="top",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p1 <- ggplot(data= merge_SVD_result, aes(x=Factors, y= FIT_TIME,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="top",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p2 <- ggplot(data= merge_SVD_result, aes(x=Factors, y= TEST_TIME,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="top",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p3 <- ggplot(data= filter(merge_SVD_result,SVD_class!='SVD++'), aes(x=Factors, y= FIT_TIME,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="none",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p4 <- ggplot(data= filter(merge_SVD_result,SVD_class!='SVD++'), aes(x=Factors, y= TEST_TIME,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="none",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))x11()
ggplot2.multiplot(p01,p02,cols = 2)
x11()
ggplot2.multiplot(p1,p2,p3,p4,cols = 2)

这里写图片描述

这里写图片描述

这篇关于推荐算法之矩阵分解实例的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1088410

相关文章

MySQL中的LENGTH()函数用法详解与实例分析

《MySQL中的LENGTH()函数用法详解与实例分析》MySQLLENGTH()函数用于计算字符串的字节长度,区别于CHAR_LENGTH()的字符长度,适用于多字节字符集(如UTF-8)的数据验证... 目录1. LENGTH()函数的基本语法2. LENGTH()函数的返回值2.1 示例1:计算字符串

Spring Boot spring-boot-maven-plugin 参数配置详解(最新推荐)

《SpringBootspring-boot-maven-plugin参数配置详解(最新推荐)》文章介绍了SpringBootMaven插件的5个核心目标(repackage、run、start... 目录一 spring-boot-maven-plugin 插件的5个Goals二 应用场景1 重新打包应用

Knife4j+Axios+Redis前后端分离架构下的 API 管理与会话方案(最新推荐)

《Knife4j+Axios+Redis前后端分离架构下的API管理与会话方案(最新推荐)》本文主要介绍了Swagger与Knife4j的配置要点、前后端对接方法以及分布式Session实现原理,... 目录一、Swagger 与 Knife4j 的深度理解及配置要点Knife4j 配置关键要点1.Spri

Qt QCustomPlot库简介(最新推荐)

《QtQCustomPlot库简介(最新推荐)》QCustomPlot是一款基于Qt的高性能C++绘图库,专为二维数据可视化设计,它具有轻量级、实时处理百万级数据和多图层支持等特点,适用于科学计算、... 目录核心特性概览核心组件解析1.绘图核心 (QCustomPlot类)2.数据容器 (QCPDataC

Java内存分配与JVM参数详解(推荐)

《Java内存分配与JVM参数详解(推荐)》本文详解JVM内存结构与参数调整,涵盖堆分代、元空间、GC选择及优化策略,帮助开发者提升性能、避免内存泄漏,本文给大家介绍Java内存分配与JVM参数详解,... 目录引言JVM内存结构JVM参数概述堆内存分配年轻代与老年代调整堆内存大小调整年轻代与老年代比例元空

深度解析Java DTO(最新推荐)

《深度解析JavaDTO(最新推荐)》DTO(DataTransferObject)是一种用于在不同层(如Controller层、Service层)之间传输数据的对象设计模式,其核心目的是封装数据,... 目录一、什么是DTO?DTO的核心特点:二、为什么需要DTO?(对比Entity)三、实际应用场景解析

Java中的雪花算法Snowflake解析与实践技巧

《Java中的雪花算法Snowflake解析与实践技巧》本文解析了雪花算法的原理、Java实现及生产实践,涵盖ID结构、位运算技巧、时钟回拨处理、WorkerId分配等关键点,并探讨了百度UidGen... 目录一、雪花算法核心原理1.1 算法起源1.2 ID结构详解1.3 核心特性二、Java实现解析2.

Go语言中nil判断的注意事项(最新推荐)

《Go语言中nil判断的注意事项(最新推荐)》本文给大家介绍Go语言中nil判断的注意事项,本文给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的朋友参考下吧... 目录1.接口变量的特殊行为2.nil的合法类型3.nil值的实用行为4.自定义类型与nil5.反射判断nil6.函数返回的

java向微信服务号发送消息的完整步骤实例

《java向微信服务号发送消息的完整步骤实例》:本文主要介绍java向微信服务号发送消息的相关资料,包括申请测试号获取appID/appsecret、关注公众号获取openID、配置消息模板及代码... 目录步骤1. 申请测试系统2. 公众号账号信息3. 关注测试号二维码4. 消息模板接口5. Java测试

MySQL数据库的内嵌函数和联合查询实例代码

《MySQL数据库的内嵌函数和联合查询实例代码》联合查询是一种将多个查询结果组合在一起的方法,通常使用UNION、UNIONALL、INTERSECT和EXCEPT关键字,下面:本文主要介绍MyS... 目录一.数据库的内嵌函数1.1聚合函数COUNT([DISTINCT] expr)SUM([DISTIN