前后端分离项目做爬虫收录，提供蜘蛛爬行最简单方案，创建sitemap xml

本文主要是介绍前后端分离项目做爬虫收录，提供蜘蛛爬行最简单方案，创建sitemap xml，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

2024年5月13日11:36:01

现在很多项目是vue react angular开发的，但是百度爬虫对这样的项目支持不好，很多时候回去采用一些 服务器端渲染(SSR) 和静态站点生成(SSG) ，当然有些框架支持ssr和ssg效果不好，还有些想不不破坏项目自身的提前下的方案呢？

参考：https://blog.csdn.net/andy_68147772/article/details/135118183

很多年前接手一个angularjs的项目，但是要搞爬虫收录，搞了好久那个时候ssr支持的不太好，所以想了另一个方法。

爬虫是会先爬取 http://127.0.0.1/robots.txt 去读取爬虫规则的，这里面是可以指定stemap xml地址的
例如：

User-agent: *
Allow: /
Sitemap: https://www.xxx.cn/express.xml
Sitemap: https://www.xxx.cn/lastest.xml
Sitemap: https://www.xxx.cn/sitemap1.xml
Sitemap: https://www.xxx.cn/sitemap2.xml
Sitemap: https://www.xxx.cn/sitemap3.xml
Sitemap: https://www.xxx.cn/sitemap4.xml

sitemap规范
https://www.sitemaps.org/protocol.html#index

如果有sitemap的地址，爬虫就会去读取sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>https://www.xxx.com/page1</loc><lastmod>2023-01-01</lastmod><changefreq>daily</changefreq><priority>0.8</priority></url><url><loc>https://www.xxx.com/page2</loc><lastmod>2023-01-02</lastmod><changefreq>weekly</changefreq><priority>0.6</priority></url><!-- more URLs... -->
</urlset>

那么就可以直接在get的页面的内容接口直接放在地图，让蜘蛛直接获取接口返回的json内容，就不需要渲染，达到让不使用ssr，ssg来让爬虫爬取内容的方法。

laravel支持的sitemap生成工具