第一天打卡:异步爬虫 51job网

2024-01-18 09:30

本文主要是介绍第一天打卡:异步爬虫 51job网,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

网址(首页):滑动验证页面

如图1所示:


查看所需数据是否存在于网页源代码中,发现需要的数据存在用javascript所写的脚本中,不可直接用原始URL获取数据,如下图所示:


如何寻找所需数据?

步骤一:如图所示,在network中清除,重新加载网页

步骤二:如下图所示,查看返回的是否是json文件(用在线json解析,看格式“集合”) ,发现里面包含了json,即我们所需要的数据

步骤三:如下图所示,用复制的链接打开网页,即为目标URL


 获取含有所需数据的源代码

import json
import urllib.request,urllib.error  #制定URL,获取网页数据
import redef main():url="https://search.51job.com/list/000000,000000,0000,00,9,99,python,2,2.html?lang=c&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare="#askURL(url)  #将爬取到的内容保存至result.txt文件中result=open('result.txt','r',encoding='utf-8')#正则表达式data=re.findall(r"\"engine_jds\":(.+?),\"jobid_count\"",str(result.readlines()))print(data[0])jsonObj=json.loads(data[0])for item in jsonObj:print(item['job_name']+':'+item['company_name'])#间隔时间爬取
#代理def askURL(url):head={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'}request=urllib.request.Request(url,headers=head)html=""try:response=urllib.request.urlopen(request)html=response.read().decode("gbk")print(html)except urllib.error.error.URLError as e:if hasattr(e,"code"):print(e.code)if hasattr(e,"reason"):print(e.reason)return htmlif __name__=="__main__":main()

print(data[0])的部分运行结果(中间的省略了)

[{"type":"engine_jds","jt":"0_0","tags":[],"ad_track":"","jobid":"138284904","coid":"6831150","effect":"1","is_special_job":"","job_href":"https:\\/\\/jobs.51job.com\\/beijing-cyq\\/138284904.html?s=sou_sou_soulb&t=0_0","job_name":"python工程师","job_title":"python工程师","company_href":"https:\\/\\/jobs.51job.com\\/all\\/coAmJQPARkBToDZwZgAWQ.html","company_name":"深圳市览众科技股份有限公司","providesalary_text":"0.6-1万\\/月","workarea":"010500","workarea_text":"北京-朝阳区","updatedate":"03-15","iscommunicate":"","companytype_text":"民营公司","degreefrom":"5","workyear":"3","issuedate":"2022-03-15 09:15:41","isFromXyz":"","isIntern":"","jobwelf":"五险一金 年终奖金 定期体检 员工旅游 交通补贴 餐饮补贴 专业培训 绩效奖金 股票期权 弹性工作","jobwelf_list":["五险一金","年终奖金","定期体检","员工旅游","交通补贴","餐饮补贴","专业培训","绩效奖金","股票期权","弹性工作"],"isdiffcity":"","attribute_text":["北京-朝阳区","1年经验","大专","招2人"],"companysize_text":"50-150人","companyind_text":"计算机软件","adid":""},{"type":"engine_jds","jt":"0_0","tags":[],"ad_track":"","jobid":"131628693","coid":"5866100","effect":"1","is_special_job":"","job_href":"https:\\/\\/jobs.51job.com\\/dongguan\\/131628693.html?s=sou_sou_soulb&t=0_0","job_name":"Python开发工程师(初级)","job_title":"Python开发工程师(初级)","company_href":"https:\\/\\/jobs.51job.com\\/all\\/co5866100.html","company_name":"广州市正成信息科技有限公司","providesalary_text":"5-8千\\/月","workarea":"030800","workarea_text":"东莞","updatedate":"03-15","iscommunicate":"","companytype_text":"民营公司","degreefrom":"6","workyear":"3","issuedate":"2022-03-15 09:11:19","isFromXyz":"","isIntern":"","jobwelf":"五险一金 补充医疗保险 员工旅游 专业培训 绩效奖金 弹性工作","jobwelf_list":["五险一金","补充医疗保险","员工旅游","专业培训","绩效奖金","弹性工作"],"isdiffcity":"","attribute_text":["东莞","1年经验","本科","招若干人"],"companysize_text":"50-150人","companyind_text":"电子技术\\/半导体\\/集成电路","adid":""},{"type":"engine_jds","jt":"0_0","tags":{"type":"engine_jds","jt":"0_0","tags":[],"ad_track":"","jobid":"138822037","coid":"4970051","effect":"1","is_special_job":"","job_href":"https:\\/\\/jobs.51job.com\\/hangzhou-yhq\\/138822037.html?s=sou_sou_soulb&t=0_0","job_name":"后端Python开发工程师","job_title":"后端Python开发工程师","company_href":"https:\\/\\/jobs.51job.com\\/all\\/co4970051.html","company_name":"浙江禾川科技股份有限公司","providesalary_text":"1.4-2.2万\\/月","workarea":"080207","workarea_text":"杭州-余杭区","updatedate":"03-15","iscommunicate":"","companytype_text":"民营公司","degreefrom":"6","workyear":"4","issuedate":"2022-03-15 04:00:38","isFromXyz":"","isIntern":"","jobwelf":"","jobwelf_list":[""],"isdiffcity":"","attribute_text":["杭州-余杭区","2年经验","本科","招2人"],"companysize_text":"500-1000人","companyind_text":"仪器仪表\\/工业自动化","adid":""},{"type":"engine_jds","jt":"0_0","tags":[],"ad_track":"","jobid":"138817451","coid":"2486667","effect":"1","is_special_job":"","job_href":"https:\\/\\/jobs.51job.com\\/tianjin-xqq\\/138817451.html?s=sou_sou_soulb&t=0_0","job_name":"Python开发工程师","job_title":"Python开发工程师","company_href":"https:\\/\\/jobs.51job.com\\/all\\/co2486667.html","company_name":"北京国遥新天地信息技术股份有限公司","providesalary_text":"1-1.5万\\/月","workarea":"050800","workarea_text":"天津-西青区","updatedate":"03-15","iscommunicate":"","companytype_text":"民营公司","degreefrom":"5","workyear":"4","issuedate":"2022-03-15 04:00:38","isFromXyz":"","isIntern":"","jobwelf":"五险一金 年终奖金 员工旅游 定期体检","jobwelf_list":["五险一金","年终奖金","员工旅游","定期体检"],"isdiffcity":"","attribute_text":["天津-西青区","2年经验","大专","招1人"],"companysize_text":"500-1000人","companyind_text":"计算机软件","adid":""},(Python\\/GUI)","job_title":"软件工程师desal\/co2793941.html","company_na资","degreefrom":"6","workyear":"4","issuedate":"2022-03-15 17:37:42","isFromXyz":"","isIntern":"","jobwelf":"五险一金 交通补贴 餐饮补贴 绩效奖金 年终奖金 定期体检 商业保险 生日礼金 过节费","jobwelf_list":["五险一金","交通补贴","餐饮补贴","绩效奖金","年终奖金","定期体检","商业保险","生日礼金","过节费"],"isdiffcity":"","attribute_text":["成都-高新区","2年经验","本科","招1人"],"companysize_text":"1000-5000人","companyind_text":"通信\\/电信\\/网络设备","adid":""}]

由于这样看json文件不方便,故使用网页json解析(网址为:JSON在线 | JSON解析格式化—SO JSON在线工具) 

方式一:

 方式二:点击json解析右边的json在线解析,更加清晰明了

 在复制json时运用小技巧:

ctrl+A   全选

ctrl+C  复制

ctrl+V 粘贴


代码最终运行结果

这篇关于第一天打卡:异步爬虫 51job网的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/618629

相关文章

Python 异步编程 asyncio简介及基本用法

《Python异步编程asyncio简介及基本用法》asyncio是Python的一个库,用于编写并发代码,使用协程、任务和Futures来处理I/O密集型和高延迟操作,本文给大家介绍Python... 目录1、asyncio是什么IO密集型任务特征2、怎么用1、基本用法2、关键字 async1、async

嵌入式Linux驱动中的异步通知机制详解

《嵌入式Linux驱动中的异步通知机制详解》:本文主要介绍嵌入式Linux驱动中的异步通知机制,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录前言一、异步通知的核心概念1. 什么是异步通知2. 异步通知的关键组件二、异步通知的实现原理三、代码示例分析1. 设备结构

Redis消息队列实现异步秒杀功能

《Redis消息队列实现异步秒杀功能》在高并发场景下,为了提高秒杀业务的性能,可将部分工作交给Redis处理,并通过异步方式执行,Redis提供了多种数据结构来实现消息队列,总结三种,本文详细介绍Re... 目录1 Redis消息队列1.1 List 结构1.2 Pub/Sub 模式1.3 Stream 结

使用Python实现一个优雅的异步定时器

《使用Python实现一个优雅的异步定时器》在Python中实现定时器功能是一个常见需求,尤其是在需要周期性执行任务的场景下,本文给大家介绍了基于asyncio和threading模块,可扩展的异步定... 目录需求背景代码1. 单例事件循环的实现2. 事件循环的运行与关闭3. 定时器核心逻辑4. 启动与停

C#中async await异步关键字用法和异步的底层原理全解析

《C#中asyncawait异步关键字用法和异步的底层原理全解析》:本文主要介绍C#中asyncawait异步关键字用法和异步的底层原理全解析,本文给大家介绍的非常详细,对大家的学习或工作具有一... 目录C#异步编程一、异步编程基础二、异步方法的工作原理三、代码示例四、编译后的底层实现五、总结C#异步编程

Python 中的异步与同步深度解析(实践记录)

《Python中的异步与同步深度解析(实践记录)》在Python编程世界里,异步和同步的概念是理解程序执行流程和性能优化的关键,这篇文章将带你深入了解它们的差异,以及阻塞和非阻塞的特性,同时通过实际... 目录python中的异步与同步:深度解析与实践异步与同步的定义异步同步阻塞与非阻塞的概念阻塞非阻塞同步

Java 中实现异步的多种方式

《Java中实现异步的多种方式》文章介绍了Java中实现异步处理的几种常见方式,每种方式都有其特点和适用场景,通过选择合适的异步处理方式,可以提高程序的性能和可维护性,感兴趣的朋友一起看看吧... 目录1. 线程池(ExecutorService)2. CompletableFuture3. ForkJoi

Python异步编程中asyncio.gather的并发控制详解

《Python异步编程中asyncio.gather的并发控制详解》在Python异步编程生态中,asyncio.gather是并发任务调度的核心工具,本文将通过实际场景和代码示例,展示如何结合信号量... 目录一、asyncio.gather的原始行为解析二、信号量控制法:给并发装上"节流阀"三、进阶控制

Spring Boot 中正确地在异步线程中使用 HttpServletRequest的方法

《SpringBoot中正确地在异步线程中使用HttpServletRequest的方法》文章讨论了在SpringBoot中如何在异步线程中正确使用HttpServletRequest的问题,... 目录前言一、问题的来源:为什么异步线程中无法访问 HttpServletRequest?1. 请求上下文与线

在 Spring Boot 中使用异步线程时的 HttpServletRequest 复用问题记录

《在SpringBoot中使用异步线程时的HttpServletRequest复用问题记录》文章讨论了在SpringBoot中使用异步线程时,由于HttpServletRequest复用导致... 目录一、问题描述:异步线程操作导致请求复用时 Cookie 解析失败1. 场景背景2. 问题根源二、问题详细分