关于scrapy爬取51job网以及智联招聘信息存储文件的设置

本文主要是介绍关于scrapy爬取51job网以及智联招聘信息存储文件的设置，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

通过这两个文件,,可以存储数据(但是注意在爬虫文件中也在写相应的代码

具体参考51job网和智联招聘两个文件)

1.先设置items文件

# -*- coding: utf-8 -*-# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.htmlimport scrapyclass JobspiderItem(scrapy.Item):# define the fields for your item here like:job_name = scrapy.Field()fan_kui_lv = scrapy.Field()job_company_name = scrapy.Field()job_salary = scrapy.Field()job_place = scrapy.Field()job_type = scrapy.Field()job_time = scrapy.Field()

2.设置管道文件

# -*- coding: utf-8 -*-# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html#pipeline:俗称管道,用于接收爬虫返回的item数据class JobspiderPipeline(object):def process_item(self, item, spider):return itemclass TocsvPipeline(object):def process_item(self, item, spider):with open("job.csv", "a",encoding="gb18030") as f:job_name = item['job_name']fan_kui_lv = item['fan_kui_lv']job_company_name = item['job_company_name']job_salary = item['job_salary']job_place = item['job_place']job_type = item['job_type']job_time = item['job_time']job_info = [job_name, fan_kui_lv,job_company_name,job_salary, job_place,job_type, job_time,'\n']f.write(",".join(job_info))#把item传递给下一个pipelinereturn item

这篇关于关于scrapy爬取51job网以及智联招聘信息存储文件的设置的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！