python 爬虫--利用百度图片处理OCR识图API进行验证码识别，并通过python、requests进行网站信息爬取（一）

本文主要是介绍python 爬虫--利用百度图片处理OCR识图API进行验证码识别，并通过python、requests进行网站信息爬取（一），希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

首先注册百度账号

申请百度OCR的Api

本地识图与远程识图代码部分

首先注册百度账号

申请百度OCR的Api

前两个步骤自行百度申请，这里就不赘述了，谢谢！

本地识图与远程识图代码部分

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2019/4/10 17:35
# @Author  : Hanxiaoshun@天谕传说
# @Site    : 
# @File    : SpiderStationInfo.py
# @Software: PyCharmimport os
import randomfrom aip import AipOcr# pip install baidu-aip""" 你的 APPID AK SK 请自行申请"""
APP_ID = 'xxxxxx'
API_KEY = 'xxxxxxxxxxxxxxx'
SECRET_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
# 建立百度OCR程序调用客户端
rootBase = "./capt"
CAPT_URL = "http://xx.xx.com/validatecodeservlet.do"
#真实网站多有不便，需要详细地址的还望留言我邮箱发送，谢谢def get_file_content(filePath):"""获取本地图片文件的流:param filePath::return:"""with open(filePath, 'rb') as fp:return fp.read()def localutils(rootBase):"""本地图片OCR识别:param rootBase::return:"""dirs = os.listdir(rootBase)for filePath in dirs:print(filePath)""" 读取图片 """image = get_file_content(rootBase + '/' + filePath)result = client.basicGeneral(image)words = result['words_result']if words.__len__() > 0:word = words[0]value = word['words'].strip().replace(' ', '')print(value)def remoteutils(url):"""远程实时OCR识别并调用:param url::return:"""result = client.basicGeneralUrl(url)print(result)if "words_result" in result:words = result['words_result']if words.__len__() > 0:word = words[0]value = word['words'].strip().replace(' ', '')print(value)return valueelse:return 0else:return 0

最后说明：代码不尽完善，感觉low的话，还请不吝赐教，我当倾心学习之。

这篇关于python 爬虫--利用百度图片处理OCR识图API进行验证码识别，并通过python、requests进行网站信息爬取（一）的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！