Python爬虫，使用BeautifulSoup解析页面结果

Python爬虫，使用BeautifulSoup可以轻松解析页面结果，下面是使用该方法爬取boss页面的职位信息：包括职位名称、薪资、地点、公司名称、公司融资情况等信息。通过这个示例可以轻松看到BeautifulSoup的使用方法。

成都网站建设哪家好，找成都创新互联公司！专注于网页设计、成都网站建设、微信开发、小程序制作、集团成都企业网站定制等服务项目。核心团队均拥有互联网行业多年经验，服务众多知名企业客户；涵盖的客户类型包括：成都湿喷机等众多领域，积累了大量丰富的经验，同时也获得了客户的一致称赞！

import requests
from bs4 import BeautifulSoup
from middlewares import get_random_proxy,get_random_agent
import time

class Boss_Spider(object):
    def __init__(self, page=3):
        self.proxies = []
        self.verify_pro = []
        self.page = page
        self.headers = {}

    #第一步：获取首页所有招聘连接
    def Parse_pre(self):
        base_url = 'https://www.zhipin.com/'
        headers = get_random_agent()
        proxy = get_random_proxy()
        time.sleep(1)
        resp = requests.get(base_url, headers=headers)
        if resp.status_code == 200:
            soup = BeautifulSoup(resp.text, 'lxml')
            for job_menu in soup.find_all(class_='menu-sub'):
                for li in job_menu.find_all('li'):
                    job_type = li.find('h5').get_text()
                    for job_list in li.find_all('a'):
                        job_sub = job_list.get_text()
                        job_uri = job_list['href']
                        for i in range(0,11):
                            job_url = base_url + job_uri + '?page=%d&ka=page-%d' %(i,i)
                            requests.get(job_url,headers=headers,proxies=proxy)
                            meta = {
                                'job_type': job_type,
                                'job_sub': job_sub,
                            }
                            self.Parse_index(meta=meta,url=job_url)
    #爬取具体页数据
    def Parse_index(self,meta,url):
        headers = get_random_agent()
        proxy = get_random_proxy()
        time.sleep(1)
        resp = requests.get(url, headers=headers)
        if resp.status_code == 200:
            soup = BeautifulSoup(resp.text, 'lxml')
            print(soup)
            for li in soup.find(class_='job-list').find_all('li'):
                print('###########')
                position = li.find(class_='job-title').get_text()
                salary = li.find(class_='red').get_text()
                add = li.find('p').get_text()
                need = li.find('p').find('em').get_text()
                company_name = li.find(class_='company-text').find('a').get_text()
                tag = li.find(class_='company-text').find('p')
                print(position,"$$$",salary,"$$$",add,"$$$",need,"$$$",company_name,"$$$",tag)

if __name__ == '__main__':
    b = Boss_Spider()
    b.Parse_pre()

运行输出结果如下：
后端开发 $$$ 15-30K $$$ 北京朝阳区朝外3-5年本科 $$$ $$$ 米花互动 $$$ 游戏不需要融资20-99人
###########
后端开发工程师 $$$ 35-55K $$$ 北京朝阳区望京经验不限本科 $$$ $$$ 云账户 $$$ 移动互联网C轮100-499人
###########

当前标题：Python爬虫，使用BeautifulSoup解析页面结果
转载源于：https://www.cdcxhl.com/article28/iipdcp.html

成都网站建设公司_创新互联，为您提供网站策划、静态网站、网站建设、小程序开发、商城网站、App设计

声明：本网站发布的内容（图片、视频和文字）以用户投稿、用户转载内容为主，如果涉及侵权请尽快告知，我们将会在第一时间删除。文章观点不代表本网站立场，如需处理请联系客服。电话：028-86922220；邮箱：631063699@qq.com。内容未经允许不得转载，或转载时需注明来源：创新互联

猜你还喜欢下面的内容