[Python]爬取首都之窗百姓信件网址id python 2020.2.13

shengge0 2020-02-13

经人提醒忘记发网址id的爬取过程了,

http://www.beijing.gov.cn/hudong/hdjl/com.web.consult.consultDetail.flow?originalId=AH20021300174

AH20021300174为要爬取的内容

现代码如下:

import json
import requests
import io

url="http://www.beijing.gov.cn/hudong/hdjl/com.web.search.mailList.mailList.biz.ext"

kv = {
    ‘Host‘: ‘www.beijing.gov.cn‘,
    ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0‘,
    ‘Accept‘: ‘application/json, text/javascript, */*; q=0.01‘,
    ‘Accept-Language‘: ‘zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2‘,
    ‘Accept-Encoding‘: ‘gzip, deflate‘,
    ‘Content-Type‘: ‘text/json‘,
    ‘X-Requested-With‘: ‘XMLHttpRequest‘,
    ‘Content-Length‘: ‘155‘,
    ‘Origin‘: ‘http://www.beijing.gov.cn‘,
    ‘Connection‘: ‘keep-alive‘,
    ‘Referer‘: ‘http://www.beijing.gov.cn/hudong/hdjl/‘}

def page(begin):
    query={
    ‘PageCond/begin‘: begin,
    ‘PageCond/isCount‘:‘true‘,
    ‘PageCond/length‘:6,
    }
    datas=json.dumps(query)
    r=requests.post(url,data=datas,headers=kv)
    print(r.status_code)
    print(r.text)
    js=json.loads(r.text)
    for j in js["mailList"]:
        print(j)
        print(j.get("original_id"))


def href():
    begin=0
    for i in range(0,5584):
        if i%6==0:
            page(i)
            #print(begin)

if __name__=="__main__":
    href()

相关推荐