kikaylee 2020-05-05
一、urllib库
1、了解urllib
Urllib是python内置的HTTP请求库
包括:urllib.request 请求模块
urllib.error 异常处理模块
urllib.parse url解析模块
urllib.robotparser robot.txt解析模块
二、Requests库
1、简单使用
import requests
response = requests.get(url)
print(type(response))
print(response.status_code)
print(response.cookies)
print(response.text)
print(response.content)
print(response.content.decode("utf-8"))注意:
很多情况下直接用response.text会出现乱码问题,所以常使用response.content,返回二进制格式的数据,在通过decode()转换成utf-8
也可以使用以下方式进行避免乱码的问题
response = requests.get(url) response.encoding = ‘utf-8‘ print(response.text)
2、请求
(1)基本get请求
(2)带参数的get请求
get?key=val
response = requests.get("http://httpbin.org/get?name=zhaofan&age=23")
print(response.text)通过params关键字传递参数
data = {
“name”:"zhaofan" ,
"age":22
}
response = requests.get("http://httpbin.org/get",params=data)
print(response.url)
print(response.text)import json
import requests
response = request.get("http://httpbin.org/get")
print(response.json())
print(json.loads(response.text))在谷歌浏览器里输入chrome://version,就可以看到用户代理,将用户代理添加到头部信息
import requests
headers = {
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
}
response = requests.get("https://www.zhihu.com",headers=headers)
print(response.text)添加data参数
import requests
data = {
“name”:"zhaofan",
"age":23
}
response = requests.post("http://httpbin.org/post",data=data)
print(response.text)通过response可以获得很多属性
import requests
response = requests.get("http://www.baidu.com")
print(response.status_code)
print(response.headers)
print(response.cookies)
print(response.url)
print(response.history)状态码判断
202:accepted
404:not_found