五. 带参数访问
1. 普通参数
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get("http://httpbin.org/get", params=payload)
>>> r.url
u'http://httpbin.org/get?key2=value2&key1=value1'
>>> r.text
u'{\n "url": "http://httpbin.org/get?key2=value2&key1=value1",\n }'
2. 带中文参数
>>> payload = {'key1': 'value1', 'key2': u'中文'}
>>> r = requests.get("http://httpbin.org/get", params=payload)
>>> r.url
u'http://httpbin.org/get?key2=%E4%B8%AD%E6%96%87&key1=value1'
>>> r.text
u'{\n "url": "http://httpbin.org/get?key2=%E4%B8%AD%E6%96%87&key1=value1",\n }'
3. json格式
>>> import json
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> headers = {'content-type': 'application/json'}
>>> r = requests.post(url, data=json.dumps(payload), headers=headers)
六. 文件操作
1. 文件下载
r = requests.get('http://img4.cache.netease.com/travel/2013/4/7/2013040718512699794.jpg')
from PIL import Image
from StringIO import StringIO
i = Image.open(StringIO(r.content))
i.save('1.jpg')
2. 文件上传
>>> url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}
>>> r = requests.post(url, files=files)
>>> r.text
3. 上传时指定文件名
>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'))}
>>> r = requests.post(url, files=files)
>>> r.text
4. 按文件接收字符串
>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}
>>> r = requests.post(url, files=files)
>>> r.text
5. 流式上传,大文件上传时,不需要全部装载到内存
with open('massive-body') as f:
requests.post('http://some.url/streamed', data=f)
6. 流式下载
r = requests.post('https://stream.twitter.com/1/statuses/filter.json',
data={'track': 'requests'}, auth=('username', 'password'), stream=True)
for line in r.iter_lines():
if line: # filter out keep-alive new lines
print json.loads(line)
七. 代理设置
1. 代理参数
proxies = {
"http": "http://10.10.1.10:3128",
"https": "http://10.10.1.10:1080",
}
requests.get("http://example.org", proxies=proxies)
>>> r = requests.get('http://ifconfig.me/ip')
>>> r.text
u'116.226.xx.xxx\n'
>>> proxies = {
... "http": "http://175.136.xxx.xx",
... }
>>> r = requests.get("http://ifconfig.me/ip", proxies=proxies)
>>> r.text
u'175.136.xxx.xx\n'
2. 环境变量
$ export HTTP_PROXY="http://10.10.1.10:3128"
$ export HTTPS_PROXY="http://10.10.1.10:1080"
$ python
>>> import requests
>>> requests.get("http://example.org")
3. 如果代理授权方式用的是HTTP Basic Auth,则可以
proxies = {
"http": "http://user:pass@10.10.1.10:3128/",
}
八 Session
如果需要在多次访问之间保持状态,则需要用到requests中的Session对象。
>>> s = requests.Session()
>>> r = s.get("http://httpbin.org/cookies")
>>> r.text
u'{\n "cookies": {\n "sessioncookie": "123456789"\n }\n}'
另外,Session支持Keep-Alive,在同一个session中的多个请求会自动重用连接。注意,只有所有body内容读完,连接才会放回连接池重用。在使用流式文件的时候小心。
九. 异常出错
1. 网络错误(如DNS出错,链接拒绝等),抛出ConnectionError
2. 遭遇少见的非法HTTP 响应(不是requests能解析的那些404之类异常),抛出HTTPError
3. 超时,抛出Timeout
4. 301之类的跳转次数太多,抛出TooManyRedirects
5. 所有requests抛出的异常均继承自requests.exceptions.RequestException
No comments:
Post a Comment