python - 怎么查看網(wǎng)址做的是什么反爬蟲(chóng)
問(wèn)題描述
網(wǎng)址:https://www.nvshens.com/g/22377/,該網(wǎng)站直接游覽器打開(kāi)然后,點(diǎn)擊圖片右鍵是可以下載的,然后我爬蟲(chóng)直接請(qǐng)求下來(lái)的圖片就已經(jīng)被屏蔽了,然后我改了headers跟設(shè)置了ip代理,還是沒(méi)用。但抓包來(lái)看也不是動(dòng)態(tài)加載的數(shù)據(jù)呀!!!求解答= =
問(wèn)題解答
回答1:妹子挺漂亮的哈。右鍵確實(shí)能打開(kāi),但是刷新一下就成盜鏈圖片了。一般防盜鏈,服務(wù)器端是會(huì)檢查請(qǐng)求頭里面的Referer字段,這就是為什么刷新后就不是原圖的原因(刷新后Referer變了)。
img_url = 'https://t1.onvshen.com:85/gallery/21501/22377/s/003.jpg'r = requests.get(img_url, headers={’Referer’:'https://www.nvshens.com/g/22377/'}).contentwith open('00.jpg',’wb’) as f: f.write(r)回答2:
獲取圖片時(shí)抓包看漏什么參數(shù)沒(méi)。
回答3:光顧著看網(wǎng)站內(nèi)容,差點(diǎn)忘記了正式了。你可以把你請(qǐng)求的信息全部按照

然后在試試
回答4:Referer 照這網(wǎng)站的設(shè)計(jì)應(yīng)該是各別的頁(yè)面會(huì)比較符合假裝是人的行為,而并不是用單一的Referer以下是完整能跑的代碼,抓18頁(yè)所有的圖片
# Putting all togetherdef url_guess_src_large (u): return ('https://www.nvshens.com/img.html?img=' + ’/’.join(u.split(’/s/’)))# 下載函數(shù)def get_img_using_requests(url, fn ): import shutil headers [’Referer’] = url_guess_src_large(url) #'https://www.nvshens.com/g/22377/' print (headers) response = requests.get(url, headers = headers, stream=True) with open(fn, ’wb’) as out_file:shutil.copyfileobj(response.raw, out_file) del responseimport requests# 用xpath擷取內(nèi)容from lxml import etreeurl_ = ’https://www.nvshens.com/g/22377/{p}.html’ headers = { 'Connection' : 'close', # one way to cover tracks 'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2900.1 Iron Safari/537.36}'}for i in range(1,18+1): url = url_.format(p=i) r = requests.get(url, headers=headers) html = requests.get(url,headers=headers).content.decode(’utf-8’) selector = etree.HTML(html) xpaths = ’//*[@id='hgallery']/img/@src’ content = [x for x in selector.xpath(item)] urls_2get = [url_guess_src_large(x) for x in content] filenames = [os.path.split(x)[0].split(’/gallery/’)[1].replace('/','_') + '_' + os.path.split(x)[1] for x in urls_2get] for i, x in enumerate(content):get_img_using_requests (content[i], filenames[i])
相關(guān)文章:
1. 我在導(dǎo)入模板資源時(shí)遇到無(wú)法顯示的問(wèn)題,請(qǐng)老師解答下2. 運(yùn)行python程序時(shí)出現(xiàn)“應(yīng)用程序發(fā)生異常”的內(nèi)存錯(cuò)誤?3. thinkphp6使用驗(yàn)證器 信息如何輸出到前端頁(yè)面4. python - sqlalchemy更新數(shù)據(jù)報(bào)錯(cuò)5. javascript - h5微信中怎么禁止橫屏6. PHPExcel表格導(dǎo)入數(shù)據(jù)庫(kù)怎么導(dǎo)入7. macos - 無(wú)法source activate python278. html5 - 前端面試碰到了一個(gè)緩存數(shù)據(jù)的問(wèn)題,來(lái)論壇上請(qǐng)教一下9. html - 網(wǎng)頁(yè)的a標(biāo)簽到底要不要寫(xiě)上域名?10. css - 移動(dòng)端 盒子內(nèi)加overflow-y:scroll后 字體會(huì)變大

網(wǎng)公網(wǎng)安備