亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

  免費(fèi)注冊(cè) 查看新帖 |

Chinaunix

  平臺(tái) 論壇 博客 文庫(kù)
最近訪問(wèn)板塊 發(fā)新帖
查看: 4166 | 回復(fù): 9
打印 上一主題 下一主題

如何使用 wget 命令。 [復(fù)制鏈接]

論壇徽章:
0
跳轉(zhuǎn)到指定樓層
1 [收藏(0)] [報(bào)告]
發(fā)表于 2010-07-02 13:26 |只看該作者 |倒序?yàn)g覽
RT。

比如我想下載 網(wǎng)上的

http://mirrors.kernel.org/opensuse/distribution/11.2/iso/

下的文件。。。

怎么得到呢?

論壇徽章:
0
2 [報(bào)告]
發(fā)表于 2010-07-02 13:53 |只看該作者

論壇徽章:
5
寅虎
日期:2015-01-20 09:16:52亥豬
日期:2015-01-21 14:43:44IT運(yùn)維版塊每日發(fā)帖之星
日期:2015-12-17 06:20:00每日論壇發(fā)貼之星
日期:2015-12-17 06:20:00每周論壇發(fā)貼之星
日期:2015-12-20 22:22:00
3 [報(bào)告]
發(fā)表于 2010-07-02 14:07 |只看該作者
回復(fù) 1# wazhl


   
wget 使用技巧

2007-10-14 Toy Posted in TipsRSSTrackback

wget 是一個(gè)命令行的下載工具。對(duì)于我們這些 Linux 用戶來(lái)說(shuō),幾乎每天都在使用它。下面為大家介紹幾個(gè)有用的 wget 小技巧,可以讓你更加高效而靈活的使用 wget。

    * $ wget -r -np -nd http://example.com/packages/

這條命令可以下載 http://example.com 網(wǎng)站上 packages 目錄中的所有文件。其中,-np 的作用是不遍歷父目錄,-nd 表示不在本機(jī)重新創(chuàng)建目錄結(jié)構(gòu)。

    * $ wget -r -np -nd --accept=iso http://example.com/centos-5/i386/

與上一條命令相似,但多加了一個(gè) --accept=iso 選項(xiàng),這指示 wget 僅下載 i386 目錄中所有擴(kuò)展名為 iso 的文件。你也可以指定多個(gè)擴(kuò)展名,只需用逗號(hào)分隔即可。

    * $ wget -i filename.txt

此命令常用于批量下載的情形,把所有需要下載文件的地址放到 filename.txt 中,然后 wget 就會(huì)自動(dòng)為你下載所有文件了。

    * $ wget -c http://example.com/really-big-file.iso

這里所指定的 -c 選項(xiàng)的作用為斷點(diǎn)續(xù)傳。

    * $ wget -m -k (-H) http://www.example.com/

該命令可用來(lái)鏡像一個(gè)網(wǎng)站,wget 將對(duì)鏈接進(jìn)行轉(zhuǎn)換。如果網(wǎng)站中的圖像是放在另外的站點(diǎn),那么可以使用 -H 選項(xiàng)。
一路征程一路笑 該用戶已被刪除
4 [報(bào)告]
發(fā)表于 2010-07-02 14:35 |只看該作者
提示: 作者被禁止或刪除 內(nèi)容自動(dòng)屏蔽

論壇徽章:
0
5 [報(bào)告]
發(fā)表于 2010-07-02 15:31 |只看該作者
大哥不會(huì)用man嗎?!

論壇徽章:
0
6 [報(bào)告]
發(fā)表于 2010-07-02 21:58 |只看該作者
3l的牛...我只會(huì)wget url

論壇徽章:
0
7 [報(bào)告]
發(fā)表于 2010-07-03 17:54 |只看該作者
本帖最后由 expresss 于 2010-07-05 08:09 編輯

為什么我下載樓主給的那些文件,沒辦法下載成功呢?
wget -r -np -nd --accept=md5 http://mirrors.kernel.org/opensuse/distribution/11.2/iso/
并不能成功的把md5文件下下來(lái),只有一個(gè)robots.txt文件。改成其它后綴也一樣。
這個(gè)參數(shù)有問(wèn)題嗎?試過(guò)好幾次都錯(cuò)了。并不能下載指定類型的文件。

論壇徽章:
5
寅虎
日期:2015-01-20 09:16:52亥豬
日期:2015-01-21 14:43:44IT運(yùn)維版塊每日發(fā)帖之星
日期:2015-12-17 06:20:00每日論壇發(fā)貼之星
日期:2015-12-17 06:20:00每周論壇發(fā)貼之星
日期:2015-12-20 22:22:00
8 [報(bào)告]
發(fā)表于 2010-07-05 11:58 |只看該作者
回復(fù) 7# expresss
  1. 9.1 Robot Exclusion

  2. It is extremely easy to make Wget wander aimlessly around a web site, sucking all the available data in progress. ‘wget -r site’, and you're set. Great? Not for the server admin.

  3. As long as Wget is only retrieving static pages, and doing it at a reasonable rate (see the ‘--wait’ option), there's not much of a problem. The trouble is that Wget can't tell the difference between the smallest static page and the most demanding CGI. A site I know has a section handled by a CGI Perl script that converts Info files to html on the fly. The script is slow, but works well enough for human users viewing an occasional Info file. However, when someone's recursive Wget download stumbles upon the index page that links to all the Info files through the script, the system is brought to its knees without providing anything useful to the user (This task of converting Info files could be done locally and access to Info documentation for all installed GNU software on a system is available from the info command).

  4. To avoid this kind of accident, as well as to preserve privacy for documents that need to be protected from well-behaved robots, the concept of robot exclusion was invented. The idea is that the server administrators and document authors can specify which portions of the site they wish to protect from robots and those they will permit access.

  5. The most popular mechanism, and the de facto standard supported by all the major robots, is the “Robots Exclusion Standard” (RES) written by Martijn Koster et al. in 1994. It specifies the format of a text file containing directives that instruct the robots which URL paths to avoid. To be found by the robots, the specifications must be placed in /robots.txt in the server root, which the robots are expected to download and parse.

  6. Although Wget is not a web robot in the strictest sense of the word, it can download large parts of the site without the user's intervention to download an individual page. Because of that, Wget honors RES when downloading recursively. For instance, when you issue:

  7.      wget -r http://www.server.com/
  8. First the index of ‘www.server.com’ will be downloaded. If Wget finds that it wants to download more documents from that server, it will request ‘http://www.server.com/robots.txt’ and, if found, use it for further downloads. robots.txt is loaded only once per each server.

  9. Until version 1.8, Wget supported the first version of the standard, written by Martijn Koster in 1994 and available at http://www.robotstxt.org/wc/norobots.html. As of version 1.8, Wget has supported the additional directives specified in the internet draft ‘<draft-koster-robots-00.txt>’ titled “A Method for Web Robots Control”. The draft, which has as far as I know never made to an rfc, is available at http://www.robotstxt.org/wc/norobots-rfc.txt.

  10. This manual no longer includes the text of the Robot Exclusion Standard.

  11. The second, less known mechanism, enables the author of an individual document to specify whether they want the links from the file to be followed by a robot. This is achieved using the META tag, like this:

  12.      <meta name="robots" content="nofollow">
  13. This is explained in some detail at http://www.robotstxt.org/wc/meta-user.html. Wget supports this method of robot exclusion in addition to the usual /robots.txt exclusion.

  14. If you know what you are doing and really really wish to turn off the robot exclusion, set the robots variable to ‘off’ in your .wgetrc. You can achieve the same effect from the command line using the -e switch, e.g. ‘wget -e robots=off url...’.
復(fù)制代碼

論壇徽章:
5
寅虎
日期:2015-01-20 09:16:52亥豬
日期:2015-01-21 14:43:44IT運(yùn)維版塊每日發(fā)帖之星
日期:2015-12-17 06:20:00每日論壇發(fā)貼之星
日期:2015-12-17 06:20:00每周論壇發(fā)貼之星
日期:2015-12-20 22:22:00
9 [報(bào)告]
發(fā)表于 2010-07-05 12:01 |只看該作者
本帖最后由 gamester88 于 2010-07-05 12:02 編輯

回復(fù) 7# expresss


    因?yàn)閞obots.txt文件的緣故,所以上面的參數(shù)都會(huì)失效,所以
  1. [gamester88@gamester88 iso]$ mkdir iso
  2. [gamester88@gamester88 iso]$ cd iso
  3. [gamester88@gamester88 iso]$ ls
  4. [gamester88@gamester88 iso]$ wget -e robots=off -r -np -nd --accept=md5 http://mirrors.kernel.org/opensuse/distribution/11.2/iso/
  5. [gamester88@gamester88 iso]$ls
  6. openSUSE-11.2-Addon-Lang-i586.iso.md5                 
  7. openSUSE-11.2-DVD-x86_64.iso.md5           
  8. openSUSE-11.2-KDE4-LiveCD-x86_64.iso.md5
  9. openSUSE-11.2-Addon-Lang-x86_64.iso.md5               
  10. openSUSE-11.2-GNOME-LiveCD-i686.iso.md5   
  11. openSUSE-11.2-NET-i586.iso.md5
  12. openSUSE-11.2-Addon-NonOss-BiArch-i586-x86_64.iso.md5  
  13. openSUSE-11.2-GNOME-LiveCD-x86_64.iso.md5  
  14. openSUSE-11.2-NET-x86_64.iso.md5
  15. openSUSE-11.2-DVD-i586.iso.md5                        
  16. openSUSE-11.2-KDE4-LiveCD-i686.iso.md5
復(fù)制代碼

論壇徽章:
0
10 [報(bào)告]
發(fā)表于 2010-07-06 21:22 |只看該作者
本帖最后由 expresss 于 2010-07-07 09:16 編輯

回復(fù) 9# gamester88


    謝謝,非常熱心的回答,真的非常感謝。看樣子Linux要學(xué)好了,還真要把英文搞好不可,呵呵。
非常感謝您熱心的解答。大概明白了,因?yàn)閞obots.txt里面的disallow:/,所以不允許搜索整個(gè)目錄,用-e robots=off可以不按robots的內(nèi)容來(lái),也就是可以繞過(guò)robots.txt里的限制。
您需要登錄后才可以回帖 登錄 | 注冊(cè)

本版積分規(guī)則 發(fā)表回復(fù)

  

北京盛拓優(yōu)訊信息技術(shù)有限公司. 版權(quán)所有 京ICP備16024965號(hào)-6 北京市公安局海淀分局網(wǎng)監(jiān)中心備案編號(hào):11010802020122 niuxiaotong@pcpop.com 17352615567
未成年舉報(bào)專區(qū)
中國(guó)互聯(lián)網(wǎng)協(xié)會(huì)會(huì)員  聯(lián)系我們:huangweiwei@itpub.net
感謝所有關(guān)心和支持過(guò)ChinaUnix的朋友們 轉(zhuǎn)載本站內(nèi)容請(qǐng)注明原作者名及出處

清除 Cookies - ChinaUnix - Archiver - WAP - TOP