亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

  免費(fèi)注冊 查看新帖 |

Chinaunix

  平臺(tái) 論壇 博客 文庫
最近訪問板塊 發(fā)新帖
查看: 2078 | 回復(fù): 0
打印 上一主題 下一主題

[Hadoop&HBase] 基于hadoop大規(guī)模數(shù)據(jù)排序算法---韓旭紅組 第一次報(bào)告 [復(fù)制鏈接]

論壇徽章:
0
跳轉(zhuǎn)到指定樓層
1 [收藏(0)] [報(bào)告]
發(fā)表于 2011-12-23 02:39 |只看該作者 |倒序?yàn)g覽
<DIV>
<P style="TEXT-ALIGN: center" align=center><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835600685U.jpg" target="_blank"></A>基于<SPAN lang=EN-US>hadoop</SPAN>的大規(guī)模數(shù)據(jù)排序算法</FONT></SPAN></B></P>
<P style="TEXT-ALIGN: center" align=center><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt"><FONT size=5 face=宋體>(第一次報(bào)告)</FONT></SPAN></B></P>
<P><FONT face=宋體><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt" lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</SPAN></SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 15pt" lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><FONT size=4>-------2011.9.11</FONT></SPAN></B></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp; </SPAN></SPAN>小組成員:</FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN>組長:韓旭紅<SPAN lang=EN-US> 1091000161</SPAN></FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;</SPAN></SPAN>組員:李巍<SPAN lang=EN-US> 1091000167&nbsp;&nbsp; </SPAN></FONT></FONT><FONT size=3><FONT face=宋體>李越<SPAN lang=EN-US> 1091000169</SPAN></FONT></FONT><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><SPAN style="mso-spacerun: yes">&nbsp;</SPAN></SPAN>閆悅<SPAN lang=EN-US> 1091000178</SPAN></FONT></FONT></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圓" lang=EN-US><SPAN style="mso-list: Ignore">一.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">簡介</SPAN></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-no-proof: yes" lang=EN-US><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"><IMG style="WIDTH: 273px; HEIGHT: 156px" border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" width=373 .load="imgResize(this, 650);" height=207 ;></A></FONT></SPAN></SPAN></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">圖表</SPAN><FONT face=Cambria> <SPAN lang=EN-US><SPAN style="mso-no-proof: yes">1</SPAN></SPAN><SPAN lang=EN-US> hadoop</SPAN></FONT></FONT></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><FONT face=Cambria><SPAN lang=EN-US>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></FONT></FONT><FONT face=宋體><SPAN style="FONT-SIZE: 10.5pt" lang=EN-US>Hadoop</SPAN><SPAN style="FONT-SIZE: 10.5pt">是一個(gè)<SPAN lang=EN-US><A href="http://baike.baidu.com/view/991489.htm" target=_blank><SPAN style="COLOR: windowtext; FONT-SIZE: 12pt; TEXT-DECORATION: none; text-underline: none" lang=EN-US><SPAN lang=EN-US>分布式系統(tǒng)</SPAN></SPAN></A></SPAN>基礎(chǔ)架構(gòu),由<SPAN lang=EN-US>Apache</SPAN>基金會(huì)開發(fā)。用戶可以在不了解分布式底層細(xì)節(jié)的情況下,開發(fā)分布式程序。充分利用集群的威力高速運(yùn)算和存儲(chǔ)。<SPAN lang=EN-US>Hadoop</SPAN>實(shí)現(xiàn)了一個(gè)<SPAN lang=EN-US><A href="http://baike.baidu.com/view/771589.htm" target=_blank><SPAN style="COLOR: windowtext; FONT-SIZE: 12pt; TEXT-DECORATION: none; text-underline: none" lang=EN-US><SPAN lang=EN-US>分布式文件系統(tǒng)</SPAN></SPAN></A></SPAN>,簡稱<SPAN lang=EN-US>HDFS</SPAN>。<SPAN lang=EN-US>HDFS</SPAN>有著高容錯(cuò)性的特點(diǎn),并且設(shè)計(jì)用來部署在低廉的硬件上。而且它提供高傳輸率來訪問<SPAN lang=EN-US><A href="http://baike.baidu.com/view/330120.htm" target=_blank><SPAN style="COLOR: windowtext; FONT-SIZE: 12pt; TEXT-DECORATION: none; text-underline: none" lang=EN-US><SPAN lang=EN-US>應(yīng)用程序</SPAN></SPAN></A></SPAN>的數(shù)據(jù),適合那些有著超大數(shù)據(jù)集的應(yīng)用程序。<SPAN lang=EN-US>HDFS</SPAN>放寬了<SPAN lang=EN-US>POSIX</SPAN>的要求這樣可以流的形式訪問文件系統(tǒng)中的數(shù)據(jù)。<SPAN lang=EN-US></SPAN></SPAN></FONT></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圓" lang=EN-US><SPAN style="mso-list: Ignore">二.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold" lang=EN-US>hadoop</SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">架構(gòu)<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="PAGE-BREAK-AFTER: avoid; TEXT-ALIGN: left; TEXT-INDENT: 0cm; MARGIN: 0cm 0cm 0pt 53.25pt; mso-pagination: widow-orphan; mso-char-indent-count: 0" class=MsoListParagraph align=left><SPAN style="mso-font-kerning: 0pt; mso-no-proof: yes" lang=EN-US><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A>&nbsp;&nbsp;&nbsp; <a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" target="_blank"><IMG border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" .load="imgResize(this, 650);" ;></A></FONT></SPAN><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A></FONT></SPAN></SPAN></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">圖表</SPAN><FONT face=Cambria> <SPAN lang=EN-US><SPAN style="mso-no-proof: yes">2</SPAN></SPAN><SPAN lang=EN-US> hadoop</SPAN></FONT><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">架構(gòu)</SPAN><SPAN style="FONT-FAMILY: 宋體; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt" lang=EN-US></SPAN></FONT></P>
<P style="MARGIN-LEFT: 53.25pt"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold" lang=EN-US>&nbsp;</SPAN></P>
<P style="TEXT-ALIGN: left; LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white; mso-pagination: widow-orphan" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋體; LETTER-SPACING: 0.4pt; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt">  <SPAN lang=EN-US><SPAN style="FONT-FAMILY: 宋體; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"> </SPAN></SPAN></SPAN><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes"><SPAN style="FONT-FAMILY: 宋體; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt" lang=EN-US>Hadoop </SPAN><SPAN style="FONT-FAMILY: 宋體; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt">有許多元素構(gòu)成。其最底部是<SPAN lang=EN-US>HDFS</SPAN>,它存儲(chǔ)<SPAN lang=EN-US> Hadoop </SPAN>集群中所有存儲(chǔ)節(jié)點(diǎn)上的文件。<SPAN lang=EN-US>HDFS</SPAN>的上一層是<SPAN lang=EN-US> MapReduce </SPAN>引擎,該引擎由<SPAN lang=EN-US> JobTrackers </SPAN>和<SPAN lang=EN-US> TaskTrackers </SPAN>組成。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US></SPAN></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圓" lang=EN-US><SPAN style="mso-list: Ignore">三.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">分布式計(jì)算模型<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"></SPAN></SPAN></FONT></FONT>&nbsp;</P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes"></SPAN></SPAN>一個(gè)<SPAN lang=EN-US>hadoop</SPAN>集群往往有幾十臺(tái)甚至成百上千臺(tái)<SPAN lang=EN-US>low cost</SPAN>的計(jì)算機(jī)組成,我們運(yùn)行的每一個(gè)任務(wù)都要在這些計(jì)算機(jī)上做任務(wù)的分發(fā),執(zhí)行中間數(shù)據(jù)排序以及最后的匯總,期間還包含節(jié)點(diǎn)發(fā)現(xiàn),任務(wù)的重試,故障節(jié)點(diǎn)替換等等等等的維護(hù)以及異常情況處理。</FONT></FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3><FONT face=宋體>所以說<SPAN lang=EN-US>hadoop</SPAN>就是一個(gè)計(jì)算模型。一個(gè)分布式的計(jì)算模型。<STRONG><SPAN style="FONT-FAMILY: 宋體; FONT-WEIGHT: normal; mso-bidi-font-family: 宋體" lang=EN-US></SPAN></STRONG></FONT></FONT></P>
<P><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">四.<SPAN lang=EN-US>Hadoop</SPAN>的大規(guī)模數(shù)據(jù)排序算法<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋體>使用<SPAN lang=EN-US>hadoop</SPAN>進(jìn)行大量的數(shù)據(jù)排序排序最直觀的方法是把文件所有內(nèi)容給<SPAN lang=EN-US>map</SPAN>之后,<SPAN lang=EN-US>map</SPAN>不做任何處理,直接輸出給一個(gè)<SPAN lang=EN-US>reduce</SPAN>,利用<SPAN lang=EN-US>hadoop</SPAN>的自己的<SPAN lang=EN-US>shuffle</SPAN>機(jī)制,對(duì)所有數(shù)據(jù)進(jìn)行排序,而后由<SPAN lang=EN-US>reduce</SPAN>直接輸出。</FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋體>然而這樣的方法跟單機(jī)毫無差別,完全無法用到多機(jī)分布式計(jì)算的便利。因此這種方法是不行的。</FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋體>利用<SPAN lang=EN-US>hadoop</SPAN>分而治之的計(jì)算模型,可以參照快速排序的思想。在這里我們先簡單回憶一下快速排序?焖倥判蚧静襟E就是需要現(xiàn)在所有數(shù)據(jù)中選取一個(gè)作為支點(diǎn)。然后將大于這個(gè)支點(diǎn)的放在一邊,小于這個(gè)支點(diǎn)的放在另一邊。</FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋體>設(shè)想如果我們有<SPAN lang=EN-US>N</SPAN>個(gè)支點(diǎn)(這里可以稱為標(biāo)尺),就可以把所有的數(shù)據(jù)分成<SPAN lang=EN-US>N+1</SPAN>個(gè)<SPAN lang=EN-US>part</SPAN>,將這<SPAN lang=EN-US>N+1</SPAN>個(gè)<SPAN lang=EN-US>part</SPAN>丟給<SPAN lang=EN-US>reduce</SPAN>,由<SPAN lang=EN-US>hadoop</SPAN>自動(dòng)排序,最后輸出<SPAN lang=EN-US>N+1</SPAN>個(gè)內(nèi)部有序的文件,再把這<SPAN lang=EN-US>N+1</SPAN>個(gè)文件首尾相連合并成一個(gè)文件,收工。</FONT></P>
<P><FONT size=3 face=宋體>由此我們可以歸納出這樣一個(gè)用<SPAN lang=EN-US>hadoop</SPAN>對(duì)大量數(shù)據(jù)排序的步驟:</FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>1</SPAN>)<SPAN lang=EN-US>&nbsp; </SPAN>對(duì)待排序數(shù)據(jù)進(jìn)行抽樣;</FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>2</SPAN>)<SPAN lang=EN-US>&nbsp; </SPAN>對(duì)抽樣數(shù)據(jù)進(jìn)行排序,產(chǎn)生標(biāo)尺;</FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>3</SPAN>)<SPAN lang=EN-US>&nbsp; Map</SPAN>對(duì)輸入的每條數(shù)據(jù)計(jì)算其處于哪兩個(gè)標(biāo)尺之間;將數(shù)據(jù)發(fā)給對(duì)應(yīng)區(qū)間<SPAN lang=EN-US>ID</SPAN>的<SPAN lang=EN-US>reduce</SPAN></FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>4</SPAN>)<SPAN lang=EN-US>&nbsp; Reduce</SPAN>將獲得數(shù)據(jù)直接輸出。</FONT></FONT></P>
<P><FONT size=3 face=宋體>這里使用對(duì)一組<SPAN lang=EN-US>url</SPAN>進(jìn)行排序來作為例子:</FONT></P>
<P style="PAGE-BREAK-AFTER: avoid"><SPAN lang=EN-US><FONT size=3 face=宋體>&nbsp;</FONT></P>
<P style="PAGE-BREAK-AFTER: avoid" align=center><SPAN style="COLOR: blue; TEXT-DECORATION: none; mso-no-proof: yes; text-underline: none"><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835600685U.jpg" target="_blank"><IMG border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835600685U.jpg" .load="imgResize(this, 650);" ;></A></FONT></SPAN></SPAN></SPAN></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">圖表</SPAN><FONT face=Cambria> <SPAN lang=EN-US><SPAN style="mso-no-proof: yes">3</SPAN></SPAN><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp; </SPAN><SPAN style="mso-spacerun: yes">&nbsp;</SPAN>url</SPAN></FONT><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">排序</SPAN></FONT></P>
<P><FONT size=3 face=宋體>這里還有一點(diǎn)小問題要處理:如何將數(shù)據(jù)發(fā)給一個(gè)指定<SPAN lang=EN-US>ID</SPAN>的<SPAN lang=EN-US>reduce</SPAN>?<SPAN lang=EN-US>hadoop</SPAN>提供了多種分區(qū)算法。這些算法根據(jù)<SPAN lang=EN-US>map</SPAN>輸出的數(shù)據(jù)的<SPAN lang=EN-US>key</SPAN>來確定此數(shù)據(jù)應(yīng)該發(fā)給哪個(gè)<SPAN lang=EN-US>reduce</SPAN>(<SPAN lang=EN-US>reduce</SPAN>的排序也依賴<SPAN lang=EN-US>key</SPAN>)。因此,如果需要將數(shù)據(jù)發(fā)給某個(gè)<SPAN lang=EN-US>reduce</SPAN>,只要在輸出數(shù)據(jù)的同時(shí),提供一個(gè)<SPAN lang=EN-US> key</SPAN>(在上面這個(gè)例子中就是<SPAN lang=EN-US>reduce</SPAN>的<SPAN lang=EN-US>ID+url</SPAN>),數(shù)據(jù)就該去哪兒去哪兒了。</FONT></P>
<P><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">五.注意事項(xiàng)<SPAN lang=EN-US></SPAN></SPAN></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>1) </SPAN>標(biāo)尺的抽取應(yīng)該盡可能的均勻,這與快速排序很多變種算法均是強(qiáng)調(diào)支點(diǎn)的選取是一致的。</FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>2) HDFS</SPAN>是一種讀寫性能很不對(duì)稱的文件系統(tǒng)。應(yīng)該盡可能的利用其讀性能很強(qiáng)的特點(diǎn)。減少對(duì)寫文件和<SPAN lang=EN-US>shuffle</SPAN>操作的依賴。舉例來說,當(dāng)需要根據(jù)數(shù)據(jù)的統(tǒng)計(jì)情況來決定對(duì)數(shù)據(jù)的處理的時(shí)候。將統(tǒng)計(jì)和數(shù)據(jù)處理分成兩輪<SPAN lang=EN-US>map-reduce</SPAN>比將統(tǒng)計(jì)信息合并和數(shù)據(jù)處理都放到一個(gè)<SPAN lang=EN-US>reduce</SPAN>中要快速的多。</FONT></FONT></P>
<P><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">六.總結(jié)</SPAN><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt" lang=EN-US></SPAN></B></P>
<P style="TEXT-INDENT: 21pt; mso-char-indent-count: 2.0"><SPAN class=apple-style-span><SPAN style="FONT-FAMILY: 'Simsun','serif'; COLOR: black; FONT-SIZE: 10.5pt" lang=EN-US>Hadoop</SPAN></SPAN><FONT face=宋體><SPAN class=apple-style-span><SPAN style="COLOR: black; FONT-SIZE: 10.5pt; mso-ascii-font-family: Simsun; mso-hansi-font-family: Simsun">實(shí)際是一種以數(shù)據(jù)為驅(qū)動(dòng)的計(jì)算模型,結(jié)合</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="FONT-FAMILY: 'Simsun','serif'; COLOR: black; FONT-SIZE: 10.5pt" lang=EN-US>MapReduce</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="COLOR: black; FONT-SIZE: 10.5pt; mso-ascii-font-family: Simsun; mso-hansi-font-family: Simsun">和</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="FONT-FAMILY: 'Simsun','serif'; COLOR: black; FONT-SIZE: 10.5pt" lang=EN-US>HDFS</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="COLOR: black; FONT-SIZE: 10.5pt; mso-ascii-font-family: Simsun; mso-hansi-font-family: Simsun">,將任務(wù)運(yùn)行在數(shù)據(jù)存放的計(jì)算節(jié)點(diǎn)上,充分利用了計(jì)算節(jié)點(diǎn)的存儲(chǔ)和計(jì)算資源,同時(shí)也大大節(jié)省了網(wǎng)絡(luò)傳輸數(shù)據(jù)的開銷。</SPAN></SPAN></FONT></P></DIV>
您需要登錄后才可以回帖 登錄 | 注冊

本版積分規(guī)則 發(fā)表回復(fù)

  

北京盛拓優(yōu)訊信息技術(shù)有限公司. 版權(quán)所有 京ICP備16024965號(hào)-6 北京市公安局海淀分局網(wǎng)監(jiān)中心備案編號(hào):11010802020122 niuxiaotong@pcpop.com 17352615567
未成年舉報(bào)專區(qū)
中國互聯(lián)網(wǎng)協(xié)會(huì)會(huì)員  聯(lián)系我們:huangweiwei@itpub.net
感謝所有關(guān)心和支持過ChinaUnix的朋友們 轉(zhuǎn)載本站內(nèi)容請注明原作者名及出處

清除 Cookies - ChinaUnix - Archiver - WAP - TOP