亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

Chinaunix

標題: 基于hadoop大規(guī)模數(shù)據(jù)排序算法---韓旭紅組 第一次報告 [打印本頁]

作者: xuyuanchao_cnu    時間: 2011-12-23 02:39
標題: 基于hadoop大規(guī)模數(shù)據(jù)排序算法---韓旭紅組 第一次報告
<DIV>
<P style="TEXT-ALIGN: center" align=center><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835600685U.jpg" target="_blank"></A>基于<SPAN lang=EN-US>hadoop</SPAN>的大規(guī)模數(shù)據(jù)排序算法</FONT></SPAN></B></P>
<P style="TEXT-ALIGN: center" align=center><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt"><FONT size=5 face=宋體>(第一次報告)</FONT></SPAN></B></P>
<P><FONT face=宋體><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt" lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</SPAN></SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 15pt" lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><FONT size=4>-------2011.9.11</FONT></SPAN></B></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp; </SPAN></SPAN>小組成員:</FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN>組長:韓旭紅<SPAN lang=EN-US> 1091000161</SPAN></FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;</SPAN></SPAN>組員:李巍<SPAN lang=EN-US> 1091000167&nbsp;&nbsp; </SPAN></FONT></FONT><FONT size=3><FONT face=宋體>李越<SPAN lang=EN-US> 1091000169</SPAN></FONT></FONT><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><SPAN style="mso-spacerun: yes">&nbsp;</SPAN></SPAN>閆悅<SPAN lang=EN-US> 1091000178</SPAN></FONT></FONT></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圓" lang=EN-US><SPAN style="mso-list: Ignore">一.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">簡介</SPAN></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-no-proof: yes" lang=EN-US><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"><IMG style="WIDTH: 273px; HEIGHT: 156px" border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" width=373 .load="imgResize(this, 650);" height=207 ;></A></FONT></SPAN></SPAN></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">圖表</SPAN><FONT face=Cambria> <SPAN lang=EN-US><SPAN style="mso-no-proof: yes">1</SPAN></SPAN><SPAN lang=EN-US> hadoop</SPAN></FONT></FONT></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><FONT face=Cambria><SPAN lang=EN-US>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></FONT></FONT><FONT face=宋體><SPAN style="FONT-SIZE: 10.5pt" lang=EN-US>Hadoop</SPAN><SPAN style="FONT-SIZE: 10.5pt">是一個<SPAN lang=EN-US><A href="http://baike.baidu.com/view/991489.htm" target=_blank><SPAN style="COLOR: windowtext; FONT-SIZE: 12pt; TEXT-DECORATION: none; text-underline: none" lang=EN-US><SPAN lang=EN-US>分布式系統(tǒng)</SPAN></SPAN></A></SPAN>基礎架構,由<SPAN lang=EN-US>Apache</SPAN>基金會開發(fā)。用戶可以在不了解分布式底層細節(jié)的情況下,開發(fā)分布式程序。充分利用集群的威力高速運算和存儲。<SPAN lang=EN-US>Hadoop</SPAN>實現(xiàn)了一個<SPAN lang=EN-US><A href="http://baike.baidu.com/view/771589.htm" target=_blank><SPAN style="COLOR: windowtext; FONT-SIZE: 12pt; TEXT-DECORATION: none; text-underline: none" lang=EN-US><SPAN lang=EN-US>分布式文件系統(tǒng)</SPAN></SPAN></A></SPAN>,簡稱<SPAN lang=EN-US>HDFS</SPAN>。<SPAN lang=EN-US>HDFS</SPAN>有著高容錯性的特點,并且設計用來部署在低廉的硬件上。而且它提供高傳輸率來訪問<SPAN lang=EN-US><A href="http://baike.baidu.com/view/330120.htm" target=_blank><SPAN style="COLOR: windowtext; FONT-SIZE: 12pt; TEXT-DECORATION: none; text-underline: none" lang=EN-US><SPAN lang=EN-US>應用程序</SPAN></SPAN></A></SPAN>的數(shù)據(jù),適合那些有著超大數(shù)據(jù)集的應用程序。<SPAN lang=EN-US>HDFS</SPAN>放寬了<SPAN lang=EN-US>POSIX</SPAN>的要求這樣可以流的形式訪問文件系統(tǒng)中的數(shù)據(jù)。<SPAN lang=EN-US></SPAN></SPAN></FONT></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圓" lang=EN-US><SPAN style="mso-list: Ignore">二.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold" lang=EN-US>hadoop</SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">架構<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="PAGE-BREAK-AFTER: avoid; TEXT-ALIGN: left; TEXT-INDENT: 0cm; MARGIN: 0cm 0cm 0pt 53.25pt; mso-pagination: widow-orphan; mso-char-indent-count: 0" class=MsoListParagraph align=left><SPAN style="mso-font-kerning: 0pt; mso-no-proof: yes" lang=EN-US><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A>&nbsp;&nbsp;&nbsp; <a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" target="_blank"><IMG border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" .load="imgResize(this, 650);" ;></A></FONT></SPAN><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A></FONT></SPAN></SPAN></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">圖表</SPAN><FONT face=Cambria> <SPAN lang=EN-US><SPAN style="mso-no-proof: yes">2</SPAN></SPAN><SPAN lang=EN-US> hadoop</SPAN></FONT><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">架構</SPAN><SPAN style="FONT-FAMILY: 宋體; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt" lang=EN-US></SPAN></FONT></P>
<P style="MARGIN-LEFT: 53.25pt"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold" lang=EN-US>&nbsp;</SPAN></P>
<P style="TEXT-ALIGN: left; LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white; mso-pagination: widow-orphan" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋體; LETTER-SPACING: 0.4pt; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt">  <SPAN lang=EN-US><SPAN style="FONT-FAMILY: 宋體; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"> </SPAN></SPAN></SPAN><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes"><SPAN style="FONT-FAMILY: 宋體; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt" lang=EN-US>Hadoop </SPAN><SPAN style="FONT-FAMILY: 宋體; FONT-SIZE: 12pt; mso-bidi-font-family: 宋體; mso-font-kerning: 0pt">有許多元素構成。其最底部是<SPAN lang=EN-US>HDFS</SPAN>,它存儲<SPAN lang=EN-US> Hadoop </SPAN>集群中所有存儲節(jié)點上的文件。<SPAN lang=EN-US>HDFS</SPAN>的上一層是<SPAN lang=EN-US> MapReduce </SPAN>引擎,該引擎由<SPAN lang=EN-US> JobTrackers </SPAN>和<SPAN lang=EN-US> TaskTrackers </SPAN>組成。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US></SPAN></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圓" lang=EN-US><SPAN style="mso-list: Ignore">三.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">分布式計算模型<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"></SPAN></SPAN></FONT></FONT>&nbsp;</P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3><FONT face=宋體><SPAN lang=EN-US><SPAN style="mso-spacerun: yes"></SPAN></SPAN>一個<SPAN lang=EN-US>hadoop</SPAN>集群往往有幾十臺甚至成百上千臺<SPAN lang=EN-US>low cost</SPAN>的計算機組成,我們運行的每一個任務都要在這些計算機上做任務的分發(fā),執(zhí)行中間數(shù)據(jù)排序以及最后的匯總,期間還包含節(jié)點發(fā)現(xiàn),任務的重試,故障節(jié)點替換等等等等的維護以及異常情況處理。</FONT></FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3><FONT face=宋體>所以說<SPAN lang=EN-US>hadoop</SPAN>就是一個計算模型。一個分布式的計算模型。<STRONG><SPAN style="FONT-FAMILY: 宋體; FONT-WEIGHT: normal; mso-bidi-font-family: 宋體" lang=EN-US></SPAN></STRONG></FONT></FONT></P>
<P><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">四.<SPAN lang=EN-US>Hadoop</SPAN>的大規(guī)模數(shù)據(jù)排序算法<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋體>使用<SPAN lang=EN-US>hadoop</SPAN>進行大量的數(shù)據(jù)排序排序最直觀的方法是把文件所有內(nèi)容給<SPAN lang=EN-US>map</SPAN>之后,<SPAN lang=EN-US>map</SPAN>不做任何處理,直接輸出給一個<SPAN lang=EN-US>reduce</SPAN>,利用<SPAN lang=EN-US>hadoop</SPAN>的自己的<SPAN lang=EN-US>shuffle</SPAN>機制,對所有數(shù)據(jù)進行排序,而后由<SPAN lang=EN-US>reduce</SPAN>直接輸出。</FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋體>然而這樣的方法跟單機毫無差別,完全無法用到多機分布式計算的便利。因此這種方法是不行的。</FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋體>利用<SPAN lang=EN-US>hadoop</SPAN>分而治之的計算模型,可以參照快速排序的思想。在這里我們先簡單回憶一下快速排序。快速排序基本步驟就是需要現(xiàn)在所有數(shù)據(jù)中選取一個作為支點。然后將大于這個支點的放在一邊,小于這個支點的放在另一邊。</FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋體>設想如果我們有<SPAN lang=EN-US>N</SPAN>個支點(這里可以稱為標尺),就可以把所有的數(shù)據(jù)分成<SPAN lang=EN-US>N+1</SPAN>個<SPAN lang=EN-US>part</SPAN>,將這<SPAN lang=EN-US>N+1</SPAN>個<SPAN lang=EN-US>part</SPAN>丟給<SPAN lang=EN-US>reduce</SPAN>,由<SPAN lang=EN-US>hadoop</SPAN>自動排序,最后輸出<SPAN lang=EN-US>N+1</SPAN>個內(nèi)部有序的文件,再把這<SPAN lang=EN-US>N+1</SPAN>個文件首尾相連合并成一個文件,收工。</FONT></P>
<P><FONT size=3 face=宋體>由此我們可以歸納出這樣一個用<SPAN lang=EN-US>hadoop</SPAN>對大量數(shù)據(jù)排序的步驟:</FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>1</SPAN>)<SPAN lang=EN-US>&nbsp; </SPAN>對待排序數(shù)據(jù)進行抽樣;</FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>2</SPAN>)<SPAN lang=EN-US>&nbsp; </SPAN>對抽樣數(shù)據(jù)進行排序,產(chǎn)生標尺;</FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>3</SPAN>)<SPAN lang=EN-US>&nbsp; Map</SPAN>對輸入的每條數(shù)據(jù)計算其處于哪兩個標尺之間;將數(shù)據(jù)發(fā)給對應區(qū)間<SPAN lang=EN-US>ID</SPAN>的<SPAN lang=EN-US>reduce</SPAN></FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>4</SPAN>)<SPAN lang=EN-US>&nbsp; Reduce</SPAN>將獲得數(shù)據(jù)直接輸出。</FONT></FONT></P>
<P><FONT size=3 face=宋體>這里使用對一組<SPAN lang=EN-US>url</SPAN>進行排序來作為例子:</FONT></P>
<P style="PAGE-BREAK-AFTER: avoid"><SPAN lang=EN-US><FONT size=3 face=宋體>&nbsp;</FONT></P>
<P style="PAGE-BREAK-AFTER: avoid" align=center><SPAN style="COLOR: blue; TEXT-DECORATION: none; mso-no-proof: yes; text-underline: none"><SPAN style="FONT-SIZE: 22pt"><FONT face=宋體><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835600685U.jpg" target="_blank"><IMG border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835600685U.jpg" .load="imgResize(this, 650);" ;></A></FONT></SPAN></SPAN></SPAN></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">圖表</SPAN><FONT face=Cambria> <SPAN lang=EN-US><SPAN style="mso-no-proof: yes">3</SPAN></SPAN><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp; </SPAN><SPAN style="mso-spacerun: yes">&nbsp;</SPAN>url</SPAN></FONT><SPAN style="FONT-FAMILY: 黑體; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">排序</SPAN></FONT></P>
<P><FONT size=3 face=宋體>這里還有一點小問題要處理:如何將數(shù)據(jù)發(fā)給一個指定<SPAN lang=EN-US>ID</SPAN>的<SPAN lang=EN-US>reduce</SPAN>?<SPAN lang=EN-US>hadoop</SPAN>提供了多種分區(qū)算法。這些算法根據(jù)<SPAN lang=EN-US>map</SPAN>輸出的數(shù)據(jù)的<SPAN lang=EN-US>key</SPAN>來確定此數(shù)據(jù)應該發(fā)給哪個<SPAN lang=EN-US>reduce</SPAN>(<SPAN lang=EN-US>reduce</SPAN>的排序也依賴<SPAN lang=EN-US>key</SPAN>)。因此,如果需要將數(shù)據(jù)發(fā)給某個<SPAN lang=EN-US>reduce</SPAN>,只要在輸出數(shù)據(jù)的同時,提供一個<SPAN lang=EN-US> key</SPAN>(在上面這個例子中就是<SPAN lang=EN-US>reduce</SPAN>的<SPAN lang=EN-US>ID+url</SPAN>),數(shù)據(jù)就該去哪兒去哪兒了。</FONT></P>
<P><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">五.注意事項<SPAN lang=EN-US></SPAN></SPAN></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>1) </SPAN>標尺的抽取應該盡可能的均勻,這與快速排序很多變種算法均是強調支點的選取是一致的。</FONT></FONT></P>
<P><FONT size=3><FONT face=宋體><SPAN lang=EN-US>2) HDFS</SPAN>是一種讀寫性能很不對稱的文件系統(tǒng)。應該盡可能的利用其讀性能很強的特點。減少對寫文件和<SPAN lang=EN-US>shuffle</SPAN>操作的依賴。舉例來說,當需要根據(jù)數(shù)據(jù)的統(tǒng)計情況來決定對數(shù)據(jù)的處理的時候。將統(tǒng)計和數(shù)據(jù)處理分成兩輪<SPAN lang=EN-US>map-reduce</SPAN>比將統(tǒng)計信息合并和數(shù)據(jù)處理都放到一個<SPAN lang=EN-US>reduce</SPAN>中要快速的多。</FONT></FONT></P>
<P><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">六.總結</SPAN><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 幼圓; FONT-SIZE: 18pt" lang=EN-US></SPAN></B></P>
<P style="TEXT-INDENT: 21pt; mso-char-indent-count: 2.0"><SPAN class=apple-style-span><SPAN style="FONT-FAMILY: 'Simsun','serif'; COLOR: black; FONT-SIZE: 10.5pt" lang=EN-US>Hadoop</SPAN></SPAN><FONT face=宋體><SPAN class=apple-style-span><SPAN style="COLOR: black; FONT-SIZE: 10.5pt; mso-ascii-font-family: Simsun; mso-hansi-font-family: Simsun">實際是一種以數(shù)據(jù)為驅動的計算模型,結合</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="FONT-FAMILY: 'Simsun','serif'; COLOR: black; FONT-SIZE: 10.5pt" lang=EN-US>MapReduce</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="COLOR: black; FONT-SIZE: 10.5pt; mso-ascii-font-family: Simsun; mso-hansi-font-family: Simsun">和</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="FONT-FAMILY: 'Simsun','serif'; COLOR: black; FONT-SIZE: 10.5pt" lang=EN-US>HDFS</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="COLOR: black; FONT-SIZE: 10.5pt; mso-ascii-font-family: Simsun; mso-hansi-font-family: Simsun">,將任務運行在數(shù)據(jù)存放的計算節(jié)點上,充分利用了計算節(jié)點的存儲和計算資源,同時也大大節(jié)省了網(wǎng)絡傳輸數(shù)據(jù)的開銷。</SPAN></SPAN></FONT></P></DIV>




歡迎光臨 Chinaunix (http://www.72891.cn/) Powered by Discuz! X3.2