Category Archives: hadoop

Resource Scheduler , Calculator, Short-Circuit in Hadoop YARN and HDFS

In order to execute the next-year plan, I search the research topics and technologies in Hadoop YARN and HDFS, then make a note as follows: Since Hadoop YARN was proposed, the new generation technology are continusly discussed. For knowing the … Continue reading

Posted in hadoop, Java, Programming, 學術研究, 程式設計 | Leave a comment

Process SequenceFile without Enabling Hadoop Platform

Recently I got a requirement for reading Hadoop’s SequenceFile without enabling Hadoop Platform. However, most examples introduce the read/write SequenceFile with Hadoop Platform. How do I read such files without hadoop? There’s a tricky solution in this case. 1. Download … Continue reading

Posted in hadoop, Java, Programming, 程式設計 | Leave a comment

XPATH, MapReduce and SAX

SAX Create XPath XPath4Sax SPEX: XPath Evaluation against XML Streams Apache MRQL: MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark.

Posted in hadoop, Java, XML, 程式設計, 雲端運算 | Leave a comment

收穫滿滿的Hadoop Taiwan 2013

此次參加2013 Hadoop Taiwan Conference,收穫很多。(以下是手動隨便寫寫,請勿拘泥writing format) 業界方面的進展比學界又更加跨出一大步,也代表著我們之後如果要發表雲端相關運算的論文或是發展技術, 要特別小心注意這類工具。 由於Big Data時代的來臨,現在的雲端運算處理偏重於「即時」運算,而非「批次」運算。 我們目前所學的hadoop map/reduce只能算是非常基本而已。 對於即時運算的需求恐怕還不太夠(Hive/Pig 也不例外)。 Google先看到這個嚴重情形,繼2009年以來,陸續發表Google Caffeine (for indexing), 可繪製大量網路資訊彼此對應關係的圖表資料庫「Pregel」, 2010年7月發表Google Dremel (for real-time analysis),號稱可完全打敗Hadoop在即時運算處理上的不足。 Google在報告中明確指出,「過去MapReduce需要分多次查詢的資料,Dremel可同時處理,並大幅縮短運算時間」, 因此是為了real-time query而設計的。 此次參加Hadoop Taiwan,聽人家介紹才知道原來有這個強力的project可用。因此,Apache也仿照這個概念, 提出Drill platform. 為了real-time處理夠快,也會導入Message Queue System,例如: Apache Kafaka: The message queue system for … Continue reading

Posted in Big Data, cloud computing, hadoop, 程式設計, 資工, 資訊安全, 軟體(Software) | Leave a comment

Differences in each hadoop version

Hadoop 2.x is developed from hadoop-1.x, with the significant features over hadoop-1.x: HDFS HA for NameNode (manual failover) NextGen MapReduce (YARN) HDFS Federation Performance Wire-compatibility for both HDFS and YARN/MapReduce (using protobufs) Hadoop 0.23.x is a trunk which contains: HDFS … Continue reading

Posted in cloud computing, hadoop, 程式設計, 雲端運算 | 18 Comments

[Hadoop] Hadoop 安裝與國網中心Hadoop實作

在中部某科大上課,教到Hadoop,就把一些教材更正的釋出。Hadoop單機安裝這份跟國網中心提供的單機安裝教學有些不同,差異性在於Hadoop 0.22.x啟動方法跟如何安裝JDK 1.7。請點此觀看在國家高速網路中心Hadoop下實作教到如何在國家高速網路中心上使用Hadoop叢集,順便把這一份教材釋出。 請點此觀看

Posted in cloud computing, hadoop, Java, 程式設計 | Leave a comment