Hadoop 在 Facebook ..＠克理斯在 Baning~｜PChome Online 個人新聞台

2008-10-06 01:01:20| 人氣1,424| 回應4 | 上一篇 | 下一篇

Hadoop 在 Facebook ..

推薦 0 收藏 0 轉貼0 訂閱站台

上圖就是年僅24歲 (May 14, 1984) Facebook的創辦人「Mark Zuckerberg」

他採用了 Apache 專案中的 Hadoop 來作為後端整個分散式平行處理的Framework~

原文來自：

Engineering @ Facebook 的文章 http://www.facebook.com/note.php?note_id=16121578919

Hadoop

作者：Joydeep Sen Sarma 筆記 2008年6月4日 22:33

With tens of millions of users and more than a billion page views every day, Facebook ends up accumulating massive amounts of data. One of the challenges that we have faced since the early days is developing a scalable way of storing and processing all these bytes since using this historical data is a very big part of how we can improve the user experience on Facebook. This can only be done by empowering our engineers and analysts with easy to use tools to mine and manipulate large data sets.

About a year back we began playing around with an open source project called Hadoop. Hadoop provides a framework for large scale parallel processing using a distributed file system and the map-reduce programming paradigm. Our hesitant first steps of importing some interesting data sets into a relatively small Hadoop cluster were quickly rewarded as developers latched on to the map-reduce programming model and started doing interesting projects that were previously impossible due to their massive computational requirements. Some of these early projects have matured into publicly released features (like the Facebook Lexicon) or are being used in the background to improve user experience on Facebook (by improving the relevance of search results, for example).

We have come a long way from those initial days. Facebook has multiple Hadoop clusters deployed now - with the biggest having about 2500 cpu cores and 1 PetaByte of disk space. We are loading over 250 gigabytes of compressed data (over 2 terabytes uncompressed) into the Hadoop file system every day and have hundreds of jobs running each day against these data sets. The list of projects that are using this infrastructure has proliferated - from those generating mundane statistics about site usage, to others being used to fight spam and determine application quality. An amazingly large fraction of our engineers have run Hadoop jobs at some point (which is also a great testament to the quality of technical talent here at Facebook).

The rapid adoption of Hadoop at Facebook has been aided by a couple of key decisions. First, developers are free to write map-reduce programs in the language of their choice. Second, we have embraced SQL as a familiar paradigm to address and operate on large data sets. Most data stored in Hadoop's file system is published as Tables. Developers can explore the schemas and data of these tables much like they would do with a good old database. When they want to operate on these data sets, they can use a small subset of SQL to specify the required dataset. Operations on datasets can be written as map and reduce scripts or using standard query operators (like joins and group-bys) or as a mix of the two. Over time, we have added classic data warehouse features like partitioning, sampling and indexing to this environment. This in-house data warehousing layer over Hadoop is called Hive and we are looking forward to releasing an open source version of this project in the near future.

At Facebook, it is incredibly important that we use the information generated by and from our users to make decisions about improvements to the product. Hadoop has enabled us to make better use of the data at our disposal. So we'd like to take this opportunity to say, "Thank you" to all the people who have contributed to this awesome open-source project.

Joydeep is a Facebook Engineer

我要檢舉

#hadoop#facebook

台長：克理斯在 Internet!

您可能對以下文章有興趣

Android - What is Android? 的繁體中文翻譯

(轉貼) Google 的第一款 Android 手機上市搶先看

介紹一下 Android 及對它的看法

在 Windows 平台上架構 Hadoop 開發環境

人氣(1,424) | 回應(4)| 推薦 (0)| 收藏 (0)| 轉寄
全站分類: 數位資訊(科技、網路、通訊、家電) | 個人分類: 科技紀錄 |
此分類下一篇:在 Windows 平台上架構 Hadoop 開發環境
此分類上一篇:Google 儲存大量的資料上的取用及安全性

回應(0)

克理斯在 Baning~ 戀上簡單奢華~ 卻選擇愛上美麗的錯~ 再也 追不回我自己~ 偷偷流下眼淚\ ,在夜裡 . 舔舐著妳留給我的記憶, 卻發覺都是妳的倩影. 明早,我會再努力 面對一切! 2,360愛的鼓勵 1訂閱站台

Hadoop 在 Facebook ..

您可能對以下文章有興趣

克理斯在 Baning~ 戀上簡單奢華~ 卻選擇愛上美麗的錯~ 再也追不回我自己~ 偷偷流下眼淚\ ,在夜裡 . 舔舐著妳留給我的記憶, 卻發覺都是妳的倩影. 明早,我會再努力面對一切!
2,360愛的鼓勵 1訂閱站台