1028 hadoop in practice

537 11 0
  • Loading ...
1/537 trang
Tải xuống

Thông tin tài liệu

Ngày đăng: 11/07/2018, 16:26

IN PRACTICE Alex Holmes MANNING www.it-ebooks.info Hadoop in Practice www.it-ebooks.info www.it-ebooks.info Hadoop in Practice ALEX HOLMES MANNING SHELTER ISLAND www.it-ebooks.info For online information and ordering of this and other Manning books, please visit www.manning.com The publisher offers discounts on this book when ordered in quantity For more information, please contact Special Sales Department Manning Publications Co 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Email: orders@manning.com ©2012 by Manning Publications Co All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine Manning Publications Co 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Development editor: Copyeditors: Proofreader: Typesetter: Illustrator: Cover designer: ISBN 9781617290237 Printed in the United States of America 10 – MAL – 17 16 15 14 13 12 www.it-ebooks.info Cynthia Kane Bob Herbtsman, Tara Walsh Katie Tennant Gordan Salinovic Martin Murtonen Marija Tudor To Michal, Marie, Oliver, Ollie, Mish, and Anch www.it-ebooks.info www.it-ebooks.info brief contents PART BACKGROUND AND FUNDAMENTALS .1 PART PART PART ■ Hadoop in a heartbeat DATA LOGISTICS 25 ■ Moving data in and out of Hadoop 27 ■ Data serialization—working with text and beyond 83 BIG DATA PATTERNS 137 ■ Applying MapReduce patterns to big data 139 ■ Streamlining HDFS for big data 169 ■ Diagnosing and tuning performance problems 194 DATA SCIENCE 251 ■ Utilizing data structures and algorithms 253 ■ Integrating R and Hadoop for statistics and more 285 ■ Predictive analytics with Mahout 305 vii www.it-ebooks.info viii PART BRIEF CONTENTS TAMING THE ELEPHANT .333 10 ■ Hacking with Hive 335 11 ■ Programming pipelines with Pig 359 12 ■ Crunch and other technologies 394 13 ■ Testing and debugging 410 www.it-ebooks.info contents preface xv acknowledgments xvii about this book xviii PART BACKGROUND AND FUNDAMENTALS 1 Hadoop in a heartbeat 1.1 1.2 1.3 What is Hadoop? Running Hadoop 14 Chapter summary 23 PART DATA LOGISTICS .25 Moving data in and out of Hadoop 27 2.1 2.2 Key elements of ingress and egress Moving data into Hadoop 30 TECHNIQUE TECHNIQUE TECHNIQUE TECHNIQUE TECHNIQUE 29 Pushing system log messages into HDFS with Flume 33 An automated mechanism to copy files into HDFS 43 Scheduling regular ingress activities with Oozie Database ingress with MapReduce 53 Using Sqoop to import data from MySQL 58 ix www.it-ebooks.info 48 .. .Hadoop in Practice www.it-ebooks.info www.it-ebooks.info Hadoop in Practice ALEX HOLMES MANNING SHELTER ISLAND www.it-ebooks.info For online information and ordering of this and other Manning... www.it-ebooks.info www.it-ebooks.info Hadoop in a heartbeat This chapter covers ■ Understanding the Hadoop ecosystem ■ Downloading and installing Hadoop ■ Running a MapReduce job We live in the age... Diagnosing and tuning performance problems 6.1 6.2 194 Measuring MapReduce and your environment 195 Determining the cause of your performance woes 198 TECHNIQUE 28 Investigating spikes in input
- Xem thêm -

Xem thêm: 1028 hadoop in practice , 1028 hadoop in practice , 4 Rhipe—Client-side R and Hadoop working together, 5 RHadoop—a simpler integration of client-side R and Hadoop

Mục lục

Xem thêm

Gợi ý tài liệu liên quan cho bạn

Nhận lời giải ngay chưa đến 10 phút Đăng bài tập ngay