Oreilly parallel r

122 400 0
Oreilly parallel r

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Parallel R Q Ethan McCallum and Stephen Weston Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo Parallel R by Q Ethan McCallum and Stephen Weston Copyright © 2012 Q Ethan McCallum and Stephen Weston All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://my.safaribooksonline.com) For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com Editors: Mike Loukides and Meghan Blanchette Production Editor: Kristen Borg Proofreader: O’Reilly Production Services Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Revision History for the First Edition: 2011-10-21 First release See http://oreilly.com/catalog/errata.csp?isbn=9781449309923 for release details Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc Parallel R, the image of a rabbit, and related trade dress are trademarks of O’Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein ISBN: 978-1-449-30992-3 [LSI] 1319202138 Table of Contents Preface vii Getting Started Why R? Why Not R? The Solution: Parallel Execution A Road Map for This Book What We’ll Cover Looking Forward… What We’ll Assume You Already Know In a Hurry? snow multicore parallel R+Hadoop RHIPE Segue Summary 1 2 3 4 4 5 snow Quick Look How It Works Setting Up Working with It Creating Clusters with makeCluster Parallel K-Means Initializing Workers Load Balancing with clusterApplyLB Task Chunking with parLapply Vectorizing with clusterSplit Load Balancing Redux 7 9 10 12 13 15 18 20 iii Functions and Environments Random Number Generation snow Configuration Installing Rmpi Executing snow Programs on a Cluster with Rmpi Executing snow Programs with a Batch Queueing System Troubleshooting snow Programs When It Works… …And When It Doesn’t The Wrap-up 23 25 26 29 30 32 33 35 36 36 multicore 37 Quick Look How It Works Setting Up Working with It The mclapply Function The mc.cores Option The mc.set.seed Option Load Balancing with mclapply The pvec Function The parallel and collect Functions Using collect Options Parallel Random Number Generation The Low-Level API When It Works… …And When It Doesn’t The Wrap-up 37 38 38 39 39 39 40 42 42 43 44 46 47 49 49 49 parallel 51 Quick Look How It Works Setting Up Working with It Getting Started Creating Clusters with makeCluster Parallel Random Number Generation Summary of Differences When It Works… …And When It Doesn’t The Wrap-up iv | Table of Contents 52 52 52 53 53 54 55 57 58 58 58 A Primer on MapReduce and Hadoop 59 Hadoop at Cruising Altitude A MapReduce Primer Thinking in MapReduce: Some Pseudocode Examples Calculate Average Call Length for Each Date Number of Calls by Each User, on Each Date Run a Special Algorithm on Each Record Binary and Whole-File Data: SequenceFiles No Cluster? No Problem! Look to the Clouds… The Wrap-up 59 60 61 62 62 63 63 64 66 R+Hadoop 67 Quick Look How It Works Setting Up Working with It Simple Hadoop Streaming (All Text) Streaming, Redux: Indirectly Working with Binary Data The Java API: Binary Input and Output Processing Related Groups (the Full Map and Reduce Phases) When It Works… …And When It Doesn’t The Wrap-up 67 67 68 68 69 72 74 79 83 83 84 RHIPE 85 Quick Look How It Works Setting Up Working with It Phone Call Records, Redux Tweet Brevity More Complex Tweet Analysis When It Works… …And When It Doesn’t The Wrap-up 85 85 86 87 87 91 96 98 99 100 Segue 101 Quick Look How It Works Setting Up Working with It Model Testing: Parameter Sweep When It Works… 101 102 102 102 102 105 Table of Contents | v …And When It Doesn’t The Wrap-up 105 106 New and Upcoming 107 doRedis RevoScale R and RevoConnectR (RHadoop) cloudNumbers.com vi | Table of Contents 107 108 108 Preface Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords Constant width bold Shows commands or other text that should be typed literally by the user Constant width italic Shows text that should be replaced with user-supplied values or by values determined by context This icon signifies a tip, suggestion, or general note This icon indicates a warning or caution Using Code Examples This book is here to help you get your job done In general, you may use the code in this book in your programs and documentation You not need to contact us for permission unless you’re reproducing a significant portion of the code For example, writing a program that uses several chunks of code from this book does not require permission Selling or distributing a CD-ROM of examples from O’Reilly books does vii require permission Answering a question by citing this book and quoting example code does not require permission Incorporating a significant amount of example code from this book into your product’s documentation does require permission We appreciate, but not require, attribution An attribution usually includes the title, author, publisher, and ISBN For example: “Parallel R by Q Ethan McCallum and Stephen Weston (O'Reilly) Copyright 2012 Q Ethan McCallum and Stephen Weston, 978-1-449-30992-3.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com Safari® Books Online Safari Books Online is an on-demand digital library that lets you easily search over 7,500 technology and creative reference books and videos to find the answers you need quickly With a subscription, you can read any page and watch any video from our library online Read books on your cell phone and mobile devices Access new titles before they are available for print, and get exclusive access to manuscripts in development and post feedback for the authors Copy and paste code samples, organize your favorites, download chapters, bookmark key sections, create notes, print out pages, and benefit from tons of other time-saving features O’Reilly Media has uploaded this book to the Safari Books Online service To have full digital access to this book and others on similar topics from O’Reilly and other publishers, sign up for free at http://my.safaribooksonline.com How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information You can access this page at: http://oreilly.com/catalog/0636920021421 To comment or ask technical questions about this book, send email to: bookquestions@oreilly.com viii | Preface More Complex Tweet Analysis Situation: You need to pass complex data types between Map and Reduce stages; simple strings and numeric types will not suffice The code: In the previous example, you needed to pass just the author’s name and tweet length from the Mappers to the Reducers That was easy: the code just passed the name (a string) and length (a number) to rhcollect() as output key and output value, respectively A tweet is a rich data object, though, so it’s not unlikely that you’d want extract even more information Let’s say that, this time around, you’ve written a custom analysis function that wants a data.frame of the tweet text, user mentions within the tweet, number of retweets, and so on One option would be to call paste() to concatenate those values into a delimited string in the Map phase, then call strsplit() to unpack that string in the Reduce phase (This is, in effect, what you have to for R+Hadoop.) You could still that with RHIPE, but there’s no reason Remember when I said that RHIPE can read and write special SequenceFiles that hold native R objects? It also uses those to transfer data between the Map and Reduce phases In the Map task, then, you can pass a data.frame, a list, or pretty much any other native R object to rhcol lect() You’ll get the same object back in a Reduce task without any translation effort on your part.§ This is one key strength of RHIPE over R+Hadoop: you’re talking native R the whole time Example 7-8 demonstrates those ideas in code Example 7-8 Passing complex values ## setup.block and config.list are the same as in the previous example, ## so we omit them here map.block

Ngày đăng: 18/04/2017, 10:29

Mục lục

  • Table of Contents

  • Preface

    • Conventions Used in This Book

    • Using Code Examples

    • Safari® Books Online

    • How to Contact Us

    • Acknowledgments

      • Q. Ethan McCallum

      • Stephen Weston

      • Chapter 1. Getting Started

        • Why R?

        • Why Not R?

        • The Solution: Parallel Execution

        • A Road Map for This Book

          • What We’ll Cover

          • Looking Forward…

          • What We’ll Assume You Already Know

          • In a Hurry?

            • snow

            • multicore

            • parallel

            • R+Hadoop

            • RHIPE

            • Segue

            • Summary

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan