عنوان

Data-intensive systems :

پدید آورنده

Tomasz Wiktorski.

موضوع

Apache Hadoop.,Spark (Electronic resource : Apache Software Foundation),Apache Hadoop.,Spark (Electronic resource : Apache Software Foundation),Big data.,Databases.,Big data.,Databases.

رده

QA76
.
9
.
D32

W55

2019

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom

تماس با کتابخانه : 32910706-025

INTERNATIONAL STANDARD BOOK NUMBER

(Number (ISBN

3030046036

(Number (ISBN

3030046044

(Number (ISBN

9783030046033

(Number (ISBN

9783030046040

Erroneous ISBN

3030046028

Erroneous ISBN

9783030046026

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

Data-intensive systems :

General Material Designation

[Book]

Other Title Information

principles and fundamentals using Hadoop and Spark /

First Statement of Responsibility

Tomasz Wiktorski.

.PUBLICATION, DISTRIBUTION, ETC

Place of Publication, Distribution, etc.

Cham, Switzerland :

Name of Publisher, Distributor, etc.

Springer,

Date of Publication, Distribution, etc.

[2019]

PHYSICAL DESCRIPTION

Specific Material Designation and Extent of Item

1 online resource

SERIES

Series Title

Advanced information and knowledge processing

INTERNAL BIBLIOGRAPHIES/INDEXES NOTE

Text of Note

Includes bibliographical references.

CONTENTS NOTE

Text of Note

Intro; Contents; List of Figures; List of Listings; 1 Preface; 1.1 Conventions Used in this Book; 1.2 Listed Code; 1.3 Terminology; 1.4 Examples and Exercises; 2 Introduction; 2.1 Growing Datasets; 2.2 Hardware Trends; 2.3 The V's of Big Data; 2.4 NOSQL; 2.5 Data as the Fourth Paradigm of Science; 2.6 Example Applications; 2.6.1 Data Hub; 2.6.2 Search and Recommendations; 2.6.3 Retail Optimization; 2.6.4 Healthcare; 2.6.5 Internet of Things; 2.7 Main Tools; 2.7.1 Hadoop; 2.7.2 Spark; 2.8 Exercises; References; 3 Hadoop 101 and Reference Scenario; 3.1 Reference Scenario; 3.2 Hadoop Setup

Text of Note

3.3 Analyzing Unstructured Data3.4 Analyzing Structured Data; 3.5 Exercises; 4 Functional Abstraction; 4.1 Functional Programming Overview; 4.2 Functional Abstraction for Data Processing; 4.3 Functional Abstraction and Parallelism; 4.4 Lambda Architecture; 4.5 Exercises; Reference; 5 Introduction to MapReduce; 5.1 Reference Code; 5.2 Map Phase; 5.3 Combine Phase; 5.4 Shuffle Phase; 5.5 Reduce Phase; 5.6 Embarrassingly Parallel Problems; 5.7 Running MapReduce Programs; 5.8 Exercises; 6 Hadoop Architecture; 6.1 Architecture Overview; 6.2 Data Handling; 6.2.1 HDFS Architecture; 6.2.2 Read Flow

Text of Note

6.2.3 Write Flow6.2.4 HDFS Failovers; 6.3 Job Handling; 6.3.1 Job Flow; 6.3.2 Data Locality; 6.3.3 Job and Task Failures; 6.4 Exercises; 7 MapReduce Algorithms and Patterns; 7.1 Counting, Summing, and Averaging; 7.2 Search Assist; 7.3 Random Sampling; 7.4 Multiline Input; 7.5 Inverted Index; 7.6 Exercises; References; 8 NOSQL Databases; 8.1 NOSQL Overview and Examples; 8.1.1 CAP and PACELC Theorem; 8.2 HBase Overview; 8.3 Data Model; 8.4 Architecture; 8.4.1 Regions; 8.4.2 HFile, HLog, and Memstore; 8.4.3 Region Server Failover; 8.5 MapReduce and HBase; 8.5.1 Loading Data

Text of Note

8.5.2 Running Queries8.6 Exercises; References; 9 Spark; 9.1 Motivation; 9.2 Data Model; 9.2.1 Resilient Distributed Datasets and DataFrames; 9.2.2 Other Data Structures; 9.3 Programming Model; 9.3.1 Data Ingestion; 9.3.2 Basic Actions-Count, Take, and Collect; 9.3.3 Basic Transformations-Filter, Map, and reduceByKey; 9.3.4 Other Operations-flatMap and Reduce; 9.4 Architecture; 9.5 SparkSQL; 9.6 Exercises

SUMMARY OR ABSTRACT

Text of Note

Data-intensive systems are a technological building block supporting Big Data and Data Science applications. This book familiarizes readers with core concepts that they should be aware of before continuing with independent work and the more advanced technical reference literature that dominates the current landscape. The material in the book is structured following a problem-based approach. This means that the content in the chapters is focused on developing solutions to simplified, but still realistic problems using data-intensive technologies and approaches. The reader follows one reference scenario through the whole book, that uses an open Apache dataset. The origins of this volume are in lectures from a master?s course in Data-intensive Systems, given at the University of Stavanger. Some chapters were also a base for guest lectures at Purdue University and Lodz University of Technology.

ACQUISITION INFORMATION NOTE

Source for Acquisition/Subscription Address

Springer Nature

Stock Number

com.springer.onix.9783030046033

OTHER EDITION IN ANOTHER MEDIUM

Title

Data-intensive systems.

International Standard Book Number

9783030046026

TITLE USED AS SUBJECT

Apache Hadoop.

Spark (Electronic resource : Apache Software Foundation)

Apache Hadoop.

Spark (Electronic resource : Apache Software Foundation)

TOPICAL NAME USED AS SUBJECT

Big data.

Databases.

Big data.

Databases.

(SUBJECT CATEGORY (Provisional

COM021000

UMT

DEWEY DECIMAL CLASSIFICATION

Number

005

Edition

LIBRARY OF CONGRESS CLASSIFICATION

Class number

QA76

D32

Book number

W55

2019

PERSONAL NAME - PRIMARY RESPONSIBILITY

Wiktorski, Tomasz

ORIGINATING SOURCE

Date of Transaction

20200823080735.0

Cataloguing Rules (Descriptive Conventions))

ELECTRONIC LOCATION AND ACCESS

Electronic name

[Book]

عنوان Data-intensive systems :

پدید آورنده Tomasz Wiktorski.

موضوع Apache Hadoop.,Spark (Electronic resource : Apache Software Foundation),Apache Hadoop.,Spark (Electronic resource : Apache Software Foundation),Big data.,Databases.,Big data.,Databases.

رده QA76.9.D32 W55 2019

کتابخانه Center and Library of Islamic Studies in European Languages

محل استقرار استان: Qom ـ شهر: Qom

INTERNATIONAL STANDARD BOOK NUMBER

TITLE AND STATEMENT OF RESPONSIBILITY

.PUBLICATION, DISTRIBUTION, ETC

PHYSICAL DESCRIPTION

SERIES

INTERNAL BIBLIOGRAPHIES/INDEXES NOTE

CONTENTS NOTE

SUMMARY OR ABSTRACT

ACQUISITION INFORMATION NOTE

OTHER EDITION IN ANOTHER MEDIUM

TITLE USED AS SUBJECT

TOPICAL NAME USED AS SUBJECT

(SUBJECT CATEGORY (Provisional

DEWEY DECIMAL CLASSIFICATION

LIBRARY OF CONGRESS CLASSIFICATION

PERSONAL NAME - PRIMARY RESPONSIBILITY

ORIGINATING SOURCE

ELECTRONIC LOCATION AND ACCESS

عنوان

Data-intensive systems :

پدید آورنده

Tomasz Wiktorski.

موضوع

Apache Hadoop.,Spark (Electronic resource : Apache Software Foundation),Apache Hadoop.,Spark (Electronic resource : Apache Software Foundation),Big data.,Databases.,Big data.,Databases.

رده

QA76
.
9
.
D32

W55

2019

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom