Normal view MARC view ISBD view

Social media data mining and analytics / (Record no. 47558)

000 -LEADER
fixed length control field	08548cam a22004455i 4500
001 - CONTROL NUMBER
control field	47558
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20230309155006.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	140813s2019 inua 000 0 eng
010 ## - LIBRARY OF CONGRESS CONTROL NUMBER
LC control number	2014948538
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781118824856 (pbk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	1118824857
040 ## - CATALOGING SOURCE
Original cataloging agency	DLC
Transcribing agency	DLC
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title	eng.
042 ## - AUTHENTICATION CODE
Authentication code	pcc
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	658.83402856312
100 1# - MAIN ENTRY--PERSONAL NAME
Preferred name for the person	Szabó, Gábor,
Relator term	author.
245 10 - TITLE STATEMENT
Title	Social media data mining and analytics /
Statement of responsibility, etc	Gabor Szabo, Gungor Polatkan, Oscar Boykin, Antonios Chalkiopoulos.
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of Publication	Indianapolis, I. N.
Name of Publisher	John Wiley & Sons
Date of Publication	2019
264 #1 - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc	Indianapolis, IN :
Name of publisher, distributor, etc	John Wiley & Sons,
Date of publication, distribution, etc	[2019]
300 ## - PHYSICAL DESCRIPTION
Extent	xxxv, 316 pages :
Other physical details	illustrations ;
Dimensions	24 cm
336 ## - CONTENT TYPE
Content type term	text
Content type code	txt
Source	rdacontent
337 ## - MEDIA TYPE
Media type term	unmediated
Media type code	n
Source	rdamedia
338 ## - CARRIER TYPE
Carrier type term	volume
Carrier type code	nc
Source	rdacarrier
500 ## - GENERAL NOTE
General note	Includes index.<br/><br/>About the Author<br/><br/>GABOR SZABO, PHD, is a Senior Staff Software Engineer at Tesla and a former data scientist at Twitter, where he focused on predicting user behavior and content popularity in crowdsourced online services, and on modeling large-scale content dynamics. He also authored the PyCascading data processing library.<br/><br/>GUNGOR POLATKAN, PHD, is a Tech Lead/Engineering Manager designing and implementing end-to-end machine learning and artificial intelligence offline/online pipelines for the LinkedIn Learning relevance backend. He was previously a machine learning scientist at Twitter, where he worked on topics such as ad targeting and user modeling.<br/><br/>P. OSCAR BOYKIN, PHD, is a software engineer at Stripe where he works on machine learning infrastructure. He was previously a Senior Staff Engineer at Twitter, where he worked on data infrastructure problems. He is coauthor of the Scala big-data libraries Algebird, Scalding and Summingbird.<br/><br/>ANTONIOS CHALKIOPOULOS, MSC, is a Distributed Systems Specialist. A system engineer who has delivered fast/big data projects in media, betting, and finance, he is now leading the effort on the Lenses platform for data streaming as a co-founder and CEO at https://lenses.stream.<br/>Permissions<br/><br/>Request permission to reuse content from this site<br/>
505 ## - CONTENTS
Formatted contents note	Table of contents<br/><br/>Introduction xvii<br/><br/>Chapter 1 Users: TheWho of Social Media 1<br/><br/>Measuring Variations in User Behavior in Wikipedia 2<br/><br/>The Diversity of User Activities 3<br/><br/>The Origin of the User Activity Distribution 12<br/><br/>The Consequences of the Power Law 20<br/><br/>The Long Tail in Human Activities 25<br/><br/>Long Tails Everywhere: The 80/20 Rule (p/q Rule) 28<br/><br/>Online Behavior on Twitter 32<br/><br/>Retrieving Tweets for Users 33<br/><br/>Logarithmic Binning 36<br/><br/>User Activities on Twitter 37<br/><br/>Summary 39<br/><br/>Chapter 2 Networks: The How of Social Media 41<br/><br/>Types and Properties of Social Networks 42<br/><br/>When Users Create the Connections: Explicit Networks 43<br/><br/>Directed Versus Undirected Graphs 45<br/><br/>Node and Edge Properties 45<br/><br/>Weighted Graphs 46<br/><br/>Creating Graphs from Activities: Implicit Networks 48<br/><br/>Visualizing Networks 51<br/><br/>Degrees: The Winner Takes All 55<br/><br/>Counting the Number of Connections 57<br/><br/>The Long Tail in User Connections 58<br/><br/>Beyond the Idealized Network Model 62<br/><br/>Capturing Correlations: Triangles, Clustering, and Assortativity 64<br/><br/>Local Triangles and Clustering 64<br/><br/>Assortativity 70<br/><br/>Summary 75<br/><br/>Chapter 3 Temporal Processes: The When of Social Media 77<br/><br/>What Traditional Models Tell You About Events in Time 77<br/><br/>When Events Happen Uniformly in Time 79<br/><br/>Inter-Event Times 81<br/><br/>Comparing to a Memoryless Process 86<br/><br/>Autocorrelations 89<br/><br/>Deviations from Memorylessness 91<br/><br/>Periodicities in Time in User Activities 93<br/><br/>Bursty Activities of Individuals 99<br/><br/>Correlations and Bursts 105<br/><br/>Reservoir Sampling 106<br/><br/>Forecasting Metrics in Time 110<br/><br/>Finding Trends 112<br/><br/>Finding Seasonality 115<br/><br/>Forecasting Time Series with ARIMA 117<br/><br/>The Autoregressive Part (“AR”) 118<br/><br/>The Moving Average Part (“MA”) 119<br/><br/>The Full ARIMA(p, d, q) Model 119<br/><br/>Summary 121<br/><br/>Chapter 4 Content: The What of Social Media 123<br/><br/>Defining Content: Focus on Text and Unstructured Data 123<br/><br/>Creating Features from Text: The Basics of Natural Language Processing 125<br/><br/>The Basic Statistics of Term Occurrences in Text 128<br/><br/>Using Content Features to Identify Topics 129<br/><br/>The Popularity of Topics 138<br/><br/>How Diverse Are Individual Users’ Interests? 141<br/><br/>Extracting Low-Dimensional Information from High-Dimensional Text 144<br/><br/>Topic Modeling 145<br/><br/>Unsupervised Topic Modeling 147<br/><br/>Supervised Topic Modeling 155<br/><br/>Relational Topic Modeling 162<br/><br/>Summary 169<br/><br/>Chapter 5 Processing Large Datasets 171<br/><br/>Map Reduce: Structuring Parallel and Sequential Operations 172<br/><br/>Counting Words 174<br/><br/>Skew: The Curse of the Last Reducer 177<br/><br/>Multi-Stage MapReduce Flows 179<br/><br/>Fan-Out 180<br/><br/>Merging Data Streams 181<br/><br/>Joining Two Data Sources 183<br/><br/>Joining Against Small Datasets 186<br/><br/>Models of Large-Scale MapReduce 187<br/><br/>Patterns in MapReduce Programming 188<br/><br/>Static MapReduce Jobs 188<br/><br/>Iterative MapReduce Jobs 195<br/><br/>PageRank for Ranking in Graphs 195<br/><br/>K-means Clustering 199<br/><br/>Incremental MapReduce Jobs 203<br/><br/>Temporal MapReduce Jobs 204<br/><br/>Rollups and Data Cubing 205<br/><br/>Expanding Rollup Jobs 211<br/><br/>Challenges with Processing Long-Tailed Social Media Data 212<br/><br/>Sampling and Approximations: Getting Results with Less Computation 214<br/><br/>HyperLogLog 217<br/><br/>HyperLogLog Example 219<br/><br/>HyperLogLog on the Stack Exchange Dataset 221<br/><br/>Performance of HLL on Large Datasets 222<br/><br/>Bloom Filters 223<br/><br/>A Bloom Filter Example 226<br/><br/>Bloom Filter as Pre-Computed Membership Knowledge 228<br/><br/>Bloom Filters on Large Social Datasets 229<br/><br/>Count-Min Sketch 231<br/><br/>Count-Min Sketch—Heavy Hitters Example 233<br/><br/>Count-Min Sketch—Top Percentage Example 235<br/><br/>Aggregating Approximate Data Structures 235<br/><br/>Summary of Approximations 236<br/><br/>Executing on a Hadoop Cluster (Amazon EC2) 237<br/><br/>Installing a CDH Cluster on Amazon EC2 237<br/><br/>Providing IAM Access to Collaborators 241<br/><br/>Adding On-Demand Cluster Capabilities 242<br/><br/>Summary 243<br/><br/>Chapter 6 Learn, Map, and Recommend 245<br/><br/>Social Media Services Online 246<br/><br/>Search Engines 246<br/><br/>Content Engagement 246<br/><br/>Interactions with the Real World 248<br/><br/>Interactions with People 249<br/><br/>Problem Formulation 251<br/><br/>Learning and Mapping 253<br/><br/>Matrix Factorization 255<br/><br/>Learning, Training 257<br/><br/>Under- and Overfitting 257<br/><br/>Regularizing in Matrix Factorization 259<br/><br/>Non-Negative Matrix Factorization and Sparsity 260<br/><br/>Demonstration on Movie Ratings 261<br/><br/>Interpreting the Learned Stereotypes 265<br/><br/>Exploratory Analysis 269<br/><br/>Prediction and Recommendation 274<br/><br/>Evaluation 277<br/><br/>Overview of Methodologies 278<br/><br/>Nearest Neighbor-Based Approaches 278<br/><br/>Approaches Based on Supervised Learning 280<br/><br/>Predicting Movie Ratings with Logistic Regression 280<br/><br/>Common Issues with Features 288<br/><br/>Domain-Specific Applications 289<br/><br/>Summary 290<br/><br/>Chapter 7 Conclusions 293<br/><br/>The Surprising Stability of Human Interaction Patterns 293<br/><br/>Averages, Standard Deviations, and Sampling 296<br/><br/>Removing Outliers 303<br/><br/>Index 309<br/>
520 ## - SUMMARY, ETC.
Summary, etc	"Social media is a rich source of big data, so much so that 90% of Fortune 500 companies are investing in big data initiatives to help them predict consumer behavior. Knowing the most effective ways to mine social media data can help you acquire information that generates amazing business results. Social media is unstructured, dynamic, and future-oriented. Effective, insightful data mining requires new analytical tools and techniques. Written by experts at social networking companies, Social Media Data Mining and Analytics provides a hands-on course that teaches you how to use state-of-the-art tools and sophisticated data mining techniques specifically geared to social media. It digs deeply into the mechanics of collecting and applying social media data to understand customers, define trends, and make predictions that can improve analytics for growth and sales. "--Amazon.com
526 ## - STUDY PROGRAM INFORMATION NOTE
--	600-699
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Consumer profiling
General subdivision	Data processing.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Consumer behavior
General subdivision	Forecasting.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Business planning
General subdivision	Data processing.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Social media
General subdivision	Data processing.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Data mining.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Polatkan, Gungor,
Relator term	author.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Boykin, Oscar,
Relator term	author.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Chalkiopoulos, Antonios,
Relator term	author.
906 ## - LOCAL DATA ELEMENT F, LDF (RLIN)
a	0
b	ibc
c	origres
d	2
e	ncip
f	20
g	y-gencatlg
942 ## - ADDED ENTRY ELEMENTS
Source of classification or shelving scheme
Item type	BOOK

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Permanent Location	Current Location	Shelving location	Date acquired	Cost, normal purchase price	Inventory number	Full call number	Barcode	Date last seen	Price effective from	Item type
					COLLEGE LIBRARY	COLLEGE LIBRARY	SUBJECT REFERENCE	2019-09-02	3911.00	49704	658.83402856312 Sz12 2019	CITU-CL-49704	2019-10-24	2019-09-02	BOOK