Đang tải... (xem toàn văn)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 15
IEEE International Conference on Computer Communications 10-15 April 2016 San Francisco, CA, USA Targeted Viral Marketing in Billion-scale Networks Hung T Nguyen1, My T Thai2 and Thang N Dinh1 CS Dept., Virginia Commonwealth University, Richmond, VA 23284 2CISE Dept., University of Florida, Gainesville, FL 32611 Thang N Dinh tndinh@vcu.edu I Introduction: Viral Marketing Marketing via the “word-of-mouth” effect Influence Maximization: Find a small set of users(seed) to influence most of the network Thang N Dinh tndinh@vcu.edu Intro.: Viral Marketing Examples VIRAL MARKETING ALS Ice Bucket Challenge o 2.4 M videos uploaded on Facebook o $98.2 M donated to ALS association ToyRUs #PlayItForward o $35.5 donation Always #LikeAGirl (youtube) ~60 mil views Thang N Dinh tndinh@vcu.edu Intro.: Targeted Viral Marketing What’s wrong with choosing Mr President to advertise Shampoo? Thang N Dinh tndinh@vcu.edu Intro.: Targeted Viral Marketing Targeted Marketing: Focus on customers with certain traits Age: 18-30, Like: Music Tech hobbyists, Age: 25-50 Targeted Viral Marketing: Seeding strategies to influence customers of certain traits Thang N Dinh tndinh@vcu.edu Targeted Viral Marketing Problem Real-world data: Social networks Twitter, Stackexchange, etc o Users relationship: Who follows whom? o User attributes: Geo-location, o User-generated contents: Tweets, posts, etc Targeted Viral Marketing: o Company has a budget B to incentivize users o Hope to trigger large cascade of adoption o Whom to target for “3d printing”, “android”, etc.? Thang N Dinh tndinh@vcu.edu Targeted Viral Marketing (TVM) Input: Given graph 𝐺 = (𝑉, 𝐸, 𝑤) and a budget B and a propagation model Each node 𝑢 have a cost 𝑐(𝑢) and a relevant score 𝑏(𝑢) Output: A seed set of total cost at most B that maximize the expected relevance of the influenced users (influence spread) Thang N Dinh tndinh@vcu.edu Related Work: Influence Maximization 𝟏 (𝟏 − − 𝝐)-approximation with 𝒆 Method Time complexity a probability 𝟏 − 𝒏−𝟏 Note Greedy (KDD’03) 𝑂(𝑘𝑚𝑛𝜖 −3 ) Original greedy CELF (KDD’07) 𝑂(𝑘𝑚𝑛𝜖 −3 ) Lazy-forward, up to 700 times faster than Greedy 𝑂( 𝑚 + 𝑛 ln 𝑛 + ln 2𝑛 𝜖 −2 ) 𝑘 Up to 1000 times faster than CELF IMM (SIGMOD’15) 𝑂( 𝑚 + 𝑛 ln 𝑛 + ln 2𝑛 𝜖 −2 ) 𝑘 Up to 100 times faster TIM/TIM+ SSA/D-SSA (To appear ACM SIGMOD’16) Near-linear time + Up to 1000 times faster Guarantee minimum samples than IMM for InfMax Sub-linear time for dense graph TIM/TIM+ (SIGMOD’14) Thang N Dinh tndinh@vcu.edu Related Work Nguyen et al JSAC’13: Budgeted influence maximization o Not scalable, not consider users’ relevance Topic-aware influence: No theoretical guarantees on the quality (Barbieri et al KAIS 2013, Barbieri et al EDBT 2014, Chen et al VLDB 2015) Thang N Dinh tndinh@vcu.edu Cascading Models Describe the cascading processes Popular models: o o o o Linear Threshold Independent Cascades (or Bayesian Network) SI/ SIS, SIR, SIRS, SEIRS, … Load shedding, DC/AC Power Flow Models Thang N Dinh tndinh@vcu.edu 10 General Framework RIS sampling max𝑆 ∈Ω 𝑓(𝑆) (𝛼 − 𝜖)-approx solution 𝑆𝒜 𝑓 𝑆𝒜 ≥ 𝛼 − 𝜖 𝑂𝑃𝑇𝑓 Sample generator 𝒯 [size 𝑇 = 𝜃(ϵ, δ)] 𝑓መ𝑇 𝑆 ∼ 𝑓 𝑆 𝑤 ℎ 𝑝 Max-coverage (1-1/e) approx Bounding techniques max𝑆 ∈Ω 𝑓መ𝑇 (𝑆) 𝛼-approx algorithm 𝒜 𝑆𝒜 ∈ Ω 𝑓መ𝑇 𝑆𝒜 ≥ 𝛼 ∙ 𝑂𝑃𝑇𝑓መ𝑇 with prob (1 − δ) Difficult to get (𝛼 − 𝜖)OPT multiplicative error How many samples? 𝜽(𝝐, 𝜹) = ??? How to achieve minimum number of samples??? Thang N Dinh tndinh@vcu.edu 14 RIS Sampling(Borg Et al 14’) Generate hypergraph ℋ with hyperedges: o Select a random 𝑢 ∈ 𝑉 and a random graph sample 𝑔 o Hyperedge ℰ = { nodes that can reach 𝑢 in 𝑔} • Note: Instead of generating 𝑔, we can use reverse BFS 0.6 a u=a u=b u=c b 0.2 0.3 c Example: Assuming Independent Cascade model ℰ1 = { 𝑎, 𝑏 } ℰ2 = 𝑏, 𝑎, 𝑐 ℰ3 = 𝑐, 𝑎 ℋ = (𝑉, ℰ1 , ℰ2 , ℰ3 ) Thang N Dinh tndinh@vcu.edu 15 RIS Sampling (cont.) 0.6 a 0.2 0.3 Observation: b ℰ1 = { 𝑎, 𝑏 } ℰ2 = 𝑏, 𝑎, 𝑐 ℰ3 = 𝑐, 𝑎 c o Influential nodes appear more often in the hyperedges o Influential seed set = one that covers most hyperedges RIS framework (Borgs et al., Tang et al 2014) Generate multiple hyperedges Find seed set that covers most hyperedges using greedy algorithm for Max-Coverage Thang N Dinh tndinh@vcu.edu 16 Number of Samples (Threshold) Time complexity (expected) = #Hyperedges [𝒎ℋ ] x (Time to generate a hyperedge) [EPT] Decide the running-time A - How many hyperedges are sufficient? 𝜃 ≥ 8+𝜖 Unknown in advance 𝑛 ln +ln 2𝑛 𝟏 𝑘 𝑛 [(𝟏 − − 𝝐)-approx 𝒆 𝑂𝑃𝑇𝑘 𝜖 with a probability 𝟏 − 𝒏−𝟏] (Tang et al ‘14) B- Can we generate just a little than 𝜃 hyperedges? - TIM:Lowerbound OPT by KPT ≤ OPT - TIM+: Lowerbound KPT+ by KPT+ ∈ [KPT, OPT] Highly sophisticated estimation No guarantees on the number of samples Thang N Dinh tndinh@vcu.edu 17 BCT Algorithm Thang N Dinh tndinh@vcu.edu 18 BCT Algorithm Effective stopping conditions to generate “just enough” samples Importance sampling to guarantee a almost linear number of samples Provable bounded errors and high confidence Thang N Dinh tndinh@vcu.edu 19 Provable Guarantees Thang N Dinh tndinh@vcu.edu 20 Experiments Datasets Thang N Dinh tndinh@vcu.edu 21 Results: Benefit comparison BCT results in the the best benefit with the same budget! Thang N Dinh tndinh@vcu.edu 22 Results: Quality & Running time Thang N Dinh tndinh@vcu.edu 23 Results: Running time on Twitter Thang N Dinh tndinh@vcu.edu 24 Seeding Quality Twitter: 40 million nodes, 1.5 billion edges, 106 millions tweets Thang N Dinh tndinh@vcu.edu 25 Experiment(cont.) 300 times faster TIM+-based method More practical solutions Thang N Dinh tndinh@vcu.edu 26 Summary Investigate Targeted Viral Marketing Problem on Real-world data Scalable algorithm to handle billion-scale networks Provable performance guarantee with high confidence matching theoretically derived thresholds on the number of samples Future work: o Dynamic/Correlated Probabilistic Networks o Distributed/parallel and/or GPU-based implemenation Thang N Dinh tndinh@vcu.edu 27 THANK YOU FOR LISTENING! Question & Answer Thang N Dinh tndinh@vcu.edu 28 ... tndinh@vcu.edu Intro.: Targeted Viral Marketing What’s wrong with choosing Mr President to advertise Shampoo? Thang N Dinh tndinh@vcu.edu Intro.: Targeted Viral Marketing Targeted Marketing: Focus on customers... Introduction: Viral Marketing Marketing via the “word-of-mouth” effect Influence Maximization: Find a small set of users(seed) to influence most of the network Thang N Dinh tndinh@vcu.edu Intro.: Viral. .. Tech hobbyists, Age: 25-50 Targeted Viral Marketing: Seeding strategies to influence customers of certain traits Thang N Dinh tndinh@vcu.edu Targeted Viral Marketing Problem Real-world data: