Tài liệu trí tuệ nhân tạo english version

NEURAL TURING MACHINE Nguyen Hung Son Based on the PhD thesis of Dr Karol Kurach Warsaw University/Google CuuDuongThanCong.com https://fb.com/tailieudientucntt Agenda Introduction to Deep Neural Architectures Neural Random Access-Machines Hierarchical Attentive Memory Applications: Smart Reply CuuDuongThanCong.com https://fb.com/tailieudientucntt A primer on Deep Learning CuuDuongThanCong.com https://fb.com/tailieudientucntt How intelligent are Neural Networks two main criticisms CRITICISM SOLUTION Neural networks with fixed-size inputs are seemingly unable to solve problems with variable-size inputs Recurrent Neural Networks (RNN): - translating a sentence, or - recognizing handwritten text neural networks seem unable to Neural Turing Machine (NTM): bind values to specific locations in - giving a neural network an data structures external memory and This ability of writing to and reading - the capacity to learn how to from memory is critical in the two use it information processing systems we have available to study: brains and computers CuuDuongThanCong.com https://fb.com/tailieudientucntt Deep Learning Big Data + Big Deep Model = Success Guaranteed State of the art in: ● computer vision, ● speech recognition, ● machine translation, … – – – CuuDuongThanCong.com New techniques (e.g., initialization, pretraining) Computing power (GPU, FPGA, TPU…) Big datasets https://fb.com/tailieudientucntt CuuDuongThanCong.com https://fb.com/tailieudientucntt CuuDuongThanCong.com https://fb.com/tailieudientucntt Recurrent Neural Networks ➢ Neural networks with cycles ➢ Process inputs of variable length ➢ Preserve state between timesteps CuuDuongThanCong.com https://fb.com/tailieudientucntt ER Recurrent Neural Networks B 10 ACKGROUND b`+1 ] e: 2.3: A Recurrent Neural Network is a very deep feedforward neurn A Recurrent Neural Network is a very deep feedforward neural al Network is a very deep feedforward neural network that has a layer f timestep Its weights are shared across time step Its weights are shared across time shared across time orks neural network that has a layer for 3: A Recurrent Neural Network is a very deep feedforward neural network that has a mestep Its weights are shared across the time.central object of study of this ent Neural Network (RNN), bes the (fig 2.3) an input (vGiven It,input visT pa)input (which w dynamical system thatGiven maps sequences to sequences ✓ completely describes the RNN (fig.sequence 2.3) Given sequen ation ✓ RNN completely describes the RNN (fig 2.3) an sequ , an T and a sequence of T by t T T T T utes a sequence of hidden states h outputs z and three bias vectors [W , W , W , b , b , h ] whose con= RNN (v computes , vT )computes (whichhv ), vthe awesequence of hidden h1 h and etvsequence ), the ahh sequence states and as 0states oh hof ohidden , RNN 1a sequ 1by 1 on ✓a completely describes the (fig 2.3) Given an input sequence (v1 , , vT ) (w T RNN nd sequence of outputs z by the algorithm: wing by v1T ),algorithm: the RNN computes a sequence of hidden states hT1 and a sequence of outputs z CuuDuongThanCong.com https://fb.com/tailieudientucntt ER BACKGROUND Figure 2.3:Neural A Recurrent Neural Network is a very deep Recurrent Networks each timestep Its weights are shared across time catenation ✓ completely describes the RNN (fig 2.3) G denote by v1T ), the RNN computes a sequence of hidde following algorithm: 1: forNetwork t fromis 1a very to Tdeep dofeedforward neural network that has a 3: A Recurrent Neural mestep Its weights2:are shared time ut across Whv vt + Whh ht + bh 3: ht e(ut ) 4: ot Woh ht + bo on ✓ completely describes the RNN (fig 2.3) Given an input sequence (v1 , , vT ) (w 5: zt a sequence g(ot )of hidden states hT and a sequence of outputs z by v1T ), the RNN computes 6: end for ng algorithm: t from to T CuuDuongThanCong.com https://fb.com/tailieudientucntt Quality ● How do we ensure that the response options are always high quality in content and language? Avoid incorrect grammar and mechanics, misspellings e.g., your the best ○ Avoid inappropriate, offensive responses e.g., Leave me alone ○ Deal with wide variability, informal language e.g., got it thx ○ ● Restricting model vocabulary is not sufficient! Solution: Restrict to a fixed set of valid responses, derived automatically from data CuuDuongThanCong.com https://fb.com/tailieudientucntt Scalability ● How do we scale costly LSTM computation to the requirements of an email delivery pipeline? Solution: Use Perform an approximate search over set of valid responses CuuDuongThanCong.com https://fb.com/tailieudientucntt Diversity ● How can we select a semantically diverse set of suggestions? Redundant responses Can you join tomorrow’s meeting? Yes, I’ll be there Yes, I will be there I’ll be there Responses with diversity (more useful) Sure, I’ll be there Yes, I can Sorry, I won’t be able to make it tomorrow Solution: Learn semantic intents of responses, then use these to filter out redundant suggestions CuuDuongThanCong.com https://fb.com/tailieudientucntt Diversity Our approach to diversity is based on two heuristics: ● Cluster-based diversity: Don’t show suggestions of the same intent ● Forced positives/negatives: If there is an affirmative suggestion, also force a negative one (and vice versa) Product decision: offer positive/negative choice, even if the latter is rare CuuDuongThanCong.com https://fb.com/tailieudientucntt Cluster-based diversity «We're waiting for you, are you going to be here soon?» On my way On my way I am on my way On my way! I’m on my way I'm here I'm on my way I am here I'm here! Already here! Yes, I am here Yes, I'm here I'll be there in a few minutes I am on my way Will be there shortly I’ll be there in a few minutes Be there in a few! CuuDuongThanCong.com https://fb.com/tailieudientucntt Cluster-based diversity «We're waiting for you, are you going to be here soon?» On my way On my way I am on my way On my way! I’m on my way I'm here I'm on my way I am here I'm here! Already here! Yes, I am here Yes, I'm here I'll be there in a few minutes I am on my way Will be there shortly I’ll be there in a few minutes Be there in a few! CuuDuongThanCong.com https://fb.com/tailieudientucntt Diversity results ● Removing diversity click-through rate by 7.5% relative CuuDuongThanCong.com https://fb.com/tailieudientucntt Results CuuDuongThanCong.com https://fb.com/tailieudientucntt Deployment & coverage ● ● Deployed in Inbox by Gmail Used to assist with more than 10% of all mobile replies CuuDuongThanCong.com https://fb.com/tailieudientucntt Unique cluster and suggestion usage CuuDuongThanCong.com https://fb.com/tailieudientucntt Most frequently used clusters CuuDuongThanCong.com https://fb.com/tailieudientucntt Ranking experiments CuuDuongThanCong.com https://fb.com/tailieudientucntt Examples CuuDuongThanCong.com https://fb.com/tailieudientucntt Conclusions ● ● ● Sequence-to-sequence produces plausible email replies in many common scenarios, when trained on an email corpus Smart Reply is deployed in Inbox by Gmail and generates more than 10% of mobile replies RNNs show promise not only for assisted communication, but also for other applications where a conversation model is needed, such as virtual assistants CuuDuongThanCong.com https://fb.com/tailieudientucntt CuuDuongThanCong.com https://fb.com/tailieudientucntt ... algorithm: t from to T CuuDuongThanCong.com https://fb.com/tailieudientucntt Vanilla RNN ➢ Basic version of RNN ➢ State: vector h CuuDuongThanCong.com https://fb.com/tailieudientucntt Learning