[NLP] Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue Review

[Byeongchang Kim, Jaewoo Ahn, Gunhee kim, Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue. In ICLR, 2020]

published as a conference paper at ICLR 2020


Had hard time understanding the equations. But the paper shall be reviewed as much as I can. Hope this would work as another milestone for future studies. : )

[Abstract]
The paper introduces a better model for a NLP subproblem called knowledge selection. To briefly address the knowledge selection, it is a generation of answers in a dialogue composed of an inquirer and answerer. Yet, it is not a simple dialogue generation but a generation of answers (not the inquiries) based on knowledge selection. A baseline dataset is Wizard of Wikipedia (Dinan et al., 2019) from facebook parlAI

[Introduction]
The paper introduces the pros of its new model called Sequential Knowledge Transformer(SKT) in three aspects.

  1. Dealing the diversity in knowledge selection of conversation (one-to-many relations)
  2. Better leverage the response information by keeping track of prior and posterior distribution over knowledge.
  3. Works even when the knowledge selection labels for previous dialogue are not available. Because the latent model can infer the prior knowledge selections (Not really explained thoroughly in the paper about this part or may be I just missed it).

[2. Motivation]
[2.1 Baseline knowledge selection]
The baseline knowledge selection flow which was used in WoW (Dinana et al., 2019) was this.
1. Choosing topics
2. Given an apprentice(inquirer)'s utterance and a wizard’s previous utterance, make of an Information Retrieval system to retrieve the relevant knowledge(called knowledge pool in the paper). From the retrieved knowledge, select a single relevant sentence and use it to generate an answer. (knowledge selection and response generation
3. Repeat conversation until a minimum number of turns.

This pipeline has it’s shortcomings in terms of multimodality of the problem(as the paper refers to). In that the relations between the generated sentence and the selected knowledge sentence would not be restricted to a one-to-one relation.

Therefore, the paper proposes a sequential latent variable to add diversity as such into the model, and also to sequentially “track the topic flow of knowledge in the multi-turn dialogue”.

[3. Approach]
This is the difficult part.

The important counters for the sentences, words, and knowledge.

Sentence Encoder
The paper uses BERT (Devlin et al., 2019) as an sentence encoder, turning XtX^t to an embedding hxth^t_x(using average pooling over time steps).

Sequential Knowledge Selection
The Sequential knowledge Selection, which is the key idea of this paper has 2 crucial distinction from its previews works. 1) Regarding the knowledge selection as sequential decision process and 2) modeling the selection process as latent variables. This results in joint inference of multi-turns of knowledge selection & response generation.

Latent variable
qϕq_\phi is used as indicator variables. And the Latent variable in the model is pθ(ktxt,yt,k<t)p_\theta(k^t|x^{\leq t},y^{\leq t},k^{<t} ).
The detailed derivation is in the Appendix of the paper.

Parameters : pθ,qϕ,πθp_\theta, q_\phi, \pi_\theta ( decoder, prior distribution of knowledge and the approximate posterior respectively)

Usage of Attention mechanism
The paper uses attention mechanism over current knowledge pool {hkk,l}\{h^{k,l}_k \}.
By using ht1,s,dt1,dxyt,hxth^{t-1,s}, d^{t-1}, d^t_{xy}, h^t_{x}

Finally, the model samples the knowledge kstk^t_s over attention distribution in Eq. (6) and pass it to the decoder. In test time use Eq. (5) instead.

Decoding with Copy Mechanism
Concatenate the sentence embedding of the inquiry and the selected knowledge (KaTeX parse error: Double subscript at position 26: … = [H^t_x;H^t_k_̲s]) and feed into the decoder pθp_\theta

find the argmaxwνpt,n(w)argmax_{w\in \nu} p_{t,n}(w) as the next word to be generated and continue until the EOS token pops out.

[Results and Conlusion]

The SKT has achieved SOTA in knowledge selection and has shown to perform better in terms of response generation as well.

Other papers to read

About sentence encoding
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. Universal Sentence Encoder. arXiv:1803.11175, 2018.
Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation

about sequential latent variable models
Sequential Neural Models with Stochastic Layers
Latent Alignment and Variational Attention
A Recurrent Latent Variable Model

about posterior attention model

Written with StackEdit.

Comments

Popular posts from this blog

[Shell] Let's use zsh!!

2020 Paper Queries To Read

[NLP] A persona based neural conversation model