[NLP] Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue Review

[Byeongchang Kim, Jaewoo Ahn, Gunhee kim, Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue. In ICLR, 2020]

published as a conference paper at ICLR 2020

Had hard time understanding the equations. But the paper shall be reviewed as much as I can. Hope this would work as another milestone for future studies. : )

[Abstract]
The paper introduces a better model for a NLP subproblem called knowledge selection. To briefly address the knowledge selection, it is a generation of answers in a dialogue composed of an inquirer and answerer. Yet, it is not a simple dialogue generation but a generation of answers (not the inquiries) based on knowledge selection. A baseline dataset is Wizard of Wikipedia (Dinan et al., 2019) from facebook parlAI

[Introduction]
The paper introduces the pros of its new model called Sequential Knowledge Transformer(SKT) in three aspects.

Dealing the diversity in knowledge selection of conversation (one-to-many relations)
Better leverage the response information by keeping track of prior and posterior distribution over knowledge.
Works even when the knowledge selection labels for previous dialogue are not available. Because the latent model can infer the prior knowledge selections (Not really explained thoroughly in the paper about this part ~~or may be I just missed it~~).

[2. Motivation]
[2.1 Baseline knowledge selection]
The baseline knowledge selection flow which was used in WoW (Dinana et al., 2019) was this.
1. Choosing topics
2. Given an apprentice(inquirer)'s utterance and a wizard’s previous utterance, make of an Information Retrieval system to retrieve the relevant knowledge(called knowledge pool in the paper). From the retrieved knowledge, select a single relevant sentence and use it to generate an answer. (knowledge selection and response generation
3. Repeat conversation until a minimum number of turns.

This pipeline has it’s shortcomings in terms of multimodality of the problem(as the paper refers to). In that the relations between the generated sentence and the selected knowledge sentence would not be restricted to a one-to-one relation.

Therefore, the paper proposes a sequential latent variable to add diversity as such into the model, and also to sequentially “track the topic flow of knowledge in the multi-turn dialogue”.

[3. Approach]
This is the difficult part.

The important counters for the sentences, words, and knowledge.

Sentence Encoder
The paper uses BERT (Devlin et al., 2019) as an sentence encoder, turning $X^t$ to an embedding $h^t_x$ (using average pooling over time steps).

Sequential Knowledge Selection
The Sequential knowledge Selection, which is the key idea of this paper has 2 crucial distinction from its previews works. 1) Regarding the knowledge selection as sequential decision process and 2) modeling the selection process as latent variables. This results in joint inference of multi-turns of knowledge selection & response generation.

Latent variable
$q_\phi$ is used as indicator variables. And the Latent variable in the model is $p_\theta(k^t|x^{\leq t},y^{\leq t},k^{<t} )$ .
The detailed derivation is in the Appendix of the paper.

Parameters : $p_\theta, q_\phi, \pi_\theta$ ( decoder, prior distribution of knowledge and the approximate posterior respectively)

Usage of Attention mechanism
The paper uses attention mechanism over current knowledge pool $\{h^{k,l}_k \}$ .
By using $h^{t-1,s}, d^{t-1}, d^t_{xy}, h^t_{x}$

Finally, the model samples the knowledge $k^t_s$ over attention distribution in Eq. (6) and pass it to the decoder. In test time use Eq. (5) instead.

Decoding with Copy Mechanism
Concatenate the sentence embedding of the inquiry and the selected knowledge ( $KaTeX parse error: Double subscript at position 26: \dots = [H^t_x;H^t_k_̲s]$ ) and feed into the decoder $p_\theta$

find the $argmax_{w\in \nu} p_{t,n}(w)$ as the next word to be generated and continue until the EOS token pops out.

[Results and Conlusion]

The SKT has achieved SOTA in knowledge selection and has shown to perform better in terms of response generation as well.

Search This Blog

Eomiso's Dev Life

[NLP] Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue Review

Other papers to read

Comments

Post a Comment

Popular posts from this blog

[Shell] Let's use zsh!!

2020 Paper Queries To Read