Motivation:

Is knowledge gap the only source of Language Model (LM) hallucination? Do LM only hallucinate when they do not know a fact?

Hypothesis:

LMs is vulnerable to hallucination snowballing — a phenomenon that LM tends to commit an answer in the first token, followed by an explanation coherent to the answer despite an incorrect answer.)

Reasoning Type:

Deductive

Reasoning Step:

“Initial commital”: The prompt often guides instruction-tuned LM to generate an answer before an explanation, because of how instruction data was designed. Even if the answer is wrong, the implicit coherence objective (via the next token prediction) requires the explanation to support a wrong answer so the explanation is also wrong.

“Inherently sequential”: Transformer cannot find an answer within one time step because of limiting reasoning ability in limited tokens. Therefore, a LM cannot answer a question requiring multiple steps of reasoning if a LM is guided to answer in one step.

Testing Approach:

The authors prove the existence of hallucination snowballing — when model returns wrong answer, incorrect explanation was provided.

3 datasets (namely Primality Test, Senator Search, Graph Connectivity) were designed to induce hallucination snowballing. LM is then used to check whether an explanation is correct or not in a new session.

Properties of datasets:

Findings:

ChatGPT and GPT-4 only gives correct yes/no answer 39.87% and 16.6% of the time while ChatGPT and GPT-4 can detect 67.37% and 87.03% of incorrect claim. It tells ChatGPT and GPT-4 knows they hallucinate but they still give wrong answer, showing the existence of hallucination snowballing

When appended the “Let’s think step by step” to the prompt, error rates were all greatly reduced.

My Take:

LM is as best as how we align it with human preference/ instruction. This paper reveals some limitations of the existing instruction dataset.

Answer comes before an explanation, which is not native to the autoregressive nature. Daniel Kahneman’s “Thinking, Fast and Slow” defines System 1 and 2. An dataset with an answer immediately after a question elicits System 1 behaviour. More System 2 instruction dataset is required.

Also, most instruction dataset focused on correctness and factuality that it does not involves scenario where LM is exposed to a wrong fact and correction has to be be made.

It also hints that some modification of autoregressive model can be done. For example, can it place a different importance to its generation to avoid snowballing?

About Paper Digest:

Paper Digest aims to digest a paper into a short summary while maintaining the essence of the scientific process behind the research. It serves as my personal reflection after understanding an academic paper.