AI Q&A: Memory Networks vs Transformer's Self-Attention

⚡️Hudson Ⓜ️endes
9 min readMay 2, 2023

This document records a conversation with ChatGPT comparing Memory Networks (Weston et al, 2015) and Transformer's Self-Attention(Vaswani et al, 2017).

Prompt: blog header banner, minimalist, artistic, futuristic, high-resolution, clear lighting, picturing personified wise-looking AI teaches machine learni, in a room with modern zen architecture. by sachin teng and sergey kolesov and ruan jia and heng z. graffiti art, scifi, fantasy, hyper detailed. octane render. concept art. trending on artstation

Q1: How does GPT-4 use the attention mechanism to extract information from previous messages and apply them to its text-completion? Illustrate with an example that performs mathematical reasoning.

GPT-4, like its predecessors, is a transformer-based architecture that utilizes the attention mechanism to understand and generate text by capturing dependencies between words and phrases. The attention mechanism allows the model to selectively focus on different parts of the input while making predictions, by assigning different weights to each input token.

To illustrate how GPT-4 uses the attention mechanism for mathematical reasoning, let’s consider the following example:

User: “Calculate the sum of the first 10 even numbers.”

To complete this task, GPT-4 first tokenizes the input text into subwords or tokens. Then, it processes these tokens using a series of self-attention layers. In each self-attention layer, the model computes three different sets of vectors: query, key…

--

--