I understand that performing inference on a large language model takes time, which is why ChatGPT does not respond instantly. ChatGPT's responses appear on the screen at a few words per second using a typewriter effect. However, according to ChatGPT, the underlying technology does not generate responses sequentially:
GPT models generate responses in a more parallel manner rather than sequentially. The animation you see is likely designed to give the illusion of a more conversational experience, even though it does not accurately represent the way the AI generates responses.
Thus, I expect that before my browser can locally animate the first word of ChatGPT's response, it receives a complete response from the ChatGPT backend. This would allow me to speed up the UI with a client-side bookmarklet. I used the Firefox debug tools to test this idea, and saw that when I send a message which prompts ChatGPT to respond, Firefox POSTs a JSON object containing my message and a conversation ID to https://chat.openai.com/backend-api/conversation.
Rather than receiving a complete response immediately, followed by an animation that plays out over 30+ seconds, I found that the POST request remained open for the whole period while the "animation" was playing. And if I use the element inspector on the "animated" paragraph, I can see that the underlying text content is being changed – it doesn't look like a CSS animation. Also, I can't see the response the API sent my browser. I can see that the response type was text/event-stream, but the Response tab just says "No messages for this request."
What's going on here? How is ChatGPT creating the "animation" and how can I use the debug tools to view the server-sent events? Is it possible to get faster responses from ChatGPT using a client-side bookmarklet? Is the ChatGPT backend simulating a sequential response to manage demand, or giving me incorrect information about how its own responses are generated?