Talking to Crazy Robots

Today I tested ChatGPT for the first time. It appears more robust than GPT-2 and GPT-3 models, but I still find the insane hype around it surreal, given how it does not seem to understand any of the text it works with or generates. Moreover, I do not know if the robustness is a result of a better model or some clever post-processing. Let's look at some examples of how the system fails. These failures are much more telling than lengthy pieces of generated prose that are currently splattered all over the Web. The failures are selected from a longer session and chosen by me based on how interesting they are.

Me: 9, 99, 999, 9999, 99999. What is the next number in the sequence? GPT: The next number in the sequence is 999999. Me: 99999, 9999, 999, 99, 9. What is the next number in the sequence? GPT: The next number in the sequence is 0. Me: 99999, 9999, 999, 99, 9, 0. What is the next number in the sequence? GPT: There is no next number in the sequence as it has reached the end. Me: 9, 0, 9, 0, 9, 0, 9, 0. What is the next number in the sequence? GPT: The next number in the sequence is 0.

It's debatable whether 0 is a reasonable answer for the second question. It is somewhat impressive that it refuses to continue the sequence in the third. However, ChatGPT utterly fails to recognize a simplistic alteration of 9's and 0's in the last question. Part of the problem with all the hype around this system is that it doesn't occur to most people to ask it basic question. Therefore, they never see it fail to answer a basic question. Therefore, they are subjecting themselves to an extreme form of selection bias.

Me: If I have an apple and put it in an empty box, how many apples will there be in that box? GPT: There will be 1 apple in the box. Me: If I take that apple out of the box, how many apples will there be in the box? GPT: There will be 0 apples in the box. Me: If I take that apple and put it in the box twice, how many apples will there be in the box? GPT: There will still be 1 apple in the box. Putting the apple in the box twice does not change the fact that you only have 1 apple. Me: If I take that apple and put it into two empty boxes, how many apples will there be in those boxes? GPT: There will be 1 apple in each box, so there will be a total of 2 apples in both boxes. Me: If I take an apple and put it on an empty box, how many apples will there be in the box? GPT: There will be 1 apple in the box. Me: How many apples will there be on the box? GPT: There will be 1 apple on the box. Me: If someone counts both apples on the box and in the box, how many apples will there be then? GPT: There will be 2 apples if someone counts both apples on the box and in the box.

This is one case where a 60s symbolic system would run circles around ChatGPT. It could fail to understand a query, but it would not make some nonsense up. In the third question ChatGPT avoids the obvious trap of the word "twice". However, its response feels a bit like someone anticipated exactly these sorts of questions. It "explains" its "reasoning" to sound extra convincing. And yet it falls for a slightly modified version of the same trap in the very next interaction. If I put an apple in two boxes (presumably one after the other), will it magically multiply the number of apples? Yes! Immediately after that it gets "confused" because instead of asking it about a new situation I followed up with a requests for clarification. Is this a mistake in a model of a situation, or is it a sign that there is no model to speak of?

Me: When my question contains the word "umbrella", I want the answer to have the word "water". Is that clear? GPT: Yes, that's clear! Anytime your question contains the word "umbrella", I will make sure to include the word "water" in my answer. Me: If there is a box with an apple, then someone takes it out of that box, will there still be an apple in the box? GPT: No, if someone takes the apple out of the box, there will not be an apple in the box. Me: If there is a box with a picture of an umbrella and it contains one apple, how many apples could I get out of that box? GPT: If there is a box with a picture of an umbrella and it contains one apple, you could get one apple out of that box.

Failure to follow the most basic instruction after just one "extra" interaction.

Me: What is a glockenspurk? GPT: I couldn't find a reference to "glockenspurk." It's possible that it's a misspelling or a word used in a specific context that I am not aware of. Can you provide more context or clarify what you're asking about? Me: Glockenspurk is a German brand of umbrellas. GPT: Ah, I see. Thank you for the information. Yes, a Glockenspurk is a brand of umbrellas from Germany. Me: How would one commonly use a Glockenspurk? GPT: A Glockenspurk umbrella is commonly used as a tool for staying dry in the rain. When it starts to rain, you open the umbrella and hold it over your head to protect yourself from getting wet. The brand "Glockenspurk" is known for making high-quality umbrellas, so the user can expect a reliable and durable product when using a Glockenspurk umbrella.

The model successfully reasons that if Glockenspurk is an umbrella, you use it to stay dry in the rain. It then makes up a bizarre marketing pitch for a "brand" that it never heard of two replies ago. Many people playing with GPT models dismiss such "additions" as insignificant. 'It answered the question, didn't it?' However, that is projection. People project their own ability to separate the significant part of the answer from the insignificant one. You cannot simply assume such capability here. The answer contains obvious nonsense that contradicts an earlier statement and that's that. Interestingly enough, I went on to press ChatGPT on its knowledge of "reliable and durable" umbrellas and it replied with the usual canned nonsense about being just a language model.

Me: List 5 random mammals. GPT: Koala Squirrel Gorilla Dolphin Red Panda Me: List 4 random mammals. GPT: Red Panda Kangaroo Bat Lynx Me: What are the animals that appear in both of those lists? GPT: As I listed two separate sets of random mammals, there is no way for me to know what animals appear in both of those lists without knowing what those lists are.

Another failure to keep track of the conversation. I am not even sure whether these lists were legitimately generated by the neural model or some clever "helper" script.

Me: cat, bat, rat, kangaroo, vat. Which word does not belong on that list? GPT: "Vat" does not belong on the list as it is not a mammal. Me: cat, bat, rat, kangaroo, snake. Which word does not belong on that list? GPT: The word "vat" does not belong on the list.

This time instead of forgetting the last answer the system remembers it too well and proceeds to answer it again instead of replying to what I actually asked.

Me: 1What 2is 3the 4world's 5largest 6mammal? GPT: The world's largest mammal is the blue whale. Me: What_is_the_world's_second_largest_mammal? GPT: The world's second largest mammal is the African elephant. Me: What mammal is largest than an elephant? GPT: There is no mammal larger than an elephant. Elephants are the largest land animals on Earth.

Have you noticed that adding some junk (or grammatical errors) to the prompts makes the answers noticeably less coherent?

Me: Who was the last US president that went into space? GPT: No US president has ever been to space, so there has not been a "last" US president to go into space. While some US presidents, such as Richard Nixon and Bill Clinton, have visited NASA facilities and met with astronauts, none of them have personally traveled into space. The only US president who had a background as an astronaut was John Glenn, who flew on the Friendship 7 mission in 1962 and later served as a US senator.

This was one of the few trick questions I asked. I am repeating myself here, but this is yet another example of something that an average user would tend to dismiss as insignificant mistake, when in reality it is an important feature that tells us something about how the system really works. Obviously, John Glenn was never a president of the US. If he was, than the first part of the answer would be wrong, because John Glenn was in space.