Air Canada recently took their "AI" support chatbots offline after a court decision (or tribunal decision, to be precise) forced them to pay up and honor the non-existent policy that the bot described to one of the customers. This news article by Ars Technica is simultaneously banal and worthy of much more attention than it is getting. It's banal, because the incident was entirely predictable to anyone who understands how these AI systems work. It should be widely discussed, because it highlights something that contradicts the common view of generative models.
Certain basic points must be driven home:
1) Generative AI systems are often confidently wrong. This is a much bigger problem than the general public thinks due to how most people learn about these systems. Casual chats are an awful way to asses their accuracy. On the other hand, all of the formal testing of AI is done by AI researcheres, who are almost universally incentivized to make these models look as impressive as possible. The situation is compeltely asymmetrical.
2) When generative AI fails, it often fails differently from the way people fail. This is important, because we have millenia-old intuitions and robust institutions to deal with human failures. We have nothing of the sort for machine learning models.
3) When generative AI fails, there is no way to explain or directly correct the specific failure. The situation with explainability is so bad that the whole field invented an entirely diffrent term ("intepretability") to shift the goalposts. "Fixing" the problem through tuning often leads to degradations of model performance in other areas.