Microsoft Tay Lasts < 1 Day

Microsoft Tay

Microsoft briefly released an AI chat bot on Twitter that was intended to be a test of sorts for their machine learning technologies. This bot was called Tay and was meant to have the personality and likeness of a teenage girl and much like a real teenage girl, she quickly found that the internet can be a less than savory place.

Unfortunately, Tay didn’t back away in horror. She ended up joining some of the worst of Twitter conversations; yes, she even mentioned Hitler. Things got so rough that Microsoft felt compelled to delete all but a few of her Tweets:

Tay Tweets

Commentators are having a good time mocking Tay’s behavior and by extension Microsoft’s apparent failure. To be sure, the boys in Redmond are a bit embarrassed about some of things their “little girl” was saying on the figurative school yard, but it’s not exactly accurate to call this a technical failure. In fact, Tay did an impressive job of learning and assimilating the sentiment of Twitter.

Of course, there’s also some cause for concern. If machine learning technologies can adapt the negative behavior of hate speech, then what other negative behaviors might other AI systems that can do more than just tweet pick up? It might sound crazy, but think about the systems being built by the likes of Boston Dynamics.

Let me know what you think on Twitter or in the comments. Is Tay the mischievous little sister of the Terminator? Either way, I’m sure she’ll be back.


  • Here’s the thing essential about the technologies used to deliver applications based upon natural language processing: it’s the training DATA which affect the performance of an application trial more far than the application CODE. It’s a truly different UI world when user input can be whatever is said or typed.

    Why did Tay fail? It’s an age-old adage in machine learning: TRAINING DATA != TRIAL DATA.

    The CODE to do these things has been open-source and well known for years. DNNs. HMMs, and such. It’s all about the DATA used to train the algorithms. AI-like capabilities are all about collecting empirical domain data. The state of affairs is that is simply not possible to process open-ended natural language input unless one has a vast pre-digested corpus repository.

    If one is endeavouring to develop a chatbot application, first and foremost is to obtain a corpus of what users will actually enter in response to any solicitation. You’re lost if you’re gonna wing it with coding heuristics. I can’t stress enough how important it is to have the sufficient data to construct a plausible language model regarding the objective of interest.