Wednesday, June 6, 2018

The Importance of Training Data

In the movie Arrival, 12 alien ships visit Earth, landing at various locations around the world. Dr. Louise Banks, a linguist, is brought in to help make contact with the aliens who have landed in Montana.

With no experience with their language, Louise instead teaches the aliens, which they call heptapods, English while they teach her their own language. Other countries around the world follow suit. But China seems to view the heptapods with suspicion that turns into outright hostility later on.

Dr. Banks learns that China was using Mahjong to teach/communicate with the heptapods. She points out that using a game in this way changes the nature of the communication - everything becomes a competition with winners and loser. As they said in the movie, paraphrasing something said by Abraham Maslow, "If all I ever gave you was a hammer, everything is a nail."

Training material matters and can drastically affect the outcome. Just look at Norman, the psychopathic AI developed at MIT. As described in an article from BBC:
The psychopathic algorithm was created by a team at the Massachusetts Institute of Technology, as part of an experiment to see what training AI on data from "the dark corners of the net" would do to its world view.

The software was shown images of people dying in gruesome circumstances, culled from a group on the website Reddit.

Then the AI, which can interpret pictures and describe what it sees in text form, was shown inkblot drawings and asked what it saw in them.

Norman's view was unremittingly bleak - it saw dead bodies, blood and destruction in every image.

Alongside Norman, another AI was trained on more normal images of cats, birds and people.

It saw far more cheerful images in the same abstract blots.

The fact that Norman's responses were so much darker illustrates a harsh reality in the new world of machine learning, said Prof Iyad Rahwan, part of the three-person team from MIT's Media Lab which developed Norman.

"Data matters more than the algorithm."
This finding is especially important when you consider how AI has and can be used - for instance, in assessing risk of reoffending among parolees or determining which credit card applications to approve. If the training data is flawed, the outcome will be too. In fact, a book I read last year, Weapons of Math Destruction, discusses how decisions made by AI can reflect the biases of their creators. I highly recommend reading it!

1 comment:

  1. It’s been weird to see AI researchers conclude “sampling matters” as if it’s a new insight!