Tune in to Focus Carolina during morning, noon and evening drive times and on the weekends to hear stories from faculty members at UNC and find out what ignites their passion for their work. Focus Carolina is an exclusive program on 97.9 The Hill WCHL, sponsored by the University of North Carolina at Chapel Hill.

Dr. Mohit Bansal directs Carolina’s natural language processing lab. His team, from the computer science department in the College of Arts and Sciences, is building machines that understand complex human language and also generate human language allowing for more realistic interactions.

“I can define [language processing] in layman terms,” Dr. Bansal said, “how to build machines or artificial agents, intelligence agents that can both understand human language very naturally written or spoken as well as generate human like language.”

Dr. Bansal said it’s easy to introduce undergraduate students to language processing, because devices like Alexa, Google Home and Siri are examples of it. Those devices use both NLU and NLG.

“NLU would be natural language understanding because it has to understand what you said first,” Dr. Bansal said. “And then it also does NLG, which is natural language generation, because it generates a response to what you said.”

One of the aspects that Dr. Bansal researches is if these devices should be given personality in how they interact with people.

“So then there was a debate on whether these sort of conversational agents should be sort of imitating politeness, rudeness, strictness, but there’s a sort of thin boundary between them becoming a therapist, which we don’t want.”

In Dr. Bansal’s lab, work is also being done on a version of Cliff Notes for the mobile phones.

“This is also one of the main examples of natural language generation,” Dr. Bansal said. “So the idea of automatic document summarization would be to take a very, very long document or even several documents — which is known as multi document summarization — and being able to compress that information into maybe a hundred words so that maybe you could read it on your phone screen because you can’t scroll a hundred pages on your phone screen.”

There are several tasks a machine needs to take care of in this automatically.

First, it needs to make sure the information being pulled is relevant to the original document. Then, it needs to avoid generating redundant information in the summary. The machine also needs to bring the most important information from the document.

One trick that Dr. Bansal’s lab uses is to combine extractive and abstractive summarization.

“Extractive summarization is something which means just choose the sentences from the original document that should go into summary. But abstractive summarization tries to rewrite those sentences. It actually wants to use the space even more efficiently. So it tries to compress and rewrite sentences.”