Shalvi Mahajan - Natural Language Understanding and Inference

Automatic Summary

Natural Language Understanding and Influence: Pivotal Parts of Natural Language Processing

Throughout my professional journey as a data scientist at companies like Allianz and Samsung R&D, and currently at SAP in Germany, I have recognized the powerful impact that natural language processing (NLP) can have across various industries in real-time applications.

In today’s article, I am going to delve into two key sub-parts of NLP, namely Natural Language Understanding (NLU) and Influence. I am also going to highlight their importance in conversational AI, machine translation, and bi-directional communication.

However, before I do that, let's quickly refresh our understanding of Natural Language Processing.

NLP, or Natural Language Processing, is a process whereby automatic computational analysis is deployed to understand human language. This innovative technology combines rule-based modeling of human language with statistical machine learning and deep learning models, facilitating various tasks such as speech recognition, part-of-speech tagging, named entity recognition, co-reference resolution, and machine translation to name a few.

Moving towards the core part. What exactly is Natural Language Understanding (NLU)?

NLU is a subset of NLP that uses the syntactic and semantic analysis of text and speech to ascertain the meaning of sentences. In NLP, conversion of text into structured data is performed while NLU focuses on the 'reading aspect' – grasping the intention behind the text. In short, NLU forms the core component of NLP.

What then, is Natural Language Influence?

In the sphere of NLP, Natural Language Influence studies the possibility of whether a hypothesis can be inferred from a premise when both are text sequences.

How NLP Approaches Vary from Basic to Advanced

  • Rule-based NLP: This approach uses regular expressions and is based on specific rules.
  • Statistical or Stochastic NLP: This is an optimization problem that aims to maximize the probability of output, given the input.
  • Deep_learning and neural network methods: This involves recurrent neural networks and LSDM.

Natural Language Understanding in Conversational AI

Conversational AI encompasses technologies such as chatbots and voice assistants – communication mediums that users can interact with. NLU plays a key role in a chatbot conversation by detecting the intent and the sentiment of the user's words, along with the extraction of key entity attributes.

Natural Language Influence & Machine Translation

In machine translation, computational language technology is used to convert one language to another while retaining the original meaning of the input and attempting to generate fluent text in the desired language. The logical relationships between text sequences are determined via natural language inference.

Conclusion

Being an intrinsic part of NLP, Natural Language Understanding, and Influence have a widespread application across industries. They help convert words in the text into vectors or embeddings for further processing and play a significant role in Conversational AI. Furthermore, they help language-agnostic models map sentences from different languages in high dimensional space to gauge their semantic proximity. Understanding these technologies opens up a world of possibilities in the realm of language translation, interpretation, and communication as a whole.

Do reach out if you have more questions or would like to know more on this topic. I am here to help.


Video Transcription

So I'm Shavin Hazen, working as data scientist at Sapsc Germany, currently located at Munich. Um Previously, I basically worked at Alliance as data scientist and at Samsung RND, a software engineer.And at SAP, we are working uh on building and validating various ML use cases so that they can be adopted in various products across different industries. Um And today's topic that I'm going to deliver is natural language understanding and in influence which are actually the sub parts of N LP. So, so today's agenda would be um a little bit on the topic background. Uh uh So, so that if someone does not know the topic in detail, they can also know what N LP actually is. Uh N LP approaches from basic to advanced some of the representation methods, uh transformation methods, natural language understanding, which actually is the topic. Um And the conversa how it is useful in conversational A I um machine translation, natural language in influence and finally the conclusion. So to start with um everyone knows what natural language is. Uh It's the method of communication which could be either speech or text.

Um N LP provides an automatic computational processing of human language. Uh So basically, it combines the computational linguistics uh rule based modeling of human language with statistical machine learning and deep learning models. So some of the few tasks uh that N LP actually does is speech recognition PS tagging is part of speech tagging, uh which is a grammatical tagging uh in a text named entity recognition uh is like whether in a sentence, this is a time or location or what entity it belongs to uh then co reference resolution.

So co reference resolution is like if I say shall we is going to market, she will be late. So she here refers to Shavi that is uh co reference resolution so that uh N LP model should be capable of doing um machine translation. So machine translation, everyone knows and everyone is familiar about Google translate and so on. Uh sentiment analysis which actually tells and from a text or a speech, what sentiment a person wants to indicate or uh what intent does he have next comes the N LP approaches from basic to advanced. So uh initially, we had to based N LP, which is a regular expressions, something uh based on just rules. Uh text normalization is quite common in N LP which is a word document, organization limitation stemming. So what exactly is what tokenization? It's uh for example, if you have a long sentence, you just have removed the stop words like is R and you just have the useful words in, in your sentence. So that we know the meaning of the sentence limitation is like uh having a basic word uh for if I say she's doing better, so better belongs to good. So good is the lemma of better. So that's limitation. Stemming is like I say she's walking.

So walking, I remove ing from walking and walk is the stem. So that's the um stemming. So basically we just normalize the whole sentence or the text minimum at distance is just written so that we can compare the two sentences how close they are by calculating the distance in between them. Um If that's the problem that we have to solve, which is calculated using Levenstein distance, we can also talk about this distance in a bit more detail. But I think for now let's just keep it like this. Um So if you in case have any questions feel free to just um ask in the chat or Q and A um next comes the statistical or stochastic A lb. So uh it is an optimization problem which aims to maximize the probability of the output given the input. Like what we do InVision approach then is the vector method. So I'll explain the vector methods in detail in the upcoming slide. So it's uh basically some bag of words and gram approach. Uh Some of the embedding methods were to ve f ideas which I'll explain um deep learning and uh neural network methods are also there like recurrent neural networks and LSDM. So for now this um we sh it's not important to touch these topics. I think. So.

Yeah, just a second. Yeah. So the representation methods are um the bag of words is one of the representation method. So what does it mean by representation method is like for example, if you have a corpus or a document with lots of text, how would you represent each and every word? So one of the method is bag of words, which actually, for example, in this case, it's let's say we have a lots of documents uh with uh lots of vocabulary in each of them. So each row will ob uh represent a single observation. And the cell here we present the count of the word represented by the column in that observation. So for example, in the document 0001, we had biolog 12 times appeared. And uh that's how we measure for each and every word after removing the stopword limitation, doing the stemming and so on. So basically this article class, for example, here is a label, let's say if we are to do some uh kind of a classification or a supervised problem. So that label would help in this case. So this is one way of representing our uh text. The second way is embedding metrics. So embedding can be made for each of the words present in the vocabulary.

And uh so what exactly is embedding is to represent a word in a high dimensional space that's that's embedding. And this can also be represented in the form of matrix. So the transformation methods are what I talked about about the embedding. So a few of them are mentioned here in this slide just so that you can go with me uh topics in the next slide. Um So for example, we, what we do is words in the text are mapped to vectors of numbers. So um uh for example, we have TF ID. So it's the term frequency inverse document frequency. So that's the formula. So what it does is basically counts the number of uh how many times uh does the do term appeared in that document? And also um the inverse document frequency is represented with this formula logarithmic of N which is the total number of documents divided by the docu number of documents containing that term. So that's uh inverse document frequency on the right. You can see how word vectors look like.

So basically like a king will be represented somewhere like here. Uh Man would be represented here, queen here and women here. So that shows that how different king and queen is from each other. Uh Based on when we map them on a high dimensional space. One hot encoding is nothing. But what I explained on the bag of words, it would just take whether the word is present or not. That's all. So if it's present, it will mark it one or zero. So it's a kind of a binary encoding. Let's see. So what I'm betting methods other than that are like word to be fast. So they also do the similar things as uh back of whats uh similar to that. So thi this is just to explain uh the difference between character bag of words and script ram. Um But it's not that relevant for this slide. Um But other than that, uh since computing such embedding takes a lot of time uh and it's very expensive. So doing that every time for every vocabulary and every corpus, it's, it's difficult. Uh So it's better to use the transfer learning where the model is trained on one task and is repurposed on a second related task, it can be reused and tested on the second related task. So this by this, we can just avoid training it again and again. So now comes the main gist of this uh session uh which is natural language understanding. Uh So natural language understanding is also one subset of natural language processing which uses synthetic and semantic analysis of text and speech to determine the meaning of symptoms.

So basically N LP is constituting of natural language understanding and natural language generation. So natural language understanding is actually the core of N LP. How uh it's that it covers the reading aspect of N LP. So what N LP does is turning the text into structure data while what NLU does it covers the reading aspect, the intent of the text. So I think it's the core component of N LP. For example, entity detection for uh to know what the sentence, uh what the words in the sentence implies to like if it's a place is, it's a location, what entity it is or a topic classification, like whether this email is a spam or not. So and also NLU is a very, very important part in conversation layer. You'll see that in a second uh and natural language generation is just that the structure data that you got in during processing to convert it into an output text. So natural language uh understanding basically establishes relevant uh ontology or data structure which specifies the relationship between words and phrases. So that we know the meaning. For example, we have these two sentences, a human can know the difference between the same word used in the two sentences.

For example A I uh Alice is swimming against the current and the current version of report is in the folder. So um in the first sentence, current is a noun, but in the second sentence, it is used as an adjective. So human can make the difference. And in case of an IP NLU basically focuses on this reading comprehension through grammar and context and enables to determine the intended meaning of the sentence. And that's actually very, very useful part in A N LP model. So for conversational A I, um it refers to the technol technologies, everyone is familiar about the chat bots, voice assistants, which users can talk to, for example, apple TV, uh Alexa Google assistant, etcetera. So how does NLU actually help here? So the main part of uh a chat board or a voice assistant is intent classification and entity extraction. And as I discussed above in my previous life, they are also the primary drivers of conversational A I. So intent classification is nothing but just knowing the intent of a sentence. So for example, an NLU model is fed with the label data with a list of known intent and example sentences that correspond to these intents. And once we train uh this model, the model should be able to classify a new sentence and that sees uh what intent it should be defined onto.

So it's, it's a supervised learning example and enter the extraction is again the same thing to uh recognize the key pieces of information in a given text, time place name of a person just to get an idea and a feeling of additional context and information related to an intent.

And why is it like important in conversational A I. So it's important as for example, if you see here in this example, um a person says, I want to go on a vacation. The chatbot says fun, let's do it. Who's going? So this part to uh fun. Let's do it. Who's going tells, uh, that it's something related to vacation and that chatbot is able to know. So it's, it's an intent that, uh, a good chatbot should be able to, um, get a feeling of. And, um, the second is just, and the, then the person says just me and, uh, the chatbot is perfect. Where do you like to go? So, this is, um, the second part is basically an entity instruction where so where is uh a good chat bot should know that it's talking about, it should next talk about the location. So based on the interest from your NLU model, the current state of conversation and its train model, the core component should be of a chat board should be able to decide the next course of action uh which could be sending a reply back to the user or taking any action like suggesting something and so on.

So, one of the good examples uh right now the uh uh is the chatbot, Mina, which is actually the chatbot from Google. And it's an end to end neural conversational model that learns to respond sensibly to a given conversational context and by sensibly what I mean, I'll discuss in the next slide. Um And the training objective is just to minimize perplexity, which is to maximize the probability and uh on uh uh an unseen text uh test set. So basically, perplexity is just average branching factor in predicting the next word. And this is just an overview of uh the architecture that it has a transformer sequence to sequence architecture where it has one evolve transformer encoder block and 13 evolve transformer decoder block. The transformer is also a sequence to sequence architecture with um um overall transformer has encoders which encodes the input into embeddings and decoder to decode and uh embedding throughout uh by evolve. It just has additional uh convolutional neural network layers on the um entry and the exit points.

Um So that's uh that's a bit of a more detail. But he here, uh the metrics used to measure the performance of Mina uh was basically uh sensibleness and specificity average and how did it come? So, for example, a person says, I love tennis and the chatbot says that's nice. So it's, it's, it is, it makes sense. Yes, definitely sensibleness is good but it, it is not specific, it is uh not specific in the context. It's like to say it for any um any other comment, it's just any other comment, but a good chatbot should say me too. I can't get enough of Roger Federer, which is very, very specific. So how uh so this plot on the left side basically compares human and different variety of chatbots including Nina. Uh So these chat bots are also open source. Uh So the sensibleness of a chatbot is the fraction of responses label as sensible, whether it's it makes sense the reply makes sense or not. And the specificity is the fraction of responses that are marked and the average of two, these two is this SSS score and Mina actually performed quite well. Uh It actually beats the state of the art currently.

So before going into the uh natural language, er I just wanted to clarify a bit more on machine translation because these topics are a bit related. So machine translation is just converting one language to other, preserving the original meaning of the input and producing fluent text in the desired language. Um For example, Google translate everyone uses. Um And so some of the types of machine learning translation is statistical models which actually deal with huge volumes of bilingual content. It's it, it's, it's very simple. Something which Google translate also use just translating a corresponding word from a source language to the objective language rule based.

Machine translation is just using with the help of a regular um expression. So the example shown down below where the input sequences is as high uh converting to is it hot? Uh it uh it is using a statistical model doing it word by word and hybrid uh hybrid version would be the mix of rule based and the statistical one. and then uh neural uh based on a neural network, it's um also just using a statistical model but but training it uh with the help of a neural network model. So um the second part of this uh session is natural language inference. So natural lan language inference is just studying whether a hypothesis can be inferred from a premise when both are text sequences. So as you can see on the right, there are different languages being written and on a high dimensional space, if we map one sentence with the other, we should know how uh close these two are irrespective of the language. So uh I it should be able to determine the logical relationship between a pair of text sequences. So such a relationship usually fall into three types. Entailment. Entailment is yes, the hypothesis can be inferred from the premise. Uh contradiction is the negation um of the hypothesis.

Um that can be inferred from the premise and neutral is like any other case where it doesn't matter. Yeah. So uh now how do we actually do natural language in influence? So one of the idea is using attention mechanism. So what is it does is if we look in the, on the left side? Mm So if the premise is I do need sleep and hypothesis is I'm tired, the first step would be to use the alignment or the attention, which is the weighted average between the words in one text sequence to the words in the other text sequence. And just seeing like if I is well aligned with I and sleep is well aligned with tired but not the rest of the words. So that that's the first step. Then we have a set of uh word and the aligned words. So we can do this comparison. And how does it work? Is it basically concatenates the word from uh one sequence and aligned words from other sequence into a function? So it just concatenate these, these words then comes the aggregation aggregation is as the word suggests, summing, summing the output from these uh embedding because these are all definitely embeddings and then concatenating them and finally predicting them.

So just a follow up demo um of what I did uh basically on uh Stanford Natural Language infant data set. So here, uh it's just a summary of how, how it would look like. So as you can see in the forward part, we have the premise and hypothesis, we create the embedding out of them, then you use attention um attend function comparison aggregation. And here on the right side, we can see how the train loss looks like. So train loss is decreasing and it found its global minimal somewhere after epoch three. So here, if finally, we predict, we create this function and predict and based on the la label, either it's 01, we say, what is it an entailment contradiction or if it's none of these. So finally, if we, if I wrote he's good after training on a vocabulary and he's bad. Uh So it's a contradiction it knows. So basically natural language inference helps in map mapping sentences from multiple languages that can be processed using language agnostic mo Multilingual models.

So here in this example, it was just English, but we can do it uh for multiple languages and it can analyze how closely they are related to each other in a high dimensional space. So um finally, the language agnostic a little bit on language agnostic manner. So Facebook has made one of the library called laser where it is doing the same thing, it maps a sentence in any language to point in a high dimensional space. With the goal that the sta same statement in any language should end up in the same neighborhood as we've seen. So this representation could be seen as a universal language in a semantic vector space. And it also has a lot of lots and lots of languages and it's uh accuracy quite well. And we have observed that the in this library, the se distance in that space who relates very well to the semantic closeness of the sentence. And this is one of the example library of of how to use natural language inference. So finally, what's in the text are convert, what we actually did in this presentation. So what's in the text can be converted into vector and embedding form to process the language in a model. Um And uh natural language in influence uh sorry, understanding and generation. Both are the subtopics of N LP which constitutes N LP.

Um And has a major, the NLO at least has a major contribution towards conversational A I. It's a core component. Um And language agnostic models map sentences from different languages in high dimensional space to see their semantic closeness. Um And in analyze the attention model, this consists of actually three steps to determine the logical relationships between princess and hypothesis is attending, comparing and aggregating. So that's all. Um So I'll now look into the questions. I hope my flight was able to convey the message.

Um So um as an OK, fine, I've seen some examples of Google translated sentence from a gender neutral pronoun to another language that uses he she and it would assume she is cooking. He's a doctor. Is that problem with the training data? And could it be fixed with clearing other pairs and sentences? So, yeah, that's, that's basically a problem because currently most of our data set is biased and is trained on a biased data set. That's why it creates a problem. And um that is something which everyone is now focusing on to remove this gender bias from the training data set on the corpuses. So yeah, and soon hopefully we'll see. So Google is also working on this Google translate uh component so that the the bias thing shouldn't be there. I don't see any more questions. All the good comments. Thanks. Thanks a lot. Everyone. Um What would you recommend to entry level? So, NLO I wanna say is come entirely just a separate field. It's, it's a part of N LP. So if you want to start, uh basically with N LP, um you can maybe look into some course of courses or some uh on youtube, maybe you can find some Stanford related uh courses as well. Um And feel free to con contact me.

We can have a discussion, a long discussion on it. I'm, I'm on linkedin and uh I'm happy to take questions there as well. Um Yeah. So skills are old. So skills for skills, I think Python should be good, good enough. Um But other than that, it's more of a conceptual thing. Um And the basics of machine learning would help. So, yeah, again, Coursera or Stanford videos would be good. Any other questions? Ok. So um if there are no other questions, so feel free to contact me on linkedin. Um I'm happy to take any other questions uh personally as well. Um Thanks a lot, everyone for your attention and stay safe and take care. Thanks.