Exploring State of The Art NLP Models.

Blessing Magabane
5 min readOct 30, 2020

--

Language is the cornerstone of every civilization.

Source :Blessing Magabane.

Author: Blessing Magabane

Natural Processing Language has seen a lot of development in recent years. It is by far the most exciting field in data science right now. This rapid change was due to several reasons. But it wasn’t until 2018 through the introduction of BERT (Bidirectional Encoding Representation Transformers) that is has been revolutionized. The BERT models are so advanced they seem like they are from the future.

In the past conventional models for NLP used transformers and other encoding techniques to translate text across languages, semantic analysis and text classification. Transformers are useful in performing those tasks, they encode the sequence which is the sentence full of text. The encoding process is linked to the decoding process. The two process happen in linearly, meaning the model moves from left to right or vice versa. This technique works to a certain degree, but the mere fact that it is one directional it hinders it in the long run. The model cannot incorporate context or any other parameter.

However, in the case of BERT the story is different the model is bidirectional the is no strict direction to follow. The advantage about this is feature is that the model gets to understand context of the sequence. Like the transformers the BERT state of the model can be used for classification, semantic analysis and question-answering. Question-answering is particularly useful for chatbots. With the ever-increasing use of chatbots for customer services BERT has never been more important.

In this article the focus is on question-answering. A BERT model will be used to answer Covid-19 frequently asked questions.

Datasets:

The covid-19 FAQ data is obtained from the CDC, the data is comprised of a question and an answer. The link to the data is shown below,

In this article a pre-trained model will be used to answer some of the questions derived from the CDC. A question will be defined, and a context provided. The pre-trained model will use those two parameters to provide an answer.

BERT Architecture:

Below is the standard architecture of BERT.

Source : “https://medium.com/@shreyasikalra25/predict-movie-reviews-with-bert-88d8b79f5718”

The diagram above shows the BERT architecture within it there is a transformer. The way the model works is quite complex but below is a basic breakdown of the components.

The first component is Positional Encoding a linear function is used to hold the positions of the words. A sinusoid is incorporated in the positional encoding to improve the attention of the model. The next component is Masking there are two types of masking under Multi-Head Attention, the is padding mask which has a fixed length. The other type of masking is the Look-Ahead mask it is used for self-attention and decoding. The Scaled Dot-Product Attention is the other component responsible for matrix multiplication and other matrix operations.

The other component is the Multi-Head Attention it is a continuation of the matrix operation but introduces a new layer to the dimension. The final part of the BERT architecture is the Point-Wise Feed Forward Network it is a transformation and a normalization. An integration of all the components is called a transformer.

Model:

The Hugging face BERT will be used to perform the modeling. Within the BERT framework there is several models that can be used. An ideal model should be able to understand context and be in position to answer. In this section the bert-base-cased is used. Before we can use the bert-base-case model, a benchmark model is used. Below is a screen print showing the benchmarking process,

The answer to the question is shown below,

The model performs quite well at first attempt, and the accuracy score is also quite good.

In the next step we begin using the bert-base-case model, it is important to note that the are other models out there however in this article the bert-base-case is used. Below there is screen print shows the tokenization process,

Since the model is pre-trained there is no need for modeling. We just jump straight to using it. The next step is defining the questions, along with the context.

The final step, we evaluate the questions and context to get the answers,

The BERT model answered the questions correctly, but we only see the last question asked answered. The reason why we only see the last question is due to the model definition. A slight modification can be made to show all the answers.

Conclusion:

The answers from the BERT model are impressive it is without doubt that BERT is one of the best NLP frameworks right now. To improve the answer readability the special characters, need to be removed. Perhaps in follow up article this process can be included, the idea with this article was to show how easy it is to use BERT.

The code used in article can be assessed through the following link,

You can follow and contact me on the following platforms,

Twitter :@blessing3ke

LinkedIn : https://www.linkedin.com/in/blessing-magabane

--

--

Blessing Magabane

A full stack Data Scientist with experience in data engineering and business intelligence.