The most fascinating topic in the current technological era is sequence annotation or labelling. It is also handled as an independent problem. Sequence labelling is a sort of pattern recognition task that entails the algorithmic assignment of a categorical label to each component of a sequence of observed values.
Sequence labelling can be put into practice utilising conventional techniques like HMM and CRF. Both approaches use the input sequence to learn to anticipate the best possible labelling order. These are very effective techniques, but they have not been widely adopted because of a few flaws, such as a lack of semantic awareness and an inability to handle longer sequential dependencies. Because of this, they can capture the local dependencies and discover longer patterns by using deep learning techniques like recurrent neural networks.
Google search engine is one example of a real-world use for sequence labelling. When we enter some words in the search box, Google will automatically recommend certain phrases or words, which simplifies our task.
To solve the flaws of conventional techniques we apply modern deep learning techniques like Bi-directional LSTMs, Simple RNNs, and 1D CNNs to extract the meaning of the sequences and label them.
In this blog, we will deep dive into deep learning algorithms used in sequences labelling, their advantages and real-world applications of such algorithms.
Table of Contents:
- Conventional methods VS deep learning methods to label sequence
- Deep learning algorithms to label sequences
- Sequence labelling process using deep learning models
- Real-world applications of sequence labelling
- Conclusion
- Conventional methods VS deep learning methods to label sequence
Over the past ten years, one of the key objectives in natural language processing has been sequence tagging. The primary goal of NLP is to transform human language into a formal representation that can be easily manipulated by computers.
Linear statistical models like HMM and CRF are the most common sequence labelling models that have demonstrated good performance; nevertheless, these models heavily rely on specialised task resources and hand-crafted features. When compared to linear statistical models, we can achieve superior outcomes by adopting high-performance techniques when compared to deep learning techniques, the accuracy we obtain using these linear statistical models is insufficient. Therefore, we train and test the datasets using deep learning techniques in order to acquire good results and overcome the shortcomings in the existing methodologies. In order to increase accuracy
The major difference between conventional methods and deep learning methods is that deep learning techniques like RNN learn the patterns in a sequential manner to memorise previous tokens while conventional methods like HMM learn from independent tokens.
Conventional models that are used in sequence labelling:
- HMM(Hidden Markov Models):
A generative model called the HMM assigns a joint probability to the sequence of labels and the observations. In order to maximise the joint likelihood of training sets, the parameters are then trained.
- CRF:
This is a statistical model. The usage of this for pattern recognition is widespread. This falls under the umbrella of sequence modelling. This is a graphical model with undirected probabilities.
- SVM:
This method is one of the traditional based approaches. This is used to separate the data by using the hyperplane. Based on the data the hyperplanes may vary.
Modern Deep learning methods:
- Long-short term memories(LSTM)
- Bi-directional LSTM-CNN
- 1D Convolutional Neural Network(CNN)
- Simple Recurrent Neural network(RNN)
- Deep learning algorithms to label sequences
In the previous section, we saw deep learning techniques that can be used in sequence labelling. In this section let’s learn them in detail.
- Long-short term memory(LSTM):
LSTM algorithm was developed by Hochreiter and Schmidhuber in 1997, LSTM carries additional data flow along the time steps. This additional information will be combined with the input connection and the recurrent connection and this will affect the state being sent to the next time step.
- Bi-Directional LSTM-CNN:
This approach combines the use of two approaches. They are Bi-RNN and LSTM (Bi-directional Recurrent neural networks). An improvement or unique creation of artificial neural networks is called bi-LSTM (ANN). This approach was developed since the typical methods used in the earlier methods for greater sequences of data were ineffective for solving problems. In addition, RNN is not supported for this long series of data, so this method is used to address both of these drawbacks. Bi-LSTM is an improvement over RNN since it can handle longer sequences of data. These Bi-LSTM, however, are made up of three layers: the input layer, the hidden layer, and the output layer.
- 1D Convolutional Neural Network:
A 1-dimensional CNN algorithm extracts local 1D patches of timesteps vectors and recognizes local patterns in sequence. Because some input transformation is performed on every patch, a pattern learned at one place can be recognized at any place in a certain sequence.
- Simple RNN:
RNN is simply processes sequences by iterating through the sequence elements and maintaining a state containing information relative to what it has seen so far.
- Sequence labelling process using deep learning models
In sequence labelling, deep learning models input goes through the following steps
- It starts with a one-hot layer to look up the word in their embedding space, turns the word into a dense vector and feeds it into a multi-layer Bidirectional LSTM model.
- The Bidirectional LSTM is actually two separate neural network layers, one feeds the data from the beginning to the end and one from the end to the beginning. Both layers are joined to create a better representation of the context.
- After Bidirectional LSTM layers, the probabilities of possible classes for each input entity are computed using a ‘softmax’ layer.
- To have better consistency in the predicted label sequence, the ‘softmax’ probabilities are combined with the transition probabilities from a linear CRF layer. In other words, instead of predicting each label independently, the CRF layer considers the labels of surrounding words as well.
- At the last layer, using the vector representing both the ‘softmax’ layer and CRF layer, a prediction is made.
- Real-world applications of sequence labelling
- Sentiment analysis of sequences
- Document classification
- News categories labelling
- Information extraction
- Word sense disambiguation
- Conclusion
In this blog, we learned about sequence labelling using traditional approaches and model deep learning methods that are responsible to extract information from sequential data and predict the label from representations.
We saw steps to be followed to predict the label using deep learning methods and applications of sequence labelling.
References:
[1] Chapter 6, deep learning for text and sequences, deep learning with python by Francois Chollet
[2] https://www.warse.org/IJATCSE/static/pdf/file/ijatcse150862019.pdf