NLP: Text Classification using Keras

March 5, 2020


Natural language processing is an ocean of different work areas, but it constitutes a fundamental task: text classification. Basically, it involves assigning a text into various predefined categories, for example, the text “basketball” can be assigned under the “sports” category.  By doing so, a document becomes easy to manage and sort.

Text classification basically involves filtering the unstructured data. It can be used to structure useful information in various forms: emails, documents, web pages, chat conversations, social media messages, reviews on various issues or trends. The data can be classified by either humans or using NLP. Manual intervention is both time-consuming and laborious. Using machine learning, this task can be performed efficiently.


The text classifier is a combination of three components:


The larger the dataset the more well-trained model will be. For example, if you have 800 categories you must provide at least 100 datasets for each category.

Datasets are freely available via open-source distribution. For example, IMDB provides movie reviews which are both positive and negative.


A dataset consists of important text as well as stopwords, mis-spellings, slangs. We have to filter this noise from the data. We treat all the words in dataset equal, but in preprocessing we have to assign weights to each word; weightage could be based on how important the text is or the number of times it is occurring in the document. There are various techniques to preprocess the data such as BOW, TF-IDF, N-gram models, K-nearest, and Random forests, among others.

Classification algorithm and strategy:

There are various algorithms available. You can choose any of them according to your model requirements. Naive Bayes, Support vector machines(SVM), Deep learning, Decision trees are some examples.  


Sentiment Analysis-

Its main motive is to identify the polarity of the content and what impact it imposes. Sentiment analysis determines whether a text is positive or negative. It can also determine in the binary form of likes or dislike ratings for various movies, brands, or reviews on current affairs.

Topic labeling-

It helps one to know details about the content: what it relates from, where it is derived, and what it means. This analysis can be used for taking customer feedback on a particular topic or organizing new articles according to their subjects.

Language detection-

It is used to classify a text compatible with the incoming language it is used for routing purposes.

Why is it important?

According to research,  80% of the data is unstructured. Even though it may contain important information, due to noise, slang, or other things which make the data massive. Text classification uses text classifiers that remove unwanted items from the text, categorizes, and makes the data useful.

  • A text classifier improves the scalability.
  • There are many situations where we need the results instantly; text classifiers provide real-time analysis which helps to perform difficult operations or extract important information immediately.
  • By manual classification of data, many mistakes can arise. The reason could be anything: low-level understanding, limited knowledge, distraction, boredom or anything else.

Use Cases:

Text classification can be used for various purposes, some of them are:

  • Social media monitoring
  • Brand monitoring
  • Customer service
  • Voice of service


There are six steps involved in the workflow process:

  1. Gather data:

As mentioned earlier, you can get data via open source distribution channels. We will explore the resources from where you can get the datasets. Here, We will use Reuters datasets.

Topic classification:

  • Reuters news datasets

It is composed of 11,228 newswires from Reuters which is classified into 46 different categories such as politics, sports, economics, etc. We have to import these datasets from Keras. After importing, its feature dataset and label dataset are individually stored in two tuples. Each tuple contains both training and testing portions. 

You can import Reuters dataset from Keras-

  • 20 NewsGroups-  

It is another dataset source, consisting of approx. 20,000 documents across 20 different topics.

Sentiment Analysis:

For sentiment analysis, there are various sources of datasets such as-

  • Amazon Product Reviews
  • IMDB Reviews
  • Twitter Airline Sentiment
  1. Explore Dataset

You need to load your dataset from its source location. We will load the dataset from  Reuters.

Next, you need to load training and testing data. In this example, we have used length function to get the length of the words and number of classes present in the dataset.

Choose a model:

Now, we have to choose a model. Let's move further to know how to choose a model.

Algorithm for data preparation and data modeling

1. Calculate the number of samples/number of words per sample ratio.

2. If this ratio is less than 1500, tokenize the text as n-grams and use a

simple multi-layer perceptron (MLP) model to classify them (left branch in the

flowchart below):

  a. Split the samples into word n-grams; convert the n-grams into vectors.

  b. Score the importance of the vectors and then select the top 20K using the scores.

  c. Build an MLP model.

3. If the ratio is greater than 1500, tokenize the text as sequences and use a

   sepCNN model to classify them (right branch in the flowchart below):

  a. Split the samples into words; select the top 20K words based on their frequency.

  b. Convert the samples into word sequence vectors.

  c. If the original number of samples/number of words per sample ratio is less than 15K, using a fine-tuned pre-trained embedding with the sepCNN.

     The model will likely provide the best results.

4. Measure the model performance with different hyperparameter values to find the best model configuration for the dataset.

Let's explain this with a flow chart:

The above flow chart depicts various choices which can be made various options available:

Yellow box-  depicts data and model preparation processes.

Grey box - indicates choices that can be considered for each process.

 Green box- indicates recommended choices for each process.

Grey boxes and green boxes indicate choices we considered for each process. Green boxes indicate our recommended choice for each process.

Prepare Dataset-

In this, you need to remove unwanted elements: capitalization, spaces, slang, and eliminating redundancy.

Tokenization- this means to break down the sentences into words. You can do so by importing tokenizer and with the help of the following code you should use tokenization on both test and train data.

Build, Train and Evaluate your model

Building machine learning models with Keras is all about assembling together layers, data-processing building blocks, much like we would assemble Lego bricks. These layers allow us to specify the sequence of transformations we want to perform on our input. As our learning algorithm takes in a single text input and outputs a single classification, we can create a linear stack of layers using the Sequential model API.

You can build a model by the following:

Train your model and hyper tune

Now we have to train our model and hyper tune parameters by specifying the number of epochs, limiting the data by using validation sets, specifying dropout rate, and learning rate.

You can plot the accuracy and loss graph using matplotlib.

In this way, we can train the model. 

Click here to deploy your AI workloads on E2E GPU Cloud.  

Latest Blogs
This is a decorative image for Project Management for AI-ML-DL Projects
June 29, 2022

Project Management for AI-ML-DL Projects

Managing a project properly is one of the factors behind its completion and subsequent success. The same can be said for any artificial intelligence (AI)/machine learning (ML)/deep learning (DL) project. Moreover, efficient management in this segment holds even more prominence as it requires continuous testing before delivering the final product.

An efficient project manager will ensure that there is ample time from the concept to the final product so that a client’s requirements are met without any delays and issues.

How is Project Management Done For AI, ML or DL Projects?

As already established, efficient project management is of great importance in AI/ML/DL projects. So, if you are planning to move into this field as a professional, here are some tips –

  • Identifying the problem-

The first step toward managing an AI project is the identification of the problem. What are we trying to solve or what outcome do we desire? AI is a means to receive the outcome that we desire. Multiple solutions are chosen on which AI solutions are built.

  • Testing whether the solution matches the problem-

After the problem has been identified, then testing the solution is done. We try to find out whether we have chosen the right solution for the problem. At this stage, we can ideally understand how to begin with an artificial intelligence or machine learning or deep learning project. We also need to understand whether customers will pay for this solution to the problem.

AI and ML engineers test this problem-solution fit through various techniques such as the traditional lean approach or the product design sprint. These techniques help us by analysing the solution within the deadline easily.

  • Preparing the data and managing it-

If you have a stable customer base for your AI, ML or DL solutions, then begin the project by collecting data and managing it. We begin by segregating the available data into unstructured and structured forms. It is easy to do the division of data in small and medium companies. It is because the amount of data is less. However, other players who own big businesses have large amounts of data to work on. Data engineers use all the tools and techniques to organise and clean up the data.

  • Choosing the algorithm for the problem-

To keep the blog simple, we will try not to mention the technical side of AI algorithms in the content here. There are different types of algorithms which depend on the type of machine learning technique we employ. If it is the supervised learning model, then the classification helps us in labelling the project and the regression helps us predict the quantity. A data engineer can choose from any of the popular algorithms like the Naïve Bayes classification or the random forest algorithm. If the unsupervised learning model is used, then clustering algorithms are used.

  • Training the algorithm-

For training algorithms, one needs to use various AI techniques, which are done through software developed by programmers. While most of the job is done in Python, nowadays, JavaScript, Java, C++ and Julia are also used. So, a developmental team is set up at this stage. These developers make a minimum threshold that is able to generate the necessary statistics to train the algorithm.  

  • Deployment of the project-

After the project is completed, then we come to its deployment. It can either be deployed on a local server or the Cloud. So, data engineers see if the local GPU or the Cloud GPU are in order. And, then they deploy the code along with the required dashboard to view the analytics.

Final Words-

To sum it up, this is a generic overview of how a project management system should work for AI/ML/DL projects. However, a point to keep in mind here is that this is not a universal process. The particulars will alter according to a specific project. 

Reference Links:,product%20on%20the%20right%20platform.

This is a decorative image for Top 7 AI & ML start-ups in Telecom Industry in India
June 29, 2022

Top 7 AI & ML start-ups in Telecom Industry in India

With the multiple technological advancements witnessed by India as a country in the last few years, deep learning, machine learning and artificial intelligence have come across as futuristic technologies that will lead to the improved management of data hungry workloads.


The availability of artificial intelligence and machine learning in almost all industries today, including the telecom industry in India, has helped change the way of operational management for many existing businesses and startups that are the exclusive service providers in India.


In addition to that, the awareness and popularity of cloud GPU servers or other GPU cloud computing mediums have encouraged AI and ML startups in the telecom industry in India to take up their efficiency a notch higher by combining these technologies with cloud computing GPU. Let us look into the 7 AI and ML startups in the telecom industry in India 2022 below.


Top AI and ML Startups in Telecom Industry 

With 5G being the top priority for the majority of companies in the telecom industry in India, the importance of providing network affordability for everyone around the country has become the sole mission. Technologies like artificial intelligence and machine learning are the key digital transformation techniques that can change the way networks rotates in the country. The top startups include the following:


Founded in 2021, Wiom is a telecom startup using various technologies like deep learning and artificial intelligence to create a blockchain-based working model for internet delivery. It is an affordable scalable model that might incorporate GPU cloud servers in the future when data flow increases. 


As one of the companies that are strongly driven by data and unique state-of-the-art solutions for revenue generation and cost optimization, TechVantage is a startup in the telecom industry that betters the user experiences for leading telecom heroes with improved media generation and reach, using GPU cloud online


As one of the strongest performers is the customer analytics solutions, Manthan is a supporting startup in India in the telecom industry. It is an almost business assistant that can help with leveraging deep analytics for improved efficiency. For denser database management, NVIDIA A100 80 GB is one of their top choices. 


Just as NVIDIA is known as a top GPU cloud provider, NetraDyne can be named as a telecom startup, even if not directly. It aims to use artificial intelligence and machine learning to increase road safety which is also a key concern for the telecom providers, for their field team. It assists with fleet management. 

KeyPoint Tech

This AI- and ML-driven startup is all set to combine various technologies to provide improved technology solutions for all devices and platforms. At present, they do not use any available cloud GPU servers but expect to experiment with GPU cloud computing in the future when data inflow increases.



Actively known to resolve customer communication, it is also considered to be a startup in the telecom industry as it facilitates better communication among customers for increased engagement and satisfaction. 


An AI startup in Chennai, Facilio is a facility operation and maintenance solution that aims to improve the machine efficiency needed for network tower management, buildings, machines, etc.


In conclusion, the telecom industry in India is actively looking to improve the services provided to customers to ensure maximum customer satisfaction. From top-class networking solutions to better management of increasing databases using GPU cloud or other GPU online services to manage data hungry workloads efficiently, AI and MI-enabled solutions have taken the telecom industry by storm. Moreover, with the introduction of artificial intelligence and machine learning in this industry, the scope of innovation and improvement is higher than ever before.




This is a decorative image for Top 7 AI Startups in Education Industry
June 29, 2022

Top 7 AI Startups in Education Industry

The evolution of the global education system is an interesting thing to watch. The way this whole sector has transformed in the past decade can make a great case study on how modern technology like artificial intelligence (AI) makes a tangible difference in human life. 

In this evolution, edtech startups have played a pivotal role. And, in this write-up, you will get a chance to learn about some of them. So, read on to explore more.

Top AI Startups in the Education Industry-

Following is a list of education startups that are making a difference in the way this sector is transforming –

  1. Miko

Miko started its operations in 2015 in Mumbai, Maharashtra. Miko has made a companion for children. This companion is a bot which is powered by AI technology. The bot is able to perform an array of functions like talking, responding, educating, providing entertainment, and also understanding a child’s requirements. Additionally, the bot can answer what the child asks. It can also carry out a guided discussion for clarifying any topic to the child. Miko bots are integrated with a companion app which allows parents to control them through their Android and iOS devices. 

  1. iNurture

iNurture was founded in 2005 in Bengaluru, Karnataka. It provides universities assistance with job-oriented UG and PG courses. It offers courses in IT, innovation, marketing leadership, business analytics, financial services, design and new media, and design. One of its popular products is KRACKiN. It is an AI-powered platform which engages students and provides employment with career guidance. 

  1. Verzeo

Verzeo started its operations in 2018 in Bengaluru, Karnataka. It is a platform based on AI and ML. It provides academic programmes involving multi-disciplinary learning that can later culminate in getting an internship. These programmes are in subjects like artificial intelligence, machine learning, digital marketing and robotics.

  1. EnglishEdge 

EnglishEdge was founded in Noida in 2012. EnglishEdge provides courses driven by AI for getting skilled in English. There are several programmes to polish your English skills through courses provided online like professional edge, conversation edge, grammar edge and professional edge. There is also a portable lab for schools using smart classes for teaching the language. 

  1. CollPoll

CollPoll was founded in 2013 in Bengaluru, Karnataka. The platform is mobile- and web-based. CollPoll helps in managing educational institutions. It helps in the management of admission, curriculum, timetable, placement, fees and other features. College or university administrators, faculty and students can share opinions, ideas and information on a central server from their Android and iOS phones.

  1. Thinkster

Thinkster was founded in 2010 in Bengaluru, Karnataka. Thinkster is a program for learning mathematics and it is based on AI. The program is specifically focused on teaching mathematics to K-12 students. Students get a personalised experience as classes are conducted in a one-on-one session with the tutors of mathematics. Teachers can give scores for daily worksheets along with personalised comments for the improvement of students. The platform uses AI to analyse students’ performance. You can access the app through Android and iOS devices.

  1. ByteLearn 

ByteLearn was founded in Noida in 2020. ByteLean is an assistant driven by artificial intelligence which helps mathematics teachers and other coaches to tutor students on its platform. It provides students attention in one-on-one sessions. ByteLearn also helps students with personalised practice sessions.

Key Highlights

  • High demand for AI-powered personalised education, adaptive learning and task automation is steering the market.
  • Several AI segments such as speech and image recognition, machine learning algorithms and natural language processing can radically enhance the learning system with automatic performance assessment, 24x7 tutoring and support and personalised lessons.
  • As per the market reports of P&S Intelligence, the worldwide AI in the education industry has a valuation of $1.1 billion as of 2019.
  • In 2030, it is projected to attain $25.7 billion, indicating a 32.9% CAGR from 2020 to 2030.

Bottom Line

Rising reliability on smart devices, huge spending on AI technologies and edtech and highly developed learning infrastructure are the primary contributors to the growth education sector has witnessed recently. Notably, artificial intelligence in the education sector will expand drastically. However, certain unmapped areas require innovations.

With experienced well-coordinated teams and engaging ideas, AI education startups can achieve great success.

Reference Links:

Build on the most powerful infrastructure cloud

A vector illustration of a tech city using latest cloud technologies & infrastructure