Pulling in a Web 2.0 application into production: hosting thoughts

November 5, 2008

Faster, Cheaper and Better choose any two. Hosting/Data center is largely an optimization problem where there are trade-offs involved for every decision you can make. Knowing your choices then becomes very important. In planning for capacity you are limited by your slowest components. First make an informed guess if its CPU/Memory/Disk IO or Bandwidth bound based on measurements in your Load testing lab which can give you hints about what might be your slower components. According to me site typically needs to have 1. Raw bit pushing capability, how fast can you render the content to the browser. That is what your users care about at the end of the day. a) Your small sized static content hosted(flash, javascript, CSS, images) as close as possible to the end users, as the request-response time is nearly equal to latency of your site from an end user. ( Hint: buy services of a CDN which has servers geographically closer ) b) Larger blobs of content like progressive video downloads and like can be and should be hosted wherever bandwidth price is cheapest. Amazon S3 is a good starting point as there is no minimum commitment required there. c) Ajax requests are typically designed to hide latency from a user so ideally it shouldn't matter where in the world your application is hosted. d) HTML rendering , are your pages cached, how many caching servers do you need can be determined by estimating data cached in-memory which would be used by your application for each user 2. Number crunching/backend processing capability, including your database. Your actual web application, middleware and database. Here is where the actual difference lies between hardware capacity requirements of different applications. You should run benchmarks of synthesized traffic from a typical user session replayed concurrently to your load testing servers(hint Perl WWW::Mechanize or Jmeter) . However its impossible to figure out in advance how your end users are actually going to use the site. They might stress that 5% of the code which is not optimized for performance bringing down your site anyway. Load testing doesn't really yeild any useful information simply for the reason that its nearly impossible to create real world situations in a lab(that includes abuse and creative uses of your web application). Estimate how much data processing are you doing with the stats/data collected in your site and how you are feeding the results of that processing to your frontend application. What parts are synchronous/real-time and what parts are near-realtime (batch processing nearing real time hidden behind ajax/flash animations and like ) and what part is truly batch oriented. 3. Setting up a new site is then more about setting up initally with a reasonably sized capacity and be able to react to capex calls by monitoring the usage of bandwidth, CPU, memory and disk IO for each separated out component in the application by its class ( bit-pushing/caching or number crunching). If you have a reasonable budget for capacity then create an initial 4-20 servers(real or on the cloud at one of Amazon EC2 or other VPS based cloud solutions) with 2-4 instances of each component of your web application(outsource the things you wouldn't want to worry like e-mail/DNS/CDN etc. ), get a good quality hardware loadbalancer (or buy shared access to a loadbalancer). And make sure you don't constrain your flexibility in being able to add machines and switching capacity without requiring major physical layout changes. ( Hint: Buy larger switches than you need). 4. Long term goals for operations of a web application are a) Bandwidth costs should decline as you start using more and more of it tending towards a very low(nearly zero) per Megabit cost b) Cost(setup+rental or amortization) of adding physical machines(of the standard chosen configuration) and switching/loadbalancing should increase linearly. c) Geographical scale up by being able to replicate your first datacenter node across the globe. d) No single points of failure as in a atleast two geographical sites, access links for bandwidth at each datacenter node, load balancing, network switching, storage(multi-pathing) and your application components. 5. Start small and choose wisely and tend towards flexibility( aim for lower capex with no lock-in, even it means a higher opex initially) for you'll need to live with limitations created by your initial set of decisions regarding production hosting environment for a long time to come or require a painful and costly migration to another production environment.Sign-up for a free trial here

Latest Blogs
This is a decorative image for Top 7 Visualization tools for data scientists in 2022
June 23, 2022

Top 7 visualisation tools for data scientists in 2022

The emergence of the internet and allied services has generated unfiltered and raw data year after year. However, working with this massive amount of data requires you to sort them and use them to your benefit. In this regard, data visualisation is a technique that needs mentioning.

Owing to the development of various software, performing this task is not a challenge anymore. Moreover, these tools help to create reports that can be understood by non-tech-savvy people as well.

Read on to learn more about various cloud computing software that can help in this regard.

7 Data Visualisation Tools to Know About in 2022

Following are some data visualisation tools that you should know about:

  1. Microsoft Power BI

Microsoft’s cloud computing data analytics suite, Power BI, has evolved from just an earlier Excel plug-in. It was redeveloped as a standalone tool in 2010. Unlike many visualization tools, Power BI integrates data modelling as a feature. You can make interactive visual reports and dashboards easily. It can import data from multiple sources like Excel, text files and SQL servers and websites such as Facebook (Insights) and Google Analytics. Power BI has an impressive range of visualizations like filled maps and heat maps which are customizable as well. There are other visuals like influencer charts. Users can try the free version also.

  1. Plotly

Plotly is a data visualization tool entirely built on Python. It simplifies the process of creating graphics, charts and dashboards. Through APIs, Plotly allows the development of web apps without requiring the knowledge of programming languages like JavaScript, CSS or HTML. But, Plotly has limited support documentation.

  1. Tableau

Tableau requires zero knowledge of coding. It can also handle a large amount of data on a simple drag-and-drop interface. But, the tool is unsuitable for exploratory data analysis. However, it is useful for data analysts who like constructing dashboards for their non-technical staff. But, tableau has certain drawbacks. It is not suitable for machine learning and artificial intelligence tasks and data pre-processing.

  1. D3.js

D3.js is also known as data-driven documents. It is an open-source data library using JavaScript, which involves SVG, HTML5 and CSS. It simplifies the development of web interactive visualizations. D3 also generates great visual outputs like diagrams, charts and product roadmaps. Its web dashboards can work on all browsers. Moreover, it handles nuanced reporting very well. However, D3 cannot be used for other data analytics tasks like data cleaning.

  1. Qlikview

Qlikview generates real-time, custom dashboards that display analytics feature visualizations. It is mainly a business intelligence tool for making interactive pie charts, tables, graphs, and more.

Further, Qlikview integrates with other analytics tools in its ecosystem to extract, transform and load an ETL script editor, which allows you to pull data easily from different sources. These sources include relational databases, Excel spreadsheets, text files, web services and CRM apps like SAP or SalesForce. It also allows data sharing for team collaboration.

  1. Grafana

Grafana helps in generating real-time metrics through its interactive dashboard. It integrates with many different data sources to give smooth, clean visuals that are easy to understand. Its alert functions and plug-in extensions allow the formation of very complex monitoring dashboards. It is extremely helpful in DevOps environments.

Grafana is best suited for non-technical users, but you need some technical knowledge to handle the backend. It is free and open-source. The paid enterprise version includes options like exporting PDF and usage insights and has several auditing tools.

  1. Datawrapper

Datawrapper is a popular chart, mapping and tabling software that requires zero-coding knowledge. It also allows custom layouts through a visual interface. The tool also extracts data from many sources like websites, PDFs, Excel, Google spreadsheets, and CSVs. Additionally, it is easy to use.

To sum up, these are some notable data visualization tools that you can easily access. However, there are more cloud GPU tools that are available in the market to help you in this process. Nevertheless, if you need any services related to data storage, GPUs and other related services, get in touch with E2E Networks for a comprehensive solution.

This is a decorative image for Sentiment Analysis, Applications & Tools.
June 24, 2022

Sentiment Analysis: Analysis, Applications & Tools

Sentiment analysis is a natural language processing (NLP) technique for determining the positivity, negativity, or neutrality of data. Sentiment analysis is frequently used on textual data to assist organizations in tracking brand and product sentiment in consumer feedback and better understanding customer demands. 

Here, we will be discussing- What sentiment analysis is? How to conduct it? Its applications? What tools can you use to do it? 

Table of Content:

  1. What is Sentiment Analysis?
  2. How to conduct sentiment analysis?
  3. Application of Sentiment Analysis:
  4. Conclusion:

What is Sentiment Analysis?

Sentiment analysis is text mining that recognizes and extracts subjective information from the source material, allowing a company to determine the social sentiment of its service, brand, and product while monitoring online conversations. In most cases, however, social media stream analysis is limited to count-based metrics and basic sentiment analysis. This is analogous to only scraping the surface and missing out on those high-value ideas that are just waiting to be found. So, what can a company do to take advantage of the low-hanging fruit?

In sentiment analysis, you may examine text at varying degrees of depth, depending on your objectives. You might, for example, use the average emotional tone of a bunch of reviews to figure out what proportion of people enjoyed your new apparel line. If you want to discover what visitors like and hate about a certain garment and why, or whether they compare it to comparable goods from other companies, you'll need to examine each review phrase for specific elements and keyword usage. Two forms of analysis can be utilized, depending on the scale: coarse-grained and fine-grained. A sentiment can be defined on a document or phrase level using coarse-grained analysis. You can also extract a sentiment in each sentence part via fine-grained analysis.

How to conduct sentiment analysis? 

Sentiment analysis methods and technologies enable you to examine your operations from the perspective of your customers. But how can you get such information out of user-generated data? 

To begin, compile all relevant brand references into a single document. Consider your selection criteria: should these references be restricted in time, utilize just one language, or originate from a specified area, for example- The data must next be prepared for analysis, which includes reading it, removing any non-textual content, correcting grammar errors or typos, and removing all irrelevant items such as information about reviewers, among other things. We can evaluate and extract sentiment from data once it has been prepared. Because dozens, if not hundreds of thousands, of mentions may need to be analyzed, the ideal approach is to use software to automate this time-consuming task. Using commercially available tools and APIs. Various customer experience software gathers input from a variety of sources, provides real-time notifications on mentions, analyzes text, and visualizes the results.

Sentiment analysis is a function of text analysis platforms and tools, and it is merged with AI software that analyses text data to help you rapidly discover how people feel about your brand, product, or service. Sentiment analysis solutions function by automatically identifying the emotion, tone, and urgency in online chats and assigning them a positive, negative, or neutral tag, allowing you to prioritize consumer inquiries. Brandwatch, Lexalytics, Social Searcher, MeaningCloud, Talkwalker, Quick Search, and Rosette are just a handful of the sentiment analysis tools accessible.

Application of Sentiment Analysis:

Customers contact organizations in a variety of ways that make it difficult for employees to remain on top of everything. However, using sentiment analysis software, you may automatically sort your data as it enters your help desk. Let's look at some of the most common sentiment analysis applications:

  1. Social media monitoring: Because they're uninvited, social media posts can contain some of the most candid thoughts on your products, services, and enterprises. You can sift through all of that data in minutes with sentiment analysis tools, analyzing individual emotions and general public sentiment on every social site. Sentiment analysis can identify sarcasm, interpret popular chat acronyms (lol, ROFL, etc. ), and rectify common errors such as misspelled and misused words beyond simple definitions.

  1. Customer support: Due to the enormous volume of requests, diversified themes, and many departments within a firm – not to mention the urgency of each particular request – customer service administration poses numerous obstacles. Sentiment analysis using natural language understanding (NLU) scans ordinary human language for meaning, emotion, tone, and more, much like a person would, to comprehend client demands. To prioritize any important concerns, you may automatically handle customer service requests, online chats, phone calls and emails by emotion.

  1. Brand monitoring and reputation management: One of the most common uses of sentiment analysis in the corporate world is brand monitoring. Bad reviews may quickly accumulate on the internet, and the longer you wait to respond, the worse the problem will get. Negative brand references will be promptly alerted to you using sentiment analysis technologies. Not only that, but you can track the image and reputation of your brand over time or at any specific point in time, allowing you to measure your success. Whether you're looking for information about your brand in news stories, blogs, forums, or social media, you can turn that data into useful data and statistics.

  1. Product analysis: Find out what people are saying about a new product soon after it is released, or go through years of comments you may not have seen before. You may utilize aspect-based sentiment analysis to locate only the information you need by searching keywords for a certain product attribute (interface, UX, functionality). Learn how your target audience perceives a product, which aspects of the product need to be enhanced, and what will make your most valued consumers happy. All of this is possible because of sentiment analysis.

  1. Market and competitor research: For market and competition research, use sentiment analysis. Find out who among your rivals is getting favorable press and how your marketing efforts stack up. Examine the positive language your rivals use to communicate with their clients and incorporate some of it into your own brand message and voice guide.

Conclusion-

With technological advancements, the age of gaining useful insights from social media data has come. Sentiment analysis enables companies to make use of vast volumes of unstructured data to better understand their customers' demands and opinions about their brand. 

Online chats are monitored by businesses in order to enhance their products and services and retain their reputation. The research elevates customer service to a new level. Customer service systems use Sentiment Analysis to categorize incoming inquiries by urgency, letting personnel prioritize the most demanding consumers. Sentiment analysis may also be used for workforce analytics.

If you have not considered using sentiment analysis for crunching your user database, then what are you waiting for?

This is a decorative image for Optimization in Deep Learning- Learn with Examples
June 24, 2022

Optimization in deep learning- Learn with examples

 

Deep learning relies on optimization methods. Training a complicated deep learning model, on the other hand, might take hours, days, or even weeks. The training efficiency of the model is directly influenced by the optimization algorithm's performance. Understanding the fundamentals of different optimization algorithms and the function of their hyperparameters, on the other hand, will allow us to modify hyperparameters in a targeted manner to improve deep learning model performance. 

In this blog, we'll go through some of the most popular deep learning optimization techniques in detail.

Table of Content:

  1. The goal of Optimization in Deep learning

  1. Gradient Descent Deep Learning Optimizer 

  1. Stochastic Gradient Descent Deep Learning Optimizer 

  1. Mini-batch Stochastic Gradient Descent

  1. Adagrad(Adaptive Gradient Descent) Optimizer 

  1. RMSprop (Root Mean Square) Optimizer

  1. Adam Deep Learning Optimizer  

  1. AdaDelta Deep Learning Optimizer

The goal of Optimization in Deep learning-

Although optimization may help deep learning by lowering the loss function, the aims of optimization and deep learning are fundamentally different. The former is more focused on minimizing an objective, whereas the latter is more concerned with finding a good model given a finite quantity of data. Training error and generalization error, for example, vary in that the optimization algorithm's objective function is usually a loss function based on the training dataset, and the purpose of optimization is to minimize training error. Deep learning (or, to put it another way, statistical inference) aims to decrease generalization error. In order to achieve the latter, we must be aware of overfitting as well as use the optimization procedure to lower the training error.

Gradient Descent Deep Learning Optimizer-

Gradient Descent is the most common optimizer in the class. Calculus is used in this optimization process to make consistent changes to the parameters and reach the local minimum. Before you go any further, you might be wondering what a gradient is? 

Consider that you are holding a ball that is lying on the rim of a bowl. When you lose the ball, it travels in the steepest direction until it reaches the bowl's bottom. A gradient directs the ball in the steepest way possible to the local minimum, which is the bowl's bottom.

Gradient descent works with a set of coefficients, calculates their cost, and looks for a cost value that is lower than the current one. It shifts to a lesser weight and updates the values of the coefficients. The procedure continues until the local minimum is found. A local minimum is a point beyond which it is impossible to go any farther.

For the most part, gradient descent is the best option. It does, however, have significant drawbacks. Calculating the gradients is time-consuming when the data is large. For convex functions, gradient descent works well, but it doesn't know how far to travel down the gradient for nonconvex functions.

Stochastic Gradient Descent Deep Learning Optimizer-

On large datasets, gradient descent may not be the best solution. We use stochastic gradient descent to solve the problem. The word stochastic refers to the algorithm's underlying unpredictability. Instead of using the entire dataset for each iteration, we use a random selection of data batches in stochastic gradient descent. As a result, we only sample a small portion of the dataset. The first step in this technique is to choose the starting parameters and learning rate. Then, in each iteration, mix the data at random to get an estimated minimum. When compared to the gradient descent approach, the path taken by the algorithm is full of noise since we are not using the entire dataset but only chunks of it for each iteration.

As a result, SGD requires more iterations to attain the local minimum. The overall computing time increases as the number of iterations increases. However, even when the number of iterations is increased, the computation cost remains lower than that of the gradient descent optimizer. As a result, if the data is large and the processing time is a consideration, stochastic gradient descent should be favored over batch gradient descent.

Mini-batch Stochastic Gradient Descent-

Mini batch SGD straddles the two preceding concepts, incorporating the best of both worlds. It takes training samples at random from the entire dataset (the so-called mini-batch) and computes gradients just from these. By sampling only a fraction of the data, it aims to approach Batch Gradient Descent.

We require fewer rounds because we're utilizing a chunk of data rather than the entire dataset. As a result, the mini-batch gradient descent technique outperforms both stochastic and batch gradient descent algorithms. This approach is more efficient and reliable than previous gradient descent variations. Because the method employs batching, all of the training data does not need to be placed into memory, making the process more efficient. In addition, the cost function in mini-batch gradient descent is noisier than that in batch gradient descent but smoother than that in stochastic gradient descent. Mini-batch gradient descent is therefore excellent and delivers a nice mix of speed and precision.

Mini-batch SGD is the most often utilized version in practice since it is both computationally inexpensive and produces more stable convergence.

Adagrad(Adaptive Gradient Descent) Optimizer -

Adagrad keeps a running total of the squares of the gradient in each dimension, and we adjust the learning rate depending on that total in each update. As a result, each parameter has a variable learning rate (or an adaptive learning rate). Furthermore, when we use the root of the squared gradients, we only consider the magnitude of the gradients, not the sign. We can observe that the learning rate is reduced when the gradient changes rapidly. The learning rate will be higher when the gradient changes slowly. Due to the monotonic growth of the running squared sum, one of Adagrad's major flaws is that the learning rate decreases with time.

RMSprop (Root Mean Square) Optimizer-

Among deep learning aficionados, the RMS prop is a popular optimizer. This might be due to the fact that it hasn't been published but is nonetheless well-known in the community. RMS prop is a natural extension of RPPROP's work. The problem of fluctuating gradients is solved by RPPROP. The issue with the gradients is that some were modest while others may be rather large. As a result, establishing a single learning rate may not be the ideal option. RPPROP adjusts the step size for each weight based on the sign of the gradient. The two gradients are initially compared for signs in this technique.

Adam Deep Learning Optimizer-

To update network weights during training, this optimization approach is a further development of stochastic gradient descent. Unlike SGD, Adam optimizer modifies the learning rate for each network weight independently, rather than keeping a single learning rate for the entire training. The Adam optimizers inherit both Adagrad and RMS prop algorithm characteristics. Instead of using the first moment (mean) like in RMS Prop, Adam employs the second moment of the gradients to modify learning rates. We take the second instance of the gradients to imply the uncentered variance (we don't remove the mean).

AdaDelta Deep Learning Optimizer -

AdaDelta is a more powerful variant of the AdaGrad optimizer. It is based on adaptive learning and is intended to address the major shortcomings of AdaGrad and the RMS prop optimizer. The fundamental disadvantage of the two optimizers mentioned above is that the starting learning rate must be set manually. Another issue is the decreasing learning rate, which eventually becomes infinitesimally tiny. As a result, after a given number of iterations, the model can no longer acquire new information.

Conclusion-

This is a comprehensive explanation of the various optimization methods utilized in Deep Learning. We went through three different types of gradient descent and then moved on to additional optimizer techniques. There is still a lot of work to be done in the field of optimization. 

However, for the time being, it is critical to understand your needs and the type of data you are working with in order to select the finest optimization technique and obtain excellent outcomes.

Build on the most powerful infrastructure cloud

A vector illustration of a tech city using latest cloud technologies & infrastructure