Machine learning's central task is to fit a model to the data and find several parameters from this data. Building an accurate model in machine learning necessitates the use of an ideal set of hyperparameters and then tuning it. It is common practice to carefully tune learning parameters and model hyperparameters. Unfortunately, this tuning is sometimes a "black art" that calls for specialized knowledge, common sense, or even brute-force search.
Therefore, automated methods that can enhance the effectiveness of any given learning algorithm with respect to the issue at hand are highly appealing. Multiple hyperparameter optimization strategies are frequently utilized to determine an optimal collection of hyperparameters. For this objective, several approaches such as grid search, random search, and so on are used.
Generally, the hyperparameter optimization process is carried out extremely slowly. For speeding up this process, Bayesian Hyperparameter Optimization can be utilized, particularly in the case of machine learning models.
In this blog, we will look at how to use the Bayesian technique to optimize hyperparameters in machine learning models. We will start with understanding a few basic concepts regarding hyperparameters and then will move on to advanced concepts.
Table of Content:
- What are Hyperparameters?
- Hyperparameter Optimization
- Bayesian Optimization
- Mathematical representation of Bayesian Optimization
- Bayesian Optimization Algorithm
- Benefits of Using Bayesian Optimization
- Conclusion
What are Hyperparameters?
Hyperparameters are parameters that cannot be learned directly from the training process and must be preset. These hyperparameters specify higher-level model notions like complexity, learning capacity, rate of convergence, penalty, and so on. The best hyperparameters result in more efficiency, faster convergence, and overall better outcomes.
In a nutshell, hyperparameters are knobs or turns that result in a stronger statistical learning model. Hyperparameters are also known as free parameters and meta parameters.
Hyperparameter Optimization
Hyperparameter optimization seeks a set of hyperparameters that results in an optimum model to decrease a preset loss function and as a result, improve accuracy on provided independent data.
Hyperparameter is significant since the performance of any Machine Learning Algorithm is heavily dependent on the values of hyperparameters. Because hyperparameters are defined before any Machine learning algorithm is performed, it is critical to choose an ideal value of hyperparameters as it greatly influences the convergence of any algorithm.
As already told in the introduction part of this blog, there are many hyperparameter optimization techniques such as grid search, random search, and bayesian search. Though the scope of this blog is limited to Bayesian Hyperparameter Optimization only. For a brief understanding of other optimization techniques, we might bring separate dedicated blogs for each of them.
Bayesian Optimization
Bayesian optimization is a type of sequential model-based optimization (SMBO) technique that allows us to enhance our sampling approach for future experiments by using the outcomes of the previous iteration. Bayesian optimization technique can assist in determining the parameters that will be used to assess our goal function utilising the Gaussian process, where the Gaussian model will aid in understanding the structure of the objective function.
By selecting the proper set of parameters for the function from the parameters space, the Bayesian method optimises the objective function whose structure is known from the Gaussian model. The procedure continues to examine the parameter set until it discovers the stopping condition for convergence.
As you iterate, the algorithm balances its exploration and exploitation demands while taking into consideration what it learns about the target function. At each step, a Gaussian Process is fitted to the known samples (previously investigated points), and the posterior distribution, in conjunction with an exploration strategy or EI (Expected Improvement), is used to decide the next point to be examined.
Mathematical representation of Bayesian Optimization
The purpose of bayesian optimization is to discover the greatest value at the sampling point for an unknown function f:
where A represents the x search space. Bayesian optimization is based on Bayes' theorem, which states that given evidence data E, the posterior probability P(M|E) of a model M is proportional to the likelihood P(E|M) of overserving E multiplied by the prior probability P(M). The formula below captures the essence of Bayesian optimization.
Bayesian Optimization Algorithm
Bayesian approaches aim to construct a function (or, more precisely, a probability distribution over alternative functions) that assesses how good your model is for a given set of hyperparameters. You don't have to go through the set, train, and evaluate loop as many times if you use this approximation function (called a surrogate function), because you can just tune the hyperparameters to the surrogate function.
A Gaussian process generates the surrogate function (note: there are numerous ways to represent the surrogate function, but I'll choose a Gaussian process). All of this stuff about Bayesians and Gaussians comes down to this:
The left side indicates that a probability distribution is involved. Looking within the brackets on the left side, we can see that it's a probability distribution of Fn(X), which might be any function.
Why?
Remember, we're creating a probability distribution for all conceivable functions, not just one. In essence, the left side states that the chance that the correct function that translates hyperparameters to model metrics (such as validation accuracy, log-likelihood, test error rate, and so on) is F(X), given certain sample data X, is equal to whatever is on the right.
The Bayesian optimization algorithm is as follows. For t=1,2,… repeat:
Benefits of Using Bayesian Optimization
Out of the many benefits that bayesian optimization gives, a few of them are listed here.
- The technique employs randomised candidate points, which ensures that tweaking takes as little time as possible.
- The Gaussian process increases its performance.
- Using Bayesian optimization, you may strike a balance between exploration and exploitation. After exploring the parameter space, sampling the points yields an ideal value.
- It does not require the explicit formulation of the function, unlike typical optimization approaches.
- It employs the input as the range of each parameter, which is a better point of the method that aids in the procedure's enhancement.
Conclusion
Automated hyperparameter tweaking is most likely a positive move. It enables ordinary people who do not have a PhD in mathematics to create great machine learning applications. We will conclude this blog now, we hope we did justice to the topic and helped you in getting an understanding of hyperparameters and Bayesian optimization techniques for creating better machine learning models.