The concept of optimisation is integral to machine learning. Most machine learning models use training data to learn the relationship between input and output data. The models can then be used to make predictions about trends or classify new input data. This training is a process of optimisation, as each iteration aims to improve the model’s accuracy and lower the margin of error.
Optimisation is a theme that runs through every step of machine learning. This includes a data scientist optimising and refining labelled training data or the iterative training and improvement of models. At its core, the training of a machine learning model is an optimisation problem, as the model learns to perform a function in the most effective way. The most important part of machine learning optimisation is the tweaking and tuning of model configurations or hyperparameters.
Hyperparameters are the elements of the model set by the data scientist or developer. It includes elements like the learning rate or number of classification clusters, and is a way of refining a model to fit a specific dataset. In contrast, parameters are elements developed by the machine learning model itself during training. Selecting the optimal hyperparameters is key to ensuring an accurate and efficient machine learning model.
Machine learning optimisation can be performed by optimisation algorithms, which use a range of techniques to refine and improve the model. This guide explores optimisation in machine learning, why it is important, and includes examples of optimisation algorithms used to improve model hyperparameters.
What is machine learning optimisation?
Machine learning optimisation is the process of iteratively improving the accuracy of a machine learning model, lowering the degree of error. Machine learning models learn to generalise and make predictions about new live data based on insight learned from training data. This works by approximating the underlying function or relationship between input and output data. A major goal of training a machine learning algorithm is to minimise the degree of error between the predicted output and the true output.
Optimisation is measured through a loss or cost function, which is typically a way of defining the difference between the predicted and actual value of data. Machine learning models aim to minimise this loss function, or lower the gap between prediction and reality of output data. Iterative optimisation will mean that the machine learning model becomes more accurate at predicting an outcome or classifying data.
The process of cleaning and preparing training data can be framed as a step to optimise the machine learning process. Raw, unlabelled data must be transformed into training data that can be utilised by a machine learning model. However, generally what is understood as machine learning optimisation is the iterative improvement of model configurations called hyperparameters.
Hyperparameters are configurations set by the data scientist, and are not built by the model from the training data. They need to be effectively tuned to complete the specific task of the model in the most efficient way. Hyperparameters are the configurations set to realign a model to fit a specific use case or dataset, and are tweaked to align the model to a specific goal or task.
Hyperparameters are set by the designer of the model and may include elements like the rate of learning, structure of the model, or count of clusters used to classify data. This is different to the parameters developed during machine learning training such as the data weighting, which change relative to the input training data. Hyperparameter optimisation means the machine learning model can solve the problem it was designed to solve as efficiently and effectively as possible.
Optimising the hyperparameters are an important part of achieving the most accurate model. The process can be described as hyperparameter tuning or optimisation. The aim is to achieve maximum accuracy and efficiency, and minimum errors.
How do you optimise a machine learning model?
Tuning or optimising hyperparameters allows the model to be adopted for specific use cases and with different datasets. The setting of hyperparameters happens prior to machine learning deployment. The effect of specific hyperparameters on the model performance may not be known, so a process to test and refine hyperparameters is often required. Historically this may have been done manually, through a process of trial and error. However, this is a time consuming and often resource-intense process when done manually.
Instead, optimisation algorithms are used to identify and deploy the most effective configurations and combinations of hyperparameters. This ensures the structure and configuration of the model is as effective as possible to complete its assigned task or goal. There are a range of machine learning optimisation techniques and algorithms in use. These algorithms and techniques streamline or automate the discovery and testing of different model configurations. These optimised configurations aim to improve the accuracy of the model and lower the margin of error.
Examples of some of the main approaches to machine learning optimisation include:
- Random searches and grid searches
- Evolutionary optimisation
- Bayesian optimisation
Random searches and grid searches
Random searches and grid searches are examples of the most straightforward approaches to hyperparameter machine learning optimisation. The idea is that each hyperparameter configuration is represented by a different dimension point in a grid. Optimisation is the process of searching these dimensions to identify the most effective hyperparameter configurations. However, the process and utility of random searches and grid searches differ. Random searches are used to discover new and effective combinations of hyperparameters, as the sample is randomised. Grid searches are used to assess known hyperparameter values and combinations as each point in the grid is searched and evaluated.
Random searches is the process of randomly sampling different points or hyperparameter configurations in the grid. This helps to identify new combinations of hyperparameter values for the most effective model. The developer will set the amount of iterations to be searched, to limit the number of hyperparameter combinations. Otherwise the process can take a long time without being limited.
Grid searches as an approach often used to evaluate known hyperparameter values. Different hyperparameter values are plotted as dimensions on a grid. Whereas random searches are usually used to discover new configuration optimisations, grid searches are used to assess the effectiveness of known hyperparameter combinations.
Evolutionary optimisation
Evolutionary optimisation algorithms optimise models by mimicking the selection process within the natural world, such as the process of natural selection or genetics. Each iteration of a hyperparameter value is assessed and combined with other high scoring hyperparameter values to form the next iteration. Hyperparameter values will be altered each interaction as a ‘mutation’ before the most effective choices are recombined. Each iteration therefore improves and becomes more effective through each ‘generation’, as it is optimised.
Other approaches within evolutionary optimisation include genetic algorithms. In this process different hyperparameters are paired up after being scored for the most valuable or effective. The process continues using the resulting configuration within the next generation of tests and evaluation. Evolutionary optimisation techniques are often used to train neural networks or artificial intelligence models.
Bayesian optimisation
Bayesian optimisation is an iterative approach to machine learning optimisation. Instead of mapping all known hyperparameter configurations on a grid as in random searches and grid searches approach, bayesian optimisation is more focused. Analysis of hyperparameter combinations happens in sequence, with previous results informing the refinements in the next experiment. As the model concentrates on the most valuable areas of hyperparameters the focus on, the model improves with each step. Each iteration focuses on selecting hyperparameters in light of the target functions, so the model understands which areas of the distribution will bring the most benefit. This focuses resources and time on the optimisation of hyperparameters to meet specific functions.
Why is optimisation important in machine learning?
Optimisation sits at the very core of machine learning models, as algorithms are trained to perform a function in the most effective way. Machine learning models are used to predict the output of a function, whether that’s to classify an object or predict trends in data. The aim is to achieve the most effective model which can accurately map inputs to expected outputs. The process of optimisation aims to lower the risk of errors or loss from these predictions, and improve the accuracy of the model.
Machine learning models are often trained on local or offline datasets which are usually static. Optimisation improves the accuracy of predictions and classifications, and minimises error. Without the process of optimisation, there would be no learning and development of algorithms. So the very premise of machine learning relies on a form of function optimisation.
The process of optimising hyperparameters is vital to achieving an accurate model. The selection of the right model configurations have a direct impact on the accuracy of the model and its ability to achieve specific tasks. However, hyperparameter optimisation can be a difficult task. It is important to get right, as over-optimised and under-optimised models are both at risk of failure.
The wrong hyperparameters may cause either under fitting or over fitting within machine learning models. Overfitting is when a model is trained too closely to training data, meaning it is inflexible and inaccurate with new data. Machine learning models aim for a degree of generalisation, to be useful in a dynamic environment with new datasets. Overfitting is a barrier to this and makes machine learning models inflexible.
Under fitting a model means to poorly train a model, so it is ineffective with both training data and new data. Under fitted models will be inaccurate even with the training data, so will need to be further optimised before machine learning deployment.
Machine learning for every organisation
Seldon moves machine learning from POC to production to scale, reducing time-to-value so models can get to work up to 85% quicker. In this rapidly changing environment, Seldon can give you the edge you need to supercharge your performance.
With Seldon Deploy, your business can efficiently manage and monitor machine learning, minimise risk, and understand how machine learning models impact decisions and business processes. Meaning you know your team has done its due diligence in creating a more equitable system while boosting performance.
Deploy machine learning in your organisations effectively and efficiently. Talk to our team about machine learning solutions today.