What is interesting about automatic machine learning systems? What frameworks are suitable for AutoML? What are the limitations so far? We will answer these questions in the article.
7 Best Automatic Machine Learning Frameworks 2020.
The concept of automatic machine learning
Supposing there is a data set by which we want to obtain a predictive model. The traditional machine learning approach requires the following sequence of actions:
- preliminary data processing;
- determination of characteristic features of the construction of new features;
- choosing the right learning model;
- optimization of hyperparameters;
- training with optimal parameters.
The process can be long and, therefore, expensive. Indeed, for a better result, it is necessary to test the hypothesis repeatedly, moreover,at each step it can be refined further.
The task of automatic machine learning (AutoML) is to automate all or at least some of these steps without losing predictive accuracy. The ideal AutoML strategy assumes that any user can take raw data, build a model on it, and get predictions with the best possible (for the available sample) accuracy.
But does this mean that the day will come when there is no need for data analysis specialists? Of course not. AutoML technologies are aimed at eliminating the routine sequence of operations and manual enumeration of models so that experts could devote more time to the creative side of the issue.
Consider the “conveyor” of machine learning described above. Each stage requires its own approach. For example, to prepare data, it may be necessary to automate:
- determining the type of columns (numeric data, text, Boolean values, etc.);
- the semantic content, for example, if the field is text, then what it represents: last name, date, geotag, etc.;
- task detection: cluster allocation, ranking, etc.
Particular attention is paid to the process of finding the best model hyperparameters. The two most common methods for finding them are:
- Grid search.
- Random search (random search).
Obviously, the popularity of these methods is explained by the ease of implementation. Both methods are justified only for a small number of hyperparameters. Other algorithms are used to optimize parameters: Bayesian optimization, simulated annealing, evolutionary algorithms, etc. Let us consider in more detail the frameworks that allow you to find a suitable model and configure its parameters.
Machine Learning Framework
MLBox Framework managed to prove itself well. MLBox solves the following tasks:
- Data preparation (the most developed part of the library)
- Model selection
- Hyper Parameter Search
Among the shortcomings, we note that on Linux the system is much easier to install than on a Mac or Windows.
2. Auto Sklearn
As the name implies, the Auto Sklearn framework is built on top of the popular scikit-learn machine learning library. What Auto Sklearn can do:
- Characterization (a distinctive feature of the framework)
- Model selection
- Hyper Settings
Auto Sklearn does a good job with small data sets, but doesn’t “digest” large datasets.
TPOT is positioned as a framework in which the machine learning pipeline is fully automated. To find the optimal model, a genetic algorithm is used. Many different models are being built with the choice of the best in predictive accuracy.
Like Auto Sklearn, this framework is an add-on for scikit-learn. But TPOT has its own regression and classification algorithms. Disadvantages include the inability of TPOT to interact with natural language and categorical lines.
H2O AutoML supports both traditional machine learning models and neural networks. Especially suitable for those who are looking for a way to automate deep learning.
5. Auto Keras
Auto Keras follows the design of the classic scikit-learn API, but uses a powerful neural network search for model parameters using Keras.
6. Google Cloud AutoML
Cloud AutoML uses a neural network architecture. This Google product has a simple user interface for learning and deploying models.
However, the platform is paid, and in the long run it makes sense to use it only in commercial projects. On the other hand, Cloud AutoML with restrictions is available free of charge for research purposes throughout the year.
7. Uber Ludwig
The goal of the Uber Ludwig project is to automate the deep learning process with a minimal amount of code. This framework only works with deep learning models, ignoring other ML models. And, of course, as is usually the case with Deep Learning, the amount of data plays a significant role.
Current limitations of automatic machine learning
So, AutoML is already pretty good at teaching with the teacher, that is, with high-quality labeled data. But so far he is not able to solve the problems of learning without a teacher or with reinforcements. The latter causes difficulties for the implementation of such scenarios as artificial intelligence of a robot located in the real world or an opponent in the game.
A rare example of a successful implementation of reinforced learning is AlphaZero, developed by DeepMind. On her example, the possibility of improving the quality of the game in Go during training in which artificial intelligence competed with itself was shown.
AutoML technologies also still have difficulties in processing complex raw data and optimizing the process of constructing new features (feature engineering). For this reason, the selection of significant features remains one of the cornerstones of the model learning process.
However, progress has been observed in all of these areas, which is accelerating with the increasing number of AutoML contests.
Share your list of must-have machine learning frameworks, writing us to [email protected] ;)