How much of your working time do you spend on boring, routine tasks? Data scientists and analysts have to do a lot of such work, from time-consuming data preparation to enumeration of methods and algorithms that will work best on this data. And it doesn'’t matter whether we are talking about key business tasks or auxiliary ones — most of the time is always spent on routine.

If you need to speed up processes and allow specialists to focus on the most important tasks, you should use AutoML frameworks. What is interesting about automatic machine learning systems? What frameworks are suitable for AutoML? What are the limitations so far? We will answer these questions in the article.

In this article:

  1. What is AutoML?
  2. The concept of Automatic Machine Learning
  3. Why use AutoML frameworks?
  4. Use cases for AutoML
  5. Automatic Machine Learning Framework
  6. Conclusion

What is AutoML?

In simple technical terms, the AutoML framework automates the selection, construction, and parameterization of MO models. Simply put, AutoML provides methods and processes to accelerate exploration and prediction.

The skyrocketing demand for AI projects, coupled with a shortage of AI experts, means that complex tasks have to be left for automation. However, AutoML is not a general tool for managing model performance and cannot be used to analyze the resulting data.

One example of an AutoML constraint is the hill climbing algorithm, where the model is tasked with finding a globally optimal outcome or solution. An AutoML model will often only work until it reaches the top of the first ''hill'' - a local maximum. While it looks like you've found a solution, the data science professionals know that you may not be on the biggest hill, and as the model expands further, it will become less and less accurate. A skilled person can help quickly expand the model and find the global optimal maximum.

Extensive training and testing guarantee the project's long-term viability. Here, the importance of attracting technological expertise—more precisely, data scientists—becomes clear.

The concept of Automatic Machine Learning

Supposing there is a data set by which we want to obtain a predictive model. The traditional machine-learning approach requires the following sequence of actions:

  • data preprocessing;
  • determination of characteristic features of the construction of new features;
  • choosing the right learning model;
  • optimization of hyperparameters;
  • model training with optimal parameters.

The process can be long and, therefore, expensive. Indeed, for a better result, it is necessary to test the hypothesis repeatedly; moreover, at each step, it can be refined further.

The task of automated machine learning is to automate all or at least some of these steps without losing predictive accuracy. The ideal AutoML strategy assumes that any machine learning user can take raw data, build a model on it, and get predictions with the best possible (for the available sample) accuracy.

But does this mean that the day will come when there is no need for data analysis specialists? Of course not. Automated machine learning technologies are aimed at eliminating the routine sequence of operations and manual enumeration of models so that experts can devote more time to the creative side of the issue.

Consider the “conveyor” of machine learning described above. Each stage requires its own approach. For example, to prepare data, it may be necessary to automate:

  • determining the type of columns (numeric data, text, Boolean values, etc.);
  • the semantic content, for example, if the field is text, then what it represents: last name, date, geotag, etc.;
  • task detection: cluster allocation, ranking, etc.

Particular attention is paid to the process of finding the best model hyperparameters. The two most common methods for finding them are:

  • Grid search.
  • Random search (random search).

Obviously, the popularity of these methods is explained by the ease of implementation. Both methods are justified only for a few hyperparameters. Other algorithms are used to optimize parameters: Bayesian optimization, neural architecture search, simulated annealing, evolutionary algorithms, etc. Let us consider in more detail the frameworks that allow you to find a suitable model and configure its parameters.


MLOps Benefits that make it an industry trend

More on the topic

MLOps Benefits that make it an industry trend

Find out why everyone cares about MLOps that much and why you should use it

Show me


Why use AutoML frameworks?

Automation

One of the key benefits of AutoML frameworks is their ability to improve efficiency. Machine learning algorithms can be trained to perform tasks such as data analysis, forecasting, and classification. This can free up time and resources for other important tasks. Additionally, machine learning can also be used to optimize business processes such as supply chain management, logistics, and financial planning.

Improved Decision-Making

Another benefit of automated machine learning is its ability to improve decision-making. Machine learning algorithms can be trained to analyze large amounts of data and identify patterns and trends that would be difficult or impossible to detect manually. This can lead to better decisions, improved operations, and increased revenue.

Deep Learning

Advanced facts about automated machine learning include the use of deep learning algorithms that can learn and improve from unstructured data such as images and videos, and the use of reinforcement learning to optimize decision-making in real-time.

Diversity of Application

AutoML frameworks are used in almost every field, which makes it multifunctional. For example, in medicine, it helps predict disease outbreaks by analyzing huge amounts of data from various sources.

Use cases for AutoML

AutoML models have a wide range of functionality:

  • Image recognition and classification. AutoML frameworks can detect not only specified objects but also images, symbols, etc. Neural networks compare data with a database of images and look for matches. This is actively used in medicine for diagnostics, for example, of oncological diseases based on X-ray images.

  • Text analysis. This is a machine learning method that is used to extract information from unstructured text data. For example, to analyze customer reviews of a brand's products or services on social networks.

  • Time series forecasting. Automated machine learning is actively used to optimize production capacity, inventory levels, stock prices on the stock market, etc.

  • Fraud detection and cybersecurity. For example, detecting illegal banking transactions.

  • Recommender systems. This is an intelligent way for the user to navigate a variety of products, TV series, books, etc. Recommender systems are based on personal preferences and frequently viewed content.

  • Speech processing systems. They are used, for example, for automatic translation into different languages.

  • Development of unmanned vehicles. These are automatic control systems for electric vehicles that allow them to move safely without a driver. The system identifies various obstacles (traffic lights, pedestrians, curbs), as well as the optimal route and speed.

  • Audio data analysis. AutoML frameworks allow you to create conditions for a computer to understand the meaning of human speech.

  • Optimization of all business processes. AutoML systems in various areas of industry, production, and trade make it possible to manage many key aspects of activities. This is quality control of manufactured products, automation of routine processes, partial or complete replacement of human resources, minimization of downtime, accidents, etc.

  • Processing of medical data. Machine learning allows you to diagnose cancer or diabetes by analyzing medical data and identifying specified features.

Genetic programming. AutoML solutions are also used in genetic programming to automate the design and optimization of algorithms. It automatically tunes parameters, selects features, and discovers optimal models, thereby streamlining the evolutionary process.

Automatic Machine Learning Framework

1. TransmogrifAI

TransmogrifAI autoML

TransmogrifAI is a library built on the Scala language and the SparkML framework and it achieves this goal. With just a few lines of code, data scientists can perform automated data cleansing, feature engineering, and model selection to get a model with good performance and then perform further exploration and iteration.
TransmogrifAI includes five main components of machine learning:

  • function derivation;
  • transmogrification (i.e., automatic feature engineering);
  • automatic function check;
  • automatic model selection;
  • hyperparameter tuning & optimization.

2. AutoGluon

AutoGluon autoML

AutoGluon is an open library for machine learning application developers from Amazon Web Services that makes it easy to use and easily extend Automated Machine Learning. It allows you to achieve the highest accuracy of forecasts using modern deep learning methods without special knowledge. It's also a quick way to prototype what you can achieve with the dataset, as well as get a starting foundation for your machine learning.

AutoGluon can:

  • create models designed to classify images and text;
  • object recognition;
  • tabular forecasting.

Also, this tool contains a programming interface for advanced software engineers interested in delving into the model parameters themselves.

3. MLJAR

MLJar autoML

MLJAR is a browser-based platform for rapidly building and deploying machine learning models. It has an intuitive interface and allows you to train in parallel. It has built-in Hyperfeit search functionality, making it easier to deploy models. MLJAR provides integration with NVIDIA's CUDA, Python, Tensorflow, etc.

You only need to follow three steps to create a good model:

  • Download your dataset.
  • Train and adjust many machine learning algorithms and select the best algorithm.
  • Use the best predictive model and share your results.

This AutoML framework is currently used for subscription versions. It has a free version and has 0.25 GB of data settings. Limits. It's worth a try.

4. DataRobot

Data Robot autoML

DataRobot is a platform that allows business analysts to build predictive analytics without knowledge of machine learning or programming. The platform uses automated machine learning (AutoML) to build accurate predictive models in a short amount of time.

DataRobot provides a convenient user interface for creating machine learning models. A company can deploy a real-time predictive analytics service powered by an accurate machine learning model in just a few steps.

A huge advantage of DataRobot is the ability to go deeper into the platform and take control of the machine learning workflow. On the one hand, business analysts can use it as a tool, while experienced data scientists can tune many parameters on their own to get even more accurate models.

Features of using DataRobot:

  • DataRobot uses state-of-the-art distributed processing while running experiments in parallel;
  • the solution can be used locally or in the cloud;
  • quickly and easily connects to any data source;
  • DataRobot offers built-in security for role-based fine-grained authorization and supports Kerberos and LDAP protocols.

5. MLBox

ML box autoML

MLBox Framework managed to prove itself well.

MLBox solves the following machine-learning tasks:

  • Data preparation (the most developed part of this AutoML library)
  • Model selection
  • Hyper Parameter Search

Among the shortcomings, we note that on Linux the system is much easier to install than on a Mac or Windows.


Is your business ready to use AI?

Do you know?

Is your business ready to use AI?

In the near future, we will completely get rid of manual processing of information - it will be replaced by autonomous systems capable of processing huge amounts of information with high speed.

Find out more


6. Auto Sklearn

auto sklearn autoML

As the name implies, the Auto Sklearn framework is built on top of the popular scikit-learn machine learning library. What Auto Sklearn can do:

  • Characterization (a distinctive feature of the framework)
  • Model selection
  • Hyper Settings

Auto Sklearn does a good job with small training data sets but doesn't ''digest'' large datasets.

7. TPOT

TPOT autoML

TPOT is positioned as a framework in which the ML pipeline is fully automated. To find the optimal model, a genetic algorithm is used. Many models are being built with the choice of the best in predictive accuracy. 

Like Auto Sklearn, this framework is an add-on for scikit-learn. However, TPOT has its own regression and classification algorithms. Disadvantages include the inability of TPOT to interact with natural language and categorical lines.

8. H2O

H2O. ai autoML

H2O Flow is an interactive web tool that allows you to select data from various sources, visualization, and a seamless environment for model building, forecasting, scoring, and exporting your model. In my opinion, the strength of H2O is distributed in-memory processing.

H2O AutoML is written in Java and supports algorithms commonly used in Data Science, such as GBM, Random Forest, and Stacked Ensembles. H2O works with R, Python, and Scala on Hadoop/Yarn, Spark, or on your laptop.

Advantages of H2O AutoML framework:

  • open source AutoML, distributed (multi-core + multi-node) implementations of advanced ML algorithms;
  • the presence of basic algorithms in high-performance Java. including API in R, Python, Scala, and web interface;
  • easily deployable models for production as pure Java code;
  • easily works on Hadoop, Spark, AWS, your laptop, etc.

9. Auto Keras

Auto Keras autoML

Auto Keras follows the design of the classic scikit-learn API, but uses a powerful neural network search for model parameters using Keras.

10. Google Cloud AutoML

Google Cloud AutoML Vision

Cloud AutoML uses a neural network architecture. This Google product has a simple user interface for learning and deploying models. 

However, the platform is paid, and in the long run it makes sense to use it only in commercial projects. On the other hand, Cloud AutoML with restrictions is available free of charge for research purposes throughout the year.

11. Uber Ludwig

Uber Ludwig AutoML

The goal of the Uber Ludwig project is to automate modern deep learning systems with a minimal amount of code. This framework only works with deep learning models, ignoring other ML models. And, of course, as is usually the case with Deep Learning, the amount of data plays a significant role.


AI in FinTech: Use cases of AI and ML in Fintech

More real facts

AI in FinTech: Use cases of AI and ML in Fintech

More than 90% of global fintech companies are already relying heavily on artificial intelligence and machine learning. Do you know how?

Let's see


Conclusion

So, AutoML is already pretty good at teaching with the teacher, with high-quality labeled data. But so far, he has not been able to solve the problems of learning without a teacher or with reinforcements. The latter causes difficulties in the implementation of such scenarios as the artificial intelligence of a robot located in the real world or an opponent in the game.

Comparing AutoML frameworks, there are some rare examples of successful implementation of reinforced learning, such as AlphaZero, developed by DeepMind. We can see in their example the possibility of improving the game quality in Go during training in which artificial intelligence competed with itself.

AutoML frameworks also still have difficulties in processing complex raw data and optimizing the process of constructing new features (feature engineering). For this reason, feature selection remains one of the cornerstones of the model learning process.

However, progress has been observed in all of these areas, which is accelerating with the increasing number of AutoML contests.

Share your list of must-have machine learning frameworks by writing us at info@geniusee.com ;)