What is interesting about automatic machine learning systems? What AutoML frameworks are suitable for your IT infrastructure? What are the limitations so far? We will answer these questions in the article.

In this post:

  1. What is Automatic Machine Learning
  2. Machine Learning Frameworks
  3. Current Limitations Of Automatic Machine Learning

What is Automatic Machine Learning

Supposing there is a data set by which we want to obtain a predictive model. The traditional machine learning approach requires the following sequence of actions:

  • preliminary data processing;
  • determination of characteristic features of the construction of new features;
  • choosing the right learning model;
  • optimization of hyperparameters;
  • training with optimal parameters.

The process can be long and, therefore, expensive. Indeed, for a better result, it is necessary to test the hypothesis repeatedly, moreover, at each step it can be refined further.

The task of automatic machine learning (AutoML) is to automate all or at least some of these steps without losing predictive accuracy. The ideal AutoML strategy assumes that any user can take raw data, build a model on it, and get predictions with the best possible (for the available sample) accuracy.

MLOps Benefits that make it an industry trend

More on the topic

MLOps Benefits that make it an industry trend

Find out why everyone cares about MLOps that much and why you should use it

Show me

But does this mean that the day will come when there is no need for data analysis specialists? Of course not. AutoML technologies are aimed at eliminating the routine sequence of operations and manual enumeration of models so that experts could devote more time to the creative side of the issue.

Consider the “conveyor” of machine learning described above. Each stage requires its own approach. For example, to prepare data, it may be necessary to automate:

  • determining the type of columns (numeric data, text, Boolean values, etc.);
  • the semantic content, for example, if the field is text, then what it represents: last name, date, geotag, etc.;
  • task detection: cluster allocation, ranking, etc.

Particular attention is paid to the process of finding the best model hyperparameters. The two most common methods for finding them are:

  • Grid search.
  • Random search (random search).

Obviously, the popularity of these methods is explained by the ease of implementation. Both methods are justified only for a small number of hyperparameters. Other algorithms are used to optimize parameters: Bayesian optimization, simulated annealing, evolutionary algorithms, etc. Let our AI specialists consider in more detail the frameworks that allow you to find a suitable model and configure its parameters.

Is your business ready to use AI?

Do you know?

Is your business ready to use AI?

In the near future, we will completely get rid of manual processing of information - it will be replaced by autonomous systems capable of processing huge amounts of information with high speed.

Find out more

Machine Learning Frameworks

MLBOX

MLBoxMLBox Framework managed to prove itself well. MLBox solves the following tasks:

  • Data preparation (the most developed part of the library)
  • Model selection
  • Hyper Parameter Search

Among the shortcomings, we note that on Linux the system is much easier to install than on a Mac or Windows.

AUTO-SKLEARN

AUTO-SKLEARN

As the name implies, the Auto Sklearn framework is built on top of the popular scikit-learn machine learning library. What Auto Sklearn can do:

  • Characterization (a distinctive feature of the framework)
  • Model selection
  • Hyper Settings

Auto Sklearn does a good job with small data sets, but doesn’t “digest” large datasets.

TPOT

TPOTTPOT is positioned as a framework in which the machine learning pipeline is fully automated. To find the optimal model, a genetic algorithm is used. Many different models are being built with the choice of the best in predictive accuracy.

Like Auto Sklearn, this framework is an add-on for scikit-learn. But TPOT has its own regression and classification algorithms. Disadvantages include the inability of TPOT to interact with natural language and categorical lines.

H2O

H2O.ai

H2O AutoML supports both traditional machine learning models and neural networks. Especially suitable for those who are looking for a way to automate deep learning.

AUTO KERAS

AUTO KERAS

Auto Keras follows the design of the classic scikit-learn API, but uses a powerful neural network search for model parameters using Keras.

GOOGLE CLOUD AUTOML

GOOGLE CLOUD AUTOMLCloud AutoML uses a neural network architecture. This Google product has a simple user interface for learning and deploying models.

However, the platform is paid, and in the long run it makes sense to use it only in commercial projects. On the other hand, Cloud AutoML with restrictions is available free of charge for research purposes throughout the year.

UBER LUDWIG

LUDWIG

The goal of the Uber Ludwig project is to automate the deep learning process with a minimal amount of code. This framework only works with deep learning models, ignoring other ML models. And, of course, as is usually the case with Deep Learning, the amount of data plays a significant role.

TransmogrifAI

TransmogrifAI

It’s an AutoML library for structured data written in Scala that runs on top of Apache Spark

The reason it was developed is to accelerate machine learning developer productivity with the help of ML automation and an API that enforces reuse, modularity, compile-time type-safety and transparency. So almost 100x reduced in time it achieves close to hand-tuned accuracy. 

AutoGluon

The AutoGluonAn AutoML tool, which uses only one line of Python code to train really precise machine learning models on unprocessed tabular datasets like CSV files. While other AutoML frameworks are focusing on model/hyperparameter selection, AutoGluon gets to the point by assembling several models and stacking them in various layers. It’s designed by the principles of simplicity, robustness, fault tolerance and timing that can be predicted. Tabular Prediction Image Prediction, Object Detection, Text Prediction, Multimodal Prediction - these all can be done with the help of AutoGluon. 

AutoWeka

AutoWekaAutoWeka is the data mining software. It is based on the Weka machine learning package. It is designed for both novices and experts, being both extremely user-friendly and supplied with powerful features. This software makes helps to rapidly develop predictive data mining models using two machine learning algorithms (i.e. artificial neural network and support vector machine). 

DataRobot

DataRobot

If you need to automate, assure, and accelerate predictive analytics, helping data scientists and analysts build and deploy accurate predictive models in a fraction of the time required by other solutions, - this ML platform is for you. If you are experienced in the field and need advanced features, it provides you an access to constantly-growing library of the latest algorithms, pre-built prototypes for data preparation and feature extraction, automated ensembling. New data scientists get plenty of the algorithms and parameter values, which eliminates your time for trial-and-error guesswork. The businesses win with DataRobot, as it reduces cost, time and risks while expanding predictive analytics for better decisions. 

Splunk

splunkThis is a software platform. which helps you search, analyze and visualize the data gathered from the websites, sensors, devices, applications etc, which make up your business and IT infrastruture. The biggest selling point of splunk is its real time processing. You have probably noticed that storage devices become better over the years and processors become more efficient but not data movement. So Splunk solves this problem. With this platform you can get Alerts / Events notification at the onset of a machine state, accurately predict the resources needed for scaling up the infrastructure and create knowledge objects for Operational Intelligence. And that’s just the top of the iceberg. 

Amazon Lex 

amazonLEX

This gives you the ability to build applications using a speech or text interface which is powered by the same technology Amazon Alexa is powered by. “Amazon Lex is a fully managed artificial intelligence (AI) service with advanced natural language models to design, build, test, and deploy conversational interfaces in applications.” - says the official source

AI in FinTech: Use cases of AI and ML in Fintech

More real facts

AI in FinTech: Use cases of AI and ML in Fintech

More than 90% of global fintech companies are already relying heavily on artificial intelligence and machine learning. Do you know how?

Let's see

Current Limitations Of Automatic Machine Learning

So, AutoML is already pretty good at teaching with the teacher, that is, with high-quality labeled data. But so far he is not able to solve the problems of learning without a teacher or with reinforcements. The latter causes difficulties for the implementation of such scenarios as artificial intelligence of a robot located in the real world or an opponent in the game.

A rare example of a successful implementation of reinforced learning is AlphaZero, developed by DeepMind. On her example, the possibility of improving the quality of the game in Go during training in which artificial intelligence competed with itself was shown.

AutoML technologies also still have difficulties in processing complex raw data and optimizing the process of constructing new features (feature engineering). For this reason, the selection of significant features remains one of the cornerstones of the model learning process.

However, progress has been observed in all of these areas, which is accelerating with the increasing number of AutoML contests.

Share your list of must-have machine learning frameworks, writing us to info@geniusee.com.