Main Blog

The Path to AI: How to Quickly Acquire the Data Scientist Profession

Many specialists in Artificial Intelligence, Machine Learning, Data Science and Big Data complain that at some point they did not study their profession correctly and spent too much time learning where they could have coped much faster. They could have gained as much knowledge as they have now in less time if they had received this knowledge in the correct sequence.

The good news is that those who are just starting their careers can learn from other people's mistakes today. Very often, beginners who want to get into the profession ask the question: "How to become a data scientist?" The answers are usually not very comforting. You need to go through seven circles of hell, spend several years studying huge textbooks on probability theory, statistics, mathematical analysis, linear algebra, learn programming in Python, algorithms and data structures, SQL query language. And only then the would-be specialist can directly approach machine learning and data analysis, which form the basis of artificial intelligence. And at the end of the preparation - if you still have got a desire - you can study neural networks, computer vision, natural language processing and reinforcement learning.

Indeed, one cannot do without mathematics, and it should be at a fairly high level for any data scientist. Still, the word “scientist” in the name of the profession sets a certain bar. Even if such a specialist does not create machine learning models from scratch, but uses ready-made implementations, then in order to obtain adequate results, he must understand the internal mechanisms of the algorithms used in artificial intelligence.

But does the training really have to be so long? Can a training program be tailored to save time?

Let's consider a typical situation.

Imagine a guy who did a theoretical course in machine learning. Without practice, as usual. This person then went on to the Kaggle site to apply his knowledge in the field..

The first obstacle that will prevent him from creating even the simplest model is the lack of skills in working with Python libraries used in Data Science. Machine learning models in Python tend to have a nice syntax. The basic methods for training and applying models in the most popular implementations are the same - fit and predict. But before reaching the stage of application, you somehow need to read the data, study, process, and only then send it to the model.

As a result, our newcomer realizes that he has gaps in programming, data processing and visualization, which is necessary not only for descriptive statistics, but also for studying data before processing it and applying models. What follows is the realization that he has come to the data analysis competition with complete zero. Although before that there was confidence that he had a clear picture of how to apply machine learning.

As a result, even on Kaggle, nothing seems to work, although the data used here in the competition is presented in the most convenient form and even partially cleared. It is scary to imagine what will happen to such a specialist in a real working situation, when data needs to be collected from dozens of tables located in different databases. And at the same time, it is necessary to observe not only the business logic, but also the requirements for the speed of the programs, limitations on RAM, hard drives and processor resources.

This guy has studied the theory of machine learning, but cannot apply it in practice. Now he begins to study Python libraries for Data Science, that is, to gain practical knowledge that is constantly applied in his work and therefore is not erased from memory. While it fills the gaps in programming and data processing, the knowledge gained earlier (mathematics and machine learning algorithms) does not find application and gradually disappears.

Where is the mistake?

Before mastering abstract, mathematics-related knowledge, it is worth mastering practical skills - being able to program and process data. This kind of thing lasts longer in memory compared to math, which in practice you don’t turn over in your mind all the time. By studying the materials in this order, you can become a real professional faster.

What are employers doing?

Since the labor market lacks well-trained data scientists, employers often have to choose - either take a programmer from their staff and improve his mathematics, or take on a mathematician and wait until he learns how to program.

Which way is better? If we talk from the point of view of knowledge stability, then a programmer in a year or two may well turn out to be a good data scientist, and his experience will be especially valuable when introducing models into production.

In the case of a mathematician, everything is more complicated. A senior student or postgraduate student of mechanics, computational mathematics and cybernetics or another similar faculty, if desired, can easily learn the basics of programming in a couple of months. If by the word "mathematician" we mean a person who has just memorized several dozen mathematical formulas in a theoretical machine learning course, it will be difficult for him. While he is becoming a Junior programmer, abstract knowledge from the world of mathematics will be forgotten. We have just described such a scenario at the beginning of the article.

If the path of training most IT specialists can be represented as a line, then for a data scientist it is several parallel lines at once. You need to be able to program, work with data, know mathematics, machine learning and the subject area. These are the main areas in which you need to constantly improve your level.

For a doctor, for example, it is normal to take a professional three-week course every three years. For a data scientist, this is unacceptable - in three years, both the profession and technology can change dramatically. Therefore, we have to develop constantly and in all directions.

Such development is not immediately available to a novice specialist. To begin with, you need to master something simple - something that in the future will be constantly encountered in work and will not be forgotten. From beginners you can often hear such phrases: “I know how the algorithm works, but I can't run it”, “Why haven't I studied pandas and sklearn before?”, “It turns out that working with matrices in numpy is as easy as shelling peas”. One thing is clear from these phrases: library skills are very important for a data scientist.

Summarizing all of the above, we can draw the following conclusion: the best way to train a beginner data scientist is top-down. This approach to learning allows you to start doing something with your own hands almost immediately, gradually deepening your understanding and raising your professional level.

The possibilities are absolutely unlimited. In particular, you can immerse yourself in mathematical disciplines and machine learning algorithms as much as you want. This will help you to understand the subjects related to the profession better, and in the future will allow you to make career jumps. For example, if you are tired of working in business, you can improve your knowledge and go into scientific research or the low-level development of machine learning algorithms.

If your business needs a professional data science specialist, you can contact the Geniusee company by filling in the form.