Data Pipeline Development Services for Robotics

Geniusee provided our data science department with assisting the product robotics company with developing and implementing the data pipeline solution. Our client specializes in the robotics industry, providing a wide range of services such as robots themselves, warehouses as a service, data pipelines, and real-time streaming of data lakes. All this digital transformation journey is aimed at optimizing other companies' processes.

Services:

Industry:

Location:

United States

Head of business operations

Data and robotics company

Geniusee delivers high-quality products and sets reasonable project timelines to ensure they deliver to the client's requirements. Their team of engineers is excellent and rectifies bugs quickly. The partners communicate regularly — Geniusee is available 24/7 and uses Jira to track project progress.

Business context:

Our client reached out owing to the need to improve the data processing system by robots. Our expertise in data engineering allowed us to offer a solution for a data pipeline that collects information in real-time using a data warehouse and allows utilizing robots and resources more efficiently.

Key challenges:

Ensure data quality and build a process of data collection that has no delays and works just on time;
Assure completeness of gathered information;
Upfit system stability;
Build a cost-effective solution for a cloud solution that requires the collection of 10 Gb per second;
Tailor infrastructure that does not depend on data volume.

Work approach

Batch processing

We processed data in multiple batches. There are different types of batching: mini batching, which can involve only a couple of samples, or full batching, which can involve days of data.
Streaming

Stream processing involves handling data on a sample-by-sample basis as it arrives. The system does not build up backlogs periodically. Instead, processes are carried out in real-time.
Experimental approach

We expand new horizons when working with clients who create emerging technologies. As a result of our collaboration, we implemented some fresh ideas.
Constant self-learning

Data science technologies and a rapidly changing market were required for this project. Due to a reshaped landscape, we had to adapt quickly to meet the needs of our clients.

Process

On this project, our team provided data pipeline development services that process data collected in real time into various data lakes. After that, data is analyzed and presented to metrics and boards. The complexity of the work was based on multiple data sources from different IoT devices and additional tools such as bar code scanners, CRM systems, products, and 3d party services.

Our team developed the pipeline to process and analyze data. We decided to start with PoC for 1-2 days and a 1-2 weeks testing period. This allowed us to analyze workflow and make conclusions that affected further development process and led to the creation of improvements road map. We took three more weeks to create a full-fledged MVP, which we monitored for a month with a constant gradual rise of processed data amount. We also collected real users' feedback to conduct results validation and fast implementation of interactive improvements based on actual and authentic experience and tests of the delivered pipeline.
The business analysis stage is an overviewing and defining stage that, if conducted correctly, allows to cut development costs and time and avoid multiple mistakes. IT may as well become a field for cooperation and powerful brainstorming. We completed WBS (work breakdown structure) for the pipeline's features on the MVP stage to have a shared sense of understanding with our client. This also allowed us to define clear development criteria, omit poor resource allocation, schedule the development process clearly, and provide a visual of the entire work scope. With WBS, we were also in complete control of overseeing control points and milestones.
At this stage, our team was working on the DevOps of the pipeline. We described its architecture and the process of architecture diagram creation. Our team decided to go with cloud microservices to optimize a basic system. Inside it, multiple services are collaborating without any access from the outside. This allowed us to cut costs and strengthen security protocols. Data infrastructure was harmonized with carried-out architecture by data engineers and DevOps specialists. At this stage, we also assured data quality check automation using data quality metrics validation conducted in case notifications came through a double check.
The design stage is always one of the most important as it is not only about UI but rather about system behavior in different workflow stages. To ensure proper system functioning, it's also crucial to build a proper workflow for data that came to a system from multiple sources with different origins as CRM systems, IoT devices, third-party providers, etc. Then we stored, processed, and analyzed it. At this stage, we defined the elements of a system to build — architecture, infrastructure, components, and data. All aspects of a system are based on the requirements specified by our client. This part was led by our data engineers and product manager from the client's side using Figma for visualization.

Project Tech Stack

As our client works with traders as end customers, we had to choose only fast and secure technologies for warehousing robotics data pipelines. We decided to use cloud possibilities and microservices solutions to satisfy those criteria and ensure a high-performance rate and asset allocation.

List of technologies:

Elastic Kubernetes Service

Geniusee Team

As the project centered on data engineering and required a significant amount of expertise to create pipelines, we decided to go with only senior specialists and our Head of Data Science and Data Engineering as a lead.

Product team

Development team

Features

System monitoring
We implemented a continuous monitoring service to check several data metrics on a regular basis several metrics to check if all system components work correctly, such as the last time of data extraction and so on. If the monitoring system detects a malfunction, there's an automated notification via multiple communication channels such as e-mail or Slack.
System of automated quality control
We based this system on a badge job of data processing. Original data from the source is counted, including empty records from data providers, and is compared to production by different periods. After the verification of collected information, the data analytics stage begins.
Microservices infrastructure
Software development using microservices involves creating small, independent services that communicate through well-defined APIs. Small, self-contained teams are responsible for providing these services. Unlike traditional architecture, microservices consist of separate units capable of developing, updating, deploying, and scaling independently. More frequent software updates can improve reliability, uptime, and performance.
3d party integrations
- Kafka
- CRM systems
- In-house databases
- IoT devices
Data pipeline alerts
The system alerts turn on if the data pipeline is off, stuck, or has no data. This prevents a complete shutdown and provides system stability.