Overcoming Pitfalls in ML System Design

Dmytro Ivanov
MACHINE LEARNING ENGINEER
Alina Ampilogova
COMMUNICATIONS MANAGER

Building a machine learning system is a complicated process. Several companies have already learned that fact the hard way. According to Gartner, only 15% of data science projects were expected to reach production stage by 2022. Furthermore, the 2020 state of enterprise ML report revealed the long deployment journey as one of the key culprits for ML project failure.

While this is a rather grim statistic, it doesn't mean that company leaders should give up on their machine learning initiatives. Most machine learning pilots fail not because there are too many steps in ML system design, but because each step can be sabotaged by pitfalls that are dangerously easy to overlook.

Let's ensure that your project doesn't join the 85% and makes it to the finish line instead. In this blog post, we'll be breaking down all the pitfalls you can come across in your ML System Design – and sharing tips on how to avoid them.

Key Challenges in ML System Design

Project scoping

It's a common belief that a machine learning project is 80% development. However, it's 60% communication, 30% data preparation, and 10% model development. Without nailing the 60%, there is no point in advancing further.

  • High cost-low ROI ratio
    Any ML project is costly and time-consuming, so you expect a proper investment return. To make this expectation a reality, you should be realistic about your needs, resources, and company processes.
HOW URGENT IS OUR PROBLEM? WHO OR WHAT DEFINES ITS URGENCY?

Assess the urgency of the ongoing problem and look over the factors that dictate this urgency.

WHY DO WE NEED ML TO SOLVE THIS PROBLEM?

Outline the tasks and goals you want to cover and decide whether these tasks are worth investing in ML development.

WHAT IS OUR DEFINITION OF SUCCESS?

What is the endgame of an ML project? Be clear about the expected results and how you would measure these results (KPI, revenue, time optimization, etc.)

  • Conflicting requirements
    Let's imagine a company needs an intelligent system for selling traffic that would allow better decision-making by scoring users based on their conversion progress. The company's marketing team would be the main stakeholder, wanting to profit from low-score users. Meanwhile, a machine learning team wants to get as much data as possible by using many complex techniques. But the company's product and behavior intelligence teams don't need so much data and aren't ready for high computing costs. With so many contradicting priorities, a consensus is in order. For example, the ML team should focus on the data with the most significant influence on the model or avoid complex algorithms.
STAKEHOLDER
KPI
Marketing team

ARPU lower than resale price

ML team

High AUC ROC

Product team

Performance

BI team

Lean DBMS management

  • Wrong prediction risks
    Consider how faulty algorithms calculation can affect work efficiency or customer satisfaction. For example, intelligent algorithms on platforms such as Netflix are responsible for selecting and representing film recommendations based on user's preferences and behavior patterns. Users won't bother renewing their subscriptions if such algorithms keep recommending the wrong movies. So, before you start working on a project, you must acknowledge the potential risks of inaccurate predictions and see whether (and to what degree) these risks are acceptable.
Discover the full scope of our intelligent automation services

Data engineering

Any machine learning system is extremely data hungry and data-sensitive. Give it too little data, and it comes out underfitted. Give it too much data — and it ends up overfitted. Give it the wrong data — and it doesn't perform as expected. Yet, collecting data is only half of the battle. Before it can be used, raw data must be cleaned and transformed into a comprehensible format. No wonder this step takes around 45% of the data scientists' time — without a thorough approach to data, the project can get compromised by numerous issues.

  • Lack of proper data
    You may think that you have all the data at hand, in documents, papers, or transferred to a digital format. However, in the middle of the testing process, your model ends up missing a huge chunk of data — because it wasn't found in your traditional data sources.

  • Data storage and collection
    Getting proper data means you must be familiar with several ways of data gathering. Not all information can be obtained with just one method or from just one source. It takes a thorough scan across departments and all data storage platforms available to ensure no essential bits are missing. You must also rethink your approach to data storage and structuring.
  • Missing values
    When compiling data in a dataset, it's not uncommon to see empty cells, NULL values, or question marks. Whenever it happens, missing values should be removed or substituted after a manual search through all data sources.

  • Data noise
    Data is rarely neat and organized. It's messy, noisy, and biased in ways unknown to all stakeholders and decision-makers. Using data from sources lacking data validation features also leads to multiple outliers and anomalies.

  • Inconsistent values
    Such a problem often occurs when combining data from various sources, providing different variants in variables (for example, one system uses the NEW YORK value, while another uses NY). That's where data may get noisy, so to avoid this pitfall, participants must find all variations and standardize them correctly.

  • Data compliance issues
    Not all data can be used freely — some bits of information gleaned from unstructured documents or other data goldmines may be valuable for the model or vital for the company clients’ NDA. So, before data scientists can proceed with modeling, a CIO must explore the data they will be using and ensure it doesn’t violate safety guidelines, customers’ privacy and follows data compliance regulations.
Discover the 6 critical aspects of enterprise data management

Modeling

After dealing with data, modeling is a relatively easier and less time-consuming step. After visualizing and processing data, data scientists can finally use it for building a model and generating predictions. However, modeling doesn't come without its pitfalls.

  • Interpretability
    A successful model must have a transparent and predictable logic that human users can track and comprehend. When users struggle with understanding the ML model's approach to making decisions or predictions, the end product will be of little value to them. Due to this, modeling requires a perspective from all the departments using the potential ML system.

  • Scalability
    Deciding that scalability can be saved for later is one of the biggest mistakes one can make at this stage. Such an approach may lead to the need to develop a new ML system — cue, more expenses, and fewer chances of adequate ROI. To be scalable, a model needs robust infrastructure, capacity for integration, and customized tech stack.

  • Maintenance
    ML models remain sensitive to the data they receive and/or are based on. A seemingly minor change, like a software update or a shift in customer behavior, may lead to a drift of concept (change in the relationship between input and output data), reducing the model's accuracy. Data scientists must maintain their ML models, running tests and updating them following software updates and validating an ML model before deployment.

Monitoring and continuous learning

So, an ML system is designed, modeled, and finally deployed. However, the work is far from over. Deploying an ML project is costly and resource-heavy, so it’s important to ensure that not a single cent or minute goes to waste. For that reason, even after deployment, every ML pilot must be closely monitored and scanned for following disruptive processes:

  • Data drifts
    Data is never static. It's dynamic and constantly shifting, so data scientists must monitor all input features, so they can locate input data changes and respond to them promptly.

  • Data inconsistencies
    Even if the data obtaining and preparation step went smoothly, it doesn't protect the data gathering pipelines from becoming infested with integrity issues (e.g., outdated or missing data points). Regularly scanning input data for missing values lets you keep your data structure complete and functional.

  • Concept drift
    Concept drifts may be sudden or gradual, triggered by external factors (such as the COVID-19 pandemic) or internal ones (fluctuation in the evaluation metrics). It's vital to perform regular correlation studies to maintain productive performance of your machine learning system.


To keep the deployed ML system safe from drifts and inconsistencies, data scientists and stakeholders must regularly invest their time and resources in continuous learning. Continuous learning is the fuel that keeps a ML model going. It involves constantly feeding large volumes of new data to the model, re-educating it, so it could keep up with the shifts in its environment (new customer behavior patterns, workflow adjustments, new target audience segments or internal company processes).

Since ML projects operate similarly to machines (requiring new parts, testing and checkups), continuous learning is vital for securing the project’s resilience.

Join A-level companies leveraging data-driven decision-making

Reporting (BI)

Reporting is not exactly the last stage of ML system development — it’s the stage where a company can start using their deployed product, gleaning business intelligence insights, visualizing the received data and gain a deeper look into customer behavior and other factors impacting business efficiency.

The only pitfalls that can compromise reporting are the ones not avoided during previous steps (poor data quality, lack of continuous learning, conflicting business objectives, etc.).

However, with all these steps performed correctly, developers and stakeholders get more than an efficient ML project — they gain a goldmine of data to enrich their understanding of their target audience, driving comprehensive analytical insights and predictions.

Conclusion

Machine learning technology has evolved beyond a futuristic concept — it's already a part of everyday routine and workflow. As the post-pandemic period pushes businesses to expand their data and task management capacity, ML systems are becoming more and more relevant.

However, starting a machine learning initiative without considering potential pitfalls can result in its failure (even after deployment).

Key components of successful ML system design

COMMUNICATION

Communicate your priorities to the rest of the group, be open to their suggestions, and prepare to reach a compromise.

COLLABORATION

Stay in touch with your development team, get them involved at the problem statement level, and welcome them to share their insights and experience with your industry. Be ready to cooperate on all ML system design changes.

CLARITY

Keep your data organized, structured, consistent and accessible. Make sure that all your data sources come with strong data validation controls so that no bad data ends up in your data-gathering pipelines.

If you want to bring your ML idea to life and be aware of every potential challenge, let’s chat. We'll provide all the insights you need to reach a consensus in your decision-making group and successfully design, deploy, and scale your machine learning system.

Ready to explore
 tomorrow's potential?