Machine learning seems to have become an ambitious goal pursued by everyone. More than 80% of companies are studying at least one AI project.
Before you start, you’d better ask yourself the following three questions:
·”How accurate is this machine learning model?”
·”How long is the training?”
·”How much training data is needed?”
Users usually want to know how long it takes to load a new model and how its performance or promotion effect is. They want a way to measure the overall cost according to performance. Unfortunately, the answer to the above question does not solve this problem.
They are even misleading.
Model training is just the tip of the iceberg. Getting the right data set and cleaning, storing, aggregating, tagging, and building reliable data flow and infrastructure pipelines are costly, but most users and AI / ml companies ignore this.
According to recent research, the company spends more than 80% of its time on data preparation and engineering business in AI / ml projects. In other words, if most of the effort is focused on building and training models, the total engineering workload and cost may be five times that expected.
In addition, machine learning blurs the line between users and software developers.
Aiaas or mlaas has begun to appear. With the growth of data, the cloud model continues to improve. Because of this, the business of mlaas is more challenging than SaaS.
The machine model learns from the training data, so without high-quality data, the model will not work well. In most cases, users are not aware of best practices for generating or annotating appropriate data sets.
When the system performance is poor, users often blame the model. Therefore, AI / ml companies usually spend a lot of time and resources on training and cooperate with users to ensure data quality, which has become a common responsibility between AI companies and their customers.
For example, to train the defect inspection model on the production line, computer vision companies need to cooperate with customers to install cameras at the correct angle and position, check the resolution and frame rate, and ensure that there are enough positive and negative training samples for each scene.
Sometimes robots or vehicles require human operation, so it is more time-consuming and costly to collect data using robotics or autonomous vehicle applications.
Even after the training course and reading all user manuals and guides, you still can’t fully control the data generated by users. A machine vision camera company told me that their engineers would manually verify all the data to ensure its complete input.
All these often overlooked additional training, manual inspection, data cleaning and marking tasks will bring huge indirect costs to AI companies. This is why we need to build a more scalable AI / ml project. So how to solve this problem?
1. Scalability is the key.
Identify the right use cases that a large number of customers are willing to buy and solve them using the same model architecture. Finally, you need to build and train different models for different companies without standard products.
2. Try to provide self-service.
Automate training and data pipelines as much as possible to improve operational efficiency and reduce dependence on manual labor. Compared with internal tools or automation, companies pay more attention to opening the functions visible to customers, but the former will soon be rewarded. You need to ensure that sufficient resources are allocated for internal process automation.
3. Finally, determine and track costs, especially hidden costs.
How much time do engineers spend cleaning, filtering, or aggregating data? How much time do they spend ensuring that the third party completes the annotation correctly? How long do they need to help customers set up their environment and collect data correctly? How many of them can be automated or outsourced?
The road to level training may be difficult and long, but some problems need to be faced sooner or later.
Responsible editor: CT