How to evaluate industrial AI vision vendors — a founder's checklist

I have been on both sides of this conversation. As a founder pitching CountAI into 25+ enterprise procurement cycles across seven countries. And as the operator who has had to live with the consequences of every architectural decision we made in 2019 about which silicon, which cloud posture, and which integration model to commit to.

From that view, almost every industrial-AI-vision evaluation I see runs the same playbook: a feature checklist, an accuracy demo, a reference call, a procurement template. And almost every one of those evaluations misses the thing that actually predicts whether the deployment will scale beyond the first line.

If you are an operations director, EHS leader, or transformation owner about to short-list AI vision vendors in 2026, the five questions below are the ones that will separate the vendors who will still be running for you in three years from the ones whose pilot will quietly stall in week six. The accuracy number is the easy part. These are the hard parts.

Why the standard evaluation gets this wrong

The standard RFP for industrial AI vision spends most of its weight on detection accuracy, feature coverage, and pricing. Those three things matter, but they are not the things that kill deployments.

What kills deployments, in roughly the order I have watched it happen:

The vendor's platform is hard-tied to one silicon family, the second site has different economics, and the rollout pauses to "re-evaluate."
The architecture depends on the cloud, the customer's IT team finds the inbound firewall rule it requires, and the project stalls in compliance review.
The vendor cannot quote a per-class false-alert rate at sites comparable to the customer's, so after the first 60 days of real-world drift, the operator stops trusting the alerts.
The integration with the existing MES is shallow — the AI sees what the MES never logged, but the MES owner cannot reconcile the two numbers, so the C-suite stops believing either of them.
The model retraining loop is the vendor's secret, the data is the customer's. When the product mix shifts in year two, nobody knows whose job it is to update the model.

None of these show up in a feature checklist. All of them show up in the second year of the contract. The questions below are designed to surface them in week one of the evaluation, not week 52 of the deployment.

QUESTION 1 of 5

Does the platform run on the cameras, sensors, MES and ERP we already have?

Ask: "Walk us through a deployment that ran on the customer's existing infrastructure end-to-end — specifically, what did you not have to replace?"

The whole economic case for industrial AI vision in 2026 is retrofit. If the deployment plan begins with "we send a recommended camera spec" or "we'd ask you to standardise on our edge box", the platform is not solving a 2026 problem — it is solving the 2018 rip-and-replace problem with rip-and-replace economics. The CFO killed that project the first time. They will kill it again.

A serious vendor will name specific camera brands they have ingested via RTSP (Hikvision, Dahua, Axis, Bosch, Honeywell, Pelco, Avigilon, CP Plus), specific MES platforms they have integrated with through published interfaces (SAP, Oracle, Plex, Aveva, Siemens Opcenter, Rockwell), and specific PLC protocols they read natively (OPC UA, MQTT, vendor gateways). Anyone who hedges this answer is selling you a transformation program, not an intelligence platform.

QUESTION 2 of 5

Where does the raw video and operational data physically rest at the end of every network hop?

Ask: "When a camera captures a frame, what is the next hop, and the next, and the next, and where does the data physically rest at the end of that chain?"

This is the question that ends procurement conversations six months in if it was not asked in week one. Cloud-first AI vision platforms still exist, and most of them genuinely work technically. They do not survive a real factory IT and privacy review.

The structural reasons: bandwidth (streaming dozens of HD camera feeds to the cloud is not how factory networks are provisioned); latency (a stoppage alert that arrives 30 seconds late is operationally useless); reliability (a cloud-dependent platform goes blind every time the uplink hiccups); and data residency (in the EU under GDPR, the UK under DPA, Australia under Privacy Principles, and most US states under emerging state-level industrial codes, video of workers processed offsite is a legal conversation your privacy team will not enjoy).

The platforms that work in production keep raw video on the floor. Only metadata leaves — events, counts, snapshots, with faces blurred where regulators require it. If the vendor offers "both cloud and edge", ask which their production customers actually use. The honest answer separates them.

QUESTION 3 of 5

What is your false-alert rate by class, after 60 days, at customer sites comparable to ours?

Ask: "Quote me a precision and recall number per use-case class — helmet, vest, gloves, holes, lycra, oil-spot, stop event — from a production site with our camera mix and our lighting conditions, after 60 days in the field."

The lab accuracy number is meaningless. So is the marketing accuracy number. The number that matters is the false-alert rate after 60 days of real production conditions, because that is the number that determines whether your supervisor still trusts the alert in week ten.

If the system fires 20 false alerts a day for the first month, the operator stops looking at the alert feed by the second month. The pilot is dead. The hardest engineering work in production AI vision is not detection — it is the precision-recall trade-off that keeps alerts trustworthy after the model has seen everything a real factory throws at it for two months.

A serious vendor will commit to a per-class benchmark on your cameras during the pilot, measured against ground truth your team labels. Any vendor that quotes a single accuracy figure across all conditions is selling marketing.

QUESTION 4 of 5

How do you handle data integrity when the operator forgets to log?

Ask: "Show me, on a real customer site, where your MES-reported number differed from your camera-observed number — and what your platform did about it."

Every MES depends on operator input. The operator who steps away for nine minutes to fix a yarn break does not key in a nine-minute stop reason. He keys in "around five", or he keys in nothing at all because the line was technically still running. Multiply that by a few stations per line, three shifts, 250 working days, and the gap between the MES-reported OEE and the actual floor OEE is usually 8 to 20 points.

A good AI vision platform creates an independent observation that surfaces this gap. It does not replace the MES — the MES remains the system of record for traceability and audit. But it forces the operation to confront the difference between what was logged and what happened.

A bad answer here is "MES is the source of truth." That confuses the system of record with the source of truth. They are not the same thing.

QUESTION 5 of 5

Which chip silicon family is your platform tied to in five years?

Ask: "Show me your platform running on at least two different inference chips in production today — without forking the codebase."

The least-discussed failure mode in this category, and in my experience the one that gets the most rollouts stuck after the first site. The chip landscape in 2026 is moving fast. NVIDIA Jetson Thor, Intel Core Ultra Series 2, Hailo-8 and 10H, Qualcomm Dragonwing — each is the right chip for some use case, some power envelope, some unit-economics point. None is the right chip for all of them.

A platform hard-tied to one silicon family is fine on the first site. By the second site, the camera count is different, the lighting is different, the power budget is different, the cost-per-camera math has changed — and the platform cannot adapt without forking. The customer pauses the rollout to re-evaluate. The re-evaluation never finishes.

The platforms that survive multiple chip cycles are the ones that built portability into the inference layer from day one. Ask to see the same software stack running on at least two different chips at two different customer sites. If the vendor cannot show it, they have not yet been forced to.

Detection accuracy is the easy part. The hard part is everything that happens to a deployment between day 30 and day 730. These five questions are what change at scale.

A scoring exercise that takes 20 minutes

If you are running an evaluation right now, here is the cheapest possible way to use the five questions above.

Send each short-listed vendor the five questions in writing. Give them a week to respond. Then score each answer A through F on two dimensions: specificity (did they name actual customers, chips, MES platforms, false-alert rates) and honesty (did they admit anything they cannot do, or did the answer sound like a brochure).

Vendors who score A on both dimensions, on all five questions, are the ones whose deployment will still be running for you in 2029. Vendors who score below B on more than two questions are the ones whose pilot is going to stall in week six. The remaining middle is where most of the field sits, and that is where the reference-customer calls actually matter — because the questions you ask in those calls should be calibrated to the specific weaknesses the written answers exposed.

Have an evaluation underway? I'll review the answers with you.

If you are short-listing vendors for an industrial AI vision project and have answers to the five questions above (from any vendors, including ones we are not competing with), I will spend 30 minutes reading them with you and pointing out what to ask next. No agenda, no pitch. The motivation is selfish — founders who do this kind of evaluation well are the customers who later become the references that make this category mature faster. About me / CountAI.

Email Harsha directly →

Goes to my inbox. Usually replies the same day.

Where this leaves you

Most industrial AI vision deployments do not fail at the model. They fail at deployment economics, cloud architecture, or chip lock-in. The Cisco 2026 State of Industrial AI Report puts the number at 61% of factories deploying physical AI but only 20% successfully scaling it — and the 41-point gap between those two numbers is almost entirely about the questions above.

Run the evaluation honestly. Ask the five questions. Score the answers ruthlessly. The vendor you pick on this basis is the one who will still be there when you need them in 2029 — not the one whose accuracy demo was prettiest in the procurement meeting.