AI2 min read

Image recognition with LLM

We compared Large Language Models against traditional machine learning for detecting electrical fuse states in images. Here is what we found about accuracy, cost, and when each approach makes sense.

Adam Harnúšek·

Recognising the growing use of Large Language Models in image recognition, we decided to investigate their applicability to a real project task. This post covers the business perspective — for the technical implementation details, see Image recognition with LLM: A dev's how-to.

The Problem

Our task was to create a model that could determine whether electrical fuses were installed or not installed within electrical boxes.

Electrical fuse example

For the initial machine learning approach, we used Google Vision service trained on a substantial customer-provided dataset hosted in cloud infrastructure.

Testing LLMs for Image Recognition

When we turned to LLMs, we found that relatively few major models support image analysis. We evaluated the available options using two distinct approaches:

Zero-Shot Learning

We presented the model with images and a description of the task along with the desired output format — no examples provided. This is the simplest and cheapest approach.

Few-Shot Learning

We provided several example images with correct answers before presenting the new task. This consistently produced better results across all models, though at higher token costs due to the additional images in the prompt.

Results

ML result statistics

ModelZero-shot CostZero-shot PrecisionFew-shot CostFew-shot Precision
gpt-4o$0.013558%$0.0485%
gpt-4o-mini$0.000821%$0.002458%
gemini-2.0-flash-lite-preview-02-05$0.000221%$0.000675%

Traditional machine learning achieved superior accuracy, but it required a substantial upfront investment of roughly €2,000 in development effort alone.

When Does Each Approach Make Sense?

The right choice depends on your specific situation:

Use traditional ML when:

Use LLMs when:

Key considerations:

Conclusion

LLMs are a genuinely viable option for image recognition tasks, particularly when usage volume is low and avoiding upfront development cost matters. The few-shot approach with gpt-4o reached 85% precision — close enough to traditional ML for many real-world applications. The optimal choice depends on your project's usage frequency, latency tolerance, data privacy requirements, and budget constraints.