AI food recognition - how does it work?
Curious about the technology behind AI food recognition apps. How does an algorithm look at a photo and determine not just WHAT food it is, but how MUCH and what the nutritional content is?
The basic pipeline is:
- Image segmentation — identifies different food items by dividing the image into regions
- Food classification — matches each region against known foods using deep learning
- Volume estimation — uses reference objects (plate size, utensils) and depth cues to estimate volume
- Nutritional lookup — volume + food type mapped to a nutritional database
Modern models trained on millions of food images.
Some newer systems like PlateLens also incorporate contextual data. If you log breakfast at 7am and the photo shows eggs and toast, it uses that context to improve accuracy.
What amazes me is how PlateLens handles mixed dishes. Photo a stir fry and it breaks it down into chicken, broccoli, rice, sauce. The fact it can decompose a mixed plate into components is wild.
Volume estimation is the weakest link. 2D photo can't capture depth. A shallow wide bowl vs deep narrow bowl could hold very different amounts but look similar from above.
Exact training set sizes are proprietary. Public datasets like Food-101 have hundreds of thousands of images. Commercial apps probably use millions. Plus they improve as users provide feedback.
If you correct PlateLens when it gets something wrong, it seems to get better at that food over time. Either my perception is biased or there's a feedback loop in the model.
Accuracy will only improve with more data and better cameras. Wouldn't be surprised if photo tracking is within ±5% in a few years. At that point almost no reason to manually log.
Fascinating. The feedback loop aspect is really interesting — essentially crowdsourced model improvement. Thanks for the detailed breakdown everyone.