Three independent papers landed this month in three different journals. Each tackles a different organ system. Each uses a different machine learning architecture. Each is in companion dogs. Read together, they sketch a pretty clear picture of where veterinary AI is going next, and it is not where the marketing decks said it would go.

A radiograph, an LLM, and a vet who still signs the report

When my Labrador Pancake had a heart murmur picked up at his last wellness visit, the next step was a thoracic radiograph and a manual Vertebral Heart Size measurement. The vet leaned over the image, spent about ninety seconds with a digital ruler, quoted me a number. The whole sequence is exactly the kind of task an AI assistant could automate without changing anything else about the workflow.

A team at aivancity in France just published a paper showing this can now be done end to end. In The Veterinary Journal on April 17, 2026, Nguemo and colleagues described an integrated framework that ingests a canine thoracic DICOM, runs a deep learning computer vision model to detect anatomical landmarks, computes the VHS, then routes both the measurement and structured clinical inputs through a Large Language Model to generate a preliminary clinical summary. Quantitative review of the deep learning pipeline showed accurate detection of thoracic anatomical landmarks. Qualitative review of the LLM summaries found them to be coherent, context-aware, and consistent with what the radiographs actually showed.

Notably, the authors are direct about the limitation that matters most: the principal weakness of the pipeline relative to the state of the art is a systematic error in the linear measurements that pushes the final VHS root-mean-square error above clinically acceptable thresholds. They flag image calibration and expanded training data as the route to closing that gap.

The system runs in seconds. The radiologist still signs the report. And the authors are honest about what is not yet good enough.

That last sentence is the thesis of the paper.

A smartphone, a CNN, and the breeds we keep building airway problems into

A second paper, published April 23, 2026 in Veterinary Research Communications by Chauhan and Kumar at Netaji Subhas University of Technology in Delhi, trained convolutional neural networks to classify stenotic nares in brachycephalic dogs. Brachycephalic Obstructive Airway Syndrome, BOAS, is the constellation of upper airway problems that affects bulldogs, pugs, French bulldogs, and a long list of other short-faced breeds whose popularity keeps climbing. Stenotic nares (narrowed nostrils) are one of its core anatomical features, and grading them is currently a subjective clinical call with documented inter-observer variability.

A trained CNN running on a phone photo of a dog's face turns that subjective call into a measurable, reproducible one. The peer-reviewed paper is fresh and full text is not yet broadly indexed, so deeper methodological detail beyond title and keywords is not externally verifiable as of this writing, and that is worth flagging. But the direction matters: this is the kind of front-line triage tool that fits a general-practice workflow rather than a referral hospital, and it is exactly the population where BOAS over- and under-grading is most consequential.

A spectrometer, a small neural net, and a yeast hiding in plain sight

The third paper is a preprint, posted on bioRxiv on April 6, 2026, from Kurmann and colleagues at the University of Zurich. They paired Fourier-transform infrared spectroscopy with an artificial neural network classifier to assign Malassezia pachydermatis yeast strains, sampled from the ear canals of dogs with and without canine atopic dermatitis, into three phylogroups. Among 60 dogs, M. pachydermatis prevalence was significantly higher in atopic cases than in healthy controls. FTIR-based ANN classification showed full concordance with whole genome sequencing across all 35 sequenced isolates the team used as ground truth. Phylogroups I and III were significantly enriched in atopic dogs. Phylogroup II dominated healthy controls.

The cohort is sixty dogs. The paper is not yet peer-reviewed. The pipeline has not been tested outside the originating lab. All real limitations.

But the implication, if it holds, is that a rapid, cheap spectroscopy test plus a small ANN can reproduce what a multi-day sequencing pipeline does, at a throughput compatible with clinical epidemiology rather than research-only use.

The shared architecture

Three papers, three different problems, one shared structure. Each takes a clinical task that already exists in the veterinary workflow. Each automates the perception layer. Each routes the output back to a clinician for the decision. None tries to replace the vet. None claims diagnostic authority. Each adds a quantitative measurement layer where currently there is human judgment with known variability.

This is a meaningful shift from where the commercial veterinary AI layer sits. Last week I wrote about David Brundage's audit of seventy-one commercial veterinary AI products, only one of which disclosed the signalment of its training data and only 15.8 percent of which (in the imaging category) reported confidence intervals. Nguemo, Chauhan, and Kurmann are doing the kind of methodologically transparent work the commercial layer is, on average, not doing yet.

What can actually go wrong

Each paper has real failure modes worth naming.

The Nguemo pipeline is curated end to end and the LLM summary step is described as "for clinician review," which is the right framing. But the open question the paper does not fully resolve is what the system does when the LLM hallucinates a finding that is not on the radiograph. That failure mode is the most consequential one in deployment, and it deserves direct characterization, not just framing.

Chauhan and Kumar's CNN approach is generalizable in principle, but stenosis-graded models trained on one phone camera, one breed mix, and one geographic population often do not survive the next.

Kurmann's preprint shows full WGS concordance in 35 sequenced isolates, which is encouraging but small. Phylogroup classification is also, at this point, a research finding more than a clinical one. There is no validated treatment that depends on phylogroup assignment yet.

Three questions before deploying any of this

For veterinarians watching this space, the practical questions are starting to converge on three:

  1. Is the model trained on patient signalment that resembles yours? (A model validated on French bulldogs in Delhi is not automatically a model that works on Boston terriers in Vermont.)

  2. Where is the human-in-the-loop checkpoint, and does the workflow encourage or discourage actual clinician review?

  3. What happens when the model is wrong, and can you tell?

The third question is the one the commercial layer keeps failing. The academic pipeline behind it is starting to answer it, and the most honest of these papers say plainly when the answer is "not yet."

That is the most encouraging thing about the past month of pet tech research. The papers are not bigger. They are more honest about scope. That is the prerequisite for the rest.

Reply

Avatar

or to participate

Keep Reading