Machine learning for the prediction of subclinical mastitis in cows milked in automatic milking system.


The aim of the study by the authors cited was to identify subclinical mastitis in the udder through zootechnical data, infrared thermography (IRT) and machine learning techniques.

Milk samples were collected monthly from 1035 udder quarters with no clinical signs, from Holstein and Holstein x Jersey crossbred cows. The cows were raised in a crop-livestock integrated system in a tropical environment and milked in an automated system. Somatic cell count was measured by flow cytometry with the cut-off point for positive vs negative cases of the disease 200 000 cells per ml of milk. Milk fat was measured by Fourier transform infrared spectroscopy. Electrical conductivity (EC) and milk yield were measured using sensors in the system. Relative humidity and air temperature were measured from sensors installed in the milking area. Thermal images of the udders were generated using a thermographic camera and then analysed using the IRSoft Testo 4.8 SP1 software to determine the average temperatures of the udders and their coldest and hottest temperature points. Micro-organisms isolated from the milk were identified by MALDITOF and grouped into three categories: primary (Staphylococcus aureus, Streptococcus dysgalactiae, S. uberis, and E. coli); secondary (Corynebacterium spp., C. bovis, Staph. chromogenes, S. auricularis, S. epidermidis, S. saprophyticus, and other coagulase-negative staphylococci); and others (other isolated micro-organisms that were not classified as primary or secondary). The attributes of air temperature and relative humidity, cold and hot spots of the udder, average temperature and temperature range of the quarter, difference between the average temperature of the quarter and the rectal temperature, parities, production volume, EC, milk fat percentage, and the group to which the isolated micro-organisms belonged were analysed together using the Random Forest machine learning algorithm. The data were randomly divided into 75% for training and 25% for testing.

Results showed that the most common micro-organisms were Staph. aureus and Staph. chromogenes. Among the variables used by the algorithm for decision making, EC was the most important, followed by milk volume. The coldest point of the quarter and its thermal amplitude were the most relevant thermal attributes, suggesting that the thermographic data could be added to the algorithm. The algorithm’s ability to screen cows suspected of having subclinical mastitis was demonstrated by its accuracy of 77%, sensitivity of 90.5%, precision of 79.6%, F1 score of 84.7, and area under the receiver operating characteristic curve of 81.7. However, a specificity of 50.6% suggests that there were maybe a significant number of false positives.                                                                                                                       

It was concluded that the Random Forest machine learning algorithm, combined with zootechnical data and diagnostic techniques such as IRT and microbiological examination, can be used to identify subclinical mastitis in dairy cows, especially when screening suspect cows. Data from new attributes should however be incorporated to improve the diagnostic capacity.