Oral cancer or mouth cancer refers to cancer involving the lips, buccal mucosae and vestibules, tongue, tonsillar pillars, and the upper throat. It usually begins as a painless white patch or red patch or non-healing ulcers that persist and/or continue to grow and increase in size. Owing to its initially asymptomatic nature, oral cancers are often neglected by patients and only are usually presented to clinicians in advanced stages often with lymph node involvement. Early detection of oral cancers could enable prompt treatment and therefore, decrease the morbidity and mortality associated with the disease.
Machine learning is a type of artificial intelligence (AI) that enables computer software-aided and accurate prediction of outcomes based on algorithms developed using data as input values to predict the output values.
About the study
The Interactive Talk presentation, “Predicting Oral Cancer Risk using Machine Learning”, took place on Saturday, June 25, 2022, at 2 p.m. China Standard Time (UTC+08:00) during the “e-Oral Health Network I” session, in which a recent study for oral cancer prediction was presented.
The prospective study was conducted by John Adeoye of the University of Hong Kong, Special Administrative Regions (SAR), China to develop a machine-learning-based platform for the prediction of the risk of oral potentially malignant disorders (OPMDs) and oral cancer.
In the present perspective, the oral cavities of 1467 study participants were examined by visual oral examination (VOE) in a community-based screening program by three Honk Kong dentists. A machine learning-based platform was used to predict the risk of oral potentially malignant disorders (OPMDs) and oral cancer.
For each individual, the status was described as positive or negative for OPMD and/or oral cancer. In addition, histopathological investigations were performed for the presence of epithelial dysplasia (ED) and squamous cell carcinoma (SCC). Individuals who were negative for OPMD and/or oral cancer in the screening were followed up via state-linked electronic medical records.
Data were obtained for the identification of associated risk factors based on demographics, lifestyle, habits, and familial history. The expired carbon monoxide (CO) levels were assessed for the participants using a monitor and expressed as parts per million (ppm). A total of 40 input features and histopathological diagnostic criteria were used to develop 12 machine learning algorithms with an 80:20 ratio of train-test splitting applied to the study data.
Recursive feature elimination with 10-fold cross-validation was implemented for the selection of features and the synthetic-minority-oversampling technique with edited-nearest neighbors was used for the correction of class imbalances. Internal validation was performed using the unused 20% data and comparing outputs using the McNemar statistical test for optimal model selection The performance metrics included specificity, F1 scores, and recall.
Overall, the study findings showed that machine learning could be used for oral cancer risk prediction and could also be applied for the identification of high-risk populations in screening programs.