A Systematic Evaluation of Object Detection Networks for Scientific Plots

Published in "Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1379-1387"
Pritha Ganguly , Nitesh Methani , Mitesh M. Khapra , Pratyush Kumar

Are existing object detection methods adequate for detecting text and visual elements in scientific plots which are arguably different than the objects found in natural images? To answer this question, we train and compare the accuracy of Fast/Faster R-CNN, SSD, YOLO and RetinaNet on the PlotQA dataset with over 220, 000 scientific plots. At the standard IOU setting of 0.5, most networks perform well with mAP scores greater than 80% in detecting the relatively simple objects in plots. However, the performance drops drastically when evaluated at a stricter IOU of 0.9 with the best model giving a mAP of 35.70%. Note that such a stricter evaluation is essential when dealing with scientific plots where even minor localisation errors can lead to large errors in downstream numerical inferences. Given this poor performance, we propose minor modifications to existing models by combining ideas from different object detection networks. While this significantly improves the performance, there are still two main issues: (i) performance on text objects which are essential for reasoning is very poor, and (ii) inference time is unacceptably large considering the simplicity of plots. To solve this open problem, we make a series of contributions: (a) an efficient region proposal method based on Laplacian edge detectors, (b) a feature representation of region proposals that includes neighbouring information, (c) a linking component to join multiple region proposals for detecting longer textual objects, and (d) a custom loss function that combines a smooth `1-loss with an IOU-based loss. Combining these ideas, our final model is very accurate at extreme IOU values achieving a mAP of 93.44%@0.9 IOU. Simultaneously, our model is very efficient with an inference time 16x lesser than the current models, including one-stage detectors. Our model also achieves a high accuracy on an extrinsic plot-to-table conversion task with an F1 score of 0.77. With these contributions, we make a definitive progress in object detection for plots and enable further exploration on automated reasoning of plots.