Thumbnail Image

Vision based system for detecting and counting mobility aids in surveillance videos

Automatic surveillance video analysis is popular among computer vision researchers due to its wide range of applications that require automated systems. Automated systems are to replace manual analysis of videos which is tiresome, expensive, and time-consuming. Image and video processing techniques are often used in the design of automatic detection and monitoring systems. Compared with normal indoor videos, outdoor surveillance videos are often difficult to process due to the uncontrolled environment, camera angle, and varying lighting and weather conditions. This research aims to contribute to the computer vision field by proposing an object detection and tracking algorithm that can handle multi-object and multi-class scenarios. The problem is solved by developing an application to count disabled pedestrians in surveillance videos by automatically detecting and tracking mobility aids and pedestrians. The application demonstrates that the proposed ideas achieve the desired outcomes. There are extensive studies on pedestrian detection and gait analysis in the computer vision field, but limited work is carried out on identifying disabled pedestrians or mobility aids. Detection of mobility aids in videos is challenging since the disabled person often occludes mobility aids and visibility of mobility aid depends on the direction of the walk with respect to the camera. For example, a walking stick is visible most times in front-on view while it is occluded when it happens to be on the walker's rear side. Furthermore, people use various mobility aids and their make and type changes with time as technology advances. The system should detect the majority of mobility aids to report reliable counting data. The literature review revealed that no system exists for detecting disabled pedestrians or mobility aids in surveillance videos. A lack of annotated image data containing mobility aids is also an obstacle to developing a machine-learning-based solution to detect mobility aids. In the first part of this thesis, we explored moving pedestrians' video data to extract the gait signals using manual and automated procedures. Manual extraction involved marking the pedestrians' head and leg locations and analysing those signals in the time domain. Analysis of stride length and velocity features indicate an abnormality if a walker is physically disabled. The automated system is built by combining the \acrshort{yolo} object detector, GMM based foreground modelling and star skeletonisation in a pipeline to extract the gait signal. The automated system failed to recognise a disabled person from its gait due to poor localisation by \acrshort{yolo}, incorrect segmentation and silhouette extraction due to moving backgrounds and shadows. The automated gait analysis approach failed due to various factors including environmental constraints, viewing angle, occlusions, shadows, imperfections in foreground modelling, object segmentation and silhouette extraction. In the later part of this thesis, we developed a CNN based approach to detect mobility aids and pedestrians. The task of identifying and counting disabled pedestrians in surveillance videos is divided into three sub-tasks: mobility aid and person detection, tracking and data association of detected objects, and counting healthy and disabled pedestrians. A modern object detector called YOLO, an improved data association algorithm (SORT), and a new pairing approach are applied to complete the three sub-tasks. Improvement of the SORT algorithm and introducing a pairing approach are notable contributions to the computer vision field. The SORT algorithm is strictly one class and without an object counting feature. SORT is enhanced to be multi-class and able to track accelerating or temporarily occluded objects. The pairing strategy associates a mobility aid with the nearest pedestrian and monitors them over time to see if the pair is reliable. A reliable pair represents a disabled pedestrian and counting reliable pairs calculates the number of disabled people in the video. The thesis also introduces an image database that was gathered as part of this study. The dataset comprises 5819 images belonging to eight different object classes, including five mobility aids, pedestrians, cars, and bicycles. The dataset was needed to train a CNN that can detect mobility aids in videos. The proposed mobility aid counting system is evaluated on a range of surveillance videos collected from outdoors with real-world scenarios. The results prove that the proposed solution offers a satisfactory performance in picking mobility aids from outdoor surveillance videos. The counting accuracy of 94% on test videos meets the design goals set by the advocacy group that need this application. Most test videos had objects from multiple classes in them. The system detected five mobility aids (wheelchair, crutch, walking stick, walking frame and mobility scooter), pedestrians and two distractors (car and bicycle). The training system on distractors' classes was to ensure the system can distinguish objects that are similar to mobility aids from mobility aids. In some cases, the convolutional neural network reports a mobility aid with an incorrect type. For example, the shape of crutch and stick are very much alike, and therefore, the system confuses one with the other. However, it does not affect the final counts as the aim was to get the overall counts of mobility aids (of any type) and determining the exact type of mobility aid is optional.
Type of thesis
The University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.