As a first step in developing algorithms that are scalable in terms of a accuracy versus time complexity trade off, we compared different implementations of the hard real time embedded person detection network presented during previous meeting. We introduced two scalability knobs: one controls the resolution of the input images that are fed into the network. We indeed observe that higher image resolutions achieve higher detection accuracy scores on typical drone image datasets, but have the disadvantage of a higher computational complexity and hence a slower execution on the same hardware. And vice versa, of course. A second knob consists of the algorithmic optimisations and approximations of the weights, yielding faster but less accurate detections at more severe approximations. Some of these time vs accuracy exploration results can be seen in the graph on the right.
EAVISE and EDM researchers put up an omnidirectional 720° camera system to gather imagery for the traffic monitoring case. In order to simulate a (hovering) drone viewpoint, we put the camera system on a scissor lift, reaching an altitude of 15m at full extension. We acquired 720° images at two different positions around Campus De Nayer, Sint-Katelijne-Waver, capturing pedestrian, bicycle and car traffic. The resulting dataset of 50415 omnidirectional stereo images are now used to further develop image interpretation algorithms.
We are focusing on improving the speed of our detection pipeline, aiming to achieve a real-time detector on the Jetson TX2.
Building on the work of Google with their MobileNets, we are researching the impact of using such convolutions in our own YOLO detector. The speed-accuracy tradeoff of the different networks is being investigated as well as ways to transfer learn weights from a regular YOLO network to the mobile variants. While mobile convolutions provide a speedup on their own, using specialized frameworks like TensorRT will allow these networks to be even faster on the Jetson TX2 platform.
This work also ties in with research from MICAS, who will use these convolutions to implement more energy efficient and high-throughput embedded neural networks on their hardware platforms.
Together with using depth information, this method will allow a real-time processing of the 360° input images for pedestrian detection on an embedded device.
In line with the work plan, we perform continuous analysis of the evolution of the legal framework concerning safe and secure operations with unmanned aircraft systems. In light of the recent overhaul of the (safety-related) civil aviation legal framework in the EU, the focus has been on the operational rules for drones. The European Commission is also working with the European Aviation Safety Agency (EASA) on guidelines for "standard scenarios" for drone operations that will help drone operators to comply with the adopted rules. Finally, the EC is also focused on the development of an institutional, regulatory and architectural framework for the provision of U-space services. U-Space's objective is to enable complex drone operations with a high degree of automation. These developments will become central to the discussion in the final report on regulatory aspects due to be submitted towards the end of OmniDrone.
Semantic scene analysis
A major challenge with 720 degree input is the huge amount of data that needs to be processed. To alleviate this problem, we are working on an active exploration mechanism that analyzes a sequence of narrow, limited field-of-view glimpses rather than the whole image at once. This is inspired by the human visual system, where eye movement focuses attention to relevant parts of the scene.
We first aim at reconstructing the full 720 image based on a limited number of glimpses. We developed an unsupervised soft attention mechanism which tells the agent where to look next to reduce its uncertainty about its environment. We also proposed a fit-in module that, with the help of skip-connections, makes the agent aware of the glimpses’ orientations in its environment.
Learning from streaming data
A typical problem when learning in an online setup, as with the self-supervised obstacle avoidance, is the non-i.i.d. distribution of streaming data. We developed a system that keeps on learning over time in a streaming fashion, with data distributions gradually changing and without the notion of separate tasks. In particular, we used a method known as Memory Aware Synapses developed earlier in our lab, to estimate the importance of network parameters. We then proposed a protocol to decide i) when to update the importance weights, ii) which data to use to update them, and iii) how to accumulate the importance weights at each update step. We successfully tested this setup in simulation and in a real world experiment.
Explaining and interpreting neural networks
We also worked on a new method to explain and interpret neural networks. We proposed a novel scheme for both interpretation as well as explanation in which, given a pretrained model, we automatically identify internal features relevant for the set of classes considered by the model, without relying on additional annotations. We interpret the model through average visualizations of this reduced set of features. Then, at test time, we explain the network prediction by accompanying the predicted class label with supporting visualizations derived from the identified features. In addition, we propose a method to address the artifacts introduced by strided operations in deconvNet-based visualizations. Moreover, we introduce an8Flower, a dataset specifically designed for objective quantitative evaluation of methods for visual explanation.