We are happy to give you our first newsletter.
Since the kick-off meeting in June, 17, we have of course not been sitting still. We are proud to give you a first update of the progress we made so far. We hope you enjoy reading this newsletter, and, as always, we are open for all feedback.
Our first focus in this project is on the use of the static 360-degree depth capturing platform that will be developed during the first 24 months of the project and will be capable of at real-time extracting omnidirectional depth maps in a traffic monitoring context.
The OmniDrone team
The ESAT TELEMIC group combines all expertise needed for the modeling, design and measurement of the next generation wireless communication systems, ranging from Massive MIMO to mmWave solutions. For OmniDrone, aerial communication performance models are proposed, based on simulations and measurement. Current and future communication technologies are then compared and optimised.
Air-to-ground channel model
It is necessary to develop a fundamental understanding of the distinctive features of air-to-ground (A2G) links. TELEMIC proposed a theoretical framework incorporating both height-dependent path loss exponent and small-scale fading, and unifies a widely used ground-to-ground channel model with that of A2G for the analysis of large-scale wireless networks. We derived analytical expressions for the optimal UAV height that minimizes the outage probability of an arbitrary A2G link. Results of this research enable dynamic adjustment of the network performance without requiring changes in the position of the corresponding UAVs.
Realistic 3D simulator
We designed a 3D propagation simulator taking into account a real map of Flanders. Together with real parameters of the base stations and Vienna LTE-A downlink system level simulator, this tool allows us to simulate performance experienced by a UAV at a given point. Next step is to design the context-aware reliable adaptive communication achieving high throughput and high reliability communication with the ground and between multiple drones
M. Azari, F. Rosas, C. Kwang-Cheng, S. Pollin, “Ultra Reliable UAV Communication Using Altitude and Cooperation Diversity” IEEE Transactions on Communications; 2018
M. Azari, H. Sallouha, A. Chiumento, S. Rajendran, E. Vinogradov, S. Pollin, ”Key Technologies and System Trade-offs for Detection and Localization of Amateur Drones,” IEEE Communications Magazine; 2018
The ESAT PSI group works on innovative solutions for image and video analysis, including traditional problems such as object detection, pose estimation and segmentation as well as novel applications such as image generation, vision-based control, lifelong learning and self-driving cars.
For OmniDrone, our mission is to increase the level of autonomy of drones. So far, we worked on vision-based control in a virtual environment; reducing the difference between simulated and real world data for better transfer to the real world, and learning-based 6DoF pose estimation in a known environment.
From Pixels to Actions
Virtual environments, such as GTA5 (see above), provide an interesting playground for the development of vision-based control algorithms, without actual crashes or expensive data collection phase. We trained an end-to-end neural network to predict a car’s steering actions on a highway based on images taken from a single car-mounted camera. By using simulation data we are able to train networks that have a performance comparable to networks trained on real-life datasets. More importantly, we demonstrated that the standard metrics that are used to evaluate networks do not necessarily accurately reflect a system’s driving behavior: we showed that a promising confusion matrix may result in poor driving behavior while a very ill-looking confusion matrix may result in good driving behavior.
From Virtual to Real
Training and testing in a virtual environment is nice to get some feeling of what may or may not work, and has the added benefit that a lot of metadata (such as depth and semantic segmentation ground truth) are available - but in the end, we want to have a model that can also be applied in the real world.
To facilitate the transfer of a model trained on virtual data to real world conditions, we trained a generator that can ‘translate’ a virtual image (top row in the figure below) into a more realistic one (bottom row), using CycleGAN (a generative adversarial Network trained with a cyclic consistency loss). This can be trained without the need for paired training data.
6DoF pose estimation
Towards autonomous navigation and control, we are investigating a data-driven, learning based alternative for geometry based methods such as SLAM. As a first step in this direction, we developed a model that can estimate the camera pose and orientation, from a single 2D image taken in a known environment. To this end, we extended the PoseNet architecture with an LSTM layer to capture temporal dynamics. A more in-depth analysis on various bench-mark sequences, showed that methods typically have problems generalizing to unseen poses. With further data augmentation, this can to a large extent be overcome. In a next phase, we plan to extend this framework to omni-directional input and see how that affects the results.
J. Heylen, S. Iven, B. De Brabandere, J. Oramas, L. Van Gool and T. Tuytelaars, “From Pixels to Actions: Learning to Drive a Car with Deep Neural Networks,” Winter Conference on Applications in Computer Vision, WACV, 2018.
The EAVISE group aims to exploit depth information in order to develop a robust real-time pedestrian detection system.
Starting from the YOLO detector, a state-of-the-art single-pass network, we make several improvements towards a robust pedestrian detector that runs real-time on an embedded platform, such as the Jetson TX2.
You Only Look Once (YOLO)
We reimplemented the YOLO architecture and training routines in the open-source framework, PyTorch. This gives us a more modular approach to construct detection pipelines, allowing for a faster research and development cycle. The resulting library - Lightnet - achieves comparable results with the official darknet implementation, whilst being almost 2x faster during inference on both a desktop GPU and the embedded Jetson TX2 platform. More information can be found at https://www.gitlab.com/eavise/lightnet.
Using Depth Information
We investigated the benefit of using depth information on top of normal RGB for a pedestrian detection. We implemented different variants of the YOLO network architecture, each fusing depth at different layers of our network. Our experiments show that midway fusion performs the best, outperforming a regular RGB detector substantially in accuracy. Moreover, we proved that our fusion network is better at detecting individuals in a crowd, by demonstrating that it has both a better localization of pedestrians and is better at handling occluded persons. The resulting network is computationally efficient and achieves real-time performance on both desktop and embedded GPUs. This research has been submitted for the AVSS conference in November and is currently being reviewed. In future, we will preprocess the high resolution input images to only pass on interesting regions to our detection network.
We are focusing on improving the speed of our detection pipeline, aiming to achieve a real-time detector on the Jetson TX2.
Building on the work of Google with their MobileNets, we are researching the impact of using such convolutions in our own YOLO detector. The speed-accuracy tradeoff of the different networks is being investigated as well as ways to transfer learn weights from a regular YOLO network to the mobile variants.
While mobile convolutions provide a speedup on their own, using specialized frameworks like TensorRT will allow these networks to be even faster on the Jetson TX2 platform.
This work also ties in with research from MICAS, who will use these convolutions to implement more energy efficient and high-throughput embedded neural networks on their hardware platforms.
Together with using depth information, this method will allow a real-time processing of the 360° input images for pedestrian detection on an embedded device.
T. Ophoff, K. Van Beeck, T. Goedemé, “Improving Real-Time Pedestrian Detectors with Depth Fusion,” International Conference on Advanced Video and Signal-based Surveillance (AVSS), 2018
The ESAT MICAS group targets innovations towards more energy efficient and high-throughput embedded deep neural network inference. This is pursued through new HW architectures and HW-algorithm co-optimization. Three different techniques are currently under development, and will be combined towards efficient dynamic neural network execution
Stepping away from layer-by-layer network computation, towards joint multi-layer execution for reduced feature map storage
By interleaving computations from different layers, some intermediate feature maps do not have to be stored, but intermediate data is directly consumed by the next layer. This is especially beneficial in networks such as MobileNet V2, where wide layers are interleaved with smaller layers. We are currently developing a an algorithmic framework to benchmark energy and throughput gains, as well as a dedicated hardware architecture to maximally exploit this opportunity.
Dynamic network execution
Instead of training one large neural network for a complex task, we can train a series of concatenated networks of increasing complexity. This allows to detect easy-to-detect objects with only few operations and do an early termination of the neural network computations. On the other hand, the strategy still achieves good accuracy for difficult data instances by using the complete networks. The challenge is to find the optimal number of stages, and the best characteristics for every stage. We developed a framework to find this optimal network hierarchy.
Efficient network computation
The majority of the embedded device's energy consumption and clock cycle latency goes to fetching network parameters in and out of memory. Compacting the neural network as much as possible is hence of crucial importance to allow embedded network execution. We assessed and improved state-of-the-art network pruning techniques. More specifically, we compared the performance of network compaction using Singular Value Decomposition, pruning and parameter clustering (Deep Compression). We finally proposed a hybrid of all three techniques to achieve an additional up to 5x improvement of network compression. This work has been accepted to the EMDL conference, to be presented in June 2018.
K. Goetschalckx, B. Moons, S. Lauwereins, M. Andraud and M. Verhelst, "Optimized Hierarchical Cascaded Processing," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
K. Goetschalckx, B. Moons, P. Wambacq, and M. Verhelst, “Efficiently Combining SVD, Pruning, Clustering and Retraining for Enhanced Neural Network Compression,” In EMDL’18
B. Moons, K. Goetschalckx, M. Van Berckelaer, M. Verhelst, “Minimum Energy Quantized Neural Networks”. Asilomar Conference on Circuits, Systems and Computers, 2017.
Centre for IT & IP Law of KU Leuven prepared the Initial Report on Regulation which is very interesting and important for any company working. The report can be found at the internal web-site.