The-Learning-Documentation-Project/docs/ResearchResources/arxiv/ArxivEmbedded.txt at master · gorlapraveen/The-Learning-Documentation-Project · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
If there are any errors
please Abort, and run `arxiv_required` for required package installation, and start again
Please wait while we phrase the requested information from global arxiv[arxiv.org] servers
------------>
---------------------------->
------------------------------------------------------>

GPU based Parallel Optimization for Real Time Panoramic Video Stitching (Chengyao Du - 4 October, 2018)
The performance of the system accomplished in the paper is 29.2 times than that of the former embedded one, while the power dissipation is reduced to 10W.
Link: https://arxiv.org/abs/1810.03988
====================================================
Extended Bit-Plane Compression for Convolutional Neural Network Accelerators (Lukas Cavigelli - 1 October, 2018)
After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers. We show that an average compression ratio of 4.4x relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic.
Link: https://arxiv.org/abs/1810.03979
====================================================
End-to-End Text Classification via Image-based Embedding using Character-level Networks (Shunsuke Kitada - 10 October, 2018)
Through various experiments, we found and confirmed that our CE-CLCNN captured closely embedded features for visually and semantically similar characters and achieves state-of-the-art results on several open document classification tasks
Link: https://arxiv.org/abs/1810.03595
====================================================
FingerVision Tactile Sensor Design and Slip Detection Using Convolutional LSTM Network (Yazhan Zhang - 5 October, 2018)
This sensor is composed of soft skin with embedded marker array bonded to rigid frame, and a web camera with a fisheye lens. The data collection process takes advantage of the human sense of slip, during which human hand holds 12 daily objects, interacts with sensor skin and labels data with a slip or non-slip identity based on human feeling of slip. Our slip classification framework performs high accuracy of 97.62% on the test dataset
Link: https://arxiv.org/abs/1810.02653
====================================================
Deep Learning Approaches for Understanding Simple Speech Commands (Roman A. Solovyev - 4 October, 2018)
Automatic classification of sound commands is becoming increasingly important, especially for mobile and embedded devices. As a result we achieved good classification accuracy that allowed us to finish the challenge on 8-th place among 1315 teams.
Link: https://arxiv.org/abs/1810.02364
====================================================
RGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter (Max Schwarz - 1 October, 2018)
A large variety of objects must be perceived in complex scenes, where they are partially occluded and embedded among many distractors, often in restricted spaces. We evaluate our approach on two challenging data sets: one captured for the Amazon Picking Challenge 2016, where our team NimbRo came in second in the Stowing and third in the Picking task, and one captured in disaster-response scenarios
Link: https://arxiv.org/abs/1810.00818
====================================================
High Performance Zero-Memory Overhead Direct Convolutions (Jiyuan Zhang - 19 September, 2018)
In this paper, we demonstrate that direct convolution, when implemented correctly, eliminates all memory overhead, and yields performance that is between 10% to 400% times better than existing high performance implementations of convolution layers on conventional and embedded CPU architectures
Link: https://arxiv.org/abs/1809.10170
====================================================
Almost optimal algorithms for diameter-optimally augmenting trees (Davide BilÃ² - 2 October, 2018)
Previously, the problem was solved in $O(n^2 \log^3 n)$ time for general weights [Oh and Ahn, ISAAC 2016], in $O(n^2 \log n)$ time for trees embedded in a metric space [GroÃe et al., {\tt arXiv:1607.05547}], and in $O(n \log n)$ time for paths embedded in a metric space [Wang, WADS 2017]. Furthermore, a $(1+\varepsilon)$-approximation algorithm running in $O(n+1/\varepsilon^{3})$ has been designed for paths embedded in $\mathbb{R}^d$, for constant values of $d$ [GroÃe et al., ICALP 2015].
Link: https://arxiv.org/abs/1809.08822
====================================================
The Key Concepts of Ethics of Artificial Intelligence - A Keyword based Systematic Mapping Study (Ville Vakkuri - 19 September, 2018)
The growing influence and decision-making capacities of Autonomous systems and Artificial Intelligence in our lives force us to consider the values embedded in these systems. Out of 1062 papers retrieved SMS discovered 37 re-occurring keywords in 83 academic papers
Link: https://arxiv.org/abs/1809.07027
====================================================
FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices (Shuochao Yao - 18 September, 2018)
Hence, we propose a novel framework, called FastDeepIoT, that uncovers the non-linear relation between neural network structure and execution time, then exploits that understanding to find network configurations that significantly improve the trade-off between execution time and accuracy on mobile and embedded devices. We evaluate FastDeepIoT using three different sensing-related tasks on two mobile devices: Nexus 5 and Galaxy Nexus. FastDeepIoT further reduces the neural network execution time by $48\%$ to $78\%$ and energy consumption by $37\%$ to $69\%$ compared with the state-of-the-art compression algorithms.
Link: https://arxiv.org/abs/1809.06970
====================================================
A Generic Multi-modal Dynamic Gesture Recognition System using Machine Learning (Gautham Krishna G - 16 September, 2018)
Moreover, this system was found to run on a low-cost embedded platform - Raspberry Pi Zero (USD 5), making it economically viable.
Link: https://arxiv.org/abs/1809.05839
====================================================
Canonical and Compact Point Cloud Representation for Shape Classification (Kent Fujiwara - 13 September, 2018)
We demonstrate the descriptiveness of the instance-wise, shape-embedded network parameters by using them to classify shapes in $3$D datasets
Link: https://arxiv.org/abs/1809.04820
====================================================
Temporal-Spatial Mapping for Action Recognition (Xiaolin Song - 10 September, 2018)
With each row being the vectorized feature representation of a frame, the temporal-spatial features are compactly represented, while the temporal dynamic evolution is also well embedded. The experiment results show that the proposed scheme achieves the state-of-the-art performance, with 4.2% accuracy gain over Temporal Segment Network (TSN), a competing baseline method, on the challenging human action benchmark dataset HMDB51.
Link: https://arxiv.org/abs/1809.03669
====================================================
Learning to Solve NP-Complete Problems - A Graph Neural Network for the Decision TSP (Marcelo O. R. Prates - 7 September, 2018)
Our model is trained to function as an effective message-passing algorithm in which edges (embedded with their weights) communicate with vertices for a number of iterations after which the model is asked to decide whether a route with cost $<C$ exists. We were able to obtain $80\%$ accuracy training with $-2\%,+2\%$ deviations, and the same trained model can generalize for more relaxed deviations with increasing performance
Link: https://arxiv.org/abs/1809.02721
====================================================
CIDPro: Custom Instructions for Dynamic Program Diversification (Thinh Hung Pham - 4 September, 2018)
Timing side-channel attacks pose a major threat to embedded systems due to their ease of accessibility. Experimental results show that our solution can achieve 80% and 86% timing side-channel capacity reduction for two benchmarks with an acceptable performance overhead compared to existing solutions. In addition, the proposed method incurs only a negligible hardware area overhead of 1% slices of the entire RISC-V system.
Link: https://arxiv.org/abs/1809.01221
====================================================
The MISRA C Coding Standard and its Role in the Development and Analysis of Safety- and Security-Critical Embedded Software (Roberto Bagnara - 4 September, 2018)
The MISRA project started in 1990 with the mission of providing world-leading best practice guidelines for the safe and secure application of both embedded control systems and standalone software
Link: https://arxiv.org/abs/1809.00821
====================================================
Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation (Cong Liao - 30 August, 2018)
The main goal of the adversary performing such attack is to generate and inject a backdoor into a deep learning model that can be triggered to recognize certain embedded patterns with a target label of the attacker's choice. We carry out extensive experimental evaluations under various assumptions on the adversary model, and demonstrate that such attacks can be effective and achieve a high attack success rate (above $90\%$) at a small cost of model accuracy loss (below $1\%$) with a small injection rate (around $1\%$), even under the weakest assumption wherein the adversary has no knowledge either of the original training data or the classifier model.
Link: https://arxiv.org/abs/1808.10307
====================================================
Deep Learning for Stress Field Prediction Using Convolutional Neural Networks (Zhenguo Nie - 27 August, 2018)
One is Feature Representation embedded Convolutional Neural Network (FR-CNN) with a single input channel, and the other is Squeeze-and-Excitation Residual network modules embedded Fully Convolutional Neural network (SE-Res-FCN) with multiple input channels. Mean relative error (MRE) of the SE-Res-FCN model is about 0.25% with respect to the average ground truth
Link: https://arxiv.org/abs/1808.08914
====================================================
Three Efficient, Low-Complexity Algorithms for Automatic Color Trapping (Haiyin Wang - 21 August, 2018)
Our algorithms are designed for software or embedded firmware implementation. The first LUT-based algorithm corrects all registration errors of one pixel in extent and reduces several cases of misregistration errors of two pixels in extent using only 727 Kbytes of storage space. The second LUT-based algorithm corrects all types of misregistration errors of up to two pixels in extent using 3.7 Mbytes of storage space. The third algorithm is a hybrid one that combines LUTs and feature extraction to minimize the storage requirements (724 Kbytes) while still correcting all misregistration errors of up to two pixels in extent
Link: https://arxiv.org/abs/1808.07096
====================================================
Energy Efficient Service Distribution in Internet of Things (Barzan Yosuf - 18 August, 2018)
The great advancement in computational power of embedded technologies have enabled the integration of these devices into the IoT network, allowing for cloud functionalities to be extended near to the source of data. Our results show that, introducing local computation at the IoT layer can bring up to 90% power savings compared with general purpose servers in a central cloud.
Link: https://arxiv.org/abs/1808.06120
====================================================
CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams (Lukas Cavigelli - 15 August, 2018)
Applying these networks to images demands a high computational effort and pushes the use of state-of-the-art networks on real-time video data out of reach of embedded platforms. This optimized inference procedure resulted in an average speed-up of 9.1x over cuDNN on the Tegra X2 platform at a negligible accuracy loss of <0.1% and no retraining of the network for a semantic segmentation application. Similarly, an average speed-up of 7.0x has been achieved for a pose detection DNN on static camera video surveillance data. These throughput gains combined with a lower power consumption result in an energy efficiency of 511 GOp/s/W compared to 70 GOp/s/W for the baseline.
Link: https://arxiv.org/abs/1808.05488
====================================================
DNN Feature Map Compression using Learned Representation over GF(2) (Denis A. Gudovskiy - 15 August, 2018)
Unlike previous works, the proposed method is based on converting fixed-point activations into vectors over the smallest GF(2) finite field followed by nonlinear dimensionality reduction (NDR) layers embedded into a DNN. Compared to prior approaches, the conducted experiments show a factor of 2 decrease in memory requirements with minor degradation in accuracy while adding only bitwise computations.
Link: https://arxiv.org/abs/1808.05285
====================================================
Convolutional Neural Networks on 3D Surfaces Using Parallel Frames (Hao Pan - 14 August, 2018)
2D images) to curved surfaces embedded in 3D Euclidean space that are discretized as irregular meshes and widely used to represent geometric data in Computer Vision and Graphics. We define surface convolution on tangent spaces of a surface domain, where the convolution has two desirable properties: 1) the distortion of surface domain signals is locally minimal when being projected to the tangent space, and 2) the translation equi-variance property holds locally, by aligning tangent spaces with the canonical parallel transport that preserves metric
Link: https://arxiv.org/abs/1808.04952
====================================================
Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA (Junsong Wang - 31 July, 2018)
Results show that our design can deliver very high performance peaking at 10.3 TOPS and classify up to 325.3 image/s/watt while running large-scale neural networks for less than 5W using embedded FPGA
Link: https://arxiv.org/abs/1808.04311
====================================================
D-RaNGe: Violating DRAM Timing Constraints for High-Throughput True Random Number Generation using Commodity DRAM Devices (Jeremie S. Kim - 13 August, 2018)
DRAM provides a promising substrate for generating random numbers due to three major reasons: 1) DRAM is composed of a large number of cells that are susceptible to many different failure modes that can be exploited for random number generation, 2) the high-bandwidth DRAM interface provides support for high-throughput random number generation, and 3) DRAM is prevalent in many commodity computing systems today, ranging from embedded devices to high-performance computing platforms.
Link: https://arxiv.org/abs/1808.04286
====================================================
Designing Adaptive Neural Networks for Energy-Constrained Image Classification (Dimitrios Stamoulis - 6 August, 2018)
As convolutional neural networks (CNNs) enable state-of-the-art computer vision applications, their high energy consumption has emerged as a key impediment to their deployment on embedded and mobile devices
Link: https://arxiv.org/abs/1808.01550
====================================================
GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware (Ananda Samajdar - 13 September, 2018)
We ran GENESYS with a suite of environments from OpenAI gym and observed 2-5 orders of magnitude higher energy-efficiency over state-of-the-art embedded and desktop CPU and GPU systems.
Link: https://arxiv.org/abs/1808.01363
====================================================
Efficient texture retrieval using multiscale local extrema descriptors and covariance embedding (Minh-Tan Pham - 3 August, 2018)
All feature vectors are finally embedded into a covariance matrix which will be exploited for dissimilarity measurement within retrieval task. In particular, the proposed framework provides highly competitive retrieval rate for several texture databases including 94.95% for MIT Vistex, 79.87% for Stex, 76.15% for Outex TC-00013 and 89.74% for USPtex.
Link: https://arxiv.org/abs/1808.01124
====================================================
Small World Model based on a Sphere Homeomorphic Geometry (Santiago Viertel - 2 August, 2018)
We define a small world model over the octahedron surface and relate its distances with those of embedded spheres, preserving constant bounded distortions. The probability of creating cycles of size three (C3) with long-range edges in a vertex is $\mathcal{O}\left(\log^{-1}n\right)$
Link: https://arxiv.org/abs/1808.01028
====================================================
Binarized Convolutional Neural Networks for Efficient Inference on GPUs (Mir Khan - 1 August, 2018)
However, they are computationally expensive,which can make their feasible mplementation on embedded and low-power devices difficult. In binarized networks, all weights and intermediate computations between layers are quantized to +1 and -1, allowing multiplications and additions to be replaced with bit-wise operations between 32-bit words. Our implementation achieves a maximum speed up of 7. 4X with only 4.4% loss in accuracy compared to a reference implementation.
Link: https://arxiv.org/abs/1808.00209
====================================================
FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software (Jin Hee Kim - 27 July, 2018)
A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.
Link: https://arxiv.org/abs/1807.10695
====================================================
Premise selection with neural networks and distributed representation of features (Andrzej StanisÅaw Kucik - 26 July, 2018)
To further improve the performance of the model, we use dimensionality reduction technique, to replace long and sparse signature vectors with their compact and dense embedded versions. This allows us to use 512-dimensional embeddings for conjecture-axiom pairs, containing enough information about the original statements to reach the accuracy of 76.45% in premise selection task, only with simple two-layer densely connected neural networks.
Link: https://arxiv.org/abs/1807.10268
====================================================
Reverse Attention for Salient Object Detection (Shuhan Chen - 25 July, 2018)
However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).
Link: https://arxiv.org/abs/1807.09940
====================================================
Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy (Zhezhi He - 20 July, 2018)
However, its enormous model size and massive computation cost have become the main obstacle for deployment of such powerful algorithm in the low power and resource-limited embedded systems. -1, 0, +1), with the objectives to greatly reduce model size, computation cost and accuracy degradation caused by the model compression. With about 16x model compression rate, our ternarized ResNet-32/44/56 could outperform full-precision counterparts by 0.12%, 0.24% and 0.18% on CIFAR- 10 dataset. We also test our ternarization method with AlexNet and ResNet-18 on ImageNet dataset, which both achieve the best top-1 accuracy compared to recent similar works, with the same 16x compression rate. If further incorporating our residual expansion method, compared to the full-precision counterpart, our ternarized ResNet-18 even improves the top-5 accuracy by 0.61% and merely degrades the top-1 accuracy only by 0.42% for the ImageNet dataset, with 8x model compression rate. It outperforms the recent ABC-Net by 1.03% in top-1 accuracy and 1.78% in top-5 accuracy, with around 1.25x higher compression rate and more than 6x computation reduction due to the weight sparsity.
Link: https://arxiv.org/abs/1807.07948
====================================================
Development of SageMath filter for Moodle (Yevhenii O. Modlo - 14 July, 2018)
An effective tool for embedded a computer mathematics systems SageMath models into Moodle is a text filter. 3
Link: https://arxiv.org/abs/1807.06924
====================================================
Real-time on-board obstacle avoidance for UAVs based on embedded stereo vision (Boitumelo Ruf - 17 July, 2018)
Hence, we aimed at using high-level synthesis (HLS) for porting our algorithms, which are written in C/C++, to the embedded FPGA. We evaluated our implementation of the disparity estimation on the KITTI Stereo 2015 benchmark
Link: https://arxiv.org/abs/1807.06271
====================================================
A Multimodal Approach to Predict Social Media Popularity (Mayank Meghawat - 16 July, 2018)
Multimodal information embedded in such posts could be useful in predicting their popularity. Specifically, we augment the SMPT1 dataset for social media prediction in ACM Multimedia grand challenge 2017 with image content, titles, descriptions, and tags
Link: https://arxiv.org/abs/1807.05959
====================================================
Computing Height Persistence and Homology Generators in $\mathbb{R}^3$ Efficiently (Tamal K. Dey - 10 July, 2018)
Recently it has been shown that computing the dimension of the first homology group $H_1(K)$ of a simplicial $2$-complex $K$ embedded linearly in $\mathbb{R}^4$ is as hard as computing the rank of a sparse $0-1$ matrix
Link: https://arxiv.org/abs/1807.03655
====================================================
Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization (Jan WÃ¶hlke - 2 July, 2018)
Recently, a hybrid approach has embedded a kinematic layer into the deep learning structure in such a way that the pose estimates obey the physical constraints of human hand kinematics. The effectiveness and limitations of our proposed approach are extensively evaluated on the Hands 2017 challenge dataset and the NYU dataset.
Link: https://arxiv.org/abs/1807.00898
====================================================
Compiler Phase Ordering as an Orthogonal Approach for Reducing Energy Consumption (Ricardo Nobre - 2 July, 2018)
Embedded systems often rely on a battery and besides energy also have power dissipation limitations, while HPC centers have a growing concern with electricity and cooling costs. We use our phase selection and ordering framework to explore the design space in the context of a Clang+LLVM compiler targeting a multicore ARM processor in an ODROID board and a dual x86 desktop representative of a node in a Supercomputing center. Our experiments with a set of representative kernels show that there we can reduce energy consumption by up to 24% and that some of these improvements can only be partially explained by improvements to execution time
Link: https://arxiv.org/abs/1807.00638
====================================================
Towards real-time unsupervised monocular depth estimation on CPU (Matteo Poggi - 31 July, 2018)
To tackle this issue, in this paper we propose a novel architecture capable to quickly infer an accurate depth map on a CPU, even of an embedded system, using a pyramid of features extracted from a single input image. Extensive experimental results on the KITTI dataset show that compared to the top performing approach our network has similar accuracy but a much lower complexity (about 6% of parameters) enabling to infer a depth map for a KITTI image in about 1.7 s on the Raspberry Pi 3 and at more than 8 Hz on a standard CPU. Moreover, by trading accuracy for efficiency, our network allows to infer maps at about 2 Hz and 40 Hz respectively, still being more accurate than most state-of-the-art slower methods
Link: https://arxiv.org/abs/1806.11430
====================================================
Compact Deep Neural Networks for Computationally Efficient Gesture Classification From Electromyography Signals (Adam Hartwell - 3 July, 2018)
However, deep neural networks typically have the drawback of large numbers of parameters, requiring large training data sets and powerful hardware not suited to embedded systems. The performance of the compact deep net is benchmarked against an SVM and compared to other contemporary architectures across 10 human subjects, comparing Myo and Delsys Trigno electrode sets. The accuracy of the compact deep net was found to be 84.2 +/- 0.06% versus 70.5 +/- 0.07% for the SVM on the Myo, and 80.3+/- 0.07% versus 67.8 +/- 0.09% for the Delsys system, demonstrating the superior effectiveness of the proposed compact network, which had just 5,889 parameters - orders of magnitude less than some contemporary alternatives in this domain while maintaining better performance.
Link: https://arxiv.org/abs/1806.08641
====================================================
Par4Sim -- Adaptive Paraphrasing for Text Simplification (Seid Muhie Yimam - 21 June, 2018)
Our experimental result shows that, over a period of time, the performance of the embedded paraphrase ranking model increases steadily improving from a score of 62.88% up to 75.70% based on the NDCG@10 evaluation metrics
Link: https://arxiv.org/abs/1806.08309
====================================================
A model-driven approach for a new generation of adaptive libraries (Marco Cianfriglia - 19 June, 2018)
We present experimental results for two GPU architectures and show significant performance gains of up to 3x (on a high-end NVIDIA Pascal GPU) and 2.5x (on an embedded ARM Mali GPU) when compared to a traditionally optimized library.
Link: https://arxiv.org/abs/1806.07060
====================================================
Manifold Learning & Stacked Sparse Autoencoder for Robust Breast Cancer Classification from Histopathological Images (Sawon Pratiher - 18 June, 2018)
In this contribution, HI are modelled as spatially-progressive lower dimensional dynamical patterns embedded in the higher dimensional HI space. Classification accuracy of 99.4% obtained on publicly available BreaKHis dataset outperforms the state-of-the-art methods and validates it's adequacy as an adjunct tool to clinicians in confirming their diagnosis
Link: https://arxiv.org/abs/1806.06876
====================================================
Mitigating Botnet Attack Using Encapsulated Detection Mechanism (EDM) (Maxwell Scale Uwadia Osagie - 16 June, 2018)
Botnet as it is popularly called became fashionable in recent times owing to it embedded force on network servers. Botnet has an exponential growth of about 170, 000 within network server and client infrastructures per day. The networking environment on monthly basis battle over 5 million bots
Link: https://arxiv.org/abs/1806.06275
====================================================
Ego-Lane Analysis System (ELAS): Dataset and Algorithms (Rodrigo F. Berriel - 15 June, 2018)
Decreasing costs of vision sensors and advances in embedded hardware boosted lane related research detection, estimation, and tracking in the past two decades. To validate ELAS and cover the lack of lane datasets in the literature, a new dataset with more than 20 different scenes (in more than 15,000 frames) and considering a variety of scenarios (urban road, highways, traffic, shadows, etc.) was created
Link: https://arxiv.org/abs/1806.05984
====================================================
Comparing Two Generations of Embedded GPUs Running a Feature Detection Algorithm (Max Danielsson - 13 June, 2018)
We compare two generations of embedded GPUs for mobile devices when running a state-of-the-art feature detection algorithm, i.e., Harris-Hessian/FREAK
Link: https://arxiv.org/abs/1806.04859
====================================================
A Graph Model with Indirect Co-location Links (Md Shahzamal - 26 July, 2018)
Graph models are widely used to analyse diffusion processes embedded in social contacts and to develop applications. We analyze 60 million location updates made by 2 million users from a social networking application to characterize the graph properties, including the space-time correlations and its time evolving characteristics, such as bursty or ongoing behaviors. The generated synthetic graph reproduces diffusion dynamics of a realistic contact graph, and reduces the prediction error by up to 82% when compare to other contact graph models demonstrating its potential for forecasting epidemic spread.
Link: https://arxiv.org/abs/1806.03386
====================================================
PID2018 Benchmark Challenge: Model Predictive Control With Conditional Integral Control Using A General Purpose Optimal Control Problem Solver - RIOTS (Sina Dehghan - 5 June, 2018)
A conditional integral (CI) compensator is embedded in the controller to compensate for the small steady state errors. Our solution is introduced in detail in this paper and our final results using the overall relative index, $J$, are 0.2 over C1 and 0.3 over C2, respectively. In other words, we achieved 80% improvement over C1 and 70% improvement over C2
Link: https://arxiv.org/abs/1806.01976
====================================================
Faster Dual-Key Stealth Address for Blockchain-Based Internet of Things Systems (Xinxin Fan - 4 June, 2018)
Our theoretical analysis as well as the extensive experiments on an embedded computing platform demonstrate that DKSAP-IoT is able to reduce the computational overhead by at least 50% when compared to the state-of-the-art scheme, thereby paving the way for its application to blockchain-based IoT systems.
Link: https://arxiv.org/abs/1806.00951
====================================================
Fast Rigid 3D Registration Solution: A Simple Method Free of SVD and Eigen-Decomposition (Jin Wu - 2 July, 2018)
The simple framework provides very easy approach of integer-implementation on embedded platforms. The final results indicate that the proposed algorithm is accurate, robust and owns over $60\% \sim 80\%$ less computation time than representatives
Link: https://arxiv.org/abs/1806.00627
====================================================
A novel channel pruning method for deep neural network compression (Yiming Hu - 29 May, 2018)
However, it is still a big challenge to deploy these deep models on resource-constrained embedded devices such as mobile robots, smart phones and so on. On the CIFAR-10 and SVHN datasets, the pruned VGGNet achieves better performance than the original model with 8 times parameters compression and 3 times FLOPs reduction.
Link: https://arxiv.org/abs/1805.11394
====================================================
Dynamicity and Durability in Scalable Visual Instance Search (Herwig Lejsek - 25 May, 2018)
We present a detailed performance evaluation of the transactional NV-tree: (i) We show that the insertion throughput is excellent despite the overhead for enforcing the ACID properties; (ii) We also show that this transactional index is truly scalable using a standard image benchmark embedded in collections of up to 28.5 billion high-dimensional vectors; the largest single-server evaluations reported in the literature.
Link: https://arxiv.org/abs/1805.10942
====================================================
Convolutional neural network compression for natural language processing (Krzysztof WrÃ³bel - 28 May, 2018)
The artificial intelligence systems (like humanoid robots) are very often based on embedded systems with constraints on memory, power consumption etc. Additionally, significant memory footprint reduction was achieved (from 85% up to 93%).
Link: https://arxiv.org/abs/1805.10796
====================================================
Compact and Computationally Efficient Representation of Deep Neural Networks (Simon Wiedemann - 27 May, 2018)
For instance, deep neural networks such as VGG-16 require up to 15 giga-operations in order to perform the dot products present in a single forward pass, which results in significant energy consumption and thus limits their use in resource-limited environments, e.g., on embedded devices or smartphones. We experimentally show that we are able to attain up to x15 compression ratios, x1.7 speed ups and x20 energy savings when we lossless convert state-of-the-art networks such as AlexNet, VGG-16, ResNet152 and DenseNet into the new data structures.
Link: https://arxiv.org/abs/1805.10692
====================================================
Fast Symbolic 3D Registration Solution (Jin Wu - 12 May, 2018)
Experimental results show that the proposed solver does not loose accuracy and robustness but improves the execution speed to a large extent by almost \%50 to \%80, on both personal computer and embedded processor.
Link: https://arxiv.org/abs/1805.08703
====================================================
SqueezeJet: High-level Synthesis Accelerator Design for Deep Convolutional Neural Networks (Panagiotis G. Mousouliotis - 6 May, 2018)
Results show that SqueezeJet can achieve 15.16 times speed-up compared to the software implementation of SqueezeNet running on an embedded mobile processor with less than 1% drop in top-5 accuracy.
Link: https://arxiv.org/abs/1805.08695
====================================================
Speeding-up Age Estimation in Intelligent Demographics System via Network Optimization (Zhenzhen Hui - 21 May, 2018)
Second, we optimize the age estimation algorithm based on CNNs with label distribution and K-L divergence distance embedded in the fog layer and evaluate the model on the latest wild aging dataset. Experimental results demonstrate that: 1. our system collects the demographics data dynamically at far-distance without contact, and makes the city population analysis automatically; and 2
Link: https://arxiv.org/abs/1805.08373
====================================================
Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines (Sean O. Settle - 21 May, 2018)
These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made to maximize the computational efficiency given limited hardware and energy resources and, as a result, inferencing with reduced precision has emerged as a viable alternative to the IEEE 754 Standard for Floating-Point Arithmetic
Link: https://arxiv.org/abs/1805.07941
====================================================
Robust curvelet domain watermarking technique that preserves cleanness of high quality images (Wook-Hyung Kim - 16 May, 2018)
The embedded information provides proof of authorship and facilitates tracking illegal distribution, etc. The proposed method showed very good results of 57.65 dB peak signal-to-noise ratio in fidelity tests, and mean opinion score showed that images treated with the proposed method were hardly distinguishable from the originals
Link: https://arxiv.org/abs/1805.06181
====================================================
Wearable Audio and IMU Based Shot Detection in Racquet Sports (Manish Sharma - 14 May, 2018)
In our paper, we propose a novel, computationally inexpensive and real-time system for shot detection in table tennis, based on fusion of Inertial Measurement Unit (IMU) and audio sensor data embedded in a wrist-worn wearable. The system builds upon our presented methodology for synchronizing IMU and audio sensor input in time using detected shots and achieves 95.6% accuracy
Link: https://arxiv.org/abs/1805.05456
====================================================
Hu-Fu: Hardware and Software Collaborative Attack Framework against Neural Networks (Wenshuo Li - 14 May, 2018)
However, the robustness of the embedded DL system might be harmed by inserting hardware/software Trojans into the accelerator and the neural network model, since the accelerator and deploy tool (or neural network model) are usually provided by third-party companies. We test our attack framework for image classification and face recognition tasks, and get attack success rate of 92.6% and 100% on CIFAR10 and YouTube Faces, respectively, while keeping almost the same accuracy as the unattacked model in the normal mode
Link: https://arxiv.org/abs/1805.05098
====================================================
Exact size counting in uniform population protocols in nearly logarithmic time (David Doty - 13 May, 2018)
Crucially, unlike most published protocols with $Ï(1)$ states, our protocol is _uniform_: it uses the same transition algorithm for any population size, so does not need an estimate of the population size to be embedded into the algorithm. A sub-protocol is the first uniform sublinear-time leader election population protocol, taking $O(\log n \log \log n)$ time and $O(n^{18})$ states. The state complexity of both the counting and leader election protocols can be reduced to $O(n^{30})$ and $O(n^{9})$ respectively, while increasing the time to $O(\log^2 n)$.
Link: https://arxiv.org/abs/1805.04832
====================================================
ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time (Rudra P K Poudel - 19 July, 2018)
State-of-the-art methods are, however, not directly transferable to real-time applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We analyse our network in a thorough ablation study and present results on the Cityscapes dataset, achieving 66.1% accuracy at 18.3 frames per second at full (1024x2048) resolution (23.2 fps with pipelined computations for streamed data).
Link: https://arxiv.org/abs/1805.04554
====================================================
Evaluation of Game Templates to support Programming Activities in Schools (Bernadette Spieler - 31 August, 2018)
During the project game genres such as adventure, action, and quiz, as well as rewards or victory point mechanisms, have been embedded into different subjects, e.g., science, mathematics, and arts. The insights gained during the class hours were used to generate 13 game templates, which are integrated in Create@School (a new version of the Pocket Code app which targets schools)
Link: https://arxiv.org/abs/1805.04517
====================================================
Adaptive Selection of Deep Learning Models on Embedded Systems (Ben Taylor - 11 May, 2018)
We apply our approach to the image classification task and evaluate it on a Jetson TX2 embedded deep learning platform using the ImageNet ILSVRC 2012 validation dataset. Experimental results show that our approach achieves a 7.52% improvement in inference accuracy, and a 1.8x reduction in inference time over the most-capable single DNN model.
Link: https://arxiv.org/abs/1805.04252
====================================================
Convex Programming Based Spectral Clustering (Tomohiko Mizutani - 11 May, 2018)
The nodes with the largest degree in each cluster may be found by computing an enclosing ellipsoid for embedded nodes in real space, and the clusters may be identified by using those nodes. at COLT 2015, is satisfied
Link: https://arxiv.org/abs/1805.04246
====================================================
Combo Loss: Handling Input and Output Imbalance in Multi-Organ Segmentation (Saeid Asgari Taghanaki - 24 September, 2018)
The input imbalance refers to the class-imbalance in the input training samples (i.e., small foreground objects embedded in an abundance of background voxels, as well as organs of varying sizes). We evaluated the proposed loss function on three datasets: whole body positron emission tomography (PET) scans with 5 target organs, magnetic resonance imaging (MRI) prostate scans, and ultrasound echocardigraphy images with a single target organ i.e., left ventricular
Link: https://arxiv.org/abs/1805.02798
====================================================
Interpretable Fully Convolutional Classification of Intrapapillary Capillary Loops for Real-Time Detection of Early Squamous Neoplasia (Luis C. Garcia-Peraza-Herrera - 2 May, 2018)
Motivated by the classification of oesophageal tissue for real-time detection of early squamous neoplasia, the most frequent kind of oesophageal cancer in Asia, we present a new dataset and a novel deep learning method that by means of deep supervision and a newly introduced concept, the embedded Class Activation Map (eCAM), focuses on the interpretability of results as a design constraint of a convolutional network. In comparison to a baseline method which does not feature deep supervision but provides attention by grafting Class Activation Maps, we improve the F1-score from 87.3% to 92.7% and provide more detailed attention maps.
Link: https://arxiv.org/abs/1805.00632
====================================================
Feedback Control Goes Wireless: Guaranteed Stability over Low-power Multi-hop Networks (Fabian Mager - 24 April, 2018)
This paper presents a wireless embedded system that tames imperfections impairing control performance such as jitter or packet loss, and a control design that exploits the essential properties of this system to provably guarantee closed-loop stability for linear dynamic systems. Using experiments on a testbed with multiple cart-pole systems, we are the first to demonstrate the feasibility and to assess the performance of closed-loop control and coordination over multi-hop low-power wireless for update intervals from 20 ms to 50 ms.
Link: https://arxiv.org/abs/1804.08986
====================================================
Fingerprint Match in Box (Joshua J. Engelsma - 23 April, 2018)
We open source fingerprint Match in Box, a complete end-to-end fingerprint recognition system embedded within a 4 inch cube. An onboard touch screen and rechargeable battery pack make this device extremely portable and ideal for applying both fingerprint authentication (1:1 comparison) and fingerprint identification (1:N search) to applications (vaccination tracking, food and benefit distribution programs, human trafficking prevention) in rural communities, especially in developing countries. We also show that Match in Box is suited for capturing neonate fingerprints due to its high resolution (1900 ppi) cameras.
Link: https://arxiv.org/abs/1804.08659
====================================================
A 0.086-mm$^2$ 12.7-pJ/SOP 64k-Synapse 256-Neuron Online-Learning Digital Spiking Neuromorphic Processor in 28nm CMOS (Charlotte Frenkel - 8 October, 2018)
It leverages an efficient implementation of the spike-driven synaptic plasticity (SDSP) learning rule for high-density embedded online learning with only 0.68$Î¼$m$^2$ per 4-bit synapse. Neurons can be independently configured as a standard leaky integrate-and-fire (LIF) model or as a custom phenomenological model that emulates the 20 Izhikevich behaviors found in biological spiking neurons. Using a single presentation of 6k 16$\times$16 MNIST training images to a single-layer fully-connected 10-neuron network with on-chip SDSP-based learning, ODIN achieves a classification accuracy of 84.5% while consuming only 15nJ/inference at 0.55V using rank order coding
Link: https://arxiv.org/abs/1804.07858
====================================================
MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices (Sheng Chen - 14 June, 2018)
We present a class of extremely efficient CNN models, MobileFaceNets, which use less than 1 million parameters and are specifically tailored for high-accuracy real-time face verification on mobile and embedded devices. Under the same experimental conditions, our MobileFaceNets achieve significantly superior accuracy as well as more than 2 times actual speedup over MobileNetV2. After trained by ArcFace loss on the refined MS-Celeb-1M, our single MobileFaceNet of 4.0MB size achieves 99.55% accuracy on LFW and 92.59% TAR@FAR1e-6 on MegaFace, which is even comparable to state-of-the-art big CNN models of hundreds MB size. The fastest one of MobileFaceNets has an actual inference time of 18 milliseconds on a mobile phone
Link: https://arxiv.org/abs/1804.07573
====================================================
Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression (Shihui Yin - 19 April, 2018)
Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which makes it challenging to implement them on power/area-constrained embedded platforms. The optimized DNN that combines 8X structured compression and 3-bit weight precision showed 98.4% accuracy at 20nJ per classification.
Link: https://arxiv.org/abs/1804.07370
====================================================
Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving (Jiaolong Xu - 17 April, 2018)
Autonomous driving has harsh requirements of small model size and energy efficiency, in order to enable the embedded system to achieve real-time on-board object detection. Among them, binary weight neural network (BWN) is the extreme case which quantizes the float-point into just $1$ bit. The experimental results show that the proposed method maintains high detection accuracy while reducing the model size of DarkNet-YOLO from 257 MB to 8.8 MB and MobileNet-YOLO from 193 MB to 7.9 MB.
Link: https://arxiv.org/abs/1804.06332
====================================================
An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter (Zhouhan Chen - 14 April, 2018)
The bot groups we detect tweet duplicate content with shortened embedded URLs over extended periods of time. Our experiments with the detection protocol reveal that bots consistently account for 10% to 50% of tweets generated from 7 popular URL shortening services on Twitter
Link: https://arxiv.org/abs/1804.05232
====================================================
On Solving Quantified Bit-Vectors using Invertibility Conditions (Aina Niemetz - 11 May, 2018)
We show that invertibility conditions can be embedded into quantifier instantiations using Hilbert choice expressions, and give experimental evidence that a counterexample-guided approach for quantifier instantiation utilizing these techniques leads to performance improvements with respect to state-of-the-art solvers for quantified bit-vector constraints.
Link: https://arxiv.org/abs/1804.05025
====================================================
Precise Temporal Action Localization by Evolving Temporal Proposals (Haonan Qiu - 13 April, 2018)
Our framework is embedded with an Actionness Network to generate initial proposals through frame-wise similarity grouping, and then a Refinement Network to conduct boundary adjustment on these proposals. Our proposed framework achieves mAP@IoU=0.5 of 34.2%.
Link: https://arxiv.org/abs/1804.04803
====================================================
Towards a Flexible Architecture for Industrial Networking (Michael Karrenbauer - 18 April, 2018)
It is embedded into the Industrial Internet Reference Architecture and the RAMI4.0 reference architecture. The paper shows how the advancements introduced around the new 5G mobile technology can fulfill a wide range of industry requirements and thus enable new Industry 4.0 applications
Link: https://arxiv.org/abs/1804.04531
====================================================
L1 guidance logic extension for small UAVs: handling high winds and small loiter radii (Thomas Stastny - 23 May, 2018)
L1 guidance logic is one of the most widely used path following controllers for small fixed-wing unmanned aerial vehicles (UAVs), primarily due to its simplicity (low-cost implementation on embedded on-board processors, e.g. Two primary drawbacks remain, specific to small, slow flying fixed-wing UAVs; namely, 1) the combination of low operator defined gains and high ground speeds may violate the bounds of the algorithms convergence region for the case of loiter circles with small radii and 2) L1 logic breaks down when wind speeds exceed the vehicle's airspeed, another common predicament for small, slow-flying UAVs
Link: https://arxiv.org/abs/1804.04209
====================================================
Detecting Multi-Oriented Text with Corner-based Region Proposals (Linjie Deng - 8 April, 2018)
Moreover, we design a simple embedded data augmentation module inside the region-wise subnetwork, which not only ensures the model utilizes training data more efficiently, but also learns to find the most representative instance of the input images for training. On the ICDAR 2013 and 2015 datasets, it obtains F-measure of 0.876 and 0.845 respectively
Link: https://arxiv.org/abs/1804.02690
====================================================
A Multi-Stage Multi-Task Neural Network for Aerial Scene Interpretation and Geolocalization (Alina Marcu - 4 April, 2018)
Furthermore, its size is limited to be tractable on an embedded GPU. We achieve commercial GPS-level localization accuracy from satellite images with spatial resolution of 1 square meter per pixel in a city-wide area of interest
Link: https://arxiv.org/abs/1804.01322
====================================================
Towards Highly Accurate Coral Texture Images Classification Using Deep Convolutional Neural Networks and Data Augmentation (Anabel GÃ³mez-RÃos - 27 March, 2018)
The recognition of coral species based on underwater texture images pose a significant difficulty for machine learning algorithms, due to the three following challenges embedded in the nature of this data: 1) datasets do not include information about the global structure of the coral; 2) several species of coral have very similar characteristics; and 3) defining the spatial borders between classes is difficult as many corals tend to appear together in groups. We have analyzed 1) several Convolutional Neural Network (CNN) architectures, 2) data augmentation techniques and 3) transfer learning
Link: https://arxiv.org/abs/1804.00516
====================================================
MicronNet: A Highly Compact Deep Convolutional Neural Network Architecture for Real-time Embedded Traffic Sign Classification (Alexander Wong - 3 October, 2018)
While deep neural networks have been demonstrated in recent years to provide state-of-the-art performance traffic sign recognition, a key challenge for enabling the widespread deployment of deep neural networks for embedded traffic sign recognition is the high computational and memory requirements of such networks. The resulting MicronNet possesses a model size of just ~1MB and ~510,000 parameters (~27x fewer parameters than state-of-the-art) while still achieving a human performance level top-1 accuracy of 98.9% on the German traffic sign recognition benchmark. Furthermore, MicronNet requires just ~10 million multiply-accumulate operations to perform inference, and has a time-to-compute of just 32.19 ms on a Cortex-A53 high efficiency processor
Link: https://arxiv.org/abs/1804.00497
====================================================
Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings (Da-Rong Liu - 1 April, 2018)
The basic idea is to cluster the embedded acoustic tokens and learn the mapping between the cluster sequences and the unknown phoneme sequences with a Generative Adversarial Network (GAN). An unsupervised phoneme recognition accuracy of 36% was achieved in the preliminary experiments.
Link: https://arxiv.org/abs/1804.00316
====================================================
SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters (Yifan Xu - 12 September, 2018)
SpiderCNN is comprised of units called SpiderConv, which extend convolutional operations from regular grids to irregular point sets that can be embedded in R^n, by parametrizing a family of convolutional filters. Experiments on ModelNet40 demonstrate that SpiderCNN achieves state-of-the-art accuracy 92.4% on standard benchmarks, and shows competitive performance on segmentation task.
Link: https://arxiv.org/abs/1803.11527
====================================================
Single Stream Parallelization of Recurrent Neural Networks for Low Power and Fast Inference (Wonyong Sung - 30 March, 2018)
When a single stream recurrent neural network (RNN) is executed for a personal user in embedded systems, it demands a large amount of DRAM accesses because the network size is usually much bigger than the cache size and the weights of an RNN are used only once at each time step. The experiments for SRU showed about 300% and 930% of speed-up when the numbers of multi time steps are 4 and 16, respectively, in an ARM CPU based system.
Link: https://arxiv.org/abs/1803.11389
====================================================
Fine-Grained Energy and Performance Profiling framework for Deep Convolutional Neural Networks (Crefeda Faviola Rodrigues - 14 May, 2018)
In this work, we introduce a benchmarking framework called "SyNERGY" to measure the energy and time of 11 representative Deep Convolutional Neural Networks on embedded platforms such as NVidia Jetson TX1. In addition, we build an initial multi-variable linear regression model to predict energy consumption of unseen neural network models based on the number of SIMD instructions executed and main memory accesses of the CPU cores of the TX1 with an average relative test error rate of 8.04 +/- 5.96 %. Our predicted results demonstrate 7.08 +/- 6.0 % average relative error over actual energy measurements of all 11 networks tested, except MobileNet. By including MobileNet the average relative test error increases to 17.33 +/- 12.2 %.
Link: https://arxiv.org/abs/1803.11151
====================================================
SqueezeNext: Hardware-Aware Neural Network Design (Amir Gholami - 27 August, 2018)
One of the main barriers for deploying neural networks on embedded systems has been large memory and power consumption of existing neural networks. This new network is able to match AlexNet's accuracy on the ImageNet benchmark with $112\times$ fewer parameters, and one of its deeper variants is able to achieve VGG-19 accuracy with only 4.4 Million parameters, ($31\times$ smaller than VGG-19)
Link: https://arxiv.org/abs/1803.10615
====================================================
FPGA Implementations of 3D-SIMD Processor Architecture for Deep Neural Networks Using Relative Indexed Compressed Sparse Filter Encoding Format and Stacked Filters Stationary Flow (Yuechao Gao - 12 April, 2018)
It is a challenging task to deploy computationally and memory intensive State-of-the-art deep neural networks (DNNs) on embedded systems with limited hardware resources and power budgets. In [1], we introduce a computation dataflow, stacked filters stationary dataflow (SFS), and a corresponding data encoding format, relative indexed compressed sparse filter format (CSF), to make the best of data sparsity, and simplify data handling at execution time. Comparing with the state-of-the-art results [2,3,4], our methods achieve at least 2x improvement for computation efficiency per PE on most layers. Especially, our methods achieve 8x improvement on AlexNet layer CONV4 with 384 filters, and 11x improvement on VGG16 layer CONV5-3 with 512 filters.
Link: https://arxiv.org/abs/1803.10548
====================================================
Hand Gesture Controlled Drones: An Open Source Library (Kathiravan Natarajan - 27 March, 2018)
Drones are conventionally controlled using joysticks, remote controllers, mobile applications, and embedded computers. Classification accuracies show that well-lit, clear background, and within 3 ft gestures are recognized correctly over 90%
Link: https://arxiv.org/abs/1803.10344
====================================================
Image Semantic Transformation: Faster, Lighter and Stronger (Dasong Li - 27 March, 2018)
One powerful Euclidean latent space embedded in ISTRC is FaceNet's last layer with the power of distinguishing and understanding images. In this paper, we show that ISTRC performs 10 high-level semantic transformations like "Male and female","add smile","open mouth", "deduct beard or add mustache", "bigger/smaller nose", "make older and younger", "bigger lips", "bigger eyes", "bigger/smaller mouths" and "more attractive". It just takes 3 hours(GTX 1080) to train the models of 10 semantic transformations.
Link: https://arxiv.org/abs/1803.09932
====================================================
On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach (Nikolai Smolyanskiy - 19 April, 2018)
We propose a novel semi-supervised learning approach to training a deep stereo neural network, along with a novel architecture containing a machine-learned argmax layer and a custom runtime (that will be shared publicly) that enables a smaller version of our stereo DNN to run on an embedded GPU. Competitive results are shown on the KITTI 2015 stereo dataset
Link: https://arxiv.org/abs/1803.09719
====================================================
CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF (Linchao Bao - 26 March, 2018)
To this end, we propose a novel CNN-embedded algorithm to perform approximate inference in the MRF. When initialized with an appearance-based one-shot segmentation CNN, our model outperforms the winning entries of the DAVIS 2017 Challenge, without resorting to model ensembling or any dedicated detectors.
Link: https://arxiv.org/abs/1803.09453
====================================================
A Case Study for Grain Quality Assurance Tracking based on a Blockchain Business Network (Percival Lucena - 21 March, 2018)
Those transactions can be generated and ruled by special network-embedded software -- known as smart contracts -- that may be public to all nodes of the network or may be private to a specific set of peer nodes. Preliminary results support a potential demand for a Blockchain-based certification that would lead to an added valuation of around 15% for GM-free soy in the scope of a Grain Exporter Business Network in Brazil.
Link: https://arxiv.org/abs/1803.07877
====================================================
EVA$^2$: Exploiting Temporal Redundancy in Live Computer Vision (Mark Buckler - 16 April, 2018)
Hardware support for deep convolutional neural networks (CNNs) is critical to advanced computer vision in mobile and embedded devices. The new unit reduces the average energy per frame by 54.2%, 61.7%, and 87.6% for three CNNs with less than 1% loss in vision accuracy.
Link: https://arxiv.org/abs/1803.06312
====================================================
A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets (Fabian Schuiki - 26 September, 2018)
Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7x over previously published results; (ii) an optimized IEEE754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7x energy efficiency improvement of NTX over contemporary GPUs at 4.4x less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95% parallel and energy efficiency, while providing 2.1x energy savings or 3.1x performance improvement over a GPU-based system.
Link: https://arxiv.org/abs/1803.04783
====================================================
Breast Tumor Classification Based on Decision Information Genes and Inverse Projection Sparse Representation (Xiaohui Yang - 17 April, 2018)
For completing the classification, an inverse projection sparse representation (IPSR) model is constructed to exploit information embedded in existing samples, especially in the test ones. Compared to the latest open literature, there is 14% higher in classification accuracy. Specificity and sensitivity achieve 94.17% and 97.5%, respectively.
Link: https://arxiv.org/abs/1803.03562
====================================================
TRLG: Fragile blind quad watermarking for image tamper detection and recovery by providing compact digests with quality optimized using LWT and GA (Behrouz Bolourian Haghighi - 7 March, 2018)
Furthermore, CCS map is used to determine the mapping block for embedding information, encrypting and confusing the embedded information. The results indicate that the PSNR and SSIM of the watermarked image are about 46 dB and approximately one, respectively. Also, the mean of PSNR and SSIM of several recovered images which has been destroyed about 90% is reached to 24 dB and 0.86, respectively.
Link: https://arxiv.org/abs/1803.02623
====================================================
Smartphone-based Home Robotics (Lojain Jibawi - 6 March, 2018)
Smartphones have been embedded in toys and drones before. Here, we introduce a novel robot architecture based on smartphones that demonstrates x3 cost reduction and that is compatible with iOS/Android.
Link: https://arxiv.org/abs/1803.02122
====================================================
Inferring Missing Categorical Information in Noisy and Sparse Web Markup (Nicolas Tempelmeier - 1 March, 2018)
Embedded markup of Web pages has seen widespread adoption throughout the past years driven by standards such as RDFa and Microdata and initiatives such as schema.org, where recent studies show an adoption by 39% of all Web pages already in 2016. For instance, from 26 million nodes describing events within the Common Crawl in 2016, 59% of nodes provide less than six statements and only 257,000 nodes (0.96%) are typed with more specific event subtypes. Our experiments, conducted on properties of events and movies, show a performance of 79% and 83% F1 score correspondingly, significantly outperforming existing baselines.
Link: https://arxiv.org/abs/1803.00446
====================================================
A High GOPs/Slice Time Series Classifier for Portable and Embedded Biomedical Applications (Hamid Soleimani - 26 February, 2018)
Such a trend in hardware design may not be efficient in applications where on-node computation is required and the focus is more on the area and power efficiency as in the case of portable and embedded biomedical devices. Most notably, our classifier reaches 1.46$\times$ higher GOPs/Slice than similar state of the art FPGA-based accelerators.
Link: https://arxiv.org/abs/1802.10458
====================================================
Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption (Kyriakos Georgiou - 27 February, 2018)
Experimental evaluation with 71 embedded benchmarks demonstrated performance gains for at least half of the benchmarks for both processors. An average execution time reduction of 2.4% and 5.3% was achieved across all the benchmarks for the Cortex-M0 and Cortex-M3 processors, respectively, with execution time improvements ranging from 1% up to 90% over the -O2. In contrast to these time consuming and expensive to apply techniques, our approach only needs to test a limited number of optimization configurations, less than 64, to obtain similar or even better savings
Link: https://arxiv.org/abs/1802.09845
====================================================
SmartUnit: Empirical Evaluations for Automated Unit Testing of Embedded Software in Industry (Chengyu Zhang - 17 June, 2018)
From our experimental results, in general, more than 90% of functions in commercial embedded software achieve 100% statement, branch, MC/DC coverage, more than 80% of functions in SQLite achieve 100% MC/DC coverage, and more than 60% of functions in PostgreSQL achieve 100% MC/DC coverage
Link: https://arxiv.org/abs/1802.08547
====================================================
Training wide residual networks for deployment using a single bit for each weight (Mark D. McDonnell - 23 February, 2018)
For fast and energy-efficient deployment of trained deep neural networks on resource-constrained embedded hardware, each learned weight parameter should ideally be represented and stored using a single bit. For CIFAR-10, CIFAR-100 and ImageNet, and models with 1-bit-per-weight requiring less than 10 MB of parameter memory, we achieve error rates of 3.9%, 18.5% and 26.0% / 8.5% (Top-1 / Top-5) respectively. We also considered MNIST, SVHN and ImageNet32, achieving 1-bit-per-weight test results of 0.27%, 1.9%, and 41.3% / 19.1% respectively. For CIFAR, our error rates halve previously reported values, and are within about 1% of our error-rates for the same network with full-precision weights. Using a warm-restart learning-rate schedule, we found that training for 1-bit-per-weight is just as fast as full-precision networks, with better accuracy than standard schedules, and achieved about 98%-99% of peak performance in just 62 training epochs for CIFAR-10/100
Link: https://arxiv.org/abs/1802.08530
====================================================
Reversible Image Watermarking for Health Informatics Systems Using Distortion Compensation in Wavelet Domain (Hamidreza Zarrabi - 21 February, 2018)
Integer wavelet transform is used for embedding where in each iteration, one watermark bit is embedded in one transform coefficient. Using a one-level wavelet transform, maximum capacity of 1.5 BPP is obtained
Link: https://arxiv.org/abs/1802.07786
====================================================
Full Virtualization of Renault's Engine Management Software and Application to System Development (Dirk Von Wissel - 16 February, 2018)
Generation of C code (EMS application software) from all module specifications using MATLAB/Simulink Embedded Coder. 3. To insure software quality, this step is repeatedly performed with steps 1 and 2, based on the simulation capabilities of MATLAB/Simulink. 4. 5. 6. In contrast to step 3, the interactions of all modules and interactions with the system environment are visible then and subject to testing. Critical assessment of the above process shows that there is a considerable delay between delivery of a set of specifications to the software project team (at the end of step 3) and system-level tests based on an ECU that runs entire software (step 6)
Link: https://arxiv.org/abs/1802.06841
====================================================
Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks (Guillem Cucurull - 6 February, 2018)
To sense the whys of certain social user's demands and cultural-driven interests, however, the knowledge embedded in the 1.8 billion pictures which are uploaded daily in public profiles has just started to be exploited since this process has been typically been text-based
Link: https://arxiv.org/abs/1802.06757
====================================================
Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection (Alexander Wong - 18 February, 2018)
Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the single-shot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire sub-network stack and a non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO)
Link: https://arxiv.org/abs/1802.06488
====================================================
High Speed SRT Divider for Intelligent Embedded System (Bhavana Mehta - 17 February, 2018)
In IoT and other embedded applications, typically radix 2 and radix 4 division algorithms are used. Our approach uses the previous SRT algorithm methods to make a highly parallel pipelined design and use Mamdani model to determine a solution to the overlapping problem to reduce the overall execution time of radix 4 SRT division on 64 bits double precision floating point numbers to 281ns
Link: https://arxiv.org/abs/1802.06195
====================================================
QRkit: Sparse, Composable QR Decompositions for Efficient and Stable Solutions to Problems in Computer Vision (Jan Svoboda - 11 February, 2018)
Embedded computer vision applications increasingly require the speed and power benefits of single-precision (32 bit) floating point
Link: https://arxiv.org/abs/1802.03773
====================================================
Lightweight Classification of IoT Malware based on Image Recognition (Jiawei Su - 11 February, 2018)
Current IoT devices are typically micro-computers for domain-specific computations rather than traditional functionspecific embedded devices. The experimental results show that the proposed system can achieve 94.0% accuracy for the classification of goodware and DDoS malware, and 81.8% accuracy for the classification of goodware and two main malware families.
Link: https://arxiv.org/abs/1802.03714
====================================================
Learning Correlation Space for Time Series (Han Qiu - 15 May, 2018)
The given space is learned such that time series correlation can be effectively approximated from Euclidean distance between corresponding embedded vectors. For top-$k$ highest correlation search, our method improves the precision from 5\% to 20\% while the query time is similar to the baseline approach query time.
Link: https://arxiv.org/abs/1802.03628
====================================================
MOEA/D with Angle-based Constrained Dominance Principle for Constrained Multi-objective Optimization Problems (Zhun Fan - 10 February, 2018)
This paper proposes a novel constraint-handling mechanism named angle-based constrained dominance principle (ACDP) embedded in a decomposition-based multi-objective evolutionary algorithm (MOEA/D) to solve constrained multi-objective optimization problems (CMOPs). This paper uses 14 benchmark instances to evaluate the performance of the MOEA/D with ACDP (MOEA/D-ACDP)
Link: https://arxiv.org/abs/1802.03608
====================================================
OEI: Operation Execution Integrity for Embedded Devices (Zhichuang Sun - 9 February, 2018)
When tested against real-world embedded programs on a development board, OAT incurred only a mild runtime overhead (2.7%).
Link: https://arxiv.org/abs/1802.03462
====================================================
D2.4 Report on the final prototype of programming abstractions for energy-efficient inter-process communication (Phuong Hoai Ha - 8 February, 2018)
Section 2) ii) Customization methodology for implementation of streaming aggregation in embedded systems (cf. Section 3) iii) Energy Model on CPU for Lock-free Data-structures in Dynamic Environments (cf. Section 4.10) iv) A General and Validated Energy Complexity Model for Multithreaded Algorithms (cf. Section 5)
Link: https://arxiv.org/abs/1802.03013
====================================================
Digital Watermarking for Deep Neural Networks (Yuki Nagai - 6 February, 2018)
The embedded watermark does not disappear even after fine-tuning or parameter pruning; the watermark remains complete even after 65% of parameters are pruned.
Link: https://arxiv.org/abs/1802.02601
====================================================
Polarization and Fake News: Early Warning of Potential Misinformation Targets (Michela Del Vicario - 5 February, 2018)
Moreover, such information may be embedded as a new feature in an additional classifier able to recognize fake news with 91% accuracy
Link: https://arxiv.org/abs/1802.01400
====================================================
Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning (Yixing Li - 2 February, 2018)
Due to the high computational complexity and memory storage requirement, it is hard to directly deploy a full-precision CNN on embedded devices. Our result shows that we can further scale down the network size of the BNN up to 3.9x with no more than 1% accuracy drop. The actual runtime can be reduced up to 2x and 9.9x compared with the baseline BNN and its full-precision counterpart, respectively.
Link: https://arxiv.org/abs/1802.00904
====================================================
Securing On-Body IoT Devices By Exploiting Creeping Wave Propagation (Wei Wang - 28 January, 2018)
These on-body IoT devices are largely embedded devices that lack a sophisticated user interface to facilitate traditional Pre-Shared Key based security protocols. Extensive experiments are conducted in a lab, apartments, malls, and outdoor areas, involving 12 volunteer subjects of different age groups, to demonstrate the robustness of our system. Results show that our system can mitigate 96.13% of active attack attempts while triggering false alarms on merely 5.64% of legitimate traffic.
Link: https://arxiv.org/abs/1801.09224
====================================================
D2.2 White-box methodologies, programming abstractions and libraries (Phuong Hoai Ha - 8 February, 2018)
Regarding programming abstractions and libraries, we have continued investigat- ing the trade-offs between energy consumption and performance of data structures such as concurrent queues and concurrent search trees based on the early results of Task 2.1.The preliminary results show that our concurrent trees are faster and more energy efficient than the state-of-the-art on commodity HPC and embedded platforms.
Link: https://arxiv.org/abs/1801.08761
====================================================
EnKCF: Ensemble of Kernelized Correlation Filters for High-Speed Object Tracking (Burak Uzkent - 20 January, 2018)
Computer vision technologies are very attractive for practical applications running on embedded systems. Experimental results showed that the performance of ours is, on average, 70.10% for precision at 20 pixels, 53.00% for success rate for the OTB100 data, and 54.50% and 40.2% for the UAV123 data. Experimental results showed that our method is better than other high-speed trackers over 5% on precision on 20 pixels and 10-20% on AUC on average. Moreover, our implementation ran at 340 fps for the OTB100 and at 416 fps for the UAV123 dataset that is faster than DCF (292 fps) for the OTB100 and KCF (292 fps) for the UAV123
Link: https://arxiv.org/abs/1801.06729
====================================================
In-RDBMS Hardware Acceleration of Advanced Analytics (Divya Mahajan - 18 September, 2018)
The accelerator implementation is generated for a User Defined Function (UDF), expressed as a part of an SQL query using a Python-embedded Domain-Specific Language (DSL). Results show that DAnA-enhanced PostgreSQL provides, on average, 8.3x end-to-end speedup for real datasets, with a maximum of 28.2x. Moreover, DAnA-enhanced PostgreSQL is, on average, 4.0x faster than the multi-threaded Apache MADLib running on Greenplum
Link: https://arxiv.org/abs/1801.06027
====================================================
StressedNets: Efficient Feature Representations via Stress-induced Evolutionary Synthesis of Deep Neural Networks (Mohammad Javad Shafiee - 16 January, 2018)
The computational complexity of leveraging deep neural networks for extracting deep feature representations is a significant barrier to its widespread adoption, particularly for use in embedded devices. Experimental results demonstrate the efficacy of the proposed framework to synthesize StressedNets with significant improvement in network architecture efficiency (e.g., 40x for AlexNet and 33x for YOLOv2) and speed improvements (e.g., 5.5x inference speed-up for YOLOv2 on an Nvidia Tegra X1 mobile processor).
Link: https://arxiv.org/abs/1801.05387
====================================================
Full Wafer Redistribution and Wafer Embedding as Key Technologies for a Multi-Scale Neuromorphic Hardware Cluster (Kai Zoschke - 15 January, 2018)
The panels with the embedded wafers were subsequently stressed with up to 1000 thermal cycles between 0C and 100C and have shown no severe failure formation over the cycle time.
Link: https://arxiv.org/abs/1801.04734
====================================================
PACER: Peripheral Activity Completion Estimation and Recognition (Daniel Moore - 14 January, 2018)
Embedded peripheral devices such as memories, sensors and communications interfaces are used to perform a function external to a host microcontroller. For the peripheral devices under test, the test fixture confirmed decreases in energy expenditures of up to 80% and latency reductions of up to 67%.
Link: https://arxiv.org/abs/1801.04601
====================================================
Development of Energy Models for Design Space Exploration of Embedded Many-Core Systems (Christian Klarhorst - 15 January, 2018)
This paper introduces a methodology to develop energy models for the design space exploration of embedded many-core systems. Compared to a simulation of the CoreVA-MPSoC on gate level in a 28nm FD-SOI standard cell technology, our framework shows an average estimation error of about 4%.
Link: https://arxiv.org/abs/1801.04242
====================================================
The Wireless Technology Landscape in the Manufacturing Industry: A Reality Check (Xavier Vilajosana - 11 January, 2018)
An upcoming industrial IoT revolution, supposedly led by the introduction of embedded sensing and computing, seamless communication and massive data analytics within industrial processes [1], seems unquestionable today
Link: https://arxiv.org/abs/1801.03648
====================================================
Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2018 (David Castells-Rufas - 10 January, 2018)
Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2018. Collocated with HIPEAC 2018 Conference.
Link: https://arxiv.org/abs/1801.03513
====================================================
Text Extraction and Retrieval from Smartphone Screenshots: Building a Repository for Life in Media (Agnese Chiatti - 4 January, 2018)
In this paper, we present the experimental workflow we exploited to: (i) pre-process a unique collection of screen captures, (ii) extract unstructured text embedded in the images, (iii) organize image text and metadata based on a structured schema, (iv) index the resulting document collection, and (v) allow for Image Retrieval through a dedicated vertical search engine application. We show how combining OpenCV-based pre-processing modules with a Long short-term memory (LSTM) based release of Tesseract OCR, without ad hoc training, led to a 74% character-level accuracy of the extracted text
Link: https://arxiv.org/abs/1801.01316
====================================================
Multi-Objective Vehicle Routing Problem Applied to Large Scale Post Office Deliveries (Luis A. A. Meira - 23 December, 2017)
This work creates an extensible real-world mail delivery benchmark to the Vehicle Routing Problem (VRP) in a planar graph embedded in the 2D Euclidean space. Such problem is multi-objective on a roadmap with up to 25 vehicles and 30,000 deliveries per day
Link: https://arxiv.org/abs/1801.00712
====================================================
DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car (Michael G. Bechtel - 29 July, 2018)
We also systematically compare other contemporary embedded computing platforms using the DeepPicar's CNN-based real-time control workload. We find that all tested platforms, including the Pi 3, are capable of supporting the CNN-based real-time control, from 20 Hz up to 100 Hz, depending on hardware platform. To protect the CNN workload, we also evaluate state-of-the-art cache partitioning and memory bandwidth throttling techniques on the Pi 3
Link: https://arxiv.org/abs/1712.08644
====================================================
NDT: Neual Decision Tree Towards Fully Functioned Neural Graph (Han Xiao - 16 December, 2017)
Though traditional algorithms could be embedded into neural architectures with the proposed principle of \cite{xiao2017hungarian}, the variables that only occur in the condition of branch could not be updated as a special case. $\mathbf{1}_{x>0}$), then approximate Dirac symbol with the continuous functions (e.g. $1 - e^{-Î±|x|}$)
Link: https://arxiv.org/abs/1712.05934
====================================================
Fast Monte-Carlo Localization on Aerial Vehicles using Approximate Continuous Belief Representations (Aditya Dhawale - 29 March, 2018)
We demonstrate analysis of this likelihood in the vicinity of the ground truth pose and detail its utilization in a particle filter-based vehicle localization strategy, and later present results of real-time implementations on a desktop system and an off-the-shelf embedded platform that outperform localization results from running a state-of-the-art algorithm on the same environment.
Link: https://arxiv.org/abs/1712.05507
====================================================
Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks (Bo Wu - 12 December, 2017)
Then, based on the embedded data sequence over time, temporal context learning attempts to recurrently learn two adaptive temporal contexts for sequential popularity. Experiments on our released image dataset with about 600K Flickr photos demonstrate that DTCN outperforms state-of-the-art deep prediction algorithms, with an average of 21.51% relative performance improvement in the popularity prediction (Spearman Ranking Correlation).
Link: https://arxiv.org/abs/1712.04443
====================================================
Direction-aware Spatial Context Features for Shadow Detection (Xiaowei Hu - 16 May, 2018)
This design is developed into the DSC module and embedded in a CNN to learn DSC features at different levels. Experimental results show that our network outperforms state-of-the-art methods and achieves 97% accuracy and 38% reduction on balance error rate.
Link: https://arxiv.org/abs/1712.04142
====================================================
EmLog: Tamper-Resistant System Logging for Constrained Devices with TEEs (Carlton Shepherd - 18 December, 2017)
Remote mobile and embedded devices are used to deliver increasingly impactful services, such as medical rehabilitation and assistive technologies. On average, EmLog runs with low run-time memory overhead (1MB heap and stack), 430--625 logs/second throughput, and five-times persistent storage overhead versus unprotected logs.
Link: https://arxiv.org/abs/1712.03943
====================================================
Memory-based Combination PUFs for Device Authentication in Embedded Systems (Soubhagya Sutar - 5 December, 2017)
Embedded systems play a crucial role in fueling the growth of the Internet-of-Things (IoT) in application domains such as healthcare, home automation, transportation, etc. Extensive authentication tests across a wide temperature range (20 - 60 deg. Celsius) and accelerated aging (12 months) demonstrate the robustness of the proposed design, which achieves a 100% true-positive rate and 0% false-positive rate for authentication across these parameter ranges.
Link: https://arxiv.org/abs/1712.01611
====================================================
Multikast rutiranje open-source platformom - XORP (Petar D. Bojovic - 3 December, 2017)
Integration of a software router into embedded systems is obtained possibility of the most modern routers, at a much more affordable price. Transfer services TV and radio signals over the IP network are only activated by using multicast 1 protocol for routing. Multicast routing 2 is currently a feature of only costly hardware solutions
Link: https://arxiv.org/abs/1712.00776
====================================================
On the Simultaneous Minimum Spanning Trees Problem (MatÄj KoneÄnÃ½ - 1 December, 2017)
Simultaneous Embedding with Fixed Edges (SEFE) is a problem where given $k$ planar graphs we ask whether they can be simultaneously embedded so that the embedding of each graph is planar and common edges are drawn the same. Given $k$ graphs with weighted edges, such that they have a common intersection, are there minimum spanning trees of the respective graphs such that they agree on the intersection? We show that the unweighted case is polynomial-time solvable while the weighted case is only polynomial-time solvable for $k=2$ and it is NP-complete for $k\geq 3$.
Link: https://arxiv.org/abs/1712.00253
====================================================
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition (Shuyang Sun - 7 July, 2018)
By directly calculating pixel-wise spatiotemporal gradients of the deep feature maps, the OFF could be embedded in any existing CNN based video action recognition framework with only a slight additional cost. The network with OFF fed only by RGB inputs achieves a competitive accuracy of 93.3% on UCF-101, which is comparable with the result obtained by two streams (RGB and optical flow), but is 15 times faster in speed
Link: https://arxiv.org/abs/1711.11152
====================================================
Energy-Efficient Time-Domain Vector-by-Matrix Multiplier for Neurocomputing and Beyond (Mohammad Bavandpour - 28 November, 2017)
As a case study, we have designed a multilayer perceptron, based on two layers of 10x10 four-quadrant vector-by-matrix multipliers, in 55-nm process with embedded NOR flash memory technology, which allows for compact implementation of adjustable current sources. Post-layout estimates for a conservative 6-bit digital input/output NxN multiplier designed in 55 nm process, including I/O circuitry for converting between digital and time domain representations, show ~7 fJ/Op for N>200, which can be further lowered well below 1 fJ/Op for more optimal and aggressive design.
Link: https://arxiv.org/abs/1711.10673
====================================================
A Transprecision Floating-Point Platform for Ultra-Low Power Computing (Giuseppe Tagliavini - 28 November, 2017)
In modern low-power embedded platforms, floating-point (FP) operations emerge as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. Experimental evidence shows that 50% of the energy consumed by a core and its data memory is related to FP computations. Experimental results on FP-intensive benchmarks show that up to 90% of FP operations can be safely scaled down to 8-bit or 16-bit formats. Thanks to precision tuning and vectorization, execution time is decreased by 12% and memory accesses are reduced by 27% on average, leading to a reduction of energy consumption up to 30%.
Link: https://arxiv.org/abs/1711.10374
====================================================
Towards Provably Invisible Network Flow Fingerprints (Ramin Soltani - 22 September, 2018)
Bob, who receives the set of fingerprinted flows after they pass through the network modeled as a collection of independent and parallel $M/M/1$ queues, wishes to extract Alice's embedded fingerprints to infer the connection between input and output links of the network. We consider two scenarios: 1) Alice embeds fingerprints in all of the flows; 2) Alice embeds fingerprints in each flow independently with probability $p$
Link: https://arxiv.org/abs/1711.10079
====================================================
JPEG Steganalysis Based on DenseNet (Jianhua Yang - 17 April, 2018)
Different from the conventional deep learning work based on an images content in computer vision, deep steganalysis is an art to detect the secret information embedded in an image via deep learning, pose challenge of detection weak information invisible hidden in a host image thus learning in a very low signal-to-noise (SNR) case. Compared with the state-of-the-art method XuNet [1] on BOSSbase, the proposed CNN-SCA-GFR architecture can reduce detection error rate by 5.67% for 0.1 bpnzAC and by 4.41% for 0.4 bpnzAC while the number of training parameters in CNN is only 17% of what used by XuNet. It also decreases the detection errors from the conventional method SCA-GFR by 7.89% for 0.1 bpnzAC and 8.06% for 0.4 bpnzAC, respectively.
Link: https://arxiv.org/abs/1711.09335
====================================================
fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs (Stylianos I. Venieris - 23 November, 2017)
Overall, our framework yields designs that improve the performance by up to 6.65x over highly optimised embedded GPU designs for the same power constraints in embedded environments.
Link: https://arxiv.org/abs/1711.08740
====================================================
Software Development Under Stringent Hardware Constraints: Do Agile Methods Have a Chance? (Jussi Ronkainen - 23 November, 2017)
This paper explores the possibility of using agile development techniques in this environment and defines the requirements for new agile methods targeted to facilitate the development of embedded software. The findings are based on an empirical study over a period 12 months in the development of low-level telecommunications software
Link: https://arxiv.org/abs/1711.08637
====================================================
Efficient Implementation of a Recognition System Using the Cortex Ventral Stream Model (Ahmad W. Bitar - 21 November, 2017)
At layer C1, the minimum scales values are exploited to be embedded into the maximum ones using the additive embedding space. The results show that our model provides significant improvement in accuracy at the S1 layer by more than 10% where the computational complexity is also reduced
Link: https://arxiv.org/abs/1711.07827
====================================================
SPARE: Spiking Networks Acceleration Using CMOS ROM-Embedded RAM as an In-Memory-Computation Primitive (Amogh Agrawal - 30 July, 2018)
Our results show up-to 1.75x, 1.95x and 1.95x improvement in energy, iso-storage area, and iso-area performance, respectively, by using neural network accelerators built on ROM-embedded RAM primitives.
Link: https://arxiv.org/abs/1711.07546
====================================================
SquishedNets: Squishing SqueezeNet further for edge device scenarios via deep evolutionary synthesis (Mohammad Javad Shafiee - 20 November, 2017)
Furthermore, the SquishedNets are still able to achieve accuracies ranging from 81.2% to 77%, and able to process at speeds of 156 images/sec to as much as 256 images/sec on a Nvidia Jetson TX1 embedded chip
Link: https://arxiv.org/abs/1711.07459
====================================================
Mobile Video Object Detection with Temporally-Aware Feature Maps (Mason Liu - 28 March, 2018)
This paper introduces an online model for object detection in videos designed to run in real-time on low-powered mobile and embedded devices. This approach is substantially faster than existing detection methods in video, outperforming the fastest single-frame models in model size and computational cost while attaining accuracy comparable to much more expensive single-frame models on the Imagenet VID 2015 dataset. Our model reaches a real-time inference speed of up to 15 FPS on a mobile CPU.
Link: https://arxiv.org/abs/1711.06368
====================================================
Squeeze-SegNet: A new fast Deep Convolutional Neural Network for Semantic Segmentation (Geraldin Nanfack - 15 November, 2017)
Thus, brilliant ideas in the field of semantic segmentation with deep learning have completed the state of the art of accuracy, however this architectures become very difficult to apply in embedded systems as is the case for autonomous driving. On datasets like Camvid or City-states, our net gets SegNet-level accuracy with less than 10 times fewer parameters than SegNet.
Link: https://arxiv.org/abs/1711.05491
====================================================
Towards Interpretable R-CNN by Unfolding Latent Structures (Tianfu Wu - 6 September, 2018)
We utilize a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of regions of interest (RoIs). In experiments, we build on R-FCN and test our method on the PASCAL VOC 2007 and 2012 datasets
Link: https://arxiv.org/abs/1711.05226
====================================================
Quantized Memory-Augmented Neural Networks (Seongsik Park - 10 November, 2017)
Quantization is known to be effective when we deploy deep models on embedded systems with limited resources. In our experiments, we achieved a computation-energy gain of 22x with 8-bit fixed-point and binary quantization compared to the floating-point implementation. Measured on the bAbI dataset, the resulting model, named the quantized MANN (Q-MANN), improved the error rate by 46% and 30% with 8-bit fixed-point and binary quantization, respectively, compared to the MANN quantized using conventional techniques.
Link: https://arxiv.org/abs/1711.03712
====================================================
Traffic Prediction Based on Random Connectivity in Deep Learning with Long Short-Term Memory (Yuxiu Hua - 3 April, 2018)
In particular, Long Short-Term Memory (LSTM), one kind of Recurrent Neural Network (RNN) schemes, has attracted a lot of attentions due to its capability of processing the long-range dependency embedded in the sequential traffic data. We apply the RCLSTM to predict traffic and validate that the RCLSTM with even 35% neural connectivity still shows a satisfactory performance
Link: https://arxiv.org/abs/1711.02833
====================================================
Time-Triggered Co-Scheduling of Computation and Communication with Jitter Requirements (Anna Minaeva - 27 November, 2017)
In particular, automotive embedded systems are highly complex in nature, and their functionality is realized by a set of periodic tasks. 2) A heuristic approach, employing three levels of scheduling scaling to real-world use-cases with 10000 tasks and messages. It shows that up to 28% higher resource utilization can be achieved by having up to 10 times longer computation time with relaxed jitter requirements.
Link: https://arxiv.org/abs/1711.00398
====================================================
A multitask deep learning model for real-time deployment in embedded systems (Miquel MartÃ - 31 October, 2017)
We propose an approach to Multitask Learning (MTL) to make deep learning models faster and lighter for applications in which multiple tasks need to be solved simultaneously, which is particularly useful in embedded, real-time systems. Our multitask network is 1.6x faster, lighter and uses less memory than deploying the single-task models in parallel
Link: https://arxiv.org/abs/1711.00146
====================================================
The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA (Yufeng Hao - 16 November, 2017)
Finally, we have found that the DRNN language model can be deployed on the embedded system smoothly and the Overlay accelerator with AXI Stream interface performs at 20 GOPS processing throughput, which constitutes a 70.5X and 2.75X speed up compared to the work in Ref.30 and Ref.31 respectively.
Link: https://arxiv.org/abs/1710.10296
====================================================
Spiking Optical Flow for Event-based Sensors Using IBM's TrueNorth Neurosynaptic System (Germain Haessig - 26 October, 2017)
A low power embedded implementation of the method which combines the Asynchronous Time-based Image Sensor with IBM's TrueNorth Neurosynaptic System is presented. These spike are processed by a spiking neural network running on TrueNorth with a 1 millisecond resolution to accurately determine the order and time difference of spikes from neighboring pixels, and therefore infer the velocity. The system is evaluated on two recordings for which ground truth motion is available, and achieves an Average Endpoint Error of 11% at an estimated power budget of under 80mW for the sensor and computation.
Link: https://arxiv.org/abs/1710.09820
====================================================
Trace norm regularization and faster inference for embedded speech recognition RNNs (Markus Kliegl - 6 February, 2018)
We propose and evaluate new techniques for compressing and speeding up dense matrix multiplications as found in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For speedup, we enable faster inference on ARM processors through new open sourced kernels optimized for small batch sizes, resulting in 3x to 7x speed ups over the widely used gemmlowp library
Link: https://arxiv.org/abs/1710.09026
====================================================
Accelerating Energy Games Solvers on Modern Architectures (Andrea Formisano - 10 October, 2017)
Quantitative games, where quantitative objectives are defined on weighted game arenas, provide natural tools for designing faithful models of embedded controllers. Our solution outperforms the baseline implementation by up to 36x speedup and obtains a faster convergence time on real-world graphs.
Link: https://arxiv.org/abs/1710.03647
====================================================
Deep learning for source camera identification on mobile devices (David Freire-ObregÃ³n - 13 October, 2017)
Our proposal describes a CNN architecture which is able to infer the noise pattern of mobile camera sensors (also known as camera fingerprint) with the aim at detecting and identifying not only the mobile device used to capture an image (with a 98\% of accuracy), but also from which embedded camera the image was captured
Link: https://arxiv.org/abs/1710.01257
====================================================
Efficient Convolutional Neural Network For Audio Event Detection (Matthias Meyer - 28 September, 2017)
In the area of distributed acoustic sensing, the combination of algorithms with a high classification rate and resource-constraint embedded systems is essential. This paper addresses these aspects by applying structural optimizations to a convolutional neural network for audio event detection to reduce the memory requirement by a factor of more than 500 and the computational effort by a factor of 2.1 while performing 9.2% better.
Link: https://arxiv.org/abs/1709.09888
====================================================
Pseudo-labels for Supervised Learning on Dynamic Vision Sensor Data, Applied to Object Detection under Ego-motion (Nicholas F. Y. Chen - 14 March, 2018)
Using principles inspired by the retina, its high temporal resolution overcomes motion blurring, its high dynamic range overcomes extreme illumination conditions and its low power consumption makes it ideal for embedded systems on platforms such as drones and self-driving cars. We show, for the first time, event-based car detection under ego-motion in a real environment at 100 frames per second with a test average precision of 40.3% relative to our annotated ground truth
Link: https://arxiv.org/abs/1709.09323
====================================================
Beyond opening up the black box: Investigating the role of algorithmic systems in Wikipedian organizational culture (R. Stuart Geiger - 1 October, 2017)
Scholars and practitioners across domains are increasingly concerned with algorithmic transparency and opacity, interrogating the values and assumptions embedded in automated, black-boxed systems, particularly in user-generated content platforms. Over the past 15 years, Wikipedian veterans and administrators have made specific decisions to support administrative and editorial workflows with automation in particular ways and not others
Link: https://arxiv.org/abs/1709.09093
====================================================
Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design (Zhourui Song - 24 November, 2017)
The heavy burdens of computation and off-chip traffic impede deploying the large scale convolution neural network on embedded platforms. Experiments revealed that 8-bit mantissa, including sign bit, in BFP representation merely induced less than 0.3% accuracy loss
Link: https://arxiv.org/abs/1709.07776
====================================================
Complexity of Finding Perfect Bipartite Matchings Minimizing the Number of Intersecting Edges (Grzegorz GuÅpiel - 22 December, 2017)
We additionally require that H admits a perfect matching and assume that edges of H are embedded in the plane as segments. [3] and generalized by Bonnet, Miltzow and Rzazewski [1]
Link: https://arxiv.org/abs/1709.06805
====================================================
Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video (Mohammad Javad Shafiee - 18 September, 2017)
Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system.
Link: https://arxiv.org/abs/1709.05943
====================================================
A Survey of Calibration Methods for Optical See-Through Head-Mounted Displays (Jens Grubert - 13 September, 2017)
As with most Augmented and Virtual Reality systems, the physical position of an OST HMD is typically determined by an external or embedded 6-Degree-of-Freedom tracking system. For over 20 years, researchers have proposed various calibration methods to determine this needed eye position
Link: https://arxiv.org/abs/1709.04299
====================================================
qDSA: Small and Secure Digital Signatures with Curve-based Diffie--Hellman Key Pairs (Joost Renes - 11 September, 2017)
qDSA is a high-speed, high-security signature scheme that facilitates implementations with a very small memory footprint, a crucial requirement for embedded systems and IoT devices, and that uses the same public keys as modern Diffie--Hellman schemes based on Montgomery curves (such as Curve25519) or Kummer surfaces. qDSA resembles an adaptation of EdDSA to the world of Kummer varieties, which are quotients of algebraic groups by $\pm$1
Link: https://arxiv.org/abs/1709.03358
====================================================
Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach (Bowen Cheng - 10 September, 2017)
While video frames are downsampled at the encoder side, the decoder is embedded with a deep network model for joint super-resolution (SR) and recognition. The proposed framework is evaluated on the AVEC 2016 benchmark, and demonstrates significantly improved stand-alone recognition performance, as well as rate-distortion (R-D) performance, than either directly recognizing from LR frames, or separating SR and recognition.
Link: https://arxiv.org/abs/1709.03126
====================================================
Robustness of Interdependent Random Geometric Networks (Jianan Zhang - 4 June, 2018)
Based on this model, we study the robustness of two interdependent spatially embedded networks where interdependence exists between geographically nearby nodes in the two networks. We derive analytical upper bounds on the percolation thresholds of two interdependent RGGs by discretization, and obtain $99\%$ confidence intervals for the percolation thresholds by simulation
Link: https://arxiv.org/abs/1709.03032
====================================================
RDeepSense: Reliable Deep Mobile Computing Models with Uncertainty Estimations (Shuochao Yao - 9 September, 2017)
In this work, we propose RDeepSense, the first deep learning model that provides well-calibrated uncertainty estimations for resource-constrained mobile and embedded devices. Results show that RDeepSense can reduce around 90% of the energy consumption while producing superior uncertainty estimations and preserving at least the same model accuracy compared with other state-of-the-art methods.
Link: https://arxiv.org/abs/1709.02980
====================================================
Super-speeds with Zero-RAM: Next Generation Large-Scale Optimization in Your Laptop! (Mark Amo-Boateng - 10 September, 2017)
The novel algorithm is capable of achieving breakthrough speeds for very large-scale optimization on general purpose laptops and embedded systems. Application of the algorithm to the Griewank function was possible in up to 1 billion decision variables in double precision took only 64485 seconds (~18 hours) to solve, while consuming 7,630 MB (7.6 GB) or RAM on a single threaded laptop CPU
Link: https://arxiv.org/abs/1709.02500
====================================================
Embedded Binarized Neural Networks (Bradley McDanel - 6 September, 2017)
We study embedded Binarized Neural Networks (eBNNs) with the aim of allowing current binarized neural networks (BNNs) in the literature to perform feedforward inference efficiently on small embedded devices. All intermediate results from a layer are stored as binary values, as opposed to floating-points used in current BNN implementations, leading to a 32x reduction in required temporary space. For example, eBNN achieves 95\% accuracy on the MNIST dataset running on an Intel Curie with only 15 KB of usable memory with an inference runtime of under 50 ms per sample
Link: https://arxiv.org/abs/1709.02260
====================================================
Real-time convolutional networks for sonar image classification in low-power embedded systems (Matias Valdenegro-Toro - 7 September, 2017)
Autonomous Underwater Vehicles use low-power embedded systems for sonar image perception, and cannot execute large neural networks in real-time. Our networks can classify a 96x96 sonar image with 98.8 - 99.7 accuracy on only 41 to 61 milliseconds on a Raspberry Pi 2, which corresponds to speedups of 28.6 - 19.7.
Link: https://arxiv.org/abs/1709.02153
====================================================
Capturing natural-colour 3D models of insects for species discovery (Chuong V. Nguyen - 6 September, 2017)
The resulting models are compact (around 10 megabytes), afford excellent optical resolution, and can be readily embedded into documents and web pages, as well as viewed on mobile devices
Link: https://arxiv.org/abs/1709.02039
====================================================
Evaluating Content-centric vs User-centric Ad Affect Recognition (Abhinav Shukla - 6 September, 2017)
Specifically, we (1) compile an affective ad dataset capable of evoking coherent emotions across users; (2) explore the efficacy of content-centric convolutional neural network (CNN) features for encoding emotions, and show that CNN features outperform low-level emotion descriptors; (3) examine user-centered ad AR by analyzing Electroencephalogram (EEG) responses acquired from eleven viewers, and find that EEG signals encode emotional information better than content descriptors; (4) investigate the relationship between objective AR and subjective viewer experience while watching an ad-embedded online video stream based on a study involving 12 users
Link: https://arxiv.org/abs/1709.01684
====================================================
Affect Recognition in Ads with Application to Computational Advertising (Abhinav Shukla - 6 September, 2017)
This work (i) compiles an affective ad dataset capable of evoking coherent emotions across users, as determined from the affective opinions of five experts and 14 annotators; (ii) explores the efficacy of convolutional neural network (CNN) features for encoding emotions, and observes that CNN features outperform low-level audio-visual emotion descriptors upon extensive experimentation; and (iii) demonstrates how enhanced affect prediction facilitates computational advertising, and leads to better viewing experience while watching an online video stream embedded with ads based on a study involving 17 users
Link: https://arxiv.org/abs/1709.01683
====================================================
360 Panorama Cloning on Sphere (Qiang Zhao - 5 September, 2017)
Considering the sphere geometry constraint embedded in spherical panoramic images, we develop a coordinate-based method that directly clones in the spherical domain. preserving the patch's orientation, and handling the large-patch cloning (covering over 180 field of view) which may suffer from discoloration artifacts
Link: https://arxiv.org/abs/1709.01638
====================================================
Learning Word Embeddings from the Portuguese Twitter Stream: A Study of some Practical Aspects (Pedro Saleiro - 4 September, 2017)
Using a single GPU, we were able to scale up vocabulary size from 2048 words embedded and 500K training examples to 32768 words over 10M training examples while keeping a stable validation loss and approximately linear trend on training time per epoch. We also observed that using less than 50\% of the available training examples for each vocabulary size might result in overfitting. Results on intrinsic evaluation show promising performance for a vocabulary size of 32768 words
Link: https://arxiv.org/abs/1709.00947
====================================================
Should I Stay or Should I Go? On Forces that Drive and Prevent MBSE Adoption in the Embedded Systems Industry (Andreas Vogelsang - 1 September, 2017)
[Goal] In this paper, we investigate the forces that prevent or impede the adoption of MBSE in companies that develop embedded software systems. [Method] Our results are based on 20 interviews with experts from 10 companies
Link: https://arxiv.org/abs/1709.00266
====================================================
Choreography in the embedded systems domain: A systematic literature review (NebojÅ¡a TauÅ¡an - 30 August, 2017)
To fulfil this objective, a systematic literature review of scientific publications that focus on the use of choreography in the embedded systems domain was carried out. After screening, 48 publications were selected as primary studies and analysed using thematic synthesis
Link: https://arxiv.org/abs/1708.09136
====================================================
FirmUSB: Vetting USB Device Firmware using Domain Informed Symbolic Execution (Grant Hernandez - 30 August, 2017)
Embedded USB devices use microcontrollers that have not been well studied by the binary analysis community, and our work demonstrates how lifters into popular intermediate representations for analysis can be built, as well as the challenges of doing so. We develop targeting algorithms and use domain knowledge to speed up these processes by a factor of 7 compared to unconstrained fully symbolic execution. We also successfully find malicious activity in embedded 8051 firmwares without the use of source code
Link: https://arxiv.org/abs/1708.09114
====================================================
Watch Me, but Don't Touch Me! Contactless Control Flow Monitoring via Electromagnetic Emanations (Yi Han - 29 August, 2017)
We present Zeus, a contactless embedded controller security monitor to ensure its execution control flow integrity. Zeus was able to distinguish between different legitimate and malicious executions with 98.9% accuracy and with zero overhead on PLC execution by design.
Link: https://arxiv.org/abs/1708.09099
====================================================
Adaptive Linear Programming Decoding of Nonbinary Linear Codes Over Prime Fields (Eirik Rosnes - 23 August, 2017)
For $p=3$, there is only a single valid symmetric class and we prove that the resulting inequalities together with the so-called simplex constraints give a completely and irredundant description of the codeword polytope of the embedded SPC code. For $p>5$, we show that there are additional facets beyond those from the proposed construction. Furthermore, we construct a decoder for linear codes over arbitrary fields $\mathbb{F}_q$ with $q=p^m$ and $m>1$ by a factor graph representation that reduces to several instances of the case $m=1$, which results, in general, in a relaxation of the original decoding polytope
Link: https://arxiv.org/abs/1708.06959
====================================================
Automatic HVAC Control with Real-time Occupancy Recognition and Simulation-guided Model Predictive Control in Low-cost Embedded System (Muhammad Aftab - 17 August, 2017)
With this in mind, we designed and implemented an occupancy-predictive HVAC control system in a low-cost yet powerful embedded system (using Raspberry Pi 3) to demonstrate the following key features for building automation: (1) real-time occupancy recognition using video-processing and machine-learning techniques, (2) dynamic analysis and prediction of occupancy patterns, and (3) model predictive control for HVAC operations guided by real-time building thermal response simulations (using an on-board EnergyPlus simulator)
Link: https://arxiv.org/abs/1708.05208
====================================================
Prune the Convolutional Neural Networks with Sparse Shrink (Xin Li - 8 August, 2017)
Nowadays, it is still difficult to adapt Convolutional Neural Network (CNN) based models for deployment on embedded devices. As shown in our experiments, we can reduce 56.77% parameters and 73.84% multiplication in total with only minor decrease in accuracy
Link: https://arxiv.org/abs/1708.02439
====================================================
On the Effect of Semantically Enriched Context Models on Software Modularization (Amir Saeidi - 4 August, 2017)
Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. In some cases, the authoritativeness of decompositions is improved by 67%
Link: https://arxiv.org/abs/1708.01680
====================================================
Compiling Deep Learning Models for Custom Hardware Accelerators (Andre Xian Ming Chang - 10 December, 2017)
Two properties that leave room for potential software and hardware optimizations for embedded systems. Snowflake with $256$ processing units was synthesized on Xilinx's Zynq XC7Z045 FPGA. At $250$ MHz, AlexNet achieved in $93.6$ frames/s and $1.2$ GB/s of off-chip memory bandwidth, and $21.4$ frames/s and $2.2$ GB/s for ResNet18. Total on-chip power is $5$ W.
Link: https://arxiv.org/abs/1708.00117
====================================================
Ramsey Spanning Trees and their Applications (Ittai Abraham - 27 July, 2017)
The metric Ramsey problem asks for the largest subset $S$ of a metric space that can be embedded into an ultrametric (more generally into a Hilbert space) with a given distortion. Mendel and Naor 2007 devised the so called Ramsey Partitions to address this problem, and showed the algorithmic applications of their techniques to approximate distance oracles and ranking problems.
Link: https://arxiv.org/abs/1707.08769
====================================================
Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration (Jeng-Hau Lin - 15 July, 2017)
Such networks strain the computational capabilities and energy available to embedded and mobile processing platforms, restricting their use in many important applications. Our BCNNw/SF accelerator realizes memory savings of 17% and execution time reduction of 31.3% compared to BCNN with only minor accuracy sacrifices.
Link: https://arxiv.org/abs/1707.04693
====================================================
LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation (Abhishek Chaurasia - 14 June, 2017)
We also compare our networks processing time on NVIDIA GPU and embedded system device with existing state-of-the-art architectures for different image resolutions.
Link: https://arxiv.org/abs/1707.03718
====================================================
An Embedded Deep Learning based Word Prediction (Seunghak Yu - 6 July, 2017)
In this work we propose an embedded deep learning based word prediction method that optimizes run-time memory and also provides a real time prediction environment. Our model size is 7.40MB and has average prediction time of 6.47 ms
Link: https://arxiv.org/abs/1707.01662
====================================================
Towards lightweight convolutional neural networks for object detection (Dmitriy Anisimov - 5 October, 2017)
Our vehicle detection models are accurate, fast and therefore suit for embedded visual applications. With only 1.5 GFLOPs our best model gives 93.39 AP on validation subset of challenging DETRAC dataset. The smallest of our models is the first to achieve real-time inference speed on CPU with reasonable accuracy drop to 91.43 AP.
Link: https://arxiv.org/abs/1707.01395
====================================================
DeepStory: Video Story QA by Deep Embedded Memory Networks (Kyung-Min Kim - 4 July, 2017)
Deep Embedded Memory Networks (DEMN), to reconstruct stories from a joint scene-dialogue video stream using a latent embedding space of observed data. This is mainly due to 1) the reconstruction of video stories in a scene-dialogue combined form that utilize the latent embedding and 2) attention
Link: https://arxiv.org/abs/1707.00836
====================================================
Robust Cost-Sensitive Learning for Recommendation with Implicit Feedback (Peng Yang - 20 July, 2017)
A cost-sensitive learning model is embedded into the framework. The theoretical result shows that even with a small fraction of 1's in the U-I matrix $M\in\mathbb{R}^{n\times m}$, the cost-sensitive error of the proposed model is upper bounded by $O(\fracÎ±{\sqrt{mn}})$, where $Î±$ is a bias over imbalanced classes
Link: https://arxiv.org/abs/1707.00536
====================================================
Approximating Sparsest Cut in Low Rank Graphs via Embeddings from Approximately Low-Dimensional Spaces (Yuval Rabani - 21 June, 2017)
Goemans (unpublished, appears in a work of [Magen and Moharammi, 2008]) showed that such points residing in \emph{exactly} $d$ dimensions can be embedded into $\ell_1$ with distortion at most $\sqrt{d}$. Our result improves upon the previously known bound of $O(r)$ on the average distortion, and the integrality gap of the Goemans-Linial SDP under the same preconditions, proven in the previous works of [Deshpande and Venkat, 2014] and [Deshpande, Harsha and Venkat, 2016].
Link: https://arxiv.org/abs/1706.06806
====================================================
Kernelization of Constraint Satisfaction Problems: A Study through Universal Algebra (Victor Lagerkvist - 19 June, 2017)
We show that a CSP problem has a kernel with O(n) constraints if it can be embedded (via a domain extension) into a CSP problem which is preserved by a Maltsev operation. We also study extensions of this towards SAT and CSP problems with kernels with O(n^c) constraints, c>1, based on embeddings into CSP problems preserved by a k-edge operation, k > c
Link: https://arxiv.org/abs/1706.05941
====================================================
New Results on Edge Partitions of 1-plane Graphs (Emilio Di Giacomo - 16 June, 2017)
A $1$-plane graph is a graph embedded in the plane such that each edge is crossed at most once. A NIC-plane graph is a $1$-plane graph such that any two pairs of crossing edges share at most one end-vertex. An edge partition of a $1$-plane graph $G$ is a coloring of the edges of $G$ with two colors, red and blue, such that both the graph induced by the red edges and the graph induced by the blue edges are plane graphs. $(ii)$ Deciding whether a $1$-plane graph admits an edge partition such that the red graph has maximum vertex degree two is NP-complete. $(iii)$ Deciding whether a $1$-plane graph admits an edge partition such that the red graph has maximum vertex degree one, and computing one in the positive case, can be done in quadratic time
Link: https://arxiv.org/abs/1706.05161
====================================================
Block-space GPU Mapping for Embedded SierpiÅski Gasket Fractals (CristÃ³bal A. Navarro - 14 June, 2017)
A block-space map $Î»: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel space $\mathbb{E}$ to embedded fractal space $\mathbb{F}$, that maps in $\mathcal{O}(\log_2 \log_2(n))$ time and uses no more than $\mathcal{O}(n^\mathbb{H})$ threads with $\mathbb{H} \approx 1.58...$ being the Hausdorff dimension, making it parallel space efficient. Experimental performance tests show that in practice $Î»(Ï)$ can produce performance improvement at any block-size once $n > n_0 = 2^8$, reaching approximately $10\times$ of speedup for $n=2^{16}$ under optimal block configurations.
Link: https://arxiv.org/abs/1706.04552
====================================================
SEP-Nets: Small and Effective Pattern Networks (Zhe Li - 13 June, 2017)
While going deeper has been witnessed to improve the performance of convolutional neural networks (CNN), going smaller for CNN has received increasing attention recently due to its attractiveness for mobile/embedded applications. The striking difference from most previous work on parameter binarization/quantization lies at different treatments of $1\times 1$ convolutions and $k\times k$ convolutions ($k>1$), where we only binarize $k\times k$ convolutions into binary patterns. Second, in light of the different functionalities of $1\times 1$ (data projection/transformation) and $k\times k$ convolutions (pattern extraction), we propose a new block structure codenamed the pattern residual block that adds transformed feature maps generated by $1\times 1$ convolutions to the pattern feature maps generated by $k\times k$ convolutions, based on which we design a small network with $\sim 1$ million parameters
Link: https://arxiv.org/abs/1706.03912
====================================================
JetsonLEAP: a Framework to Measure Power on a Heterogeneous System-on-a-Chip Device (Tarsila Bessa - 29 March, 2017)
JetsonLEAP consists of an embedded hardware, in our case, the Nvidia Tegra TK1 System-on-a-chip device, a circuit to control the flow of energy, of our own design, plus a library to instrument program parts. Our entire infrastructure - board, power meter and both circuits - can be reproduced with about $500.00
Link: https://arxiv.org/abs/1706.03042
====================================================
Two-Bus Holomorphic Embedding Method-based Equivalents and Weak-Bus Determination (Shruti Rao - 21 August, 2017)
A new method of solving the power-flow problem, the holomorphically embedded load-flow method (HELM) is theoretically guaranteed to find the high-voltage solution, if one exists, up to the saddle-node bifurcation point (SNBP), provided sufficient precision is used and the conditions of Stahls theorem are satisfied. In this paper, it is shown that the sigma condition proposed in [2] will not produce reliable results and that a modified requirement can be used to produce a tight upper bound on the SNBP
Link: https://arxiv.org/abs/1706.01298
====================================================
DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework (Shuochao Yao - 22 November, 2017)
Recent advances in deep learning motivate the use of deep neutral networks in sensing applications, but their excessive resource needs on constrained embedded devices remain an important impediment. It reduces the size of deep neural networks by 90% to 98.9%. It is thus able to shorten execution time by 71.4% to 94.5%, and decrease energy consumption by 72.2% to 95.7%
Link: https://arxiv.org/abs/1706.01215
====================================================
Real-Time Robot Localization, Vision, and Speech Recognition on Nvidia Jetson TX1 (Jie Tang - 31 May, 2017)
Meanwhile, robots are mobile and usually have tight energy constraints, integrating these services onto an embedded platform with around 10 W of power consumption is critical to the proliferation of mobile robots. In this paper, we present a case study on integrating real-time localization, vision, and speech recognition services on a mobile SoC, Nvidia Jetson TX1, within about 10 W of power envelope
Link: https://arxiv.org/abs/1705.10945
====================================================
HardScope: Thwarting DOP with Hardware-assisted Run-time Scope Enforcement (Thomas Nyman - 12 March, 2018)
We discuss our systematic empirical evaluation of HardScope which demonstrates that it can mitigate all currently known DOP attacks, and has a real-world performance overhead of 3.2% in embedded benchmarks.
Link: https://arxiv.org/abs/1705.10295
====================================================
GridNet with automatic shape prior registration for automatic MRI cardiac segmentation (Clement Zotti - 12 September, 2017)
The novelty of our network comes with its embedded shape prior and its loss function tailored to the cardiac anatomy. Experimental results reveal that our method can segment the left and right ventricles as well as the myocardium from a 3D MRI cardiac volume in 0.4 second with an average Dice coefficient of 0.90 and an average Hausdorff distance of 10.4 mm.
Link: https://arxiv.org/abs/1705.08943
====================================================
A Low-Power Accelerator for Deep Neural Networks with Enlarged Near-Zero Sparsity (Yuxiang Huan - 22 May, 2017)
This paper presents a low-power accelerator for processing Deep Neural Networks in the embedded devices. In the proposed accelerator, 256 multipliers are grouped into 16 independent Processing Lanes (PL) to support up to 16 neuron activations simultaneously. Designed and simulated in UMC 65 nm process, the accelerator operating at 500 MHz is $>$ 4X faster than the mobile GPU Tegra K1 in processing the fully-connected layer FC8 of Alexnet, while consuming 717X less energy.
Link: https://arxiv.org/abs/1705.08009
====================================================
Detecting Recycled Commodity SoCs: Exploiting Aging-Induced SRAM PUF Unreliability (Yansong Gao - 20 May, 2017)
The advantage of SRAM PUFs is that they are widely embedded into commodity devices, thus such a PUF is obtained without a custom design and virtually free of implementation costs. We show that less than 1,000 SRAM responses are adequate to guarantee that both false acceptance rate and false rejection rate are no more than 0.001.
Link: https://arxiv.org/abs/1705.07375
====================================================
LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems (Subarna Tripathi - 16 May, 2017)
In this work, we propose LCDet, a fully-convolutional neural network for generic object detection that aims to work in embedded systems. Our experimental results show that the proposed method achieves comparative accuracy comparing with state-of-the-art CNN-based face detection methods, while reducing the model size by 3x and memory-BW by ~4x comparing with one of the best real-time CNN-based object detector such as YOLO. TF 8-bit quantized model provides additional 4x memory reduction while keeping the accuracy as good as the floating point model
Link: https://arxiv.org/abs/1705.05922
====================================================
CLBlast: A Tuned OpenCL BLAS Library (Cedric Nugteren - 27 April, 2018)
CLBlast has five main advantages over other OpenCL BLAS libraries: 1) it is optimized for and tested on a large variety of OpenCL devices including less commonly used devices such as embedded and low-power GPUs, 2) it can be explicitly tuned for specific problem-sizes on specific hardware platforms, 3) it can perform operations in half-precision floating-point FP16 saving bandwidth, time and energy, 4) it has an optional CUDA back-end, 5) and it can combine multiple operations in a single batched routine, accelerating smaller problems significantly
Link: https://arxiv.org/abs/1705.05249
====================================================
Texture to the Rescue: Practical Paper Fingerprinting based on Texture Patterns (Ehsan Toreini - 22 May, 2017)
Through experiments, we demonstrate that the embedded paper texture provides a more reliable source for fingerprinting than features on the surface. Based on the collected datasets, we achieve 0% false rejection and 0% false acceptance rates. We further report that our extracted fingerprints contain 807 degrees-of-freedom (DoF), which is much higher than the 249 DoF with iris codes (that have the same size of 2048 bits)
Link: https://arxiv.org/abs/1705.02510
====================================================
Group Marching Tree: Sampling-Based Approximately Optimal Motion Planning on GPUs (Brian Ichter - 5 May, 2017)
We show solutions for complex planning problems under differential constraints can be found in ~10 ms on a desktop GPU and ~30 ms on an embedded GPU, representing a significant speed up over the state of the art, with only small losses in performance
Link: https://arxiv.org/abs/1705.02403
====================================================
Restart-Based Security Mechanisms for Safety-Critical Embedded Systems (Fardin Abdi - 3 May, 2017)
In this paper, we aim to decouple the safety of the plant from security of the embedded system by taking advantage of the inherent inertia in such systems. We demonstrate the feasibility of our approach using two realistic systems - an actual 3 degree of freedom (3-DoF) helicopter and a simulated warehouse temperature control unit
Link: https://arxiv.org/abs/1705.01520
====================================================
Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks (Rajarshi Das - 26 April, 2017)
{\it Universal schema} can support reasoning on the union of both structured KBs and unstructured text by aligning them in a common embedded space. This model also outperforms the current state-of-the-art by 8.5 $F_1$ points.\footnote{Code and data available in \url{https://rajarshd.github.io/TextKBQA}}
Link: https://arxiv.org/abs/1704.08384
====================================================
Ranking to Learn: Feature Ranking and Selection via Eigenvector Centrality (Giorgio Roffo - 18 April, 2017)
Our approach has been tested on 7 diverse datasets from recent literature (e.g., biological data and object recognition, among others), and compared against filter, embedded and wrappers methods
Link: https://arxiv.org/abs/1704.05409
====================================================
Exploring Sparsity in Recurrent Neural Networks (Sharan Narang - 6 November, 2017)
The number of parameters in recent state-of-the-art networks makes them hard to deploy, especially on mobile phones and embedded devices. The network size is reduced by 8x and the time required to train the model remains constant. Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x.
Link: https://arxiv.org/abs/1704.05119
====================================================
Visual Recognition of Paper Analytical Device Images for Detection of Falsified Pharmaceuticals (Sandipan Banerjee - 13 April, 2017)
A dataset of cards embedded with these reagents is produced to generate the most distinctive results for a set of 26 different active pharmaceutical ingredients (APIs) and excipients. On testing, higher-level features performed much better in accurately classifying the PAD images, with the CNN models reaching the highest average accuracy of over 94\%.
Link: https://arxiv.org/abs/1704.04251
====================================================
Enabling Embedded Inference Engine with ARM Compute Library: A Case Study (Dawei Sun - 14 April, 2017)
When you need to enable deep learning on low-cost embedded SoCs, is it better to port an existing deep learning framework or should you build one from scratch? In this paper, we share our practical experiences of building an embedded inference engine using ARM Compute Library (ACL). In addition, by utilizing ACL, we managed to build an inference engine that outperforms TensorFlow by 25%
Link: https://arxiv.org/abs/1704.03751
====================================================
FMMU: A Hardware-Automated Flash Map Management Unit for Scalable Performance of NAND Flash-Based SSDs (Yeong-Jae Woo - 11 April, 2017)
Existing SSDs increase the clock frequency of embedded processors or increase the number of embedded processors in order to prevent FTL from acting as bottleneck of SSD performance, but these approaches are not scalable. The experimental results show that the FMMU reduces the FTL execution time in the map cache hit case and the miss case by 44% and 37%, respectively, compared with the existing software-based approach operating in 4-core. FMMU also prevents FTL from acting as a performance bottleneck for up to 32-channel, 8-way SSD using PCIe 3.0 x32 host interface.
Link: https://arxiv.org/abs/1704.03168
====================================================
Field of Groves: An Energy-Efficient Random Forest (Zafar Takhirov - 10 April, 2017)
However their accuracy decreases significantly in energy-constrained mobile and embedded systems space, where all computations need to be completed under a tight energy budget. FoG is ~6.5x less energy efficient than SVM_LR, but achieves 18% higher accuracy on average across all considered datasets.
Link: https://arxiv.org/abs/1704.02978
====================================================
BLASFEO: basic linear algebra subroutines for embedded optimization (Gianluca Frison - 7 January, 2018)
BLASFEO is a dense linear algebra library providing high-performance implementations of BLAS- and LAPACK-like routines for use in embedded optimization. Compared to both open-source and proprietary highly-tuned BLAS libraries, for matrices of size up to about one hundred the high-performance implementation of BLASFEO is about 20-30% faster than the corresponding level 3 BLAS routines and 2-3 times faster than the corresponding LAPACK routines.
Link: https://arxiv.org/abs/1704.02457
====================================================
HiFrames: High Performance Data Frames in a Scripting Language (Ehsan Totoni - 7 April, 2017)
We demonstrate that HiFrames is significantly faster than alternatives such as Spark SQL on clusters, without forcing the programmer to switch to embedded SQL for part of the program. HiFrames is 3.6x to 70x faster than Spark SQL for basic relational operations, and can be up to 20,000x faster for advanced analytics operations, such as weighted moving averages (WMA), that the map-reduce paradigm cannot handle effectively. HiFrames is also 5x faster than Spark SQL for TPCx-BB Q26 on 64 nodes of Cori supercomputer.
Link: https://arxiv.org/abs/1704.02341
====================================================
Multi-Path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained "Hard Faces" (Yuguang Liu - 27 March, 2017)
The "atrous" convolution trick (convolution with up-sampled filters) and a newly proposed sampling layer for "hard" examples are embedded in MP-RPN to further boost its performance. Experiments show that this approach achieves state-of-the-art face detection performance on the WIDER FACE dataset "hard" partition, outperforming the former best result by 9.6% for the Average Precision.
Link: https://arxiv.org/abs/1703.09145
====================================================
An embedded segmental K-means model for unsupervised segmentation and clustering of speech (Herman Kamper - 5 September, 2017)
Like its Bayesian counterpart, this embedded segmental K-means model (ES-KMeans) represents arbitrary-length word segments as fixed-dimensional acoustic word embeddings. We first compare ES-KMeans to previous approaches on common English and Xitsonga data sets (5 and 2.5 hours of speech): ES-KMeans outperforms a leading heuristic method in word segmentation, giving similar scores to the Bayesian model while being 5 times faster with fewer hyperparameters. We then show that ES-KMeans scales to larger corpora by applying it to the 5 languages of the Zero Resource Speech Challenge 2017 (up to 45 hours), where it performs competitively compared to the challenge baseline.
Link: https://arxiv.org/abs/1703.08135
====================================================
The Hardness of Embedding Grids and Walls (Yijia Chen - 19 March, 2017)
The dichotomy conjecture for the parameterized embedding problem states that the problem of deciding whether a given graph $G$ from some class $K$ of "pattern graphs" can be embedded into a given graph $H$ (that is, is isomorphic to a subgraph of $H$) is fixed-parameter tractable if $K$ is a class of graphs of bounded tree width and $W[1]$-complete otherwise.
Link: https://arxiv.org/abs/1703.06423
====================================================
Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations (Liangzhen Lai - 8 March, 2017)
Deep convolutional neural network (CNN) inference requires significant amount of memory and computation, which limits its deployment on embedded devices. Experimental results show that the proposed scheme reduces the weight storage by up to 36% and power consumption of the hardware multiplier by up to 50%.
Link: https://arxiv.org/abs/1703.03073
====================================================
Neural Multi-Step Reasoning for Question Answering on Semi-Structured Tables (Till Haug - 22 March, 2018)
Second, paraphrases of logical forms and questions are embedded in a jointly learned vector space using word and character convolutional neural networks. Our best single model achieves 34.8% accuracy on the WikiTableQuestions dataset, while the best ensemble of our models pushes the state-of-the-art score on this task to 38.7%, thus slightly surpassing both the engineered feature scoring baseline, as well as the Neural Programmer model of [Neelakantan et al., 2016].
Link: https://arxiv.org/abs/1702.06589
====================================================
Evolving Boxes for Fast Vehicle Detection (Li Wang - 29 March, 2017)
Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP
Link: https://arxiv.org/abs/1702.00254
====================================================
Accurate Measurement of Power Consumption Overhead During FPGA Dynamic Partial Reconfiguration (Amor Nafkha - 30 January, 2017)
In the context of embedded systems design, two important challenges are still under investigation. Results in terms of reconfiguration time and power consumption overhead for Virtex 5 FPGAs are shown.
Link: https://arxiv.org/abs/1701.08849
====================================================
Scale effects on spatially embedded contact networks (Peng Gao - 30 January, 2017)
This study examines the scale effects, in terms of spatial extent, on the network structure and the spatial structure of spatially embedded contact networks. Two sets of areal units, regular grids with 24 different levels of spatial extent and census units of three levels of spatial extent, are used to divide one observed and two reference random networks into multiple scales
Link: https://arxiv.org/abs/1701.08721
====================================================
Treelogy: A Novel Tree Classifier Utilizing Deep and Hand-crafted Representations (Ä°lke ÃuÄu - 28 January, 2017)
The proposed algorithm is embedded in a smart-phone application, which is publicly available. Furthermore, our novel dataset comprised of 5408 leaf images is also made public for use of other researchers.
Link: https://arxiv.org/abs/1701.08291
====================================================
FPGA Architecture for Deep Learning and its application to Planetary Robotics (Pranay Gankidi - 25 January, 2017)
However, embedded systems onboard planetary rovers and spacecraft rarely implement learning algorithms due to the constraints faced in the field, like processing power, chip size, convergence rate and costs due to the need for radiation hardening. We simulate and program our architecture on a Xilinx Virtex 7 FPGA. The results show up to a 43-fold speed up by Virtex 7 FPGAs compared to a conventional Intel i5 2.3 GHz CPU
Link: https://arxiv.org/abs/1701.07543
====================================================
Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes (Erfan Azarkhish - 24 September, 2017)
In this paper, we propose a flexible processor-in-memory (PIM) solution for scalable and energy-efficient execution of deep convolutional networks (ConvNets), one of the fastest-growing workloads for servers and high-end embedded systems. NeuroCluster occupies only 8% of the total logic-base (LoB) die area in a standard HMC and achieves an average performance of 240 GFLOPS for complete execution of full-featured state-of-the-art (SoA) ConvNets within a power budget of 2.5W. Overall 11 W is consumed in a single SMC device, with 22.5 GFLOPS/W energy-efficiency which is 3.5X better than the best GPU implementations in similar technologies. The minor increase in system-level power and the negligible area increase make our PIM system a cost-effective and energy efficient solution, easily scalable to 955 GFLOPS with a small network of just four SMCs.
Link: https://arxiv.org/abs/1701.06420
====================================================
Design of an Audio Interface for Patmos (Daniel Sanz Ausin - 23 January, 2017)
Patmos is part of a project founded by the European Union called T-CREST (Time-predictable Multi-Core Architecture for Embedded Systems).[5] The structure of this project is integrated with the Patmos project: new hardware modules have been added as IOs, which allow the communication between the processor and the audio codec
Link: https://arxiv.org/abs/1701.06382
====================================================
Decoupled Access-Execute on ARM big.LITTLE (Anton Weber - 13 January, 2017)
In this work we target the ARM big.LITTLE, a heterogeneous platform that is dominant in the mobile and embedded market, which allows code to run transparently on different microarchitectures with individual energy and performance characteristics. By prefetching data in Access we can achieve an IPC improvement of up to 37% in the Execute phase, and manage to shift more than half of the program runtime to the LITTLE core
Link: https://arxiv.org/abs/1701.05478
====================================================
Embedding Watermarks into Deep Neural Networks (Yusuke Uchida - 20 April, 2017)
The embedded watermark does not disappear even after fine-tuning or parameter pruning; the watermark completely remains even after removing 65% of parameters were pruned
Link: https://arxiv.org/abs/1701.04082
====================================================
An Accurate Interconnect Test Structure for Parasitic Validation in On-Chip Machine Learning Accelerators (Chun-Chen Liu - 9 March, 2017)
Compared with the state-of-the-art interconnect test structures, the new structure is compact in size and can be easily embedded on die as a parasitic variation monitor
Link: https://arxiv.org/abs/1701.03181
====================================================
Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2017 (David Castells-Rufas - 11 January, 2017)
Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2017. Collocated with HIPEAC 2017 Conference.
Link: https://arxiv.org/abs/1701.03053
====================================================
SIPHON: Towards Scalable High-Interaction Physical Honeypots (Juan Guarnizo - 11 January, 2017)
In recent years, the emerging Internet-of-Things (IoT) has led to rising concerns about the security of networked embedded devices. We demonstrate the proposed architecture in a large scale experiment with 39 wormhole instances in 16 cities in 9 countries. Based on this setup, six physical IP cameras, one NVR and one IP printer are presented as 85 real IoT devices on the Internet, attracting a daily traffic of 700MB for a period of two months. A preliminary analysis of the collected traffic indicates that devices in some cities attracted significantly more traffic than others (ranging from 600 000 incoming TCP connections for the most popular destination to less than 50000 for the least popular). We recorded over 400 brute-force login attempts to the web-interface of our devices using a total of 1826 distinct credentials, from which 11 attempts were successful
Link: https://arxiv.org/abs/1701.02446
====================================================
A Framework for Extending microKanren with Constraints (Jason Hemann - 3 January, 2017)
We present a framework for building CLP languages with symbolic constraints based on microKanren, a domain-specific logic language shallowly embedded in Racket. The framework itself and the constraints' implementations amounts to just over 100 lines of code
Link: https://arxiv.org/abs/1701.00633
====================================================
Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices (Wenjia Meng - 4 January, 2017)
Typical large Convolutional Neural Networks (CNNs) need large amounts of memory and computational power, and cannot be deployed on embedded devices efficiently. We present Two-Bit Networks (TBNs) for model compression of CNNs with edge weights constrained to (-2, -1, 1, 2), which can be encoded with two bits
Link: https://arxiv.org/abs/1701.00485
====================================================
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference (Yaman Umuroglu - 1 December, 2016)
On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 Î¼s latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 Î¼s latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy
Link: https://arxiv.org/abs/1612.07119
====================================================
Efficient Optical flow and Stereo Vision for Velocity Estimation and Obstacle Avoidance on an Autonomous Pocket Drone (Kimberly McGuire - 14 March, 2017)
It runs at 20 Hz on a 4 g stereo camera with an embedded STM32F4 microprocessor (168 MHz, 192 kB) and uses feature histograms to calculate optical flow and stereo disparity. The velocity and depth measurements are used for fully autonomous flight of a 40 g pocket drone only relying on on-board sensors
Link: https://arxiv.org/abs/1612.06702
====================================================
An Integrated Optimization + Learning Approach to Optimal Dynamic Pricing for the Retailer with Multi-type Customers in Smart Grids (Fanlin Meng - 21 March, 2018)
In this paper, we consider a realistic and meaningful scenario in the context of smart grids where an electricity retailer serves three different types of customers, i.e., customers with an optimal home energy management system embedded in their smart meters (C-HEMS), customers with only smart meters (C-SM), and customers without smart meters (C-NONE). To this end, we propose a two-level decision-making framework where the retailer acting as upper-level agent firstly announces its electricity prices of next 24 hours and customers acting as lower-level agents subsequently schedule their energy usages accordingly
Link: https://arxiv.org/abs/1612.05971
====================================================
Copycat: A High Precision Real Time NAND Simulator (Juyong Shin - 11 December, 2016)
This NAND simulator facilitates the development of embedded flash memory management software such as the flash translation layer (FTL). Compared against a real FPGA implementation, the simulator's response time deviation is under 0.28% on average, with a maximum of 10.12%.
Link: https://arxiv.org/abs/1612.04277
====================================================
I Spy with My Little Eye: Analysis and Detection of Spying Browser Extensions (Anupama Aggarwal - 3 May, 2018)