∎

¹¹institutetext: Nazeef Ul Haq ²²institutetext: National University of Sciences and Technology (NUST), Islamabad, Pakistan
²²email: [email protected] ³³institutetext: Muhammad Moazam Fraz ⁴⁴institutetext: The Alan Turing Institute, 96 Euston Rd, London NW1 2DB, United Kingdom
[email protected] (M.M. Fraz)

Orientation Aware Weapons Detection In Visual Data : A Benchmark Dataset

N.U.Haq M.M.Fraz T.S.Hashmi M.Shahzad

(Received: date / Accepted: date)

Abstract

Automatic detection of weapons is significant for improving security and well being of individuals, nonetheless, it is a difficult task due to large variety of size, shape and appearance of weapons. View point variations and occlusion also are reasons which makes this task more difficult. Further, the current object detection algorithms process rectangular areas, however a slender and long rifle may really cover just a little portion of area and the rest may contain unessential details. To overcome these problem, we propose a CNN architecture for Orientation Aware Weapons Detection, which provides oriented bounding box with improved weapons detection performance. The proposed model provides orientation not only using angle as classification problem by dividing angle into eight classes but also angle as regression problem. For training our model for weapon detection a new dataset comprising of total 6400 weapons images is gathered from the web and then manually annotated with position oriented bounding boxes. Our dataset provides not only oriented bounding box as ground truth but also horizontal bounding box. We also provide our dataset in multiple formats of modern object detectors for further research in this area. The proposed model is evaluated on this dataset, and the comparative analysis with off-the shelf object detectors yields superior performance of proposed model, measured with standard evaluation strategies. The dataset and the model implementation are made publicly available at this link: https://bit.ly/2TyZICF.

Keywords:

Weapons Detection Orientated Object Detection Deep Learning Firearm violence

1 Introduction

Weapon savagery incidents are largely spread across the globe what’s more, are being seen at expanding recurrence moian2018walking , morris2018mass , Texas and planty2013firearm . Consistently, around a quarter-million individuals die due to firearm savagery BBC . Steps for gun control don’t appear to be compelling despite such an enormous number of deplorable occasions. Firearm brutality consistently covers the globe, and it is making unfavorable consequences for mankind. The issue should be tended to logically for the advancement of general well-being, a person’s health, and security. Solid voices have been raised recently for logical information-supported exploration to forestall weapon viciousness jaffe2018gun and to finance research for such tasks NYT . Over the world, security agencies, government, and private institutions have expanded the applications of surveillance systems to protect the lives of people, secure buildings, and monitor the commercial area. As in this globalized world, there are more people to people contacts but at the same time, gun violence due to mutual hatred is also common. So, an efficient firearm detection method rooted in a surveillance system to avert gun violence or make a prompt response is unavoidablevelastin2006motion and ainsworth2002buyer . Such steps will not only ensure security but also boost the economy due to low medical costs because of timely response due to early detection of weapons through an automated mechanism.

Previously, visual systems have been used for the identification of objects in images or videos. These systems can also be used for firearm detection due to the particular characteristics of these systems. Systems like based on CNN liu2016ssd ,ren2015faster ,redmon2016you has used for such purposes however, the generic version was not suitable for firearm detection objectshu2018sinet ; li2017scale , li2017perceptual , liu2019towards and zhang2016faster . The reason for this is hiding in the chemistry of firearm objects like unflattering viewing angles, clutter, occlusions, shape, and size of firearms which make the detection more tricky than other objects such as vehicles, human faces, and airplanes. The existing systems have one limitation that is the use of axis-aligned windows in object detection systems. Most experts analyze the features of window and determine the presence of objects however, due to the small size of guns, the physically elongated structure of guns make window inefficient in case of firearm detection. This happens owing to a very low foreground to background ratio where the weapon behaves like foreground and window with all its features other than weapon like a background. For example, when a person is carrying a gun and the window is busy detecting the gun, at the same time it will also detect the person who is carrying the gun. This amalgamated mixture makes the information difficult for the classifier to differentiate between a required object and background information. Zhou et al. proposed a feature detection method recently which is orientation aware, particularly for handling planer rotations however it could not prove efficient for handling clutterzhou2017oriented . Some methods used a mechanism where angles were made a part of an anchor which resulted in an increased number of anchors that requires to be categorized for every object and proposalazimi2018towards ,ma2018arbitrary . The drawback lies in a large number of anchors because it is computationally ineffective. Moreover, these methods need to train region proposals on oriented bounding boxes but presently almost all data sets have horizontal bounding boxes as ground truth. There are very few data sets that have an oriented bounding box as ground truth. Our proposed method in this paper required oriented bounding box for training.

In this research work, we propose an orientation-aware weapon detection algorithm in visual data which not only improves the detection of objects but also gives information about the orientation of objects. We also prepared a new dataset comprises of 6400 weapon images. Our dataset has an oriented bounding box as ground truth with angle information. According to our best knowledge, no comprehensive dataset related to weapons is publicly available. Our proposed algorithm is end-to-end trainable. A direction forecast module is prepared to foresee conceivable object direction for every area proposition. Our proposed model has done this task in two ways. The first is using angle of the object as classification and the second is using angle as a regression problem. We named our proposed method as Orientation Aware Weapons Detection (OAWD) because it provides not only detection but also the orientation of objects.

For training and assessment purposes, a broad data set (OAWD) comprising of a wide assortment of guns and pistols is gathered from the web. It comprises photos of genuine scenes also ones from dramatizations and motion pictures. Figure 1 shows some sample random pictures of our data set.

Refer to caption — Figure 1: Illustrations from the proposed Orientation Aware Weapons Detection (OAWD) data-set

This data set is annotated manually by using the roLabelImg tool which is an extended version of the LabelImg tool and provides an oriented bounding box. The data set, contains 6,400 pictures and has assorted attributes counting pictures catching different guns, various conditions, present varieties of people conveying weapons, what’s more, pictures with guns without people. The proposed model is compared with FRCNN ren2015faster trained on the same data set. Our work will be an impetus in making a difference out for oriented-aware weapons recognition.

Our major contributions of this research work are listed underneath.

•

We present the first exhaustive work on guns location in RGB pictures. We analyse the drawbacks of using a horizontal bounding box for detection of oriented aware object detection e.g. pistols and guns by taking noise as background information. We propose orientation aware model in two ways: one is using angle as a classification problem and the second is using angle as a regression problem. Oriented bounding box has less background information as compared to axis-aligned windows.
•

We also propose a first comprehensive oriented aware weapons dataset consisting of a total of 6400 images with single and numerous guns and pistols. Our dataset not only provides an oriented bounding box with angle information as a ground truth but also a provides horizontal bounding box as ground truth. We provide annotation of our dataset in three different formats for further research
•

Our proposed oriented aware model outperforms in results as compared to present state-of-the-art object detection algorithms e.g. Faster R-CNN.

The remainder of this paper is sorted out as follows: Section 2 is about related work. Our proposed Orientation Aware Weapons Detection (OAWD) in visual data is described in Section 3. Section 4 is about the data set. Section 5 is about experiments done and analysis of results and Section 6 follows the conclusion and future directions.

2 Literature Review

Research on visual gun recognition in pictures or recordings is very scanty and presently there is no committed gun indicator or gun benchmark data set for execution assessment. Almost 1000 object classes are present in the ImageNet data setdeng2009imagenet and just two are devoted to guns counting quickly discharging programmed weapons and gun or pistol with the spinning chamber, which is a very little assortment and not sufficient enough data set for detection purposes. Open Images data set V5 has also a handgun class but it has only around 600 images of this which are not enough. Moreover, the orientation of these data set is horizontal, not oriented box. Olmos et al. has applied Faster RCNN ren2015faster for detection of a handgun in recordingsolmos2018automatic , while no outcomes have been accounted for on rifle identification. Akcay et al. has applied RCNNgirshick2014rich , FRCNNren2015faster , Yolo V2redmon2017yolo9000 and RFCNdai2016r for object recognition inside x-ray stuff security symbolismakcay2018using . As opposed to these works, we first time, address the issue of oriented aware visual gun and riffle recognition in pictures in a more comprehended way.

Despite the fact that very limited research work has been done for oriented-aware weapon detection, comprehensive research work is available for creating generic object recognition algorithms. Tao Gong et al. gong2019using proposed a unique end-to-end framework that improves object detection by using multi-label classification as an assistant assignment. But they didn’t provide any method for oriented aware object detection and our proposed method provides orientation aware object detection. Y.Xiao et. all presented a review analysis of different deep learning models for object detection in detail. They also presented a brief overview of different traditional object detection methods. YOLO family redmon2016you ; redmon2017yolo9000 ; redmon2018yolov3 which are based on deep convolutional neural network are very good detectors for real-time object detection. YOLOredmon2016you is very speedy and it can be applied in real-time detection. It provides more localization errors than other detector methods. YOLO v2 redmon2017yolo9000 and v3redmon2018yolov3 gives good results than YOLO redmon2016you because it uses multi-scale training, batch normalization and anchor boxes. However, the performance of YOLO family is lower than Faster RCNNren2015faster . YOLO family provides horizontal bounding boxes but our proposed method provides an oriented bounding box method.

As compared to YOLOredmon2016you , SSDliu2016ssd is also a single-stage object recognition algorithm. VGG16han2015deep network is used as base architecture and instead to classify from the last fully connected layer of VGG16 they add four extra convolution layers using smaller filters and take samples from multiple feature layers and then combine them in output feature space. Performance of SSDliu2016ssd compromises if objects are small. Another variation of SSDliu2016ssd is DSSD fu2017dssd as they used ResNet-101 architecture as base network instead of VGG16 which is used in SSDliu2016ssd . DSSDfu2017dssd uses deconvolutional layers that use skip connections and give rich feature map for better recognition. A recent study shows that the performance of DSSDfu2017dssd is less as compared to Faster RCNNren2015faster . Also, because of deconvolutional layers, the computational complexity of DSSDfu2017dssd is higher. SSD and DSSD also provide axis-aligned bounding box for object detection. They do not have ability to provide an oriented bounding box for object detection. SSD and DSSD both models do not incorporate angle in any way for orientation. Our proposed model incorporates angle and provides an oriented aware bounding box.

In weapon detection issues, weapons may show up in very little size contrasted with the general picture measurements. In recognition of small objects, picture size, scale, and relevant data may have essentialness for learning profound model hu2017finding . M.Grega et al. grega2016automated proposed a model for automated recognition of knives and firearms in CCTV video. They used a conventional approach for the recognition of firearms. They only focused on pistol detection in weapons. Although they detect pistol but detection is not oriented aware while our proposed model provides oriented bounding box for object detection. T.Y.Lin et al. proposed novel focal loss to handle the problem of class imbalance lin2017focal . R. Olmos et al. proposed a model based on Faster RCNNren2015faster which generates an automatic alarm if the model detects a handgun in videoolmos2018automatic . They just focused on handgun detectionolmos2018automatic . They do not focus that at which angle handgun is present. Our proposed model also tells this that at which angle the handgun is present. R.K Tiwari et al. tiwari2015computer proposed a conventional based approach for gun recognition in visual data with the help of Harris interest point detectorharris1988combined and FREAKalahi2012freak . Their performance does not perform well on illumination change. Chen et al. and Liu et al. have stressed the essentialness of context and example relationship for precise detection of objectliu2018structure , chen2018context . For instance, in the event that an individual is holding a tennis racket, at that point there ought to be a ball close by. Anyway, on account of weapons, many of the logic may remain unrelated contextually to the existence or non-existence of weapons.

Huang and Rather et al. have performed an analysis in detail between accuracy-speed trade offhuang2017speed , in which they showed that Faster RCNN ren2015faster is more steady contrasted with the other locators. Faster RCNN ren2015faster has developed to the current structure in the wake of experiencing numerous varieties. In the past adaptations, RCNN was using Selective Search method uijlings2013selective for generating region proposals, applied profound convolutional systems on each proposal to remove low-level features, and then SVM furey2000support was used for classification. Later on, the RPN module introduced in Faster RCNN ren2015faster that share convolutional features of the full image with the detection network. They used the sliding window method on a convolutional feature map to generate region proposals. ROI pooling layer was also introduced for obtaining fixed-size regions of interest. At every position of the feature map, they yield a total of 9 anchors using three scales and three ratios. They performed NMS to obtain high probability region proposals. After applying ROI pooling on selected region proposals, the classifier is applied. Presently an enormous number of FRCNN have been proposed, particularly FPN and Mask RCNNlin2017feature ,he2017mask are more significant. All these methods give a horizontal bounding box at the output. Our proposed model provides an oriented aware model not only on the basis of angle classification but also on the basis of angle regression. According to our best knowledge no one has done orientation using angle regression. Thus our proposed model Orientation Aware Weapon Detection (OAWD) is unique and different from present object detection algorithms.

According to our best knowledge, no comprehensive public dataset is available related to weapons that provide oriented bounding box as ground truth. Although it can be seen in Table 1 that some public available datasets (Open Images Dataset V5kuznetsova2020open , IMAGENET Dataset fei2010imagenet , Handgun Dataset olmos2018automatic , Guns Dataset gunsdataset and Weapons Detection Dataset olmos2018automatic ) have weapons images but they provide horizontal bounding box as ground truth not oriented bounding box. Moreover, they have very less weapon images. Although HRSC2016 liu2016ship dataset and DOTA dataset xia2018dota provide oriented bounding box but they have no weapon class and weapon images. In contrast to other publicly available datasets which we mentioned above, our dataset has a total of 6400 weapon images of two classes: one is a gun and the second is a pistol. Our dataset provides both oriented bounding box with angle information and horizontal bounding boxes as ground truth. We are also working to expand this dataset further.

3 Dataset

For automatic detection of weapons, no oriented aware public data set is available for guns and pistols for training. In this research work, we propose the first oriented aware data set for weapons which we named ”OAWD data set”.

3.1 OAWD Data Set

In present years, Datasets have assumed a significant part in the information-driven examination. Enormous datasets like MSCOCO are instrumental in advancing object detection and captioning. With regards to the classification undertaking and scene acknowledgment task, the equivalent is valid for ImageNet and Places, separately. Moreover, in oriented object detection, a dataset looking like MSCOCO and ImageNet both as far as quantity of images and oriented annotations have been missing, which gets one of the fundamental obstructions for research especially in detecting oriented aware weapons detection. Accordingly, a comprehensive large-scale and challenging-oriented aware weapons data set is basic for advancing research in this domain.

Table 1: Comparison of OAWD dataset with other publicly available dataset which has weapons images

Dataset	Annotation Method	Weapon class	Weapon Images
HRSC2016 liu2016ship	Oriented BB	No	0
DOTA xia2018dota	Oriented BB	No	0
Open Images Dataset V5 kuznetsova2020open	Horizontal BB	yes	650
IMAGENET Dataset fei2010imagenet	Horizontal BB	yes	1200
HandGun Dataset olmos2018automatic	Horizontal BB	yes	795
Guns Dataset gunsdataset	Horizontal BB	yes	333
Weapons Detection Dataset olmos2018automatic	Horizontal BB	yes	3304
OAWD Dataset	Oriented BB	yes	6400

We contend that a comprehensive weapons dataset ought to have four properties, specifically, 1) an enormous number of pictures, 2) numerous examples per classifications, 3) appropriately oriented bounding box along with the angle, and 4) different classes of weapons, which make it way to deal with real-time applications. While presently available dataset has lack of comprehensive dataset of weapons, even some datasets do not have weapon class. With regards to general object detection datasets, ImageNet dataset fei2010imagenet and MSCOCO lin2014microsoft are supported by analysts because of the enormous number of pictures, numerous classifications categories, and good annotations. ImageNet is the largest dataset as compared to other datasets as it has the biggest number of pictures. Notwithstanding, the average number of objects per picture is less than MSCOCO and our OAWD, in addition to the constraints of its spotless foundations and deliberately chosen scenes.

3.2 Annotations of OAWD dataset

3.2.1 Images Collection

We have gathered our own data set using web scrapping with different keywords e,g, gun, rifle, pistol, weapon, hand gun, gun with human and rifle in the human hand, etc. Then manually delete those images which were not related to guns or rifles. We have also collected images using the google chrome extension ”Download All Images” to download all images of a specific page. We have also downloaded videos of movies from YouTube and then extract images from those videos which have pistols or guns in the image. We have also downloaded videos of robbery in the shop and such videos are very low resolution and images gathered from these videos have very low resolution. The purpose of gathering low-resolution images was that so our model can detect real-time weapons. Because we knew that normally video quality of cameras installed in shops are of very low resolution.

Table 2: Statistics of Data set after splitting into training and testing data set

	Weapon Count
Dataset	Total Images	Gun	Pistol	Total weapons
Training	5149	4341	3206	7547
Test	1249	642	825	1467

3.2.2 Class Selection

Two classes are chosen and manually annotated in OAWD dataset, including gun and pistol. Gun can be of any category e.g. bolt action riffles Remington 700 and Howa 1500, lever action riffles Winchester 94 and Marlin 336 and semi-automatic riffles AR-15 and Browning BAR while pistol is shorthand gun includes Moss-berg 500, Remington 870, Smith & Wesson Model 686, Ruger GP100 and Glock 17, etc.

3.2.3 Annotation Method

We think about various methods of annotation. In computer vision, numerous visual ideas, for example, attributes, objects, relationships, and region descriptions are annotated using bounding boxes. A typical depiction of bounding boxes is ( $x_{c}$ , $y_{c}$ ,w and h), where ( $x_{c}$ , $y_{c}$ ) is the middle location of rectangle and w, h are the width and height of the bounding box. After gathering images, the next part was to annotate this data using such a tool that gives rotated annotation. We have used roLabelImg rolabelimg tool for annotation of images. roLabelImg tool is an extended version of LabelImg which provides annotation in rotated format. This tool provided annotation in XML format like PASCAL VOC format. The annotation format of our data set is Xc, Yc, width, height, and angle. Figure 4 shows sample ground truth of an image annotated by the roLabelImg tool.

3.2.4 Dataset Splits

Table 1 shows the statistical data set of training and testing after randomly splitting data set into training and testing with the ratio of 80% vs 20%. It also shows that how many objects of the Gun and Pistol class are present in both the training and testing data set. It can also be seen in table 2 that how many total objects are present in the training and testing data set.

3.3 Properties of OAWD

3.3.1 Image Size

The spatial dimension of images in the OAWD dataset is not fixed. The minimum spatial size of a single image in our dataset is 104 x 104 while the maximum dimension of a single image in our dataset is 6720 x 4480. We did manual annotations on the full image.

3.3.2 Annotation Formats of OAWD Dataset

Annotation formats are very important for fast research. OAWD dataset is available in the following three different formats:

•

CNTK Faster RCNN Format
•

Tensor Flow Pascal VOC Format
•

YOLO aware Format

3.3.3 Angle Wise Distribution of OAWD dataset

We have divided our angle into 8 classes starting from class 1 to class 8 for analysis of data more statistically. Range of class 1 is from $-11.25^{0}$ to $11.25^{0}$ , class 2 is from $11.25^{0}$ to $33.75^{0}$ , class 3 is from $33.75^{0}$ to $56.25^{0}$ , class 4 is from $56.25^{0}$ to $78.75^{0}$ , class 5 is from $78.75^{0}$ to $101.25^{0}$ , class 6 is from $101.25^{0}$ to $123.75^{0}$ , class 7 is from $123.75^{0}$ to $146.25^{0}$ , class 8 is from $146.25^{0}$ to $168.75^{0}$ .

Figure 4 shows data more statistically according to 8 classes of angle. Figure 5 shows that how many weapons are in every class. It is clear from figure 5 that class 1 has 2654 objects while class 8 according to angle has the most number of objects 2804 and class 5 according to angle has 17, the least number of objects. Minimum 1 weapon and a maximum of 10 weapons are present per image while on average there are 1.40 weapons per image in our data set. Class 4,5 and 6 has very few number of weapons because class 5 belongs to angle 90 while class 4 and class 6 belongs to around 90 degrees and it is a very rare case in real life that a man will have a gun in hand at 90 degrees because at 90-degree gun or riffle nozzle will have a direction towards the sky. Moreover, it is not dangerous for the next person if a man has a weapon in hand at 90 degrees and it is not easy for a man to pick a gun in hand at this angle for a long time.

4 Proposed Model

Most of the present object detection algorithms provide horizontal bounding boxes which contain a lot of background information and the object is not more localized. To conquer this issue, we propose Orientation Aware Weapons Detection (OAWD) algorithm which provides orientation in two different ways. In the first way, the model provides orientation using angle as classification and in the second way, it gives orientation using angle as regression. In first way, we can only tell the specific angle of the object but in a second way we can tell any angle of the object. According to our best knowledge no one has done orientation by taking angle as regression. The following two ways are more described in detail below here.

4.1 Angle Classification Method

OAWD using angle classification mainly comprises of computing deep feature map using ResNet, RPN, RoI pooling followed by FC (fully connected) layers. For this purpose, we also prepared our data set for training and define the angle class. Only at inference time we apply linear rotation transformation for obtaining rotated bounding box. Every one of these segments is clarified in the accompanying subsections. Figure 6 shows the architecture diagram of OAWD using angle classification.

1.

Dataset Preparation: We know Faster R-CNN takes text document as input but in our data set annotation was in XML format. So we first converted that XML sheet into an excel sheet. Now in our excel sheet we have the following information of every image as input to Faster R-CNN; image name, image width, height, X1, Y1, X2, Y2, object class, and angle class.

Figure 6: Our proposed OAWD model architecture diagram with details of feature extraction, selection of top proposals, RoI pooling and detection. It is shown that our proposed model provides orientation in two ways one is using angle classification and other is using angle regression
2.

Define Angle Class: In XML format of data set we have angle information in radian which is in degree from $0^{0}$ to $180^{0}$ . when we convert this XML file into an excel sheet we converted that angle into the following 8 classes; class zero is from $-10^{0}$ to $12.5^{0}$ , class 1 is from $12.5^{0}$ to $35^{0}$ , class 2 is from $35^{0}$ to $57.5^{0}$ , class 3 is from $57.5^{0}$ to $80^{0}$ , class 4 is from $80^{0}$ to $102.5^{0}$ , class 5 is from $102.5^{0}$ to $125^{0}$ , class 6 is from $125^{0}$ to $147.5^{0}$ and class 7 is from $147.5^{0}$ to $170^{0}$ . So now we have additional information of angle in input.
3.

Computing Feature Map: After defining angle class and dataset preparation, We used VGG like architecture for feature extraction from an input image. Though any deep neural network can be used for feature extraction. Here we can use any CNN network for purpose of feature extraction. Convolution operation can be represented as shown in equation 1 given below:

$F_{n}^{m}(a,b)={\sum_{j}\sum_{u,v}i_{j}(u,v).e_{n}^{m}(c,d)}$ (1)

in which $F_{n}^{m}$ (a,b) shows feature map element of $m^{t^{h}}$ kernel of $n^{t^{h}}$ layer, j represents channel, $i_{j}$ (u,v) represents element of an image with respect to $j^{t^{h}}$ channel and $e_{n}^{m}$ (c,d) represents element of $m^{t^{h}}$ kernel with respect to $l^{t^{h}}$ layer. Because input images may have differ in size, that’s why feature map may also differ in spatial dimensions but depth of feature map (512) will remain same. The feature map weights can be learned using following equation 2:

$W(_{j}{}_{,}{}_{c}{}_{)}=G(xj_{,}{}_{c}{},w)$ (2)

in which G represents activation function which is ReLU in our case, $x_{j}$ represents last layer output, w represents weights of $1^{s^{t}}$ convolution layer in OAWD and $W(_{j}{}_{c}{}_{)}$ represents newly learned weights of feature map at position j with respect to c channel.

RPN (Region Proposal Network): RPN network takes feature map as input. RPN is arbitrarily initialized and afterward trained on the training dataset of OAWD dataset to create objectness score and proposals of an object exist in a picture. At every location of the feature map total of 9 anchor boxes using three different scales and three ratios are used to handle the different sizes of objects. RPN module generates almost 2000 region proposals. To additionally decrease the computational complexity, just a small amount of these best proposals is chosen for additional handling which is then used as input to the RoI pooling. We selected the top 300 region proposals on the basis of overlap between generated region proposals and the ground truth box. If the overlap between generated region proposals and ground truth box is greater than 0.7 we selected that box and discard all other remaining region proposals. If we have more than 300 region proposals that have greater than 0.7 overlap with the ground truth box then we select only the top 300 proposals which have greater overlap with ground truth box. We labeled these region proposals as given in equation 3:

RPNLabel=\begin{cases}Gun,\text{if}IOU_{1}\geq 0.7.\\ Pistol,&\text{if $IOU_{2}>=0.7$}.\\ Background,&\text{Otherwise}.\end{cases}

(3)

where $IOU_{1}$ is the IOU between ground truth bounding box of gun and RPN box and $IOU_{2}$ is IOU between ground truth bounding box of pistol and RPN box. The regression loss function for the bounding box is given in equation 4:

L_{r}{}_{e}{}_{g}{}_{r}{}_{e}{}_{s}{}_{s}{}_{i}{}_{o}{}_{n}{}(t^{l},v)={\sum_{j\in({x,y,w,h})}SmoothL_{1}{}(tj^{l}-v_{j})}

(4)

in which $t^{l}$ represents bounding box offsets for l object classes and v is ground truth bounding box offsets and $SmoothL_{1}$ can be calculated using equation 5:

SmoothL_{1}(y)=\begin{cases}0.5y^{2},&\text{if $|y|<1$}.\\ |y|-0.5,&\text{Otherwise}.\end{cases}

(5)

5.

Classification Layers: After getting the top 300 region proposals we do ROI pooling before applying fully connected layers. Best recommendations of proposals from RPN are used as input to ROI pooling and it chooses the corresponding feature map which we have already computed by using ResNet. Since fully connected layers only take fixed-size input which is acquired by applying ROI pooling on the feature map. Then after ROI, we apply 2 fully connected layers and at the output it gives bounding box offset, object class score, and angle class score. We used Softmax loss for angle classification as well. The loss function we used is defined as:

$L_{c}{}_{l}{}_{s}{}(a)_{i}=\frac{e^{a_{m}}}{\sum_{n=1}^{K}e^{a_{n}}}$ (6)

where a is input vector which is equal to (a1,a2,….,aK), m,i=1,2,3,….,K and K is the number of classes which are gun and pistol in our case. Now at the output, we have types of information; one is bounding box offset, second is object class and third is angle class. We used RELU activation to introduce non-linearity in our model. The activation function can be defined as:

$O_{n}^{m}={G_{a}(f_{n}^{m})}$ (7)

in which $G_{a}$ is activation function which takes output of convolution that is $f_{n}^{m}$ and $O_{n}^{m}$ is output for $n^{t^{h}}$ layer.
6.

Linear Transformation: After getting output we apply rotation transformation on 4 corner points of the bounding box. Linear transformation module takes bounding box coordinates and angle as input and it gives rotated points of bounding box according to angle. Then we draw the bounding box according to the new bounding box corner points and finally, we get the rotated bounding box. For example, if we have any point a,b in 2-D. we can rotate that point around a specific position by using equations 8 and 9. These equations give new values of a,b of a point after rotation using given any angle.

$a^{\prime}=a*cos\theta-b*sin\theta$ (8)

$b^{\prime}=b*cos\theta+a*sin\theta$ (9)

We have divided our angle into 8 classes as describes above. But during rotation transformation, we used 8 separate angles for every class. For class 1 we choose angle zero although class zero has an angle range between $-10^{0}$ to $12.5^{0}$ . Similarly for class 2,3,4,5,6,7 and 8 we choose angle $22.5^{0}$ , $45^{0}$ , $67.5^{0}$ , $90^{0}$ , $112.5^{0}$ , $135^{0}$ and $157.5^{0}$ respectively.

4.2 Angle Regression Method

Since problem was to detect and classify weapons along with their orientation. In the previous part, we deal with angle as a classification problem but it does not predict all angles from $0^{0}$ to $180^{0}$ and we can not tell the exact angle of an object present in an image. So here in this part, we deal with angle as a regression problem so that we can predict all angles from $0^{0}$ to $180^{0}$ rather than some specific angles. According to our best knowledge, there is no method or model available right now which deals with angle as a regression problem. There is only a change in data set preparation for training and in classifier layers at the end of the model. All other work is the same as it was in the classification part. For data set preparation, we first converted that XML sheet into an excel sheet. Now in our excel sheet, we have the following information of every image as input to Faster R-CNN; image name, image width, height, X1, Y1, X2, Y2, object class, and angle class. In the XML format of data set, we have angle information from $0^{0}$ to $180^{0}$ .For this purpose when we converted XML file of every image into an excel sheet, we did not divide an angle into classes. We took the angle as it is as it was in the XML file. Angle information in the XML file is in radian.

We took an angle in such a way that angle information is in the range of 0 to 1 by dividing every angle into 8. Since now the angle is in the range of 0 to 1, so it is easy to do a regression of angle. In classification layers, we use the smooth L1 loss for regression of angle. The model gives two outputs in this case. One tells about object class and the second output tells not only about bounding box offsets but also gives one additional offset value which is for angle. Then using the rotation transformation describes above, we get a rotated bounding box.

5 Experiments and Results

We compare the results of the proposed OAWD model architecture with Faster RCNN which provides a horizontal bounding box with good detection results. Our model not only provides good results than Faster RCNN but also provides an oriented bounding box at any angle.

5.1 Training Settings

Training has been done on GPU and we used NVIDIA Ge Force 1080 TI GPU for training. We used a system that has 64GB RAM, 1TB hard disk along with 250GB SSD. The total number of epochs during training are 200 and one epoch length is 1000. RPN overlap is 0.7 and IOU is set to 0.5 during non-maximum suppression. The learning rate is set to 1e-5.

5.2 Results & Analysis

We used mAP (mean average precision) for the evaluation of our proposed model. We assess the IOU (Intersection over Union) of the identified and the ground truth bounding boxes. If the IOU of an instance is greater than or equal to 0.5, we considered it as TP (True Positive), else it is FP (False Positive). Precision is calculated as TP/(TP+FP). At each level, average precision is calculated and. A similar procedure is repeated for every class independently and an average of all classes is presented as mAP.

Table 3: Quantitative Results of OAWD using angle classification and angle regression and compared with Faster RCNN which gives horizontal bounding box

	OAWD Dataset
Network	Training (map)	Testing (map)
Faster R-CNN	86.80	81.2
Proposed Model OAWD Classification	88	82.90
Proposed Model OAWD Regression	87	81

Table 3 shows that the map of training and testing of OAWD using the angle classification approach is greater than Faster R-CNN. It is because our data set is oriented and it takes less background information than the horizontal bounding box data set which Faster R-CNN takes.

Figure 7 shows the prediction results in form of bounding boxes overlapped on the original images in the OWAD dataset. Table 3 also shows that although mAP of OAWD using angle regression is higher than Faster R-CNN but slightly less than OAWD using classification.

Figure 8 shows qualitative results of OAWD using angle regression. Qualitative results are somehow very close to OAWD (classification) results. Because we have done regression of angle in this model and it can be seen in figure 9 that due to slight change in angle our bounding box rotates according to that. We also have presented an analysis of the results of OAWD (classification) and OAWD (regression) with Faster R-CNN. We have computed mAP at different IOU ratios. We have computed both training and testing mAP at 0.25,0.5 and 0.75 IOU ratio and shown results in form of a table which can be seen in table 4.

Table 4: Quantitative results at different IOU

	OAWD Dataset
Network	Testing	Testing	Testing
	[email protected]	[email protected]	[email protected]
Faster R-CNN	86.8	81.2	58.50
Proposed Model OAWD Classification	87.20	82.90	60.5
Proposed Model OAWD Regression	88	81	59.5

Table 4 shows that testing map of OAWD using regression at 0.25 is higher than other two models. While testing map of OAWD using classification is higher than other 2 models.

Figure 9 shows the comparison of qualitative results of three models starting from left to right Faster R-CNN, OAWD using angle classification, and OAWD using angle regression. The first 2 pictures have shown good results on both oriented models while OAWD suing regression model gives a slightly bad result on the third picture. From the results and comparison given above it can be seen that the oriented model gives better results than the horizontal model. In the oriented model, we can easily detect only objects and it gives less background information than the horizontal model. The oriented model focuses more on objects or we can see oriented gives higher foreground to background ratio.

5.3 Ablation Study

We have tested our model using different parameters. First, we tried to train our model on 4GB GPU memory but our training was not giving good results. It is necessary to have a good GPU of up to 11 GB memory for the training model. Below here we have discussed at which parameters we have checked our model step by step.

Using ResNet as backbone network: We also used Res-Net as backbone network in our architecture for feature extraction and compare results with the base model. It can be seen in table 5 that using ResNet as a backbone does not give good results. Although ResNet architecture is better than VGG but here we got better results using VGG as backbone. The reason is that when the dataset is simple and not complex and small dataset then VGG gives better results.

Table 5: Comparison of quantitative results using VGG and ResNet as backbone network

	OAWD Dataset
Network	Backbone Network	Testing (map)
Faster R-CNN	VGG16	72.98
Proposed Model OAWD Classification	VGG16	82.90
Proposed Model OAWD Regression	VGG16	81
Proposed Model OAWD Classification	ResNet	73.70
Proposed Model OAWD Regression	ResNet	73.61

2.

Using Different Number of Epochs: We have trained our model on different number of epochs as well. First we trained our model on 100 number of epochs and mean average precision was very low. Then we increased number of epochs up to 150. Now model was giving good than 100 number of epochs but it was still learning and validation loss was decreasing. So again we set number of epochs into 200 and it gives very good results. We also set number of epochs more than 200 and we set 250 and 300 number of epochs for training but our model precision was decreasing. So we set epoch value 200 which gives best result.
3.

Using Different Learning Rate Values: We also changed learning rate into different values to check at which value model learns best and gives best results. When we decreased learning rate value from 1e-5 to 1e-4 or even less then model was giving bad results. We also set learning rate value higher than 1e-5 and precision was not improving but it was decreasing. Learning rate value 1e-5 gives best results.
4.

Using Different IOU values: We have set different IOU value for non maximum suppression and check our results that at which value it is giving good results. Table 4.3 shows results at o.25, o.5 and o.75 IOU value. At o.25 IOU values model gives good results but it will not be accurate in real time scenario. Model will not give good results. Normally people set IOU value at 0.5 as standard to check that how model is accurately working.

6 Conclusion & Future Directions

In this research work, we proposed a novel OAWD algorithm with application to detect weapons mainly guns and pistols along with orientation in visual data. OAWD is trained on our own created data set using horizontal bounding boxes and angle. The direction forecast is acted like a classification problem and regression problem by dividing every angle into eight different classes and by taking angle in the range from zero to one respectively. OAWD (classification) predicts bounding box offsets, object class, and angle class or every input object proposal. While OAWD (regression) predicts bounding box offset plus angle offset and object class. The proposed OAWD model not only provides orientation for some specific angle but also gives any angle of an object present in the range from zero degree to 180 degree. For training and assessment of the proposed model, a new weapon data set comprising of around 6400 annotated images have been gathered, which will soon be available publicly for further research work. The models we discussed in the literature review and compared model provides horizontal bounding boxes which have greater background information but our proposed model localizes better and provides oriented bounding boxes. OAWD is compared with Faster RCNN In a wide scope of tests, the proposed indicator has exhibited improved location and restriction execution for the undertaking of gun discovery.

In future work, the data set can be increased and more classes can be included. Different types of weapons like tanks, hand grenades, etc. can also be included in the data set. Every type of gun and riffle can be separately dealt with its own class. If we have 3-D data set for orientation then using deep learning techniques we can tell that either weapon is pointed towards humans or not. If we have such a data set that shows hidden objects then we can easily detect weapons and can assist law enforcement agencies for proper action in time. The accuracy of object detection can also be increased by rotating the feature map on a predicted angle and then again we use fully connected layers next to the rotated feature map for classifier and regression. Regression can also be introduced in Region proposal network module for rotated Region of interest at every angle.

References

(1) Moian, Daniela,Mbeanu and Oana Walking The Road of Education Under Gun Threat: Case of The USA,The 20th Students’ International Conference, pages=49,2018
(2) Morris and Sam, Mass shootings in the US: There have been 1,624 in 1,870 days, The Guardian, year=2018
(3) Planty, Michael and Truman, JL, Firearm Violence, 1993-2011 (NCJ 241730), Special report prepared by the Bureau of Justice Statistics. Washington, DC, year=2013
(4) Liu, Wei and Liao, Shengcai and Hu, Weidong, Towards accurate tiny vehicle detection in complex scenes, Neurocomputing, 347, 24–33, 2019, Elsevier
(5) Zhang, Liliang and Lin, Liang and Liang, Xiaodan and He, Kaiming, Is faster R-CNN doing well for pedestrian detection?, European conference on computer vision, 443–457, 2016, Springer
(6) Zhou, Yanzhao and Ye, Qixiang and Qiu, Qiang and Jiao, Jianbin, Oriented response networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 519–528, 2017
(7) Azimi, Seyed Majid and Vig, Eleonora and Bahmanyar, Reza and Körner, Marco and Reinartz, Peter, Towards multi-class object detection in unconstrained remote sensing imagery, Asian Conference on Computer Vision, 150–165, 2018, Springer
(8) Ma, Jianqi and Shao, Weiyuan and Ye, Hao and Wang, Li and Wang, Hong and Zheng, Yingbin and Xue, Xiangyang, Arbitrary-oriented scene text detection via rotation proposals, IEEE Transactions on Multimedia, 20, 11, 3111–3122, 2018, IEEE
(9) Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li,Imagenet: A large-scale hierarchical image database, 2009 IEEE conference on computer vision and pattern recognition, 248–255, 2009, Ieee
(10) Olmos, Roberto and Tabik, Siham and Herrera, Francisco,Automatic handgun detection alarm in videos using deep learning, Neurocomputing, 275, 66–72, 2018, Elsevier
(11) Girshick, Ross and Donahue, Jeff and Darrell, Trevor and Malik, Jitendra,Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, 580–587, 2014
(12) Redmon, Joseph and Farhadi, Ali,YOLO9000: better, faster, stronger, Proceedings of the IEEE conference on computer vision and pattern recognition, 7263–7271, 2017
(13) Dai, Jifeng and Li, Yi and He, Kaiming and Sun, Jian,R-fcn: Object detection via region-based fully convolutional networks, Advances in neural information processing systems, 379–387, 2016
(14) Akcay, Samet and Kundegorski, Mikolaj E and Willcocks, Chris G and Breckon, Toby P,Using deep convolutional neural network architectures for object classification and detection within x-ray baggage security imagery, IEEE transactions on information forensics and security, 13, 9, 2203–2215, 2018, IEEE
(15) Redmon, Joseph and Farhadi, Ali,Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767, 2018
(16) Han, Song and Mao, Huizi and Dally, William J,Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint arXiv:1510.00149, 2015
(17) Jaffe, Susan, Gun violence research in the USA: the CDC’s impasse, The Lancet, 391, 10139, 2487–2488, 2018, Elsevier
(18) Velastin, Sergio A and Boghossian, Boghos A and Vicencio-Silva, Maria Alicia, A motion-based image processing system for detecting potentially dangerous situations in underground railway stations, Transportation Research Part C: Emerging Technologies, 14, 2, 96–113, 2006, Elsevier
(19) Buyer beware, Ainsworth, Trevor, Security Oz, 19, 18–26, 2002
(20) Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C, Ssd: Single shot multibox detector, European conference on computer vision, 21–37, 2016, Springer
(21) Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian,Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, 91–99, 2015
(22) Redmon, Joseph and Divvala, Santosh and Girshick, Ross and Farhadi, Ali, You only look once: Unified, real-time object detection, Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788, 2016
(23) Hu, Xiaowei and Xu, Xuemiao and Xiao, Yongjie and Chen, Hao and He, Shengfeng and Qin, Jing and Heng, Pheng-Ann, SINet: A scale-insensitive convolutional neural network for fast vehicle detection, IEEE transactions on intelligent transportation systems, 20, 3, 1010–1019, 2018, IEEE
(24) Li, Jianan and Liang, Xiaodan and Shen, ShengMei and Xu, Tingfa and Feng, Jiashi and Yan, Shuicheng, Scale-aware fast R-CNN for pedestrian detection, IEEE transactions on Multimedia, 20, 4, 985–996, 2017, IEEE
(25) Li, Jianan and Liang, Xiaodan and Wei, Yunchao and Xu, Tingfa and Feng, Jiashi and Yan, Shuicheng, Perceptual generative adversarial networks for small object detection, Proceedings of the IEEE conference on computer vision and pattern recognition, 1222–1230, 2017
(26) Fu, Cheng-Yang and Liu, Wei and Ranga, Ananth and Tyagi, Ambrish and Berg, Alexander C,Dssd: Deconvolutional single shot detector, arXiv preprint arXiv:1701.06659, 2017
(27) Hu, Peiyun and Ramanan, Deva,Finding tiny faces, Proceedings of the IEEE conference on computer vision and pattern recognition, 951–959, 2017
(28) Grega, Michał and Matiolański, Andrzej and Guzik, Piotr and Leszczuk, Mikołaj,Automated detection of firearms and knives in a CCTV image, Sensors, 16, 1, 47, 2016, Multidisciplinary Digital Publishing Institute
(29) Lin, Tsung-Yi and Goyal, Priya and Girshick, Ross and He, Kaiming and Dollár, Piotr,Focal loss for dense object detection, Proceedings of the IEEE international conference on computer vision, 2980–2988, 2017
(30) Tiwari, Rohit Kumar and Verma, Gyanendra K,A computer vision based framework for visual gun detection using harris interest point detector, Procedia Computer Science, 54, 703–712, 2015, Elsevier
(31) Harris, Christopher G and Stephens, Mike and others,A combined corner and edge detector., Alvey vision conference, 15, 50, 10–5244, 1988, Citeseer
(32) Alahi, Alexandre and Ortiz, Raphael and Vandergheynst, Pierre,Freak: Fast retina keypoint, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 510–517, 2012, Ieee
(33) Liu, Yong and Wang, Ruiping and Shan, Shiguang and Chen, Xilin,Structure inference net: Object detection using scene-level context and instance-level relationships, Proceedings of the IEEE conference on computer vision and pattern recognition, 6985–6994, 2018
(34) Chen, Zhe and Huang, Shaoli and Tao, Dacheng,Context refinement for object detection, Proceedings of the European Conference on Computer Vision (ECCV), 71–86, 2018
(35) Huang, Jonathan and Rathod, Vivek and Sun, Chen and Zhu, Menglong and Korattikara, Anoop and Fathi, Alireza and Fischer, Ian and Wojna, Zbigniew and Song, Yang and Guadarrama, Sergio and others,Speed/accuracy trade-offs for modern convolutional object detectors, Proceedings of the IEEE conference on computer vision and pattern recognition, 7310–7311, 2017
(36) Uijlings, Jasper RR and Van De Sande, Koen EA and Gevers, Theo and Smeulders, Arnold WM,Selective search for object recognition, International journal of computer vision, 104, 2, 154–171, 2013, Springer
(37) Furey, Terrence S and Cristianini, Nello and Duffy, Nigel and Bednarski, David W and Schummer, Michel and Haussler, David,Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16, 10, 906–914, 2000, Oxford University Press
(38) Lin, Tsung-Yi and Dollár, Piotr and Girshick, Ross and He, Kaiming and Hariharan, Bharath and Belongie, Serge,Feature pyramid networks for object detection, Proceedings of the IEEE conference on computer vision and pattern recognition, 2117–2125, 2017
(39) Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Dollár, Piotr and Zitnick, C Lawrence,Microsoft coco: Common objects in context, European conference on computer vision, 740–755, 2014, Springer
(40) Lowe, David G,Distinctive image features from scale-invariant keypoints, International journal of computer vision, 60, 2, 91–110, 2004, Springer
(41) Iqbal, Javed and Munir, Muhammad Akhtar and Mahmood, Arif and Ali, Afsheen Rafaqat and Ali, Mohsen,Orientation Aware Object Detection with Application to Firearms, arXiv preprint arXiv:1904.10032, 2019
(42) Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei,DOTA: A large-scale dataset for object detection in aerial images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3974–3983, 2018
(43) Gong, Tao and Liu, Bin and Chu, Qi and Yu, Nenghai,Using multi-label classification to improve object detection, neurocomputing, 370, 174–185, 2019, Elsevier
(44) Liu, Zikun and Wang, Hongzhen and Weng, Lubin and Yang, Yiping,Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds, IEEE Geoscience and Remote Sensing Letters, 13, 8, 1074–1078, 2016, IEEE
(45) Kuznetsova, Alina and Rom, Hassan and Alldrin, Neil and Uijlings, Jasper and Krasin, Ivan and Pont-Tuset, Jordi and Kamali, Shahab and Popov, Stefan and Malloci, Matteo and Kolesnikov, Alexander and others,The open images dataset v4, International Journal of Computer Vision, 1–26, 2020, Springer
(46) Fei-Fei, Li,ImageNet: crowdsourcing, benchmarking & other cool things, CMU VASC Seminar, 16, 18–25, 2010
(47) Jennifer Calfas, Texas Was the 22nd School Shooting This Year, https://time.com/5282496/santa-fe-high-school-shooting-2018/, 2018, ”[Online; accessed 13-August-2020]”
(48) BBC, America’s gun culture in charts, https://www.bbc.com/news/world-us-canada-41488081, 2019, ”[Online; accessed 13-August-2020]”
(49) The New York Times, Opinion, Restore funding for gun violence research , https://www.nytimes.com/2018/11/06/ opinion/letters/gun violence-research.html, 2018, ”[Online; accessed 13-August-2020]”
(50) Evil Tim, The IMFDB Internet Movie Firearms Database, http://www.imfdb.org/wiki/Category:Gun, 2011, ”[Online; accessed 25-November-2019]”
(51) Nick Enoch, Silver shop robbery goes wrong when customers fight back , https://www.dailymail.co.uk/news/article-4974230/Seville-shop-robbery-goes-wrong-customers-fight-back.html, 2017, ”[Online; accessed 08-August-2020]”
(52) cgvict, roLabelImg, https://github.com/cgvict/roLabelImg, 2017, ”[Online; accessed 23-June-2020]”
(53) S. Sasank, Guns Object Detection, https://www.kaggle.com/issaisasank/guns-object-detection, 2019, ”[Online; accessed 23-June-2020]”
(54) He, Kaiming and Gkioxari, Georgia and Dollár, Piotr and Girshick, Ross,Mask r-cnn, Proceedings of the IEEE international conference on computer vision, 2961–2969, 2017
(55) Xiao, Youzi and Tian, Zhiqiang and Yu, Jiachen and Zhang, Yinshu and Liu, Shuai and Du, Shaoyi and Lan, Xuguang, A review of object detection based on deep learning, Multimedia Tools and Applications, 1–63, 2020, Springer
(56) Lee, Dong-Hyun, CNN-based single object detection and tracking in videos and its application to drone detection, Multimedia Tools and Applications, 1–12, 2020, Springer