AI-based System for Automatic Detection and Recognition of Weapons in Surveillance Video.
This is a collaborative research project with a company called Integrated
andHong Kong Metropolitan University
. The research project aims to design an AI-based software function which can automatically detect and recognize firearms, knives, and other weapons in a streaming video captured by surveillance camera.
Security cameras and video surveillance systems have become important infrastructures for ensuring safety and security of the general public. However, the detection of high-risk situations through these systems are still performed manually in many cities. The lack of manpower in the security sector and limited performance of human may result in undetected dangers or delay in detecting threats, posing risks for the public. In response, various parties have developed real-time and automated solutions for identifying risks based on surveillance videos. The aim of this work is to develop a low-cost, efficient, and artificial intelligence-based solution for the real-time detection and recognition of weapons in surveillance videos under different scenarios. The system was developed based on Tensorflow and preliminarily tested with a 294-second video which showed 7 weapons within 5 categories, including handgun
, shotgun
, automatic rifle
, sniper rifle
, and submachine gun
. At the intersection over union (IoU)
value of 0.50
and 0.75
, the system achieved a precision of 0.8524
and 0.7006
, respectively.
At present, Artificial intelligence is no longer part of a strange new term. Not only the ALPHAGO but also the face scan payment technology, even drones, and driverless vehicles, the application of artificial intelligence gradually involves many aspects of our daily life. No doubt safety is one of the most important aspects of daily life and technology is becoming more mature. Nevertheless, today's mainstream security methods have been unable to meet the numerous security risks posed by technological development. The problem is that it is difficult to find a safe, effective, and low-cost way to ensure security.
So, Artificial intelligence will be used to improve this situation. In Hong Kong, the popularity rate of surveillance cameras is already high. There is a total of 24,591 surveillance cameras. This project plan through combining artificial intelligence with the existing surveillance cameras to automatically detect and recognize firearms, knives, and other weapons in a streaming video captured by a surveillance camera.
Above figure shows the functional blocks of the proposed system. After the video is captured by the surveillance camera, it is passed to the keyframe extraction subsystem, which reduces data size by selecting keyframes for the feasible real-time running of the subsequence steps. The extracted frames are then inputted into the weapon detection algorithm. The detected weapons are classified and labeled.
- Mode 1: Energy-saving mode Method:
- Load the surveillance video
- Video input into the keyframes extraction system
- Detect extracted images
- Labe & classify the weapons
- Purpose: Within a typical surveillance video, most of the frames are identical due to the fixed location and background. In this case, if it is detected frame by frame, it will cause a waste of computing resources. So according to the keyframe extraction system, detect weapons only when the surveillance video content changes.
- Applicable environment: Suitable for the night when the flow of people is low or in low-security risk areas.
- Advantage: short detection time and save computing resources.
- Disadvantage: There is a certain security risk.
- Mode 2: High-performance mode Method:
- Load the Real-time surveillance video
- Video as input, detect Real-time directly and frame-by-frame detection
- Labe & classify the weapons
- Purpose: Provide the highest level of security. To eliminate all dangerous situations.
- Applicable environment: In densely populated areas or periods, or high-security risk areas.
- Advantage: Provides the highest level of protection.
- Disadvantage: Consumes a lot of computing power and resources and compare with Energy-saving mode, the detection time is longer.
According to flowchart, when the system starts running, first the system will load the surveillance video captured by the IP camera. But surveillance video usually contains a lot of worthless fragments. As a result, this will cause the system to increase the amount of calculation and slow down the detection speed. So that the system does not go to each frame to detect video. Instead, the system will take an image of the valid frame as an input, to speed up the detection.
When the image is entered into the system, the system will trough grayscale conversion to remove the background features of the image and convert the image into black and white. After the processing finished, the system will detect the objects in the image. Through Single Shot MultiBox Detector Algorithm (SSD) MobileNet
, the system can extract features from the objects.
After extracting the feature, the system can solve the identification problem and output which category the object belongs to. When the object is detected as belonging to a weapon, the system will locate and label the weapon.
When a surveillance video enters the system, because the surveillance video contains many frames and the scenes of many frames are unchanged. If we analyze each frame of video one by one, this will increase the amount of calculation and reduce the detection efficiency. So, for the expected result, the system will extract valid frames and preprocessed videos, for example, capture the screen every 20 frames.
In this project, the keyframe extract algorithm is based on interframe difference. The video sequence captured by the camera is continuous, if there are no moving objects in the scene, the change of continuous frames is very small; if there are moving objects, there will be significant changes between successive frames. So that based on this concept, the interframe difference will be used to extract the keyframe from the surveillance video.
Consider frame n
and n-1
of the video sequence as f_k
and f_(k-1)
, The grayscale values of the corresponding pixel points of the two frames are denoted as f_k (x, y)
and f_(k-1) (x, y)
. Subtract f_k (x, y)
and f_(k-1) (x, y)
, take the absolute value, get the difference image D_K
Set the segmentation threshold T_1
, Then, the difference image is subjected to threshold segmentation to obtain a binarized foreground image.
In this formula, R_k (x,y)
is the binary foreground image. If the pixel value of a point in the inter-frame difference
image is less than the threshold valueT_1
, the point with gray value 0 is judged as the background point. If the D_K (X,Y)
greater than the threshold value T_1
, the point is judged as the foreground (moving target) point. The operation flow of the inter-frame difference
is shown below Figure .
In this project, the local maximum algorithm based on interframe difference is used to extract keyframes.
The frames in which the average interframe difference are local maximum are considered to be keyframes.
The system was developed based on TensorFlow
, which is an open-source platform for machine learning; the Single Shot MultiBox Detector (SSD)
, a popular object detection algorithm; and MobileNet
, which is a neural network (NN) for producing high-level features. the SSD-MobileNet
model, which consisted of a Single Shot MultiBox Detector (SSD)
and MobileNet lightweight deep neural networks
. In the SSD_MobileNet
, the MobileNet
performs as the feature extractor for object prediction or feature extraction at different scales in the convolutional layers. With the extracted features from MobileNet, the SSD
algorithm was used to obtain the feature information to locate and classify the detected object(s)
Figure shows the operation flow of Object Detection. First, the system gets an image, then crop this image and input this image into the object detection system. After that, the image enters into the SSD-MobileNet
model, also through TensorFlow, parameters were deployed to the trained model into SSD-MobileNet
. Finally, the weapons detection result will be got.
In this project, the neural network is trained using supervised classification learning in two basic steps:
- A dataset from the COCO dataset of each weapon is created.
- Using the above dataset to train the weapons detector by fine-tuned the existing pre-trained model.
The COCO dataset project is a large visual database for visual object recognition software research. In this project, the data is 1218 machine gun images from the COCO dataset. The result of fine-tuning is a more efficient model that can detect more Weapons and improve efficiency. Below is a brief overview of how to fine-tune existing models for custom objects to meet the needs of detecting weapons:
-
Use the images extracted by keyframe extraction function as the train and test images.
-
Use the software ‘LabelImg’, manually label the train images. Create an
XML
file that describes the objects in the pictures. This is called labeling. -
Convert the
XML
files toCSV
files. -
Generate the
TFRecord
file from theCSV
file. -
Setting up the configuration file for the model. The initial configuration parameter settings are shown in Table 1.
-
Train the model by using the TFRecord files of the keyframe extraction output images and the configuration file.
TABLE 1
Name | parameter |
---|---|
Pretrained model | ssd_moblienet_v1_coco |
Num_classes | 6 |
Matched_threshold | 0.5 |
Unmatched_threshold | 0.5 |
Batch_size | 5 |
Initial_learning_rate | 0.004 |
Max_detections_per_class | 100 |
Max_total_detections | 100 |
Table 2 summarizes the software environment used for developing the system’s functions.
TABLE 2
Name | Version |
---|---|
Operating System | Microsoft Windows 10 (64-bit) |
Python | 3.7.3 |
TensorFlow | 1.14.0 |
Graphics Drive | Radeon Software Adrenalin 19.20 |
Conda | 4.7.12 |
- Install Python
- Download
Anaconda3
,make it as the default Python version (3.7.3) and configure the environment variables.
- Install TensorFlow Install the CPU version of TensorFlow.
- Call the Command Prompt though
Strat
->Anaconda3
->Anaconda Prompt
pip install --upgrade tensorflow
Download from the TensorFlow object detection
This system donot support TensorFlow versin 2.0 or later
Download from Protocol Buffers
- After the UnZip, put the
prtoc.exe
in filebin
intoC:\\Windows
- Add the full directory of folder
Models/Research/
and folderModels/Research/Slim
intoPATHONPATH
insystem variables
of the Windows system
- Testing the TensorFlow object detection API, Run the command line under the
Models/Research/
folder
python object_detection/builders/model_builder_test.py
- Use the images extracted by keyframe extraction function as the train and test images.
- Use the software ‘LabelImg’, manually label the train images. Create an
XML
file that describes the objects in the pictures. This is called labeling
In Figure 3.3,it shown the part of the training dataset with the corresponding XML file after labeling.
- For TensorFlow, we need to enter Google’s specific data format, so we need to convert our data into TFRecords Format.
- Use gun_xml_to_csv to convert XML file to CSV file.
- Generate TFRecord file from CSV file through generate_TRF
- create the
PBTXT
file oflabel map
,which contain theweapons name
andcorrespond ID
item {
name: "Sniper rifle"
id: 1
}
item {
name: "Automatic Rifle"
id: 2
}
item {
name: "Submachine gun"
id: 3
}
item {
name: "Shotgun"
id: 4
}
item {
name: "Handgun"
id: 5
}
item {
name: "Knife"
id: 6
}
- The
weapons name
andID
in generate_TRF should be consistent withlable map
def class_text_to_int(row_label):
if row_label == 'Sniper rifle':
return 1
elif row_label == 'Automatic Rifle':
return 2
elif row_label == 'Submachine gun':
return 3
elif row_label == 'Shotgun':
return 4
elif row_label == 'Handgun':
return 5
elif row_label == 'Knife':
return 6
else:
None
- Run the generate_TRF with command
# From tensorflow/models/
# Create train data:
python generate_TFR.py --csv_input=data/gun_train.csv --output_path=data/gun_train.record
# Create test data:
python generate_TFR.py --csv_input=data/gun_test.csv --output_path=data/gun_test.record
- So far, the preparation of all the training and test data have completed
Find the sample of the configuration file in model configs of TensorFlow
- In this system,ssd_mobilenet_v1_coco will adopt
- Download ssd_mobilenet_v1_coco.config, put the file in to
traning
folder and open it with text editor. - Search for "PATH_TO_BE_CONFIGURED" to find the fields that should be configured.
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/\\models\\research\\object_detection\\data\\gun_train.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/\\models\\research\\object_detection\\data\\gun.pbtxt"
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/\\models\\research\\object_detection\\data\\gun_test.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/\\models\\research\\object_detection\\data\\gun.pbtxt"
shuffle: false
num_readers: 1
}
- Change
NUM_classes
as practical situation
- In this system, the number of weapons categories are 6
num_classes: 6
- If do not have sufficient graphics memory, please change the parameter of
batch_size
train_config: {
batch_size: 5
- In the new verion of TensorFlow, the Training file is :
model_main.py
- Training model through the following command,set the number of training steps to 200000 and evaluate the number of steps to 6000:
# From the tensorflow/models/research/ directory
python object_detection/model_main.py \
--pipeline_config_path=object_detection/training2/ssd_mobilenet_v1_coco.config \
--model_dir=object_detection/training2 \
--num_train_steps=200000 \
--num_eval_steps=6000 \
--alsologtostderr
- Through Command Prompt, the training process can be viewed
After the training, We can test the model. Find the export_inference_pictures.py
file under the Models \ Research \ Object_detection
folder. To run this file, you still need to pass in the config
and the parameters associated with checkpoint
.
- Open
Anaconda Prompt
and located under theModels \ Research \ Object_Detection
folder, run the command:
python export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path training/ssd_mobilenet_v1_coco.config \
--trained_checkpoint_prefix training/
model.ckpt-200000 \
--output_directory knife_detection
- For
--trained_checkpoint_prefix training/model.ckpt-200000
, the number after the.ckpt
represent thecheckpoint
, you can find the situation of your own training model under thetraining
folder and fill in the corresponding number.( If there have more than one number, you can choose the largest one).
- For
--output_directory knife_detection
is the output directory.After running , you'll find a bunch of files under theknife_detection
folder. which includesaved_model
、checkpoint
、frozen_inference_graph.pb
.And thefozen model
is trained model.
- After got the
frozen_inference_graph.pb
, we can through gun_image_testing to perform weapons image detection and through gun_video_testing to perform Real-time weapons detection.
A low-cost, efficient, and artificial intelligence-based solution for the real-time detection and recognition of weapons in surveillance videos has been developed. The system can detect weapons within 6
categories, including handgun
, shotgun
, automatic rifle
, sniper rifle
, knife
, and submachine gun
. At the intersection over union (IoU)
value of 0.50
and 0.75
, the system achieved a precision
of 0.8524
and 0.7006
, respectively. At the Energy-saving mode
, the corresponding time for the system to detect a single image is 1.10
seconds. For high-performance mode
, the system achieves real-time
detection and gives weapon detection results synchronized with surveillance video. For a surveillance video, the duration of the video is 148 seconds
the size of the video is 19.8 MB (20,807,246 Bytes), the reaction time of the system detection weapon is also 148 seconds
- Detection result-automatic rifle
- Detection result-knife
- Screen capture for Real-time detection