Proposed Face Detection Classification Model Based on Amazon Web Services Cloud (AWS)

O ne of the most important features of the Amazon Web Services (AWS) cloud is that the program can be run and accessed from any location. You can access and monitor the result of the program from any location, saving many images and allowing for faster computation. This work proposes a face detection classification model based on AWS cloud aiming to classify the faces into two classes: a non-permission class


INTRODUCTION
The massive adoption and expansion of IoT in these industries generate massive amounts of data. In hospital surveillance applications, for instance, IoT devices such as cameras generate tremendous images. Face recognition is crucial for safeguarding medical facilities; detecting patient fraud; analyzing hospital traffic patterns; and analyzing patients' emotions and sentiments. Automatic and intelligent face recognition systems have high accuracy in a controlled environment; in an uncontrolled one, they have low accuracy. The systems must also function in real-time for various applications, including smart healthcare. A deep treebased method for cloud-based facial recognition software is studied. The suggested deep model uses less computing time without compromising accuracy. A volume of input is divided into several volumes in the model, and a tree is generated for each volume (Masud et al., 2020;Joodi, 2023). Facial recognition and cloud-based mobile edge computing are suggested to provide an immersive online biometric authentication method for online guiding. A combination of technologies was used to create an effective model. The suggested framework is verified against various state-of-the-art methods (Saad, 2018;Su, 2021). The cloud-based system was used to share computational resources for ANN to reduce redundant computation. A cloud-based intelligent monitoring system was presented to provide intelligent monitoring services. Built into the system are hybrid convolutional neural networks. This technology has been used for several intelligent monitor functions, such as recognizing strangers, recognizing facial expressions, and recognizing activities (Yong et al., 2016). A cloud-based deep learning face video retrieval system is represented. A dataset is initially collected and preprocessed. To generate a viable dataset for CNN models, blurry photos are eliminated, and face alignment is performed on the remaining images. The resulting dataset is then used to pre-train the CNN models (VGG Facial, Arc Face, and Face Net) for face recognition (Jabbar, 2018; Lin et al., 2020). Using a cloud-computing infrastructure, this study aims to develop and implement automatic customer face recognition. This platform employs cloud-based features to overcome traditional systems' resource and scalability limitations, which rely on local computing capacity. When a consumer is detected, the front-end device takes his or her photograph. The image is saved in the cloud and analyzed through software as a Service (SaaS) (Dersingh, 2016 ). Using cloud Hopfield neural network (CHNN), a method is described for identifying low-resolution grayscale face pictures. This strategy comprises three steps: First, Otsu's approach is used to convert grayscale facial pictures into binary facial images, then the Hebb rule is used to store binary faces in the weight matrix of the network, and lastly, the CHNN retrieval algorithm is used to return the proper face from a deformed face. In contrast to typical asynchronous retrieval, in which only a single neuron is updated at a time, CHNN consists of clouds containing several distinct neurons that are updated asynchronously (Neha, 2016). Several emerging technologies are discussed, including the idea of a smart hotel, artificial intelligence, face recognition, key algorithms, cloud computing, big data, AI (artificial intelligence), the Internet of Things, and others. People's faces are studied from the perspective of the smart hotel so that facial recognition technology may be applied to them (Chen, 2020). Utilizing a local server and the Amazon Web Service (AWS) cloud recognition Application Programming Interface, the real-time attendance tracking employs a remote-operable web application (API). The first method consists of five sections: face detection, preprocessing, training, and face recognition, through which attendance is recorded and sent to each teacher. The second method is based on the AWS Recognition API, which analyses cloud-based data (Pattnaik, 2020).
In an anomaly-based technique for unauthorized entry detection and signature analysis, a face recognition algorithm running on the AWS cloud is used to tell an authorized person from an intruder, improving the accuracy of authorizing the legitimate person and granting access to the private/personal zone, and lowering the risk of sending false alerts or alarms (Mahendra, 2020). AWS Deep Lens is an embedded device built for machine learning that compares the device's performance while using Amazon Web Services (AWS) cloud computing services for an AI-trained model against on-premises performance deep learning (Gregorius Rafael, 2020; Jabbar, 2021). Under the thorough examination of cloud QoS and media application needs, a performance evaluation technique for the image and video services offered by various cloud platforms is developed. In the current state of technology find the most popular picture and video cloud services. These include facial recognition, image analysis, optical character recognition, video on demand, live streaming, and transcending (Xue et al., 2018). Face recognition and Amazon Web Services, such as S3 and Quick Sight will be used to create an automatic employee attendance system. The opening-closing procedure of the doors is modified based on the face recognition findings (Sharma et al., 2020a). Using a Raspberry Pi camera coupled with a Raspberry Pi, a visitor rating system is constructed. The system relies mainly on the cloud services provided by AWS for storing and evaluating the received ratings. The system uses a Facial Emotion Recognition model to get real ratings from real visitors (Sharma et al., 2020b). This system is built on an OpenCV module for image recognition to create a face recognition application that may be used for check-in. OpenCV is the principal control recognition module that detects and tracks the target face using image processing. In the meantime, Baidu Cloud holds a face database and offers face recognition and matching scores for face photos (Feng et al., 2021). This study looks at real-time security and surveillance systems that use cloud-based facial recognition, focusing on cloud architecture (Jha et al., 2022). Our faces change with age, yet the photographs in our database remain unchanged. We plan to investigate the precision of Residual Network (ResNet) for cross-age face recognition. Cross-age reference coding (CARC), Amazon Web Services (AWS) Recognition, and other approaches are compared to the cross-age celebrity dataset (CACD) and a verification subset (CACD-VS) performance (Babbar, 2019). The use of Haar cascade with CNN is proposed for face detection. Haar cascade is a method for detecting faces quickly and in real time. CNN also uses the convolution process by moving a convolution kernel of a certain size from one picture to the next based on what happens when the current picture is multiplied by the filter being used (Asmara, 2021). Deep learning is now the most significant technology in computer vision. It can extract more critical facial characteristics automatically compared to conventional face recognition systems. A face recognition system is developed using the neural computing paradigm and the neural network concept. The results of the experiments show that the suggested method has a high rate of detection and a quick processing time (Yu and Pei, 2021). The roles of convolutional neural network-based deep learning approaches for object detection are explained. Object identification frameworks and services based on deep learning are also described. Modern approaches to deep learning for object identification systems are evaluated Pathak, 2018). The main contribution of this work is proposing a CNN cloud-based system that can be used to share computational resources for ANN to reduce redundant computation to train and test the real data set created from our camera system. Also, it is used in entering it into one of the two detectors; the first is called the Haar cascade detector and the second is called the multitask cascaded convolutional neural networks (MTCNN) detector. The test process is running by using AWS cloud (elastic computation (EC2) and simple service storage (S3)). In addition to enhancing the test prediction time using the AWS paid cloud and comparing it with the AWS free cloud. Access the test prediction from any location by AWS cloud, and save many captured images on the AWS S3 camera. Show the effect of the image capture size on the test execution time.
The following is how this study is set up: The research background for the face recognition system employing IOT (AWS cloud) and a literature review of face recognition methods that use enhanced convolutional neural networks are both provided. The architecture of the proposed cloud classification model is described in two parts, these are: a description of the proposed cloud infrastructure; and a description of the Haar cascade and MTCNN detectors. The permission and non-permission classes' three subsections make up Section 3's experimental results and discussion: training the real data set; classification performance; and testing the predictions system as a whole, utilizing the free AWS cloud service and comparing it to a paid service of the same provider.

PROPOSED CLASSIFICATION MODEL BASED ON AWS CLOUD
An actual data set acquired from stationary cameras was subjected to the proposed categorization model. The suggested low-complexity CNN offline augmentation model of face recognition and classification, divided into the categories of "permitted person" and "non-permitted person," must be built from scratch. This procedure is designed to identify people permitted to enter high-security locations like airports or tourist sites. The Amazon Web Services (AWS) cloud may be used to test the complete system, and the AWS application allows access to the images from anywhere. IoT is currently quite significant. A bucket in Amazon S3 is a publicly available cloud storage resource that may be accessed through the object storage service Simple Storage Service (S3) from Amazon Web Services (AWS). Amazon S3 buckets store objects made up of data and associated identifying information, much like file folders do. IaaS is well-exemplified by AWS EC2. EC2 offers a scalable infrastructure for hosting cloud-based applications. AWS is an Amazon corporation that provides individuals, companies, and government organizations with on-demand cloud computing platforms and APIs. Through AWS server farms, these cloud computing web services provide software tools and the capability to perform distributed computing. The proposed model topological structure and architecture are shown in Fig. 1 and Table 1, respectively. To create a convolution, two fundamental functions (f) and (g) must be multiplied by a (n). This expression describes a convolution in one dimension (Eq. (1)) (Sharma et al., 2020a): Two-dimensional (2D) digital images can be produced using convolution if (A) is a 2D image with (i + j) dimensions, (K) is a filter with (m + n) dimensions, and (F) is a feature map. To obtain the output F, picture A is convolved with the filter K. Eq. (2) defines the operation in this situation because it is commutative, and Eq. (3) below can be used to represent the 2D equation: The activation function of RELU ignores negative values. Additionally, adaptive moment estimation (Adam) is an optimization approach that may replace the standard SGD technique to update network parameters depending on training data (Vinh, 2020). Adam converges more quickly than other approaches, according to empirical evidence. A frequent strategy for generalization is a dropout.  Neurons are randomly discharged during each training session. A loss function that measures how far apart two values are from one another in the 0 to 1 is called binary crossentropy. The loss function shown in Eq. (4) (Hughes, 2018) can be considered the definition of binary cross-entropy.
where the true label value is α and ά is the value the model returned. Minimizing (α, ά) several photos at once is common practice.

AWS Cloud infrastructure Description
The proposed cloud architecture is based on Amazon Web Services. Our infrastructure is the two instances of size t2. micro and t3a. large availability in EC2. Table 2 shows the EC2 free tier performance, Table 3 shows the EC2 paid tier performance, and Table 4 shows the hardware resources used in our system.

Describe the Haar cascade and MTCNN Detectors
Face recognition in still photos and live streams is accomplished using the object detection method known as the Haar cascade. Edge or line detection characteristics were created by Viola and Jones (Viola, 2001). Models are stored in XML files in the repository that may be accessed using OpenCV methods. These versions include the face, eye, upper and lower body, license plate, and other detecting features. Fig. 4 illustrates some of the concepts offered by Viola and Jones. Due to these features of the image, it is easy to spot its edges or lines and places where the pixel intensities quickly change, as seen in Fig. 5. Here is an illustration of calculating the Haar value using a rectangular image slice. Pixels with a value of 1 make up the Haar feature's darker areas, whereas pixels with 0 make up its brighter areas. Each of them is in charge of recognizing a certain visual feature, such as an edge, a line, or any other visual structure with a sudden change in intensity. As the sample image above shows, the Haar feature may detect a vertical border with darker pixels on its right and brighter pixels on its left. This operation aims to calculate the sum of all image pixels situated in the brighter region of the Haar feature and the total of all image pixels situated in the darker area of the Haar feature. Then decide what makes them unique. The Haar value is closer to 1 if the image has an edge separating bright pixels on the left from dark pixels on the right. If the Haar value is near 1, an edge has been found. A framework called Multitask Cascaded Convolutional Networks (MTCNN) was created as a face alignment and identification solution. Convolutional networks are used in three process stages to recognize faces and facial landmarks such as the eyes, nose, and mouth. The research suggests MTCNN merge both tasks using multitask learning (recognition and alignment). In the initial stage, candidate windows are quickly developed using a shallow CNN. The proposed candidate windows are improved in the second step using CNN. A third, April 2023 Volume 29 186 more complex CNN is employed in the final stage to refine further the output and output of the positions of landmarks (Zhang, 2016).

RESULTS AND DISCUSSION
The third component comprises parts about training the proposed CNN model; the first section prepares the real data set containing images of humans acquired by buildingmounted cameras. This division consists of two classes: with permission and without authorization. The second step consists of training the real data set with offline augmentation applied to the real data set. The third phase utilizes performance metrics to assess the quality and accuracy of the model. In the fourth part, permission and nonpermission AWS cloud users are classified based on a test prediction of some photographs.

Dataset
For the experiment, a genuine data set of 347 frontal faces with a minimum size of 100 × 100 pixels taken by cameras was employed. The dataset consists of two training and validation sets portions, each containing 263, 84 photos and two classes: permission and nonpermission: Fig. 6 and Fig.7 display photos from the real dataset.

Training
All performance indicators were established for the model, which includes offline augmentation of the real data set. The binary cross-entropy loss across the training and validation sets was evaluated using a batch size of 32. 75 epochs were used throughout the training procedure, and a feature vector location with 300 values was selected. Fig. 8a compares the results of offline augmentation for training and validation data accuracy. With offline augmentation, the estimates of binary cross-entropy loss for training and validation data are compared in Fig. 8b.

Performance
The following five metrics evaluated the suggested model's performance: recall, validation accuracy, training error, and precision or specificity. The percentage of examples in the negative class that are correctly classified is known as "recall." The validation accuracy is calculated by dividing the total number of cases examined by the number of correct predictions. The training error is calculated by dividing the number of wrong predictions by the total number of occurrences, as shown in (5, 6, 7, 8, 9, and 10). The precision and AUC metrics, as a consequence of the classification, represent the quality and accuracy of the model. For data with offline augmentation,   Table 5. which was improved to 98.81 % when using the offline data augmentation. The validation loss was also improved to reach 0.0176, whereas the AUC value was equal to 1.00.

Test Prediction System Results
As a test, we take captured images from distributed fixed cameras located in the important regions of the high-security buildings to capture images of people entering and then upload them to the S3 Amazon Web Service (AWS). From AWS EC2, the full system is running by downloading all the images from S3 and inputting them into one of the two detectors. The first is the Haar cascade detector, and the second is the MTCNN detector, cutting the face region, inputting them to the classifier model, and classifying them as "permission" or "nonpermission" based on a built-from-scratch proposed CNN training model for face recognition. After that, the non-permission classes are uploaded to the S3 AWS cloud. The captured images had a window size of 100 x 100 pixels to recognize the regions of a person's face. Fig. 9 and Fig. 10 show the flow chart of the detector accuracy versus images and versus execution time for two detectors: Haar cascade and MTCNN. Fig. 11 shows the overall test prediction system. The difference in accuracy and execution time given in Table 6. when the Haar cascade detector and the MTCNN detector are applied to the real data images captured by our camera as a test prediction downloaded from AWS S3 to AWS EC2.   Table 7 illustrates the test prediction for 75 images uploaded to the AWS S3 cloud from our camera system. When saved on the cloud, it is downloaded to the EC2 AWS cloud, inputting the two types of detectors, the Haar cascade detector and the MTCNN detector. Figure 11. The overall test prediction system from S3 to EC2 to S3 (AWS cloud).  Figs. 12 and 13 show the flow chart using two detectors in the free tier of the AWS cloud: the number of images and the images' total size versus execution time, respectively.  Table 8 illustrates the execution time to run the overall system as a test prediction for one image from S3 to EC2 to S3 AWS cloud leased tier and describes its performance of it in Table  3. Also, show the total image size for the test image.     Table 9 illustrates the execution time to run the overall system as a test prediction for two images and the total image size from S3 to EC2 to S3 AWS cloud paid tier. Figs. 16 and 17 illustrate the flow chart of all systems that test two images using the Haar cascade detector or the MTCNN detector. They show the number of images and the total image size versus the execution time. Table 10 illustrates the execution time to run the overall system as a test prediction for four images and the total image size from S3 to EC2 to AWS S3 cloud tier.       Table 11 illustrates the execution time to run the overall system as a test prediction for six images, and the total image size from S3 to EC2 to S3 AWS cloud leased tier.       Table 13 illustrates the execution time to run the overall system as a test prediction for ten images and the total image size from S3 to EC2 to AWS S3 cloud paid tier.    Table 14 illustrates the execution time to run the overall system as a test prediction for 75 images and the total image size from S3 to EC2 to AWS S3 cloud paid tier.       Figs. 28 and 29 illustrate the flow chart of overall systems run for 75 images by using the Haar cascade detector using two AWS cloud performance-free tiers and paying for them. Figs. 30 to 33 represent the web pages of our AWS account, an AWS EC2 instance, an AWS S3 camera, and a not-permitted class upload from AWS EC2. Fig. 34 shows the nonpermission class on the AWS S3 uploaded from AWS EC2 using the Haar cascade detector, and Fig. 35 shows the permission class saved on the AWS EC2 using the Haar cascade detector. Fig. 36 and Fig. 37 show the non-permission and permission classes, respectively, using the MTCNN detector.

CONCLUSION
This work presented a Haar cascade detector, the MTCNN detector, within a proposed CNN cloud-based system with two classes, the permission class and the non-permission class, applied to a real data set collected by fixed cameras. The main idea is to enable the system to distinguish if a person has permission not to enter the building by using AWS cloud services when applied to RGB images with a size of 100x100 pixels. Compare the accuracy and execution time of all systems using two detectors to detect faces on the captured images: the Haar Cascade only detects frontal faces, while the MTCNN detects both frontal and nonfrontal faces. AWS S3 cloud and an AWS EC2 instance are used to save many images captured from cameras and image analysis. Compare the performance of the AWS EC2 free tier, and the AWS EC2 paid tier, and make the execution time shorter compared to the EC2 free tier