Face-based Gender Classification Using Deep Learning Model

G ender classification is a critical task in computer vision. This task holds substantial importance in various domains, including surveillance, marketing, and human-computer interaction. In this work, the face gender classification model proposed consists of three main phases: the first phase involves applying the Viola-Jones algorithm to detect facial images, which includes four steps: 1) Haar-like features, 2) Integral Image, 3) Adaboost Learning, and 4) Cascade Classifier. In the second phase, four pre-processing operations are employed, namely cropping, resizing, converting the image from(RGB) Color Space to (LAB) color space, and enhancing the images using (HE, CLAHE). The final phase involves utilizing Transfer learning, a powerful deep learning technique that can be effectively employed to Face gender classification using the Alex-Net architecture. The performance evaluation of the proposed gender classification model encompassed three datasets: the LFW dataset, which contained 1,200 facial images. The Faces94 dataset contained 400 facial images, and the family dataset had 400. The Transfer Learning with the Alex-Net model achieved an accuracy of 98.77% on the LFW dataset. Furthermore, the model attained an accuracy rate of 100% on both the Faces94 and family datasets. Thus, the proposed system emphasizes the significance of employing pre-processing techniques and transfer learning with the Alex-Net model. These methods contribute to more accurate results in gender classification. Where, the results achieved by applying image contrast enhancement techniques, such as HE and CLAHE, were compared. CLAHE achieved the best facial classification accuracy compared to HE .


INTRODUCTION
The face is the essential focus of attention in society and is usually used as identification.The facial image is considered to be a reliable indicator for gender detection.Every human has a distinctive face that may be used to identify them from other people The mined color descriptors, texture, and preprocessed face images were copied and categorized using the support vector machine technique.The experimental consequences of the proposed method are presented using various data samples, such as LFW and FERET datasets.Due to using additional pre-processing and derived feature descriptors, the results indicate that the recommended strategy has admirable classification accuracy and a blend of pre-processing approaches.The average classification accuracy for the manipulated two data samples is 99% and 94%, respectively.(Galla et al., 2020) converted the visualized data to audio data to submit better sensation for blind persons by modeled feature extraction and classification.The proposed Feature extraction used the Multiscale-Invariant Feature Transform (MSIFT), while the support vector machine technique and the LASSO classifier fare used to perform feature optimization and type.The obtained classification results are illustrated as EN with an accuracy of 93.5%.LR with an accuracy of 93.2% and RR with a precision of 89.6%.(Zaman, 2020) propose for gender classification a custom CNN architecture based on face images into female and male.This CNN contains seven convolutional layers two fully connected layers, and between the convolutional layers are used batch normalization layers.Compared to Google-Net and Alex-Net, the number of parameters in bespoke CNN is three times and 30 times less, respectively.The proposed custom CNN also produced results on par with cutting-edge techniques, producing the top result with 96% accuracy on the CelebA dataset.Custom CNN trained on the CelebA dataset correctly classified Gender for the IMDB and WIKI datasets with 97% and 96% accuracy, respectively.The proposed method obtained an accuracy of 95% in LFW datasets.(Azimi and Meghdadi, 2021) proposed a real-time deep neural network model that performs gender classification more quickly and with less computation by decreasing the model parameters and computation process.The proposed model is a combination of multifold filters and is relatively light.With an accuracy of 91.4%, 95%, 95%, and 95%, respectively, the performance of this proposed method was assessed in Adience, LFW, FEI, and CAS datasets.(Alamri et al., 2022) proposed using machine learning techniques in the LFW dataset for face detection and gender identification.The SVM classifier, the LBPH technique, and the SIFT feature extraction method were trained and evaluated utilizing the LFW dataset.The SIFT with an SVM classifier with an accuracy of 65% is reported, while face recognition with LBPH with a precision of 88% is addressed, exceeding earlier research.Moreover, the LBPH provided a gender detection accuracy of 91%.This work aims to successfully develop an accurate and reliable gender classification system that can have numerous practical implications.It can contribute to Demographic analysis and crowd monitoring in surveillance applications.Moreover, accurate gender Classification can enable personalized interactions and tailored experiences in the human-computer interaction field.This work serves to remove the background from the face image.Enabling its utilization in the proposed model by applying the Viola-Jones Algorithm is also a target for this work.It also aims to address challenges related to lighting conditions and expression variations using preprocessing techniques.

Proposed Model for Gender Classification
The proposed model aims to classify Gender from face images and consists of three main phases.In the first phase, the Viola-Jones algorithm detects the face image.In the second phase, four steps are used as pre-processing techniques, including cropping, resizing, Image Convert from RGB to LAB, and Images Enhancement.The final phase is the proposed model of gender classification deep learning using Transfer Learning Alex-Net.The detailed overall framework for Classification according to Gender is shown in Fig. 1 Figure 1.The overall framework for the Proposed Gender Classification

Data Collection
This step is performed for face image loading using different types of datasets.Brief descriptions of these datasets are given in the following sub-sections.

Face Image Pre-processing
The pre-processing step enhances the image quality to achieve the best results.A detailed explanation of the pre-processing steps applied to the input images follows.[ X, Y, Z, Xn, Yn, and Zn are the coordinates of CIEXYZ color space.For converting images to the CIEXYZ from the RGB, use Eq. ( 2).

Facial Images Enhancement Techniques
Image enhancement is the process that converts the image at the input into one that is more suited for the needed application.In this proposal, techniques are used, such as HE and CLAHE for image contrast enhancement.

2.5Gender Classification
Classification is the final phase in the proposed model.This phase requires enhancing the performance of gender classification from face images by using Transfer Learning: Alex-Net.In this section we will discuss the basic methods of data needed to train Alex-Net.Alex-Net is given the samples in the form of a fixed-size image.The input image is colored and sized 227×227×3.

Transfer Learning Alex-Net Model
A practical method for resolving a classification issue is transfer learning.In this work, we applied Transfer Learning Alex-Net for gender Classification based on the face image (Anusriet et al., 2021).As shown in Fig. 8, The Alex-Net architecture contains three pooling, five convolutional layers, and three fully connected layers (Rafique et al., 2021).We adopted a modified Alex-Net to fit our issue domain.The final layer fully connected layer is swapped out with another layer fully connected layer to classify two classes (female or male) instead of 1000 classes.The parameters of the layers are given in Table 2.  Pooling layer Max pooling is primarily used to select the maximum value in the array.The max pooling does not impact the judgment when several pixels translate the image.The main benefit of Max pooling is noise reduction.The first, second, and fifth convolution layers are followed by the maximum pooling layers (Lin et al., 2020). Fully connected layer Alex-Net, There are three completely interconnected levels.The first and second layers each contain 4096 convolution kernels, with a size of 6×6×256.That is due to the exact match between the size of the convolution kernel and the processing feature map.Only one-pixel value from the feature map's size and a one-to-one correlation are multiplied by each coefficient in the convolution kernel.As a result, are 4096 neurons in the pixel layer after convolution, which is 4096×1×1.This study has two classes; thus, two neurons in the third ultimately linked layer are tuned to one type each (Lin et al., 2020).The original Alex-Net requires an input RGB image of dimensionality 227×227, in line with the increased dimensionality of images.In addition, the Alex-Net has an output layer with thousands of neurons corresponding to each type of Image-Net Internal object.As a result, the last three layers were modified as part of the debugging phase to address the gender classification problem from the facial image to (male/female).These layers are replaced with a full connection (fc), Softmax, and output layers.By implementing this finetuning phase, the network can infer the necessary bias and specificity of the data, which enhances the ability of the network to make strong discrimination.In addition, to counteract overfitting, Two dropout layers (50% randomly) were added to the grid due to the limited size of the data set.Fig. 9 shows the implementation details of the entire Alex-Net network layers after modifications.

EVALUATION METRICS
A confusion matrix can be used to investigate how well a classification algorithm can do gender recognition from a face image.Table 3 illustrates the confusion matrix.The confusion matrix's TP and TN values represent the proportion of accurate positive and negative classifications (Goodfellow et al., 2016).At the same time, FP and FN stand for the balance of incorrectly classified negative and positive instances, respectively.Several commonly used metrics can be presented from the confusion matrix for evaluating the classifier performance with varying concentrations of evaluations (Vujović, 2021;Al Jibory et al., 2022).(5) The volume of accurate predictions that have been tallied is used to determine how well the gender categorization from the facial image technique performs.
• Recall or sensitivity ratio: the ratio of true positive cases correctly predicted positive (Mateenet et al., 2020).This rate is calculated by dividing the images (Male and female) correctly detected for a total of images (male and female) and false negative detected correctly, using Eq. ( 6

RESULTS AND DISCUSSION
The proposed model was implemented using MATLAB (R2020a).The computer system specifications are as follows: Processor intel Corei7-10510U CPU @ 1.80GHz, Memory size: 8 GB, Type of Operating system 64-bit.Before entering the data into the Transfer Learning Alex-Net model, it was divided into main parts: training constitutes 80%, and testing includes 20%.Another set was taken from the validation set, consisting of 16% of the total training sets.This division is commonly employed to ensure proper evaluation and optimization of the model's performance as given in Table 4.The training and validation sets were used during the model's training, while the test set was used for the final model performance.In the proposed model, three datasets were used for training separately using the same metrics in training face image datasets with the Transfer Learning Alex-Net model.The training was done for the model for 20 epochs with a batch size of 20, and the learning rate is set to 0.0003.Three cases will be illustrated.Case 1: Results of training without Preprocessing for three datasets, as shown in Table 5.  Case 2: Results of training with Preprocessing using Histogram equalization (HE) for three datasets, as shown in Table 6.

CONCLUSIONS
The proposed method is used to classify the Gender from the face image, whether (male or female), before inputting the face image in Transfer Learning Alex-Net.We apply the Viola-Jones algorithm for face images to detect faces and exclude the background.

(
Kanget al., 2017; Joodi et al., 2023).The problem statement of gender classification from face image can be summarized into Pose Variations, Varying Illumination Conditions, and Background Variation.Gender classification plays a crucial role in demographic analysis and behavior tracking.Genderaware systems also find application in social robotics and Human-Computer Interaction (HCI) (Lahariya et al., 2021).They worked to design a framework based on deep learning with transfer learning and fine-tuning for gender classification from face images, remove the background from the face image, and use it in the proposed model by applying theViola-Jones algorithm, enhance the face image and use it in the proposed model by applying several preprocessing techniques, which are cropping, resizing, Images Convert from RGB to LAB, and Contrast enhancement using Contrast Limited Adaptive Histogram Equalization (CLAHE).Finally, using transfer learning Alex-Net to train the classification model for face images and evaluate the best classification result using Accuracy.In the literature review, several methods for gender classification from facial images have been proposed.(Dhomne et al., 2018) proposed an efficient convolutional network architecture called a VGG-Net that may be applied in severe situations with a shortage of training data for learning a D-CNN based on this architecture.Particularly for gender identification, this model is learned.Several face images from the celebrity and LFW datasets were used in the experiments.The test results have shown a gender classification accuracy of 95%.(VenkateswarLal et al., 2019) suggested building a method for face recognition by feature descriptors and preprocessing face image techniques, which works well in natural settings.
most often used benchmark for gender classification, even though it was initially developed for unrestricted face recognition such as facial expression, background, lighting, Race, Gender, age, color, saturation, camera quality, clothing, hairstyles, and others.LFW contains 13,233 images of 5749 people (10,256 males and 2977 females) (Chen et al., 2017;Bajrami et al., 2018).The dataset may be accessed and downloaded at the following website: http://vis-www.cs.umass.edu/lfw/.The dataset was created and maintained by researchers at the University of Massachusetts.Fig.2shows Samples of LFW datasets.
2.2.3 Real datasetThe real dataset is collected from the family individual for unconstrained face classification such as facial expression, background, lighting, race, Gender, age, color, saturation, camera quality, clothing, hairstyles, and others.Real contains 400 images (200 males and 200 females).Face parts in the images were detected through the Viola-Jones face detector.Samples of the real datasets are shown in Fig.4.

Figure 2 .Figure 3 .
Figure 2. Samples of images from the dataset

Figure 4 .
Figure 4. Samples of images from the Real dataset Figure 5. a) Image before Cropping, b) Image after Cropping , and Zn correspond to the white value of the parameter, Eq. (3).from one basis to another is converting color space.It happens when transforming an image shown in a single color space to a different color space (Rathore et al., 2012).


Histogram equalization(HE) Histogram equalization is the method that re-distributes all pixel values to be as close to the intended histogram as possible (Benitez-Garcia et al., 2011).Equalization of the histogram allows for increased contrast in areas of low local contrast.It automatically produces a transformation function to provide an image with a consistent histogram at the output.The equalization of the histogram is a technique for adjusting the contrast of an image by utilizing the histogram of the image.Usually, this method is applied to enhance the overall contrast of many images, primarily when near-contrast values represent the image's valid data (Musa et al., 2018).As a result, the intensities can be more evenly dispersed across the histogram.To accomplish this, histogram equalization is used to efficiently spread out the most common intensity values in the data.(Fadhil and Dawood, 2021).The color image and its histogram are shown in Fig. 6, followed by the equalized image and its histogram.

Figure 6 .
Figure 6.(a): Original Image, Histogram of the original image.(b) Image enhancement using the HE method, Histogram of HE image

Figure 7 .
Figure 7. (a): Original Image, Histogram of the original image.(b): Image enhancement using the CLAHE method, Histogram of CLAHE image.

Figure 8 .
Figure 8.The architecture of Transfer Learning Alex-Net produced by the second layer serves as the third layer's input data.Each pixel layer's borders are occupied with a single pixel to simplify further processing.Sets of 13×13×192 pixel layers generated by the third layer serve as the fourth layer's input data.The collection of 13×13×192 pixel layers produced by the fourth layer serves as the fifth layer's input data.Each pixel layer's borders are filled with a single pixel to make subsequent processing easier.The convolution layer's KS, stride (S), and padding (P), the dimensions of the output and input images are set at the convolution layer (Lin et al., 2020).The convolution layer's calculation using Eq.(4):

Figure 9 .
Figure 9. Conceptual of Transfer Learning Alex-Net model.

•
Precision or confidence ratio: Precision represents the proportion of predicted positive cases correctly classified as positive.It is calculated by dividing the number of correctly detected images by the sum of false-positive and correctly detected images (Alkentar et al.score: it is the average of the weighted values for precision and recall.As a result, this score considers both false positives and false negatives, using Eq.(8) (Vujović,

Thepade et al., 2018; Berbar, 2022).
2.2.2 Faces94 datasetEach person has 20 images in total.152peoplearedepicted,with a 180 by 200 pixel image resolution (portrait format).The collection has 399 images of females and 2,660 images of males )There are many images in this dataset./cswww.essex.ac.uk/mv/allfaces/faces94.html.Samples of datasets are shown in Fig.3.Three datasets of facial images are used to test and evaluate the performance of the proposed model for gender classification.The descriptions of these images are illustrated in Table1.

Table 1 .
Descriptions of the facial images dataset

Table 2 .
Details of Transfer Learning Alex-Net layers.

Table 5 .
Results of training without Preprocessing for three datasets.

Table 6 .
Results of training with Preprocessing using (HE) for three datasets Case 3: Results of training with Preprocessing using CLAHE for three datasets, as shown in

Table 7 .
Table7.Results of training with Preprocessing using (CLAHE) for three datasets After the face detection in images, a pre-processing step is used, such as cropping, resizing, Images Convert from RGB to LAB, and Contrast enhancement using (CLAHE).Deep learning techniques are the most common for gender detection, aiming to learn essential features from raw data automatically.Alex-Net-based Deep Learning model was proposed for gender classification tasks.A comparison has been made using Alex-Net with the dataset before and after preprocessing, where Alex-Net was applied to three datasets.The Alex-Net model delivered the best accuracy after preprocessing in LFW, Faces94, and Real datasets, where 98.77 %, 100%, and 100%, respectively.The red-green color component of the LAB color space.It ranges from -128 to 127, where negative values indicate greenness and positive values indicate redness.The yellow-blue color component of the LAB color space.It also ranges from -128 to 127, where negative values indicate blueness and positive values indicate yellowness.