Effluent quality assessment of sewage treatment plant using principal component analysis and cluster analysis

Sewage water is a mixture of water and solids added to water for various uses, so it needs to be treated to meet local or global standards for environmentally friendly waste production. The present study aimed to analyze the new Maaymyrh sewage treatment plant's quality parameters statistically at Hilla city. The plant is designed to serve 500,000 populations, and it is operating on a biological treatment method (Activated Sludge Process) with an average wastewater inflow of 107,000m/day. Wastewater data were collected daily by the Mayoralty of Hilla from November 2019 to June 2020 from the influent and effluent in the (STP) new in Maaymyrh for five water quality standards, such as (BOD5), (COD), (TSS), (TP) and (TN). The results showed that the removal efficiency was 88%, 75%, 94%, 57%, and 77%, respectively. The results of the cluster analysis (CA) showed the formation of clusters in four stages and then gave the final shape consisting of two groups. At the same time, two influencing factors were extracted in the principal component analysis (PCA). The effluent's final quality (an average of eight consecutive months) complies with the stringent regulations proposed in the Iraqi Quality Requirements.


INTRODUCTION
There is a great deal of spatial and temporal variability in the availability of water in Iraq. The rise in population and expansion of economic activity inevitably contributes to an increase in demand for water usage for various purposes. Water supplies in Iraq have also suffered significant stress in terms of water quantity, especially in the last two decades, due to various reasons, such as dams constructed on the Tigris and Euphrates in the riparian countries, global climate changes, and the extreme local decrease in annual precipitation rates and inappropriate planning of water uses within Iraq, (Jones et al., 2008;Trondalen, 2009;Rahi and Halihan, 2010). The amount and consistency of supplies coming from various sources influence the quality of water. Therefore, in order to maintain our knowledge and understanding of our climate, overall national planning and resource management in relation to water with emphasis on the allocation of priorities among different uses is important (Vaux, 2001). It is commonly used in many research publications relating to the needs of sustainable development (Parparov et al., 2006).
Water quality in an aquatic ecosystem is determined by many physical, chemical, and biological factors (Sargaonkar and Deshpande, 2003). Due to its direct impact on the environment and public health, there has been an increased interest in wastewater treatment in recent decades (Khudair and Awad, 2020). The disposal of wastewater without adequate treatment is an issue Journal of Engineering Volume 27 April 2021 Number 4 81 related to health risks and degradation of the freshwater environment (Khudair and Jasim, 2017). Sewage is released into the environment characterized as a mixture of waste plus solids from various uses, such as domestic, industrial, commercial, that contains high organic material, several pathogenic microorganisms, and some toxic compounds (Zhou, 2002). If the sewage is not treated and the effluent waste is allowed to be disposed into the water body, the environment will be at risk (Bonmatí and Flotats, 2003). The bioremediation processes are cost-effective and require no maintenance, easy operation, and eco-friendly (Metcalf and Eddy, 1991;Beach, 2001). In fact, the conventional activate sludge (CAS) processing plant is highly effective in removing organic carbon and nutrients (BOD= 96.6%, COD= 94.8%, SS= 77.1%, N= 84.4%and P= 88.0%) (Colliver and Stephenson, 2000).
The use of various multivariate statistical techniques such as Cluster Analysis (CA), Principal Component Analysis (PCA) and Factor Analysis (FA) helps to understand better the effluent quality and ecological status of the studied systems in the interpretation of complex data matrices, enabling the identification of possible factors affecting the sources of sewage treatment (Igbinosa and Okoh, 2009). The present study aims to assess the Mayoralty STP's sewage quality located in Babil Governorate in Iraq using the principal component analysis and cluster analysis.   Fine bar screen room 5

Materials and Methods: 2.1 Study area description
Aerated grit sedimentation tank 6 Anaerobic tank 7 Anoxic zone 8 Oxidation ditch 9 Sludge lifting pump tank 10 Final settling tank 11 Distribution water and sludge well 12 Contact tank 13 Water pump room 14 Mechanical thickening Room and Sludge storage tank

Data collection and analysis
The effluent and impact data at the new wastewater treatment plant (STP) in Maaymyrh were collected in this eight-month study, starting from November 2019 to June 2020, by Al Hilla Sewage Directors. The main water quality criteria were selected; Biological oxygen demand Journal of Engineering Volume 27 April 2021 Number 4 83 (BOD), Chemical oxygen demand (COD), Total suspended solids (TSS), Total nitrogen (TN), Total Phosphorous (TP), pH, and Temperature (T). Samples of raw sewage and liquid waste from Maaymyrh STP were taken daily in 1-liter jars and kept at 4°C during transportation to the laboratory. All tests were performed by the personnel working in the laboratories of the new Maaymyrh STP and according to standard sewage analysis methods (APHA et al. 1998)

STATISTICAL ANALYSIS CONCEPTS
In water quality analysis, the possibility of applying multivariate statistical techniques has been repeatedly demonstrated to classify significant components or variables that explain most of a system's variances. It is intended to minimize the number of variables to a limited number of indicators (i.e., key components or variables) while retaining relationships in the original data and change in water quality characteristics (Milanovic et al. 2010, Varol et al. 2012). In this analysis, mean, maximum, minimum, and standard deviation values were determined using SPSS V25 over eight months. In this research, multivariate data analysis using Principal component analysis (PCA), cluster analysis (CA), and factor analysis (FA) was performed.

Principal Component Analysis (PCA)
The variance of large data sets of inter-correlated variables can be represented and transformed into smaller sets of independent variables (main components) via a strong pattern recognition technique. Where the data matrix is X (n × m), the matrices of the principal components scores and loadings are T (n × p) and W (m × p), respectively, and the residual matrix is E (n × m) (Hubert and Debruyne, 2009; Rousseeuw and Driessen, 1999)

Cluster Analysis (CA):
the unsupervised pattern recognition method demonstrates the intrinsic structure of a data set without making a priori assumptions on the data to classify objects of the system into classes or clusters based on their proximity or similarity (Vega et al. 1998). The most common method in which clusters are formed sequentially is hierarchical clustering, beginning with the most similar pair of items and creating higher clusters in a gradual pattern. Within multidimensional space, it is simply the geometrical distance and is calculated as Where, D(X, Y) is the distance between two points x and y Journal of Engineering Volume 27 April 2021 Number 4 84 3.3 Factor analysis: is a contribution of less relevant PCA-derived cant variables and extracts a new group of variables, known as to vary factors (VFs), by rotating the PCA need for the defy axis. Hypothetical, not-observable, latent variables can be used in VFs. Each variable can be expressed in factor analysis as a linear mixture of common latent factors and a single unique factor as follows. (Malinowski, 1991): Fim is the loading factor where the common and unique (error) variables are Fip and eim, respectively.

Raw sewage assessment
Characteristics of sewage entering the treatment plant is shown in Table 3, which shows the changes in the properties of raw sewage over time, TSS, COD, BOD, NO2, NO3, TN, TP. The BOD5/COD ratio is considered as an indicator of biodegradation capacity (Metcalf and Eddy. 1985If the BOD5/COD ratio is greater than 0.5, biodegradation will be simple if the biodegradation between 0.2 and 0.4 occurs only in an optimal thermal state, and the biodegradation will not proceed if the ratio is less than 0.2 (Contreras et al., 2003). Domestic sewage was found to typically have a BOD5 / COD ratio of 0.4 to 0.8 (Metcalf and Eddy, 1985). A BOD5/COD ratio of 0.4 is commonly known as the distinction point between degradable and nonbiodegradable waste (Turak and Afsar, 2004). In the research, the ratio of BOD5/COD in raw wastewater was approximately 0.54, suggesting a significant quantity of biodegraded organic matter.  0.54 • Water quality units for all parameters in mg/L By collecting data in the collected wastewater entering into the plant, noted that its specifications are from weak to moderate concentration levels, as weak, medium, and strong levels of BOD5 were determined as 110, 220, and 400 mg/L, and dissolved solids concentrations were determined as 100, 220 and 350 mg/L respectively (Tchobanoglous and Burton, 1991).

Effluent Quality Assessment
The treatment plant's efficiency was evaluated by the characteristics of the treated sewage from the plant, as shown in table 4. To characterize the quality of the sanitation averages Based on the results, the maximum and minimum values for the selected parameters were calculated, as shown in Table (2), which illustrates the descriptive values of the data. The composition of effluent treated sewage varies from the facility by treatment stage, a form of households, enterprises, industries, and discharging public facilities into the system, and this may be a significant contributing factor to the observed variations in pH. A water system's pH level determines its utility for a number of uses. Quality standard for water sources according to regulations issued by Iraqi government Regulation 25. In general, influencer and effluent pH Journal of Engineering Volume 27 April 2021 Number 4 86 values were slightly alkaline (mean value), ranging from 6.91 to 8.01 and 7.03 to 7.94, respectively, with a mean value of 7.55 and 7.46 during the study period, as shown in Table 2. This refers to the decrease in the concentration of dissolved CO2 by reducing the concentration of organic matter. (Colmenarejoa et al., 2006) TSS is a critical element in regulating sewage discharge, which represents a part of being the root of aesthetic disturbance along the banks of the river; TSS causes irrigation systems harm in the form of irrigation Tubing, sprinklers, emitters, and narrow water channels may be blocked by algae. TSS can adsorb heavy metals on their surfaces and thus encourage complexes of heavy metals (Nkegbe, E. et al., 2005). The means of raw influential TSS concentration was 222.50, while treated effluent TSS means were found to decline sharply to 13.81. In contrast, the raw impact concentration of TSS varied greatly (between 56.00 and 980.00). A slight variation in the TSS concentration of treated effluent (between (4.00 and 40.00) was observed over the eightmonth. This means that the efficiency of the Maaymyrh STP is independent of the characteristics affecting it. The processing plant discharges its effluent into the Shatt Al-Hilla River; high TSS can decrease the intensity of sunlight in bodies of water and decrease primary productivity, particularly on green algae, and less light can also have a negative impact on the primary and secondary development of aquatic life and the stratification of the temperature system in aquatic environments (Nkegbeet al., 2005).
The most important factors used to verify the quality of sewage are chemical oxygen demand (COD) and biological oxygen demand (BOD). They reflect the amount of organic matter in sewage (Huertasaet al., 2008). COD is a measure of the amount of oxygen required by a strong oxidant (for example, H2SO4) to analyze both organic and inorganic substances in an aqueous system (Akan et al., 2008). High COD levels in water lead to severe oxygen depletion, which adversely affects aquatic organisms (Fatoki et al., 2003). The agonist's chemical oxygen COD demand ranged from an average value of 247.02 mg/L to an average value of 56.97 mg/L for the remaining COD value in the final flow (Table 2). At an average value of 126.56 mg/L, the corresponding biological demand for BOD in raw water varied. At an average value of 14.3 mg/L, the BOD concentration in the handled effluent complied with Iraqi quality standards.

Removal efficiency assessments
Treated sewage is one of the alternatives to be used for watering crops and others. Many physical, chemical and biological processes are designed and operated to simulate natural treatment processes to reduce the pollutant load to the level that nature can deal with. In this regard, it is necessary to pay special attention to assessing the environmental impacts of existing wastewater treatment facilities and achieving efficiency in removing pollutants at high rates (Jamra, 1999).  Table 5. Monthly variance in the removal efficiency of parameters for water quality.
The removal efficiency results in Maaymyrh STP during November 2019 to June 2020, achieved good efficiency over eight months excluding total nitrogen removal and total phosphorous removal, it was noted there is a decrease in removal rates for a some months, and this may be due to insufficient process control as shown in Table (5) and Fig. (3).

Cluster analysis
The cluster analysis findings were seen as a genogram in which the gap between two months corresponds to the similarity and dissimilarity between two months (in terms of treatment effectiveness, the similarity between two months). Fig. (4) showed the process of forming clusters with four stages. It then gave the final form consisting of two clusters, one of which is made up of several branches, where in the first stage, the amount of similarity in terms of its efficiency was shown in Cluster 1 for months (November, December, February) as well as similarity the efficiency of the second cluster of months (June, April, and July) and the third cluster represented the efficiency of the plant in March. The fourth cluster represents the efficiency of the plant in May. The second phase consists of three clusters; the first cluster represents the plant's efficiency for the months (November, December, February, and May). The second cluster represents the plant's efficiency for the months (January, April, and June). The third cluster represents the plant's efficiency for the month (March). The fourth stage, in its final form, consisted of two clusters, where the first cluster gathered the efficacy of the plant for months (November, December, January, February, April, May, and June), as it was close to efficiency, while the second cluster represented the efficiency of the station for March, which was higher than the other months, where we observe the high efficiency of the removal of large nitrogen and total phosphorous (Singh et al., 2004, Pejman et al. 2009, Varol et al. 2012. The method of cluster analysis is considered one of the most important methods used in the statistical analysis of the components in order to easily know the efficiency of the treatment plants to diagnose the problems that occur in the plant and access a high-efficiency operation on a regular and continuous basis

Principal component analysis/factor analysis
The data obtained from the laboratory analysis was used as PCA/FA inputs and output variable of the sewage samples from the Maaymyrh STP. It is described by six physical and chemical parameters, before the data analysis was standardized to generate the normal distribution of all variables since water quality standards have different metrics and scales (Davis, 1973). The loading factor value was categorized as strong (LF> 0.75), moderate (0.5-0.75) and weak (0.4-0.5) (Vega et al.,1998). In analyzing the components of raw sewage entering the STP, two factors were extracted with eigenvalues> 1, representing more than 75.7% of the total variance in the dataset. As shown in Table (6) and Fig. (5). The first factor constitutes 54.192 % of the overall variance and includes TSS, BOD, TP, and COD -with strong positive loading. This indicates a rise in the total suspended solids, as well as organic materials in the sewage entering the treatment plant. The second factor explains 21.598% of the total variance and is strongly loaded with the pH, with a moderate TN positive loading. The high pH in the raw sewage entering the plant and the moderate increase in total nitrogen explains.  In this analysis of treated wastewater components from the water treatment plant, two factors were extracted with eigenvalues greater than 1, representing more than 54,896 of the total variance in the dataset, as shown in Table (7) and Fig. (6). The first factor constitutes 34.638% of the total variance and includes the materials. TDS, with a strong positive load, indicates an increase in the total suspended solids in the treated wastewater leaving the treatment plant. Still, it is within the standard parameters (Vega et al. 1998), while BOD and COD are associated with a moderate positive load, this indicates The organic matter has been processed in a manner that meets the requirements of the standard but requires further treatment. The second factor explains 20.258% of the total contrast and is heavily loaded with TP, with a moderate positive loading of TN. This can be explained by the presence of total phosphorous and total nitrogen despite treatment due to poor management of the operation process.

CONCLUSIONS
The first step in determining the general characterization of domestic sewage in Hilla will be this research, and the findings are as shown below: 1-The concentrations of wastewater entering the plant varied from weak to medium. It was also observed that the ratio of BOD5 /COD in raw water was about 0.54, indicating the presence of a large amount of organic matter subject to biodegradation. 2-The effluent's final content (in a mean value of eight consecutive months) complies with the regulations proposed in the Iraqi Quality Requirements.
Journal of Engineering Volume 27 April 2021 Number 4 92 3-The efficiency of TSS removal was found to be over 94 percent, and sewage treatment units emphasize that during sewage operations, the removal of suspended solids (TSS) is important. It was found that the BOD removal efficiency was over 88%, COD was 75%, TN was 57%, and TP was 77%. 4-Cluster analysis results showed that the months (November, December, January, February, March, April, and July) indicate an average level of treatment compared to the level of the month (March) when the efficiency was good. The sewage component analysis results by means of the analysis of the main components, the extraction of two influencing factors. 5-The overall performance of the current sewage treatment plant was satisfactory. A difference in efficiency was observed during the months, especially in the nitrogen removal process. Still, in general, the level of performance increased in recent months, as the plant was newly established and it needed a certain period for the growth of bacteria.

Acknowledgment
The authors would like to thank the Babil Sewer Directorate for allowing us to obtain data and provide facilities to communicate with the Station Manager, and the Laboratory Staff. The authors also thank the staff of the Sanitary Engineering Lab and the Civil Engineering Department/College of Engineering-University of Baghdad for their valuable support to complete this work.