Rigid Trunk Sewer Deterioration Prediction Models using Multiple Discriminant and Neural Network Models in Baghdad City , Iraq

The deterioration of buried sewers during their lifetime can be affected by several factors leading to bad performance and can damage the infrastructure similar to other engineering structures. The Hydraulic deterioration of the buried sewers caused by sewer blockages while the structural deterioration caused by sewer collapses due to sewer specifications and the surrounding soil characteristics and the groundwater level. The main objective of this research is to develop deterioration models, which are used to predict changes in sewer condition that can provide assessment tools for determining the serviceability of sewer networks in Baghdad city. Two deterioration models were developed and tested using statistical software SPSS, the multiple discriminant model (MDM) and neural network model (NNM). Zublin trunk sewer in Baghdad city was selected as a case study. The deterioration model based on the NNDM provide the highest overall prediction efficiency which could be attributed to its inherent ability to model complex processes. The MDDM provided relatively low overall prediction efficiency, this may be due to the restrictive assumptions by this model. For the NNDM the confusion matrix gave overall prediction efficiency about 87.3% for model training and 70% for model validation, and the overall conclusion from these models may predict that Zublin trunk sewer is of a poor condition.


INTRODUCTION
Sewer networks are subsurface infrastructure systems which collect domestic sewage from different facilities to sewage treatment plants or other places for disposal.Many parts of the sewer network have been deteriorated due to several internal and external factors.This network may need to be replaced, repaired or renovated in order to guarantee their required hydraulic performance and to avoid possibility of failure, Hemed, 2015.Previous studies in the field of sewer deterioration models, as Davies et al., 2001 provided a review of the numerous factors that have been recognized as influencing the structural stability of rigid sewer pipes with their effects on the general process of pipe deterioration and failure.
Tran, 2007 developed several hydraulic and structural deterioration models in Dandenong in Victoria, Australia, by using Markov model for prediction of individual pipes, and the result showed the best performance when predicting sewer deterioration for the selected case study.
Chughtai and Zayed, 2008 applied a multiple regression model on data from two Canadian municipalities (Pierrefonds and Niagara Falls) to simulate the condition state of sewers.It was indicated the developed regression models using the determination coefficient (R 2 ) which can explain 72 to 88 % of the total variability in the operational and structural sewer conditions.Ana, 2009 applied several deterioration models on sewer and inspection data of Leuven and Antwerp cities, Belgium.The cohort survival model seemed to be the most reliable pipe group model for this case study.For the pipe-level models, the logistic regression and the probabilistic neural network (PNN) showed good overall prediction quality.
Khan et al., 2010 developed deterioration models using data from Pierrefonds, Canada.They used neural network modeling with back propagation (BPNN) and probabilistic (PNN) approaches.They used about 20% of the available dataset to test the model.The determination coefficient (R 2 ) ranged within 71 and 86 % depending on the deterioration factors considered.Salman, 2010, applied several deterioration models (ordinal regression, multinomial logistic regression and binary logistic regression analysis) on inspection data of Cincinnati city (USA).The binary logistic regression analysis showed the best performance in predicting sewer deterioration, the total model efficiency was 66%.Prediction efficiency for good condition was 78% and for bad condition 46%.
There are a number of activities that can be undertaken in order to keep the sewer network functional and in good shape such as routine maintenance, repair and renovation.This study is significance to the general authority for sewerage services to monitor the performance of sewer systems in Baghdad city, Iraq, to help utilities to predict future maintenance and rehabilitation or replacement timing.The research's aim is to predict changes in sewer conditions by develop deterioration models that are able to provide assessment tools for Zublin trunk sewer in Baghdad city, Iraq, and to investigate models usefulness and applicability in sewer deterioration modeling.

Case Study Description
The case study in this paper is the Zublin trunk sewer.It is one of the main lines that collect sewage from Al-Rusafa side in Baghdad city with an estimated total length of around 25.4 km with diameters of 1800-2400 mm at depths of 3-7 m.Reaching Al-Rustamiya sewage treatment plant with 3000 mm in diameter at 6-10 m in depth.This line starts from the municipality of Al-Shaab and ends at Al-Rustamiya sewage treatment plant (3 rd expansion) south of Baghdad as shown in Fig. 1.

Data Collection
The predictor's selection and the quality and quantity of the collected data affect models prediction.In this study, data are collected from different departments of Baghdad Mayoralty (design, implementation, planning, operating, maintenance and Geographic Information Systems).In addition, other data were collected from different sections in the different municipalities of Al-Rusafa that Zublin line serves them.The data included: sewer condition, age, material, function, type, shape, diameter, depth, length, slope and traffic intensity.

Multiple Discriminant Deterioration Model (MDDM)
One of the statistical methods that is used to predict or classify individuals into exhaustive and mutually exclusive classes based on a set of predictors is Fisher's linear discriminant analysis LDA, Huberty, 1994.The aim of MDDM is to estimate the linear relationship between a single categorical dependent variable (i.e.condition classes) and a set of quantitative independent variables (e.g.deterioration factors) by maximizing variables in the class scatter, which is called Fisher's criterion, Laitinen, 2007, and used this criterion as the calibration technique for the LDA, Johnson and Wichern, 2002.

Model description
The MDDM uses a group of linear equations of independent variables (i.e.deterioration factors) to determine classification functions, Kley et al., 2013: Where L i is the classification function where i = 1 to j, with j being the number of condition classes, X (1 to n) are the independent variables, βi are the classification coefficients that correspondent to n-number of independent variables, α is the offset.The determination of the coefficients βi can be done by maximizing the variance between classes relative to the within-class variance of Y, Sharma, 1996.

Model assumptions
The following are the assumptions needed to be adhered when using MDDM as: linearity of relationships, equal dispersion matrices and the independent variables follow a multivariate normal distribution, Hair et al., 1998.Studies, however, have shown mixed evidence with regard to the sensitivity of MDDM to violations of the above assumptions.Typically, the violations affect the classification process negatively.

Standardized discriminant functions coefficients
The standardized coefficients β i * are used for the assessment of the relative importance of the discriminator variable i in the discriminant function.These coefficients can be determined using the following expression (Sharma, 1996): Where: β i * standardized coefficient, β i unstandardized coefficient and S i the pooled standard deviation of variable i.

Neural Network Deterioration Model (NNDM)
Neural networks can be used to predict outcome data from input data in a manner that simulates the operation of the human nervous system.Unlike statistical models, NNs have no assumptions related with the model structure because it is determined by data.Generally, the model can simulate nonlinear relationships within the deterioration process and can handle ordinal outputs such as condition classes.In the case of sewer deterioration modeling, the mathematical relationships between independent variables (deterioration factors) and dependent variable (sewer condition classes) are investigated through learning from past data the deterioration behavior of pipes.Then, the gained knowledge from the past data is generalized and stored in the NNs to predict the pipe's condition, Mathematically, a neural network function can be written as below: Where, Y the output signal, Xi the input signal, K the number of input signals, Wi the connection weights, f the activation function.

Activation functions
The values of units in the succeeding layer are linked to the weighted sums of units in a layer by the activation function.The hyperbolic tangent function was used for the hidden layer neurons and the softmax function was used for output layer neurons in this study, since using automatic architecture and the output is categorical, IBM® SPSS® Statistics 20 User Guide.

Data Processing
The dataset available for this network contained 103 records corresponding to individual manholeto-manhole sewer length.In this database some sewers have erroneous entries, e.g.pipes with zero diameters, length and slopes; these sewers were discarded from the analysis.Here, 4 samples with zero length and slope were taken out from the analysis, reducing the useful samples to 99.Out of the 99 useful sewer samples, 79 were set aside for calibration and 20 were for validation.The selection of the pipes for calibration and validation was done using simple random sampling.Some of the entries in the Zublin trunk sewer database are of non-numeric type (e.g.shape, material).These types of data were then converted to numeric type by assigning codes to them, thus facilitating analysis.The data that is needed to build the models and its codes are shown in Table 1.

Coefficients of the classification and standardized discriminant functions
The classification functions coefficients can be used to classify easily sewers into condition states.Whereas, the coefficients of the standardized discriminant functions can be used to assess the relative importance of the discriminator variables, as shown in Table 2.

Sample prediction
A similar function to Eq. ( 1) can be written using the above coefficients to create the four classification functions for the prediction of the condition states of the Zublin sewers.The following classification functions, L i , can be written for condition states i = 2, 3, 4, 5 (there is no sewer pipe in condition 1, which is excellent): To make a classification, the observed values of the predictors are inserted into the classification functions above to calculate a classification score.The observation is assigned to the class with the highest classification score.

NNDM
In this model, approximately 64% of the data were assigned for training, 17% for testing and 19% to a holdout sample.Furthermore, values of all the scale input factors are rescaled using normalized method according to Eq. ( 4) to improve network training. (4)

Training of NNDM
NNDM training in this study, is used to calculate the model structure (i.e. the network weights and the hidden neurons numbers).For the hidden neurons numbers, four neurons in the hidden layer has chosen by automatic architecture selection.The optimization algorithm that is used to estimate the network weights is scaled conjugate gradient with batch training type as they suitable for small datasets.

Sample prediction
The model architecture is listed in Table 3, the condition of a sewer with a particular characteristic can then be predicted.
The non-linear relationship between the input and output data can be written as follows: Where: n the number of the predictors, W o bias weight, H i the output of the hidden neurons, Y j the output of the output neuron, H k is the input for Y j .
To make a classification, the observed values of the predictors are inserted into the equations above to calculate a classification score, which is a value between 0-1.The observation is assigned to the class with the highest classification score.

Independent Variable Importance
The importance of each independent variable was computed in determining the neural network based on the combined training and calibrating samples, IBM® SPSS® Statistics 20 User Gide.Table 4 appears that variables age and traffic have the greatest effect on how the network classifies sewers followed by diameter, material, type, slope, length, depth respectively.The independent variable importance is a measure of the amount of changes in value, and predicted the network model for different values of the independent variable while normalized importance is simply the importance values divided by the largest importance values and expressed as percentages.

Model Performance Evaluation
For evaluating model performance, the model error (i.e. the difference between predicted and observed values) must be quantified,

Confusion matrix
When comparing an observed values with model prediction, four possible situations can be observed: (1) true positive (TP) when the model correctly predicts the sewer condition (i.e.pipe in good condition), (2) true negative (TN) when the model correctly predicts the sewer condition (i.e.pipe in poor condition), (3) false positive (FP) when the model incorrectly predicts the sewer condition as a negative case (i.e.sewer in good condition predicted as being in bad condition), and (4) false negative (FN) when the model incorrectly predicts the sewer condition as a positive case (e.g.sewer in poor condition predicted as being in good condition) as shown in Table 5.The TP11 in this table means the number of pipes which were observed and correctly predicted in condition 1.In addition, O1, O2 and O3 represent the total pipes number which were observed in condition 1, 2 and 3 respectively and P1, P2 and P3 represent the total pipes number which were predicted in condition 1, 2 and 3 respectively, Tran, 2007.
The overall predicted efficiency (OPE) was used to evaluate the performance prediction of MDDM and NNDM which were developed in this study to predict the changes of pipe conditions.The OPE can be computed from the confusion matrix using Eq. ( 8).Evaluating (8) Tables 6 and 7 are the confusion matrices for MDDM and NNDM.Which showed that the deterioration model based on the NNDM provide the highest overall prediction efficiency.The high overall prediction efficiency by the NNDM could be attributed to its inherent ability to model complex processes.The MDDM provided relatively low overall prediction efficiency, this may be due to the restrictive assumptions by this model such as the assumption of the normality of the predictor variables which is difficult to satisfy with the given dataset.

CONCLUSIONS AND RECOMMENDATIONS
In this paper, the MDDM and NNDM, were developed, tested and evaluated using the sewer dataset as an assessment tool for determining the serviceability of the Zublin trunk sewer.Among these two models, the NNDM was found to have a high overall prediction efficiency level than MDDM.This model, however, is susceptible to bias in predicting the conditions of sewers with the greatest number of samples in the calibration dataset.According to NNDM the most effective factors influence deterioration model is age, traffic, diameter, material, type, slope, length, depth respectively.
The overall conclusion from these models may predict that Zublin trunk sewer is of a poor condition.To solve this problem continuous maintenance may keep the sewer in good condition and it working in a high performance level reaching the design limits.A good documentation of all observations and problems will help reviewing the sewer system performance as well as providing a good source of information for future planning.
et al., 2007.3.2.1 Model structureGenerally, a neural network is composed of artificial neurons that are connected together and ranged in different layers in order to reduce the complexity, Al-Barqawi and Zayed, 2008, as shown in Fig.2.The connection weights, which attach the connections between neurons are determined by minimizing the error between the predicted output and the actual output value using the observed data, Salman, 2010.The NNs have always a special input signal values equal 1, with a bias weight.The function of bias weight is to allow or stop the input signals going through by (being non-zero value) or (being zero-value) respectively.
3.2.2Feed-forward typeFeed forward type of NNs was used in this study to reduce the unnecessary complexity when determining NN models structure, Lou et al., 2001.As can be seen from Fig.2, the connections in the network flow forward from the input layer to the output layer without any feedback loops.

‫نوارج‬ ‫تنبؤ‬ ‫هذينت‬ ‫في‬ ‫العصبيت‬ ‫والشبكاث‬ ‫التوايز‬ ‫هتعذدة‬ ‫نوارج‬ ‫باستخذام‬ ‫الصلبت‬ ‫الرئيسيت‬ ‫الوجاري‬ ‫تذهور‬ ‫بغذاد،العراق‬
Wright et al., 2006.When high model error, the performance model is low.The confusion matrix is often used for ordinal and categorical outputs.The validation dataset should be used to effectively test the model, Baik et al., 2006.

Table 1 .
Summary code of sewer network.

Table 2 .
Coefficients of the classification and standardized discriminant functions.

Table 3 .
Estimation of hidden and output parameters.

Table 6 .
Prediction efficiencies during the calibration and validation of the MDDM.

Table 7 .
Prediction efficiencies during the calibration and validation of the NNDM.