Measuring the Attribute Accuracy and Completeness for the OpenStreetMap Roads Networks for Two Regions in Iraq

T he OpenStreetMap (OSM) project aims to establish a free geospatial database for the entire world which is editable by international volunteers. The OSM database contains a wide range of different types of geographical data and characteristics, including highways, buildings, and land use regions. The varying scientific backgrounds of the volunteers can affect the quality of the spatial data that is produced and shared on the internet as an OSM dataset. This study aims to compare the completeness and attribute accuracy of the OSM road networks with the data supplied by a digitizing process for areas in the Baghdad and Thi-Qar governorates. The analyses are primarily based on calculating the portion of the commission (extra road) and omission (missing road) for OSM roads. The calculations also involved measuring the classifications and the attribute correctness associated with geometrical shapes. The results indicated that the completion rates were very high in the two study areas, and the percentages of labels or names were low in the two study areas. However, it was better on the main roads than in other


INTRODUCTION
In the past few years, academics have paid much attention to collaborative mapping initiatives whose major objective is gathering and disseminating freely accessible geodata. On the internet, there are many different initiatives where volunteers, for the most part, contribute their knowledge and skills (Neis et al., 2013). The efforts concentrate on gathering various data, particularly information on geographic objects and their accompanying data. This type of information was previously referred to as Volunteered Geographic Information (VGI) since the data collection activities were volunteer (Goodchild, 2007). Early concerns regarding "the occurrence of VGI, and the use of VGI in doing science" (Kuhn, 2007), particularly in the field of Geographic Information Science (GIScience), were expressed by (Kuhn, 2007). The OpenStreetMap (OSM) project is the subject of most VGI investigations. In 2004, Steve Coast introduced the (OSM), a website with a geographic database. It seeks to create and provide openly accessible geographic datasets to a global audience (Al-Bakri, 2015). This project offers a wealth of research issues due to the diversity of its data. The motivations and activity range of the community that shares the information or the development of applications based on the gathered data contained in OSM. Quality, in general, is crucial when working with all types of geodata, especially during data production, evaluation, and interchange. This is especially true with OSM data because there are no limitations placed on the contributors during the data collecting and annotation phases. Geo-information quality may be assessed using ISO's (International Organization for Standardization) guiding principles. The fundamental principles for geodata quality are described in the ISO 19113 standard. Procedures for assessing the quality of digital geospatial data are outlined in ISO 19114. To consolidate the standards relating to data quality and to update the mentioned standards, the ISO 19157 "Geographic Information: Data Quality" standard, which itself is presently under development, is intended. The ISO 19113 components can be used to assess spatial data quality. This includes completeness, logical consistency, positional accuracy, temporal accuracy, and thematic accuracy ( Barron et al., 2014). ArcGIS software can be a useful tool for managing and analyzing spatial data quality (Wattan and Al-Bakri, 2019). The (GIS) system may be used to record, examine, organize, and display all aspects of spatially linked data. A GIS system lets to understand and analyze the data in a manner that patterns, trends, and discloses correlations (Khazael and Al-Bakri, 2021; Hassan and Ibrahim, 2018). Many researchers focused on the analysis of OSM data quality. For instance, (Zheng and Zheng, 2014) achieved research in which completeness and positional accuracy were chosen as the two data quality factors to compare against Baidu datasets to conduct the assessment. This research uses the density of line and point features to illustrate some aspects of OSM data distribution quantitatively. The analysis revealed that while 71 percent of the OSM data were less accurate on average than the Baidu datasets, 66 percent of the OSM data were accurate. The complete OSM data with the highest positional accuracy is available for Beijing and Shanghai. The coverage, in general, was exceedingly inadequate because more than 94 percent of the nation was made up of "incomplete areas" (regions with little to no data). But a three-year comparison, from 2011 to 2013, shows that OSM data has grown rapidly. In another study, (Al-Bakri and Sfoog, 2018) created a specialized tool that uses the Matlab programming language to examine and evaluate the likelihood of matching OSM road features with a reference dataset to determine whether the OSM data is of the same quality throughout both study regions in Iraq (Baghdad and Karbala). The findings indicated that OSM has very variable differences, meaning that the data's usefulness can be used with reliability for various cartographic uses but not with accuracy for commercial decisions. Furthermore, (Jasim and Al-Hamadani, 2020) intended to compare the OSM road network data for the cities of AL-Aadhamiyah and AL-Kadhumiyah with the official road data received from the Mayoralty of Baghdad (MB). The extent of horizontal positional accuracy was evaluated using the National Standard Spatial Data Accuracy (NSSDA) approach. According to the investigation, none of the three data sources' positional accuracy at any of the study sites coincides with that of the MB dataset. This study examined the road-representing OSM line characteristics and evaluated the relevant accuracy concerns, including their completeness and attribute (thematic accuracy). The OSM data in two different regions in Iraq were considered since various locations offered publicly accessible sources; nonetheless, the ideas discussed in this study can be transferable to any geographic area or dataset.

The Study Sites and Datasets
This study is conducted in two areas of Iraq, one in Baghdad and the other in Thi-Qar Governorates, to compare the results of the two study areas. Baghdad is the capital of Iraq and is located on the Tigris River in the country's center. Baghdad has a population of over 8,400,000 people and an area of approximately 860 km 2 (Abd Ali et al., 2010). The parts which have been adopted included (AL-Jadriy, AL-Karada, and the University of Baghdad) which are mostly commercial and residential areas, which are urban areas (see Fig. 1). Thi-Qar is another governorate of Iraq, located in the country's south, with a population of over 2,000,000 people and an area of approximately 12,900 km 2 . The parts which have been involved in this study were (AL-Shula, AL-Shumoukh, and AL-Iskan) which also commercial and residential areas (Shenechel, 2015), (see Fig. 2). The area of the selected region in Baghdad was 36.369 km 2 , which is an urban area for comparison with the second study area in Thi-Qar, which was 36.868 km 2 . The roadways network of OSM was compared to a reference database for two study areas-Baghdad and Thi-Qar-to conduct this research. The methods in the OSM data were compared in terms of their lengths and names with the equivalent methods from the digitization data. Through digitization, formal or reference datasets were taken from satellite images with a resolution of 0.5 m. The Baghdad and Thi-Qar (Iraq) OSM roads dataset was downloaded from the geo-fabric service in Jan. 2022 (Geofabric, 2022).

Evaluating the Completeness
The completeness determines if real-world features and their properties are present or absent in the database (whether a region has been well covered). Completeness comprises the data quality elements of omission and commission, according to the ISO 19113 standard. Data omitted from the database is absent, and data that is in excess is referred to as a commission (Yang and Blower et al., 2013). Two sorts of approaches, including object-based methods and unit-based methods, were presented by earlier investigations. First, matching objects are found in the two databases using object-based methods. The percentage of missing objects from the OSM database is then calculated. The matching procedure can be challenging and time-consuming when dealing with roads and other linear items. One segment in OSM may be equivalent to several segments in the reference database and vice versa. As a result, segment-matching algorithms may not be the ideal choice for evaluating road completeness. Several earlier studies employed object-based models to evaluate the completeness of OSM. However, the vast majority of studies employed unit-based techniques. Using a unit-based method, completeness can be calculated by comparing the total length of the roads in the OSM (OSM road length) with the reference dataset (reference road length). The following formula can be used to determine the completeness with length (Zhang, 2017).
Compared to the object-based methods, the unit-based method does not require object matching and is not susceptible to digitization difficulties. Hence, this method was used in this research. The omission and commission values have been adopted and applied as well.
Omission (absence of data) can be obtained from reference data agreement percentages. In contrast, data commission (existence of additional data) requires a more detailed assessment since OSM data agreement percentage refers to all OSM objects, including information that may be missing in the reference dataset. The omission and commission can be calculated by comparing the number of omissions and commission roads, respectively, with the number of OSM roads. The following formulas can be used to determine the omission and commission percentages, respectively (Zhou, 2017

Evaluating the Attribute Accuracy
According to the ISO standard, attribute or thematic accuracy is "the accuracy of the quantitative variables, the precision of the non-qualitative characteristics, and the validity of the features classes." (López et al., 2020). In actuality, the attributes describe the traits of the physical properties of the items. The accuracy of those properties reveals how well a database entity is modeled (Siebritz, 2014). For attributes in GIS, there are four measurement scales: nominal, ordinal, interval, and ratio. There are various ways to assess each type's connected variables' correctness. The road's name is one of the most crucial characteristics of the road network because it is crucial for locating addresses. As a result, the reliability of street names has been assessed in several studies (Brownson et al., 2004). Road names are nominal variables; hence, it is impossible to evaluate their accuracy like numeric variables (Morrison, 1995). In this research, the names of the roads were obtained by interviewing local residents and asking the area's original inhabitants about the names of the roads. Most of the names of the main roads were acquired and tabulated. These data have been used as a source for the comparison with the data of the attribute tables of OpenStreetMap for the two study areas.

RESULTS ANALYSIS
This study determined and analyzed the completeness and attribute accuracy values. All roads in a reference map were considered to determine the OSM roads' completeness, omission, and commission. Also, the percentage of name roads has been visually determined by referring to the reference data. Fig. 3 presents the percentage of completeness for the study area in Baghdad, which was 99.44, while for the study area in Thi-Qar was 98.40, calculated from eq. No. 1. It represents a high coverage of the OpenStreetMap data for the two study areas. This is probably because those two study areas are located in the middle or center of the governorates of Baghdad and Dhi-Qar, which indicates that the OSM road data is almost complete in such urban areas or city centers. The completeness percentage of OSM road classifications for the Baghdad study area is presented in Table 1. These results are also illustrated in Fig. 4  The results of the preliminary analysis of the completeness of OSM roads classification for the Thi-Qar study area are presented in Table 2 and Fig. 5. It can be seen from the data in Fig. 5 that the completeness percentage of construction was 99.9, the path was 100, the pedestrian was 100, the primary was 99.9, residential was 99.9, secondary was 99.9, the secondary link was 99.9, service was 99.9, steps was 99.8, tertiary was 99.9, the tertiary link was 99.9, the trunk was 99.9, trunk link 99.9, and unclassified was 99.9. The results indicated that high and close completion percentage for all OSM roads classification, Perhaps because the participants contributed to this data during these recent years, where data collection sources are available, and the large number of volunteers interested in adding data to OSM and also the study area represents the center of the region.  The omission and commission percentage for the study area of (Baghdad) was zero when applied Eqs. (2) and (3). While for Thi-Qar, the omission commission percentages were 2.1 and 0.1 when Eqs. (2) and (3) were applied, respectively. The difference in these percentages is because the Baghdad study area is the most important and has received the attention of the volunteers. Hence, its data is complete, unlike the Thi-Qar study area. The current study also evaluated the attribute or name accuracy for the OSM roads. The name is a nominal variable, essentially a string. One way to effectively compare two sets of data is through a ratio. The ratio is the division of OSM data by the corresponding reference data. Fig. 6 provides the results obtained from the initial analysis of the attribute accuracy of Baghdad. The pie chart below shows that only 9.35% of the OSM roads had names. This is because the volunteers do not belong to this place and do not have knowledge of the names of the roads. Fig. 7 shows the percentage of accuracy of OSM road names. It shows that 95.5% of the road names in both databases are identical, which is a good quality for many applications. Conversely, 4.5% of OSM road names differed from those in the reference database by 6 or more letters. This indicates that the road name in the OSM database is probably entirely incorrect. percentage For attribute accuracy of Baghdad  Fig. 8 presents the percentage of attribute accuracy for each class of the OSM roads of the study area in Baghdad. The percentage of the attribute accuracy was as follows: footway was 0, the path was 0, the primary was 100, the primary link was 47, residential was 4, secondary was 37, the secondary link was 0, service was 2, tertiary was 19, tertiary link was 0, the track was 0, the trunk was 45, trunk link was 0 and unclassified was 0. There are varying percentages of the accuracy of the names, although the primary roads were completely named. This may be due to the different experiences of the people who collected this data, or they may be, or they may be from several regions different from the study area. The results of the attribute accuracy assessment of the study area (Thi-Qar) are set out in Fig. 9. It can be seen from the chart below that only 3.63% of the OSM roads had names. A possible explanation for these results may be that the volunteers who uploaded the classifications and names of the OSM roads may not be residents of the same study area. Therefore their knowledge of the categories of these roads is relatively little. The percentage of correct names from it was 100, as the names were completely identical when compared, as shown in Fig. 10. The percentage of attribute accuracy for each class of OSM roads was also examined. Fig. 11 presents the results of the attribute accuracy of OSM road classification for the Thi-Qar study area. The percentage of the attribute accuracy was as follows: residential was 0.17, secondary was 59, the secondary link was 0, service was 0, steps was 0, tertiary was 34, teritary_link was 0, the trunk was 55, trunk link was 0, unclassified was 18, construction was 18, the path was 0, pedestrian 0, the primary was 0. There are varying percentages of the accuracy of the names. However, the secondary classification and trunk presented a higher percentage, representing that the main roads of the study area were completely named. Figure 9. The attribute accuracy Figure 10. The true and false name road Percentage in Thi-Qar.
The percentage for attribute accuracy in Thi-Qar Figure 11. The attribute accuracy percentage of the OSM road classification Thi-Qar.

CONCLUSION
The present work is intended to investigate the completeness and attribute accuracy of the OSM roads for two different study areas (an area in Baghdad and another in Thi-Qar). Completeness detection methods were divided into commissions (explains the occurrence of redundant data in the data set) and omission (explains the non-appearance of data in the data set). The attributed accuracy of the classification of OSM road features is also examined. Based on the presented main findings, the following conclusions can be drawn: 1-The OSM data's completeness varies by the different geographic regions, where the findings indicated high completeness percentage for the two study areas, with the best percentage for the Baghdad area. In addition, there are no omission and commission rates in the first study area at Baghdad compared to the second study area in Thi-Qar.