Spatial Data Quality and Data Aggregation
Introduction
The crime has been understood as one of the main problems in societies,
and it is important for authorities to make decisions based on data in order to
reach desired goals for future well-being.
In this practice, is reviewed the data about crime events in Estonia and how it is distributed in different administrative areas. At the same time, the analysis includes population data obtained in different resolutions as an overlapping layer, with the main purpose of finding statistical significance between the population and the number of crimes.
The spatial data quality and the data aggregation are the key in this spatial analysis, to gain the correct interpretation of data.
In this practice, is reviewed the data about crime events in Estonia and how it is distributed in different administrative areas. At the same time, the analysis includes population data obtained in different resolutions as an overlapping layer, with the main purpose of finding statistical significance between the population and the number of crimes.
The spatial data quality and the data aggregation are the key in this spatial analysis, to gain the correct interpretation of data.
1. Description
1.1 Task 1 - Exploring the data quality of the dataset
In this first process, it is necessary to comprehend the location of the crimes and the intensity of the occurrence, and it is done by spatial representation with geographic coordinates (WGS 84 EPSG: 4326) of a .csv file obtained from Estonian Police and Boarder Guard Board Open Data web, afterwards the data is projected to Estonian national coordinate system (EPSG: 3301).
The data shows the crimes occurred in
November and December of 2012 in Estonia, it contains column named ‘Precision’
that has the resolution with the data was obtained, such as 500 or 1000, than
means the cell size of the grid that was used for collect this data, and the
spatial visualization displays the greatest resolution in urban areas and lower
resolution in rural areas. (Figure 1). As a quick remark, the data location is
confidential, so that gathered the crimes represented as points overlapped,
interesting representation that let us use spatial tools for representation.
![]() |
| Figure 1: Map of data quality differentiated by urban and rural areas Source: Vallejo, Spatial Data Studio Lab 8, 2019 |
1.2 Task 2 - Analysing Point Density
1.2.1 Weighted Points Representation
As it was mentioned, it is not possible to visualize all the points,
because of confidentiality the points are stored by grid resolution overlapped.
In total, 15.230 crime incidents are displayed in Estonia, considering all
categories.
Then, to appreciate the
spatial weight of the crime incidents we used the ‘Collect Events Tool’. (ESRI, 2019) ., from ArcGIS
software. The result is shown by quantiles classification, and as a comment we
can see (Figure 2) the crime incidents are concentrated in urban areas of
Estonia.
![]() |
| Figure 2: Map of crime incidents in Estonia Source: Vallejo, Spatial Data Studio Lab 8, 2019 |
1.2.2. Point Density with 1000m Grid
Nevertheless, it is possible to display
the crime incidents in another way with ‘Point Density Tool’. (ESRI, 2019)., and it represents the points
concentration by different grid cell size. It is recommended to find the
suitable size of grid cell in order to make the visualization pleasant, and
most important to make it reliable.
The
point density in this result is shown each 1000m cell size, it means each
square kilometre. As we can see (Figure 3), the crime incidents are
concentrated in urban areas, and personally makes a reliable visualization
because in it let us see precise around the area.
![]() |
| Figure 3: Map of density of crime incidents by square kilometre 1000m Grid Source: Vallejo, Spatial Data Studio Lab 8, 2019 |
1.2.3. Point density with 5000m Grid
As we saw in the last result, the small
areas also show concentration. But, in this analysis we are performing ‘Point
Density Tool’. (ESRI, 2019) . with 5000m cell
size, to determine which grid is showing the reliable crime incident density.
The
result of this analysis, is a map with no concentration of crime incidents in
rural areas (Figure 4). And in this moment, is where the analyst decides to
drop some visualizations off and appreciate more others, because some
representations are more reliable than others.
![]() |
| Figure 4: Map of density of crime incidents by square kilometre 5000m Grid Source: Vallejo, Spatial Data Studio Lab 8, 2019 |
To understand better
how the data behaves, the differences between Figure 3 and Figure 4 are shown
in the next picture (Figure 5).
![]() |
| Figure 5: Pixel differentiation between 1000m Grid and 5000m Grid in Tallin Source: Vallejo, Spatial Data Studio Lab 8, 2019 |
1.3. Task 3 - Analysing relationship between population and crime incidents
As we can see in the
previous results, the greatest concentration of crimes incidents is in cities,
and it leads the hypothesis that populated places are related with crime, and
to solve this idea we are going to analyze the population and crime in two
different levels.
1.3.1. Administrative level analysis
As a basic understanding, was necessary to create a choropleth map with absolute population in administrative areas. The result, as we expected, shows the population concentrated in administrative areas with cities as Tallin, Tartu, Pärnu or Narva. Then, the choropleth is overlapped with number of crimes (Figure 6).![]() |
| Figure 6: Map of population distribution in administrative areas and crimes Source: Vallejo, Spatial Data Studio Lab 8, 2019. |
But the conclusion is not already finished, we needed to clarify the idea about population and crime, and for the next step we are performing a ‘Spatial Join’ that basically it counts how many crimes are occurring per administrative area. The idea, is to comprehend how much resemblance this resulted (Figure 7) has with the map of population per administrative areas.
![]() |
| Figure 7: Crime Incidents per administrative area Source: Vallejo, Spatial Data Studio Lab 8, 2019 |
And as it was told, the map of number of crimes per administrative unit resembles with the map of absolute population, both shows concentration in urban areas. At this point, we performed an Correlation Analysis to see how much population and number of crime events are related, the interpretation revealed that R2 = 0,95 (Figure 8), it means that the 95% of the variation in the population (dependent variable – y) is explained by the variation in the number of crimes (independent variable – x).
![]() |
| Figure 8: Correlation analysis between population and crimes Source: Vallejo, Spatial Data Studio Lab 8, 2019 |
1.3.2. Square Kilometre Analysis
Working with spatial data requires some
clues to be considered, like the spatial normalization. When you are working
with absolute numbers or administrative areas the data is unreliable at spatial
dimension; in this case, the bigger is the administrative area, bigger is the
number of crimes and population.
So,
to solve this inconvenient, es necessary to normalize the data spatially. We
already have the population normalized to square kilometers (Figure 9), then I
will show you how the crimes should be normalized.
![]() |
| Figure 9: Population of Estonia per square kilometre Source: Spatial Data Studio Lab 8, 2019 |
Now, to normalize the crime incidents we
had to follow some key steps that are listed here:
1)
‘Create
Fishnet Tool’ with spatial extent of normalized population layer.
2)
‘Union
Tool’ of the new fishnet with the normalized population layer.
3)
‘Raster
to Point Tool’ with the raster crime data that is shown in Figure 3.
4)
‘Spatial
Join’ of the fishnet (step 2) with the Crime Data points (step 3).
Then, is necessary to get rid of the trash
data such as squares with no population or with no crime incidents, and we did
it with the next SQL statement applied as a query:
"grid_code" <> 0 OR "POP"
<> 0
Grid Code shows the crime incidents, and
Pop the populations. The fishnet will look like Figure 10.
![]() |
| Figure 10: Fishnet with normalization of population and crime incidents Source: Vallejo, Spatial Data Studio Lab 8, 2019 |
Finally, we just exported the data as a
.txt and opened in excel. This step was done in order to make the Correlation
Analysis with the normalized data, to understand the relation between the
population and the crime incidents.
The Correlation Analysis with normalized
areas shows that R2=0,32 (Figure 11), it means that the 32% of the
variation in the population (dependent variable – y) is explained by the
variation in the number of crimes (independent variable – x), in other words,
the population has a very low relationship with crime incidents, based on
spatial analysis.
![]() |
| Figure 11: Correlation analysis between population and crimes - Normalized area Source: Vallejo, Spatial Data Studio Lab 8, 2019 |
References
- ESRI. (2019). Retrieved 14/11 of 2019, from: http://desktop.arcgis.com/en/arcmap/10.3/tools/
- Wiseman, A. (2013). youtube.com. (T. T. University, Editor) Retrieved 14/ 11 de 2019, from: https://www.youtube.com/watch?v=w5H6ZAg2WIA










