Bryan R. Vallejo e-portfolio: enero 2020

Spatial Data Quality and Data Aggregation

Introduction

The crime has been understood as one of the main problems in societies, and it is important for authorities to make decisions based on data in order to reach desired goals for future well-being.

In this practice, is reviewed the data about crime events in Estonia and how it is distributed in different administrative areas. At the same time, the analysis includes population data obtained in different resolutions as an overlapping layer, with the main purpose of finding statistical significance between the population and the number of crimes.

The spatial data quality and the data aggregation are the key in this spatial analysis, to gain the correct interpretation of data.

1. Description

1.1 Task 1 - Exploring the data quality of the dataset

In this first process, it is necessary to comprehend the location of the crimes and the intensity of the occurrence, and it is done by spatial representation with geographic coordinates (WGS 84 EPSG: 4326) of a .csv file obtained from Estonian Police and Boarder Guard Board Open Data web, afterwards the data is projected to Estonian national coordinate system (EPSG: 3301).

The data shows the crimes occurred in November and December of 2012 in Estonia, it contains column named ‘Precision’ that has the resolution with the data was obtained, such as 500 or 1000, than means the cell size of the grid that was used for collect this data, and the spatial visualization displays the greatest resolution in urban areas and lower resolution in rural areas. (Figure 1). As a quick remark, the data location is confidential, so that gathered the crimes represented as points overlapped, interesting representation that let us use spatial tools for representation.

Figure 1: Map of data quality differentiated by urban and rural areas
Source: Vallejo, Spatial Data Studio Lab 8, 2019

1.2 Task 2 - Analysing Point Density

1.2.1 Weighted Points Representation

As it was mentioned, it is not possible to visualize all the points, because of confidentiality the points are stored by grid resolution overlapped. In total, 15.230 crime incidents are displayed in Estonia, considering all categories.

Then, to appreciate the spatial weight of the crime incidents we used the ‘Collect Events Tool’. (ESRI, 2019)., from ArcGIS software. The result is shown by quantiles classification, and as a comment we can see (Figure 2) the crime incidents are concentrated in urban areas of Estonia.

Figure 2: Map of crime incidents in Estonia
Source: Vallejo, Spatial Data Studio Lab 8, 2019

1.2.2. Point Density with 1000m Grid

Nevertheless, it is possible to display the crime incidents in another way with ‘Point Density Tool’. (ESRI, 2019)., and it represents the points concentration by different grid cell size. It is recommended to find the suitable size of grid cell in order to make the visualization pleasant, and most important to make it reliable.

The point density in this result is shown each 1000m cell size, it means each square kilometre. As we can see (Figure 3), the crime incidents are concentrated in urban areas, and personally makes a reliable visualization because in it let us see precise around the area.

Figure 3: Map of density of crime incidents by square kilometre 1000m Grid
Source: Vallejo, Spatial Data Studio Lab 8, 2019

1.2.3. Point density with 5000m Grid

As we saw in the last result, the small areas also show concentration. But, in this analysis we are performing ‘Point Density Tool’. (ESRI, 2019). with 5000m cell size, to determine which grid is showing the reliable crime incident density.

The result of this analysis, is a map with no concentration of crime incidents in rural areas (Figure 4). And in this moment, is where the analyst decides to drop some visualizations off and appreciate more others, because some representations are more reliable than others.

Figure 4: Map of density of crime incidents by square kilometre 5000m Grid
Source: Vallejo, Spatial Data Studio Lab 8, 2019

To understand better how the data behaves, the differences between Figure 3 and Figure 4 are shown in the next picture (Figure 5).

Figure 5: Pixel differentiation between 1000m Grid and 5000m Grid in Tallin
Source: Vallejo, Spatial Data Studio Lab 8, 2019

1.3. Task 3 - Analysing relationship between population and crime incidents

As we can see in the previous results, the greatest concentration of crimes incidents is in cities, and it leads the hypothesis that populated places are related with crime, and to solve this idea we are going to analyze the population and crime in two different levels.

1.3.1. Administrative level analysis

As a basic understanding, was necessary to create a choropleth map with absolute population in administrative areas. The result, as we expected, shows the population concentrated in administrative areas with cities as Tallin, Tartu, Pärnu or Narva. Then, the choropleth is overlapped with number of crimes (Figure 6).

Figure 6: Map of population distribution in administrative areas and crimes
Source: Vallejo, Spatial Data Studio Lab 8, 2019.

But the conclusion is not already finished, we needed to clarify the idea about population and crime, and for the next step we are performing a ‘Spatial Join’ that basically it counts how many crimes are occurring per administrative area. The idea, is to comprehend how much resemblance this resulted (Figure 7) has with the map of population per administrative areas.

Figure 7: Crime Incidents per administrative area
Source: Vallejo, Spatial Data Studio Lab 8, 2019

And as it was told, the map of number of crimes per administrative unit resembles with the map of absolute population, both shows concentration in urban areas. At this point, we performed an Correlation Analysis to see how much population and number of crime events are related, the interpretation revealed that R² = 0,95 (Figure 8), it means that the 95% of the variation in the population (dependent variable – y) is explained by the variation in the number of crimes (independent variable – x). (Wiseman, 2013). But the square kilometer analysis is still missing.

Figure 8: Correlation analysis between population and crimes
Source: Vallejo, Spatial Data Studio Lab 8, 2019

1.3.2. Square Kilometre Analysis

Working with spatial data requires some clues to be considered, like the spatial normalization. When you are working with absolute numbers or administrative areas the data is unreliable at spatial dimension; in this case, the bigger is the administrative area, bigger is the number of crimes and population.

So, to solve this inconvenient, es necessary to normalize the data spatially. We already have the population normalized to square kilometers (Figure 9), then I will show you how the crimes should be normalized.

Figure 9: Population of Estonia per square kilometre
Source: Spatial Data Studio Lab 8, 2019

Now, to normalize the crime incidents we had to follow some key steps that are listed here:

1) ‘Create Fishnet Tool’ with spatial extent of normalized population layer.

2) ‘Union Tool’ of the new fishnet with the normalized population layer.

3) ‘Raster to Point Tool’ with the raster crime data that is shown in Figure 3.

4) ‘Spatial Join’ of the fishnet (step 2) with the Crime Data points (step 3).

Then, is necessary to get rid of the trash data such as squares with no population or with no crime incidents, and we did it with the next SQL statement applied as a query:

"grid_code" <> 0 OR "POP" <> 0

Grid Code shows the crime incidents, and Pop the populations. The fishnet will look like Figure 10.

Figure 10: Fishnet with normalization of population and crime incidents
Source: Vallejo, Spatial Data Studio Lab 8, 2019

Finally, we just exported the data as a .txt and opened in excel. This step was done in order to make the Correlation Analysis with the normalized data, to understand the relation between the population and the crime incidents.

The Correlation Analysis with normalized areas shows that R²=0,32 (Figure 11), it means that the 32% of the variation in the population (dependent variable – y) is explained by the variation in the number of crimes (independent variable – x), in other words, the population has a very low relationship with crime incidents, based on spatial analysis.

Figure 11: Correlation analysis between population and crimes - Normalized area
Source: Vallejo, Spatial Data Studio Lab 8, 2019

References

ESRI. (2019). Retrieved 14/11 of 2019, from: http://desktop.arcgis.com/en/arcmap/10.3/tools/
Wiseman, A. (2013). youtube.com. (T. T. University, Editor) Retrieved 14/ 11 de 2019, from: https://www.youtube.com/watch?v=w5H6ZAg2WIA

Pages

Bryan R. Vallejo e-portfolio

Spatial Data Quality and Data Aggregation

Introduction

1. Description

1.1 Task 1 - Exploring the data quality of the dataset

1.2 Task 2 - Analysing Point Density

1.2.1 Weighted Points Representation

1.2.2. Point Density with 1000m Grid

1.2.3. Point density with 5000m Grid

1.3. Task 3 - Analysing relationship between population and crime incidents

1.3.1. Administrative level analysis

1.3.2. Square Kilometre Analysis

References

About author

Follow me

Labels

recent posts

Blog Archive

Visits in last 30 days

Contact me

Colaborator in