Home > Analytics > Choropleth Maps in R

Choropleth maps provides a very simple and easy way to understand visualizations of a measurement across different geographical areas, be it states or countries.

If you were to compare growth rate of Indian states and present it to a bunch of people who have 15-20 seconds to look at it and infer insights from the data, what would be the right way? The best way? Would presenting the data in the traditional tabular format make sense? Or bar graphs would look better?

Bar graphs, indeed, will look better and present the data in visually appealing manner and provide a good comparison; but, will it make an impact in 15 seconds? I personally won’t be able to bring the desired outcome, moreover data for 36 states and union territories in 36 bars will make it cumbersome to scroll up and down. We have a much better alternative to table and bar charts, choropleth maps.

Choropleth maps are thematic maps in which different areas are colored or shaded in accordance with the value of a statistical variable being represented in the map. Taking an example, let’s say we were to compare population density in different states of the United States of America in a colorful manner, choropleth maps would be our best bet for representation. To sum it up, choropleth maps provides a very simple and easy way to understand visualizations of a measurement across different geographical areas, be it states or countries.

Let’s take some examples of choropleth maps and where they come handy in presenting data.

Choropleth maps are widely used to represent macroeconomic variables such as GDP growth rate, population density, per-capita income, etc. on a world map and provide a proportional comparison among countries. This can also be done for states within a country.
These maps can also be used to present nominal data such as gain/loss/no change in number of seats by an election party in a country.

One of the limitations of using choropleth maps is that they don’t provide details of total or absolute values. They are among the best for proportional comparison but when it comes to presenting absolute values, choropleth maps are not the right fit.

Now, let us try to see the practical implementation of choropleth maps in R. In the following code, we will try to achieve the following objectives as part of the overall implementation of the maps.

Download and import the maps shape in R
Creating our own dataset and representing it in the map of India
Merging dataset and preparing it for visual representation
Improving visualization
Display external data on choropleth maps
Presenting multiple maps at once

Download and import the maps share in R

There are multiple sites from where you can download shape files for free. I used this site(http://www.diva-gis.org/gdata) for downloading administrative map of India for further processing. Once you download the file, unzip the file and set your R working directory to the unzipped folder.

We will install all the necessary libraries at once and discuss one by one as we proceed along.

# Install all necessary packages and load the libraries into R
library(ggplot2)
library(RColorBrewer)
library(ggmap)
library(maps)
library(rgdal)
library(scales)
library(maptools)
library(gridExtra)
library(rgeos)

Set the working directory to the unzipped folder and use the following code to import the shape into R.

# Set working directory
states_shape = readShapeSpatial("IND_adm1.shp")
class(states_shape)
names(states_shape)
print(states_shape$ID_1)
print(states_shape$NAME_1)
plot(states_shape, main = "Administrative Map of India")

> class(states_shape)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
> names(states_shape)
[1] "ID_0"      "ISO"       "NAME_0"    "ID_1"  	"NAME_1"	"TYPE_1"	"ENGTYPE_1" "NL_NAME_1" "VARNAME_1"
> print(states_shape$ID_1)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
> print(states_shape$NAME_1)
[1] Andaman and Nicobar	Andhra Pradesh     	        Arunachal Pradesh       Assam              	Bihar        	     
 [6] Chandigarh         	Chhattisgarh       	Dadra and Nagar Haveli Daman and Diu      	Delhi             	
[11] Goa                	Gujarat            	Haryana            	Himachal Pradesh   	Jammu and Kashmir 	
[16] Jharkhand          	Karnataka          	Kerala             	Lakshadweep        	Madhya Pradesh    	
[21] Maharashtra        	Manipur            	Meghalaya          	Mizoram            	Nagaland          	
[26] Orissa           	        Puducherry         	Punjab             	Rajasthan          	Sikkim            	
[31] Tamil Nadu         	Telangana          	Tripura            	Uttar Pradesh      	Uttaranchal       	
[36] West Bengal       	
36 Levels: Andaman and Nicobar Andhra Pradesh Arunachal Pradesh Assam Bihar Chandigarh Chhattisgarh ... West Bengal
> plot(states_shape, main = "Administrative Map of India")

ID_1 provides a unique id for each of 36 states and union territories; while the NAME_1 provides the name of each of the states and union territories. We will be mainly using these two fields, other fields provide name of the country, code of the country and other information which separates data of one country from the other.

Alternatively, there is another function from different package which we can use to import shape into R.

States_shape2 = readOGR(".","IND_adm1")
class(States_shape2)
names(States_shape2)
plot(States_shape2)

> States_shape2<-readOGR(".","IND_adm1")
OGR data source with driver: ESRI Shapefile
Source: ".", layer: "IND_adm1"
with 36 features
It has 9 fields
Integer64 fields read as strings:  ID_0 ID_1
> class(States_shape2)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
> names(States_shape2)
[1] "ID_0"  	"ISO"   	"NAME_0"	"ID_1"  	"NAME_1"	"TYPE_1"	"ENGTYPE_1" "NL_NAME_1" "VARNAME_1"
> plot(States_shape2)

In the above code “readOGR(“.”,”IND_adm1”), “.” means that the shapefile which we want to read is in our working directory; else, we would have to mention the entire path. Also, we need to mention the shapefile name without extension otherwise it will throw an error.

Creating our own dataset and representing it in the map of India

To begin with, we will create our own data for each of the 36 IDs and call it score D, a parameter which represents dancing talent of each of the states. (Please note that this score is randomly generated and does not reflect the true dancing talent :P).

# Creating our own dataset
set.seed(100)
State_count = length(states_shape$NAME_1)
score_1 = sample(100:1000, State_count, replace = T)
score_2 = runif(State_count, 1,1000)
score = score_1 + score_2
State_data = data.frame(id=states_shape$ID_1, NAME_1=states_shape$NAME_1, score)
State_data

> State_data
   id   NAME_1 	                        score
1   1	Andaman and Nicobar             558.2268
2   2   Andhra Pradesh                  961.7615
3   3  	Arunachal Pradesh              1586.5746
4   4   Assam                           281.1586
5   5   Bihar                           853.3299
6   6   Chandigarh                     1400.2554
7   7   Chhattisgarh                   1608.8069
8   8   Dadra and Nagar Haveli         1260.4761
9   9   Daman and Diu                  1195.7210
10 10   Delhi                           744.7406
11 11   Goa                            1443.5782
12 12   Gujarat                        1778.3428
13 13   Haryana                         560.5062
14 14   Himachal Pradesh                766.7788
15 15  	Jammu and Kashmir              1118.1993
16 16   Jharkhand                       901.4804
17 17   Karnataka                       520.4586
18 18   Kerala                          697.6118
19 19   Lakshadweep                    1014.7297
20 20   Madhya Pradesh                  975.1373
21 21   Maharashtra                     706.3637
22 22   Manipur                         970.6760
23 23   Meghalaya                      1182.9777
24 24   Mizoram                         986.1971
25 25   Nagaland                        942.2375
26 26   Orissa                          901.4541
27 27   Puducherry                     1754.6125
28 28   Punjab                         1570.7218
29 29   Rajasthan                      1039.7029
30 30   Sikkim                          708.4160
31 31   Tamil Nadu                      995.2757
32 32   Telangana                      1381.9686
33 33   Tripura                         659.8475
34 34   Uttar Pradesh                  1653.6564
35 35   Uttaranchal                    1138.8248
36 36   West Bengal                    1229.3981

Merging dataset and preparing it for visual representation

We will use the function fortify() of ggplot2 package to get the shape file into a data frame and then merge the data frame file and dataset together.

# Fortify file
fortify_shape = fortify(states_shape, region = "ID_1")
class(fortify_shape)

> fortify_shape = fortify(states_shape, region = "ID_1")
> class(fortify_shape)
[1] "data.frame"

#merge with coefficients and reorder
Merged_data = merge(fortify_shape, State_data, by="id", all.x=TRUE)
Map_plot = Merged_data[order(Merged_data$order), ]

Now, let’s create a basic visualization and see how our maps looks like.

ggplot() +
  geom_polygon(data = Map_plot,
           	aes(x = long, y = lat, group = group, fill = score),
      	     color = "black", size = 0.5) +
  coord_map()

Improving visualization

We will use some of the functions of packages ‘ggplot2’ and ‘ggmap’ to improve the visual appeal of maps that we have created.

Let’s begin by creating our first plot and then subsequently improve in the next plots by adding more features.

> #Plot 1
> ggplot() +
+   geom_polygon(data = Map_plot,
+                aes(x = long, y = lat, group = group, fill = score),
+                color = "black", size = 0.5) +
+   coord_map()+
+   scale_fill_distiller(name="Score")+
+   theme_nothing(legend = TRUE)+
+   labs(title="Score in India - Distribution by State")

Let’s make our map a little more colorful so that it shows the distribution clearly. Using the function display.brewer.all() from the package ‘RColorBrewer’, gives us all the color palettes available in R. We can choose the one we like.

# Check Color palettes
display.brewer.all()

Now, change the color palette and change the legend by adding more breaks.

> #Plot 2
> ggplot() +
+   geom_polygon(data = Map_plot,
+                aes(x = long, y = lat, group = group, fill = score),
+                color = " Dark Blue", size = 1) +
+   coord_map()+
+   scale_fill_distiller(name="Score", palette = "Set3" , breaks = pretty_breaks(n = 7))+
+   theme_nothing(legend = TRUE)+
+   labs(title="Score in India - Distribution by State")

Pretty_breaks() is a function in ‘scales’ package which can help us in defining the number of breaks we want to see in the legend. In the above map, we have 7 breaks from 400 to 1600 at an interval of 200; while in the preceding graph there were only 3 breaks.

Now, add the state names to the graph to make it more appealing and illustrative.

> #Plot3
> ggplot() +
+   geom_polygon(data = Map_plot,
+                aes(x = long, y = lat, group = group, fill = score),
+                color = " Dark Blue", size = 1) +
+   coord_map()+
+   scale_fill_distiller(name="Score", palette = "Set3" , breaks = pretty_breaks(n = 7))+
+   theme_nothing(legend = TRUE)+
+   labs(title="Score in India - Distribution by State")+
+   geom_text(data=name_lat_lon, aes(long, lat, label = NAME_1), size=2)

Display external data on choropleth maps

We will now import external data and try to create choropleth maps for those data points. The dataset we are using provides following information for all the 36 states and union territories of India:

ID
State or union territory
Population (2011 Census)
Decadal growth (2001–2011)
Area (km sq)
Density (population per sq km)
Sex ratio

ID of each state is same as the ID that has been assigned in the Merged_data created earlier.

> d1 = read.csv(file.choose(), header = T)
> head(d1)
  ID	State.or.union.territory Population..2011.Census. Decadal.growth..2001.2011. Area..km.sq. Density..population.per.sq.km.
1  1 Andaman and Nicobar Islands                  379944                        0.067     	  8249                	827.1412
2  2 Andhra Pradesh                             49386799                  	0.111   	162968                  365.1876
3  3 Arunachal Pradesh 	                         1382611                  	0.259    	 83743                 1102.3931
4  4 Assam             	                        31169272                  	0.169    	 78438                 1029.2471
5  5 Bihar 	                               103804637                  	0.251    	 94163                  235.5190
6  6 Chandigarh              	                 1055450                  	0.171      	   114                	554.6676
  Sex.ratio
1   	908
2   	946
3   	916
4   	947
5   	931
6   	995

> #Merging with external source
> state_data2<-data.frame(id=d1$ID, NAME_1=d1$State.or.union.territory, pop = d1$Population..2011.Census., growth=d1$Decadal.growth..2001.2011., area = d1$Area..km.sq., pop_density = d1$Density..population.per.sq.km., sex_ratio = d1$Sex.ratio)
> head(state_data2)
  id                  	NAME_1   	            pop growth   area     pop_density sex_ratio
1  1 Andaman and Nicobar Islands	379944         0.067   8249	      827.1412   908
2  2          	Andhra Pradesh         49386799     0.111   162968	      365.1876   946
3  3       	Arunachal Pradesh    1382611        0.259   83743          1102.3931 916
4  4                  Assam                       31169272      0.169   78438          1029.2471 947
5  5               	Bihar                          103804637   0.251    94163	      235.5190  931
6  6              	Chandigarh                1055450       0.171    114	      554.6676  995

#Fortify file
merged_data2<-merge(fortify_shape, state_data2, by="id", all.x=TRUE)
map_plot2<-merged_data2[order(merged_data$order), ]

> ggplot() +
+   geom_polygon(data = map_plot2,
+                aes(x = long, y = lat, group = group, fill = pop/1000),
+                color = " Dark Blue", size = 0.5) +
+   coord_map()+
+   scale_fill_distiller(name="Population", palette = "Set3")+
+   theme_nothing(legend = TRUE)+
+   labs(title="Population in India")+
+   geom_text(data=name_lat_lon, aes(long, lat, label = NAME_1), size=2)

If we were to represent all the 5 measures in the map and see all the maps at once in a single chart, we will use function grid.arrange() of the package ‘gridExtra’. This will help us in presenting multiple maps at once. First, we will create all the five maps that we want to show and then use the function.

#Plotting multiple maps at once
plot1 = ggplot() +
  geom_polygon(data = map_plot2,
           	aes(x = long, y = lat, group = group, fill = pop/1000),
           	color = " Dark Blue", size = 0.5) +
  coord_map()+
  scale_fill_distiller(name="Population (in '000)", palette = "Set3")+
  theme_nothing(legend = TRUE)+
  labs(title="Population in India")
 
plot2 = ggplot() +
  geom_polygon(data = map_plot2,
           	aes(x = long, y = lat, group = group, fill = growth*100),
           	color = " Dark Blue", size = 0.5) +
  coord_map()+
  scale_fill_distiller(name="Decadal Growth (in %)", palette = "Set3")+
  theme_nothing(legend = TRUE)+
  labs(title="Decadal growth (in %) in India")
 
plot3 = ggplot() +
  geom_polygon(data = map_plot2,
           	aes(x = long, y = lat, group = group, fill = area/1000),
           	color = " Dark Blue", size = 0.25) +
  coord_map()+
  scale_fill_distiller(name="Area (in '000 Sq Km)", palette = "Set3")+
  theme_nothing(legend = TRUE)+
  labs(title="Area (in '000 sq km) in India")
 
plot4= ggplot() +
  geom_polygon(data = map_plot2,
           	aes(x = long, y = lat, group = group, fill = pop_density),
           	color = " Dark Blue", size = 0.25) +
  coord_map()+
  scale_fill_distiller(name="Population Density", palette = "Set3")+
  theme_nothing(legend = TRUE)+
  labs(title="Population Density in India")
 
plot5 = ggplot() +
  geom_polygon(data = map_plot2,
           	aes(x = long, y = lat, group = group, fill = sex_ratio),
           	color = " Dark Blue", size = 0.25) +
  coord_map()+
  scale_fill_distiller(name="Sex Ratio", palette = "Set3")+
  theme_nothing(legend = TRUE)+
  labs(title="Sex Ratio (per '000 males) in India")

Calling the library ‘gridExtra’ and using the function grid.arrange() to present all the 5 graphs at once.

library(gridExtra)
grid.arrange(plot1, plot2, plot3, plot4, plot5)

The above examples show the flexibility and the convenience that choropleth maps provide us in presenting a measurement on geographical base. I have used the map of India as the base geographical region; the same process can be applied to any geographical base and data.

After going to the article, I am sure you will agree to my point with which I started the article – choropleth maps are the best bets when we want to leave a strong impression on the audience in 15 seconds. Don’t you?

Choropleth Maps in R