Mapping Data: Plotly vs. Folium
How often do you see maps in real life? How often do you work with maps if you are a data scientist? Sometimes we did not use information about the location from the data set for visualization at all. But a map can bring an understanding of the situation for your research. People better get at investigation result looking on the image.
Examples of data where can be use geo maps:
- political dataset, where you may demonstrate the voting results by state or city;
- visualization from epidemiologists for people, you may explain the distribution paths of a virus;
- dataset for realtors, which represent an area of community where houses located;
- police database with dangerous places.
And many more datasets with information about a place where we need to conduct our research.
Let’s look at a real database example with visualizations on Python. I created some images for the King County Housing data set from the official site. Dataset represents real estate information and is used to predict house prices and analyze the location’s effect on house prices. In this data set, we have a longitude and latitude for each house. Usually, we did not use this information in the model, but we can use it for EDA. And one of the main questions for a dataset like this is zip codes (neighborhoods) with higher housing density an effect on selling price.
First of all, I use the library Plotly to create a quick view of the area of selling houses. The plot scatter_mapbox helps to see some additional information on the map. The great advantage of using this option, we can use a dataset without complex processing and preparation. Two steps and I have good visualization!
- import necessary libraries for plots
import plotly.express as px
2. create a plot!
# create a map of area, where houses from data set located
fig = px.scatter_mapbox(data, #our data set
lat="lat", lon="long", #location
color="price", #select a column for ranking
hover_name="price",
hover_data=["bedrooms", "bathrooms"],
color_discrete_sequence=["green"],
size_max=15,
zoom=8,
width=900, height=600, #map size
title = 'Map of area, check location')#style of map
fig.update_layout(mapbox_style="open-street-map")
fig.show(config={'scrollZoom': False})
Map of area for realtors:
How we can see in this picture, most houses have prices lower or around a million-dollar. But it is hard to see address-specific price distribution.
Folium is a powerful data visualization library in Python that was built to help people visualize geospatial data.
If I want to go further using maps, I use Folium and GeoJsons. Folium can create a map of any location in the world as long as latitude and longitude values are known. Here I need to do more preparation before mapping. To visualize the zip codes, we need the GeoJson data.
- one more time import necessary libraries:
#for map visualization
import folium
from folium import plugins
import json
2. load an additional geoinformation:
# Get geo data file path
geo_data_file = os.path.join('zipcode_king_county.geojson')
I take *.geojson files from the King County Open data website.
3. Count how many houses I have for each zip code.
4. Initialize Folium Map with Seattle latitude and longitude
king_map = folium.Map(location=[47.35, -121.9], zoom_start=9, detect_retina=True, control_scale=False)
5. And now the last step for mapping:
# Create map
king_map.choropleth(
geo_data=king_geo,
name='choropleth',
data=zipcode_data,
columns=['zipcode', col],
key_on='feature.properties.ZIPCODE',
fill_color='YlGn',
fill_opacity=0.9,
line_opacity=0.2,
legend_name='house ' + col
)folium.LayerControl().add_to(m)
This map represents how many houses are sold in each area.
In the same way, I create a distribution map of prices by postal code. So, we can see the most expensive areas.
Thank you for reading! If you like the article, I would be glad if you follow me. I regularly post new blogs related to Data Science and Visualization.