Mapping Data: GeoJSON format for geographic data.

Mari Galdina
3 min readMar 15, 2021

Usually, businesses collect data with a location component, but we didn’t use it a lot during the data analysis. When we add geodata to our investigation, it gives us a unique opportunity to gain an edge and deliver better results. In this year, I worked with some datasets where the location was a crucial point. I can call a few, for example, COVID19 and realtors information.

What is GeoJSON?

We know that JavaScript Object Notation (JSON) is a way to represent the values of variables, arrays, and dictionaries in a text file. Usually, JSON files have a standardized format for different types of data. During this time, the standard can change from company to company. But they always stayed readable for developers. So, we need to find out what means world Geo in this format. The same as simple JSON, GeoJSON syntactically represents information about geospatial objects that anybody can understand what they mean. Original GeoJSON was technically specified more than 13 years ago in RFC 7946. All types in GeoJSON contain coordinates, where longitude first in the order, the latitude is second.

GeoJSON is a format for encoding a variety of geographic data structures.

GeoJSON supports the seven following geometry types:

  • Point (shows a single position — dot)
  • LineString (rendered as a series of points connected with a line)
  • Polygon (is a chain of line segments, the first and last coordinates are identical to make sure the linear ring is closed)
  • Multipoint (shows an array of positions)
  • MultiLineString
  • MultiPolygon
  • GeometryCollection

All these types help to draw plots and locate objects on maps.

Code example on Python

Python has utilities to work with GeoJson. It is geojson library.

On the Data.gov site, you can find plenty of GeoJSON to meet with this format and create unexpected solutions based on maps.

For example, the National Center for Education Statistics’ (NCES) Education Demographic and Geographic Estimate (EDGE) program develops annually updated school district boundary composite files that include public elementary, secondary, and unified school district boundaries clipped to the U.S. shoreline. And we can use this files for creating maps and explore GeoJSON format.

For work with this data set I use local Jupyter Notebook. So, I need to do next steps:

  • Download dataset from GeoJson format and put file in project folder
  • Import necessary libraries
import os 
import folium # I like to use this library for creating maps
import geopandas as gpd
  • For closer look to dataset I import it into dataframe:
rootpath = os.path.abspath(os.getcwd())
df_district = gpd.read_file(os.path.join(rootpath, "data", "School_District.geojson"))
  • Create a map

We can use simple function plot() and create a full map. But it’s take a time, we have a lot of data: 13352 rows with geometry column, where each sell has set of coordinates POLYGON ((-114.69048 35.66379, -114.69051 35.6…)).

Summary

Nowadays, we have libraries and encoding/decoding features to work with GeoJSON files, but we need to understand their structure and components for faster and productive work.

Thank you for reading! If you like the article, I would be glad if you follow me. I regularly post new blogs related to Data Science and Visualization.

--

--