Extracting POIs from OpenStreetMap and Overture Maps: A Practical Guide
Learn how to extract EV charging stations and gas stations for New York City using osmium on OSM data, and compare the workflow with Overture Maps' GeoParquet pipeline.
Extracting POIs from OpenStreetMap and Overture Maps: A Practical Guide
Point-of-interest (POI) extraction is one of the most common tasks in modern geospatial data engineering. Whether you are building a routing application, analyzing retail density, or mapping infrastructure, you need a reliable pipeline to pull location-tagged features from large-scale vector datasets.
This guide demonstrates two approaches to the same problem: extracting EV charging stations and gas stations within the New York City metropolitan boundary. We will first use osmium to process raw OpenStreetMap (OSM) data, then replicate the same query using Overture Maps’ cloud-native GeoParquet pipeline.
1. The Data Sources
| Source | Format | Update Frequency | License |
|---|---|---|---|
| OpenStreetMap | XML / PBF (Protocolbuffer Binary Format) | Real-time (minutely diffs) | ODbL |
| Overture Maps | GeoParquet (cloud-optimized) | Monthly releases | ODbL-derived |
OSM is the gold standard for community-edited vector data, but it requires preprocessing. Overture Maps is a curated, conflated dataset produced by a consortium (Amazon, Esri, Meta, Microsoft, TomTom) that normalizes OSM and other sources into a unified schema.
2. Introduction to osmium-tool
osmium-tool is a fast, multi-purpose command-line utility for processing OpenStreetMap data in PBF or XML format. Built on the high-performance libosmium C++ library, it is the standard tool in the OSM ecosystem for tasks that do not require a full database.
Key capabilities relevant to this guide:
| Command | Purpose |
|---|---|
osmium extract | Clip a PBF file to a bounding box or polygon boundary |
osmium tags-filter | Extract objects matching specific tag patterns (e.g., amenity=fuel) |
osmium export | Convert OSM data to GeoJSON / OGR formats |
osmium merge | Combine multiple PBF files into one |
osmium update | Apply minutely / hourly / daily OSM diffs to keep an extract current |
osmium check-refs | Validate referential integrity (ways → nodes, relations → members) |
osmium-tool streams data rather than loading the entire file into memory, which makes it suitable for processing multi-gigabyte regional extracts on modest hardware. It is the closest equivalent to ogr2ogr for OSM-native workflows.
3. Extracting POIs with osmium
3.1 Prerequisites
Install osmium-tool and osmconvert (if you need format conversion):
# macOS
brew install osmium-tool
# Ubuntu / Debian
sudo apt-get install osmium-tool
# Arch Linux
sudo pacman -S osmium-tool
You will also need a bounding box for New York City. The approximate WGS84 extent is:
West: -74.30
South: 40.48
East: -73.68
North: 40.92
2.2 Downloading the OSM Extract
Geofabrik provides continent- and country-level extracts. For the United States, download the Northeast region:
wget https://download.geofabrik.de/north-america/us/northeast-latest.osm.pbf
This file includes all OSM data for the northeastern US (~2–3 GB). For a production pipeline, consider using osmium update with minutely diffs to keep the extract current.
2.3 Extracting the NYC Bounding Box
Instead of processing the entire region, extract a subset first. This reduces memory pressure and speeds up tag filtering:
osmium extract \
--bbox="-74.30,40.48,-73.68,40.92" \
--strategy=smart \
--output=nyc.osm.pbf \
northeast-latest.osm.pbf
The --strategy=smart flag tells osmium to preserve complete multipolygon relations even when only part of the geometry falls inside the bbox — critical for administrative boundaries and large buildings.
2.4 Tag-Based POI Filtering
OSM stores POI semantics in tags. The keys we care about are:
| POI Type | Primary Tag |
|---|---|
| EV Charging Station | amenity=charging_station |
| Gas Station | amenity=fuel |
Run osmium’s tag filter to extract nodes, ways, and relations matching these tags:
# Extract EV charging stations
osmium tags-filter \
nyc.osm.pbf \
n/amenity=charging_station \
w/amenity=charging_station \
r/amenity=charging_station \
--overwrite \
--output=nyc_charging_stations.osm.pbf
# Extract gas stations
osmium tags-filter \
nyc.osm.pbf \
n/amenity=fuel \
w/amenity=fuel \
r/amenity=fuel \
--overwrite \
--output=nyc_gas_stations.osm.pbf
Prefixing with n/, w/, and r/ restricts the filter to nodes, ways, and relations respectively. Charging stations are usually mapped as nodes, but some are mapped as ways (building footprints) or relations (complex sites), so filtering all three geometry types is safest.
OSM Tag Attributes for These POIs
Raw OSM data stores POI semantics as free-form key-value pairs. Below are the most useful tags for our two target categories. These tags are community-contributed; completeness varies by region.
EV Charging Station (amenity=charging_station)
| Tag | Description | Example |
|---|---|---|
amenity | POI category (fixed) | charging_station |
operator | Operating company | Tesla, ChargePoint, EVgo |
brand | Brand name | Tesla Supercharger, Electrify America |
capacity | Number of charging points | 8 |
socket:type2 | Count of Type 2 (Mennekes) connectors | 4 |
socket:chademo | Count of CHAdeMO connectors | 2 |
socket:tesla_supercharger | Count of Tesla Supercharger stalls | 8 |
socket:ccs | Count of CCS connectors | 4 |
voltage | Supply voltage | 400 |
amperage | Maximum current | 350 |
output | Power output in kW | 250 kW |
fee | Is a fee required? | yes / no |
opening_hours | Hours of operation | 24/7 |
payment:* | Accepted payment methods | payment:app=yes |
network | Charging network identifier | ChargePoint Network |
access | Access restriction | yes / customers / private |
Gas Station (amenity=fuel)
| Tag | Description | Example |
|---|---|---|
amenity | POI category (fixed) | fuel |
brand | Fuel brand | Shell, BP, Exxon |
operator | Operating company | Shell Oil Company |
fuel:diesel | Diesel availability | yes |
fuel:octane_95 | 95 octane petrol availability | yes |
fuel:octane_98 | 98 octane petrol availability | yes |
fuel:e10 | E10 ethanol blend availability | yes |
fuel:electric | On-site EV charging | yes |
opening_hours | Hours of operation | Mo-Su 06:00-23:00 |
payment:* | Accepted payment methods | payment:credit_cards=yes |
self_service | Self-service pumps available? | yes / no |
car_wash | Car wash on site? | yes / no |
shop | Convenience store attached? | convenience |
Note: Because OSM tags are crowdsourced, not every station will have all of these fields. Production pipelines should treat all tags as optional and handle missing keys gracefully.
2.5 Converting to GeoJSON for Visualization
The resulting .osm.pbf files are not directly viewable in most GIS tools. Convert them to GeoJSON using osmium export:
osmium export \
--geometry-type=point \
--add-unique-id=type+id \
--format=geojson \
nyc_charging_stations.osm.pbf \
--output=nyc_charging_stations.geojson
osmium export \
--geometry-type=point \
--add-unique-id=type+id \
--format=geojson \
nyc_gas_stations.osm.pbf \
--output=nyc_gas_stations.geojson
The --geometry-type=point flag forces all features to point geometry. OSM ways mapped as charging station building footprints will be represented by their centroid, which is the standard behavior for POI analysis.
Quick preview: If you are working inside VS Code, drag the resulting
.geojsoninto the editor and pressCtrl/Cmd + Alt + Mto open it in the Geo Data Viewer Fast extension — instant Kepler.gl map preview without leaving your workspace.
2.6 One-Command Pipeline
For automated pipelines, chain the steps:
osmium extract --bbox="-74.30,40.48,-73.68,40.92" \
--strategy=smart \
--output=- \
northeast-latest.osm.pbf \
| osmium tags-filter - \
n/amenity=charging_station w/amenity=charging_station r/amenity=charging_station \
--output=- \
| osmium export \
--geometry-type=point \
--format=geojson \
--output=nyc_charging_stations.geojson
This streams data through memory without writing intermediate files.
4. Extracting the Same POIs from Overture Maps
3.1 What Is Overture Maps?
Overture Maps is a cloud-native geospatial dataset built by normalizing and conflating multiple sources — primarily OpenStreetMap, but also proprietary feeds from Esri, TomTom, and other partners. Data is released monthly as GeoParquet files hosted on AWS S3, partitioned by theme and type.
Key advantages over raw OSM:
- Unified schema: Tags are normalized into a consistent
categoriestaxonomy - Conflation: Duplicate features from multiple sources are merged
- Cloud-native: Directly queryable via DuckDB or Python without downloading full regions
- No preprocessing: No need to run osmium extract or tag-filter
Overture Maps Schema Attributes
Overture normalizes source data into a strict schema. The places theme (which contains POIs like charging stations and gas stations) exposes the following columns:
| Column | Type | Description |
|---|---|---|
id | string | Unique Overture feature ID |
geometry | GeoParquet WKB | Point geometry (WGS84) |
bbox | struct | Spatial bounding box (minX, maxX, minY, maxY) for partition pruning |
confidence | double | Confidence score (0.0–1.0) from conflation |
categories | struct | primary category + alternate array of secondary categories |
names | struct | primary name + common array + rules for localization |
addresses | struct | Street, locality, region, postcode, country |
phones | array | Phone numbers |
emails | array | Contact emails |
websites | array | URLs |
socials | array | Social media links |
opening_hours | array | Structured opening-hours rules |
brand | struct | names + wikidata reference |
operator | struct | names + wikidata reference |
sources | array | Provenance: source dataset, record ID, and confidence per source |
subtype | string | Feature sub-classification |
For EV charging stations and gas stations, the critical fields are:
| Overture Field | EV Charging Station | Gas Station |
|---|---|---|
categories.primary | electric_vehicle_charging_station | gas_station |
names.primary | Station or location name | Station or location name |
brand | Charging network brand | Fuel brand |
operator | Operating company | Operating company |
addresses | Street address | Street address |
confidence | Conflation confidence | Conflation confidence |
Unlike OSM’s flat tag model, Overture’s schema is typed and hierarchical. Missing values are represented as NULL rather than absent keys, which simplifies downstream analytics.
3.2 Method A: DuckDB + Spatial Extension
The fastest way to query Overture Maps is via DuckDB with the spatial extension. DuckDB can read GeoParquet directly from S3 using HTTP range requests.
Install DuckDB:
brew install duckdb # macOS
# Or download from https://duckdb.org/docs/installation/
Run the query:
INSTALL spatial;
LOAD spatial;
INSTALL httpfs;
LOAD httpfs;
-- Query EV charging stations in NYC bbox
SELECT
id,
names.primary AS name,
categories.primary AS category,
confidence,
ST_X(geometry) AS lon,
ST_Y(geometry) AS lat
FROM read_parquet(
's3://overturemaps-us-west-2/release/2025-04-23.0/theme=places/type=place/*',
hive_partitioning = true
)
WHERE
bbox.minX > -74.30
AND bbox.maxX < -73.68
AND bbox.minY > 40.48
AND bbox.maxY < 40.92
AND categories.primary = 'electric_vehicle_charging_station';
DuckDB uses the bbox column (a struct of minX, maxX, minY, maxY) for efficient spatial filtering without reading every row. This is the GeoParquet equivalent of a spatial index.
For gas stations, change only the category:
AND categories.primary = 'gas_station'
Export results to GeoJSON:
COPY (
SELECT
id,
names.primary AS name,
categories.primary AS category,
confidence,
geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2025-04-23.0/theme=places/type=place/*', hive_partitioning = true)
WHERE
bbox.minX > -74.30
AND bbox.maxX < -73.68
AND bbox.minY > 40.48
AND bbox.maxY < 40.92
AND categories.primary IN ('electric_vehicle_charging_station', 'gas_station')
) TO 'nyc_pois_overture.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');
3.3 Method B: Python with overturemaps-py
For Python-centric workflows, install the official overturemaps package:
pip install overturemaps
Query and export:
import overturemaps
import geopandas as gpd
# Define NYC bounding box
bbox = (-74.30, 40.48, -73.68, 40.92)
# Download places within bbox
gdf = overturemaps.core.geodataframe("place", bbox=bbox)
# Filter for charging stations and gas stations
pois = gdf[gdf["categories"].apply(
lambda x: x.get("primary") in [
"electric_vehicle_charging_station",
"gas_station"
] if isinstance(x, dict) else False
)]
# Save to GeoJSON
pois.to_file("nyc_pois_overture.geojson", driver="GeoJSON")
print(f"Extracted {len(pois)} POIs")
overturemaps.core.geodataframe handles S3 partitioning, Parquet reading, and geometry reconstruction automatically. The result is a GeoDataFrame ready for analysis or export.
VS Code workflow: If you saved the output as
.geojson(viapois.to_file(...)), you can preview it directly in VS Code with the Geo Data Viewer Fast extension — no need to switch to a browser.
3.4 Method C: Command-Line Interface
For shell-based pipelines, use the overturemaps CLI:
pip install overturemaps
overturemaps download \
--bbox="-74.30,40.48,-73.68,40.92" \
-f geojson \
--type=place \
-o nyc_places.geojson
Then filter locally with jq or ogr2ogr:
# Filter for charging stations and gas stations using ogr2ogr
ogr2ogr -f GeoJSON nyc_pois_overture.geojson nyc_places.geojson \
-where "JSONExtractString(categories, 'primary') IN ('electric_vehicle_charging_station', 'gas_station')"
Note: The
--type=placeflag targets theplacestheme. Other available types includebuilding,division,segment, andconnector.
5. Comparing the Two Approaches
| Dimension | osmium (Raw OSM) | Overture Maps |
|---|---|---|
| Data freshness | Real-time (minutely diffs) | Monthly releases |
| Preprocessing required | Yes — download, extract, tag-filter | No — query directly from S3 |
| Schema consistency | Tag-dependent (community-defined) | Normalized categories taxonomy |
| Conflation | None — raw source data | Yes — deduplicated across sources |
| Query language | osmium CLI / C++ API | DuckDB SQL / Python / CLI |
| Output format | OSM XML / PBF / GeoJSON | GeoParquet / GeoJSON / GeoDataFrame |
| Best for | Real-time applications, custom tag logic | Analytics, rapid prototyping, conflated datasets |
When to Choose osmium
- You need minute-level freshness (e.g., tracking newly opened charging stations)
- You are working with custom OSM tags not yet normalized by Overture
- You are building a custom extract pipeline for a specific region
- You need the full OSM object graph (ways, relations, member references)
When to Choose Overture Maps
- You want immediate queryability without downloading multi-gigabyte regions
- You need a clean, conflated dataset for analysis or visualization
- You are running exploratory analytics where schema consistency matters
- You want cloud-native access without local storage constraints
6. Validating the Results
Regardless of the pipeline you choose, always validate the output before analysis:
- Count check: NYC should have ~500–800 EV charging stations and ~1,000–1,500 gas stations (subject to OSM mapping completeness)
- Spatial check: Plot the points on a basemap to verify they fall within NYC boroughs, not New Jersey or Long Island
- Attribute check: Inspect
name,operator,brand, andcapacityfields for charging stations;brandandopening_hoursfor gas stations - Duplicate check: Overture’s conflation should reduce duplicates; raw OSM may contain multiple nodes for the same physical location
To inspect the GeoJSON visually, you have two options:
- In VS Code: Open the
.geojsonfile in your editor and pressCtrl/Cmd + Alt + Mto launch the Geo Data Viewer Fast extension for an instant Kepler.gl preview. - In the browser: Drag the file into GeoDataViewer’s Studio to render the points on an interactive map with attribute tables.
7. Summary
| Task | Recommended Tool | Command |
|---|---|---|
| Extract raw OSM POIs | osmium | osmium extract → osmium tags-filter → osmium export |
| Query cloud-native POIs | Overture Maps + DuckDB | SELECT ... FROM read_parquet('s3://...') WHERE categories.primary = ... |
| Python-based extraction | Overture Maps + Python | overturemaps.core.geodataframe("place", bbox=...) |
| Quick shell pipeline | Overture Maps CLI | overturemaps download --bbox=... --type=place |
Both OpenStreetMap and Overture Maps are powerful, but they serve different stages of the data pipeline. Use osmium when you need surgical precision on the freshest raw data. Use Overture Maps when you want clean, analytics-ready data without the preprocessing overhead.
For viewing the extracted GeoJSON files, upload them to GeoDataViewer’s Studio for instant browser-based inspection — no QGIS required.
Related Posts
How to Open a Shapefile Online (with .dbf, .shx, .prj)
Step-by-step: open a Shapefile online in your browser, upload the required sidecar files, and troubleshoot missing .dbf/.shx/.prj issues.
What Is a Shapefile? (.shp + .dbf + .shx Explained)
What is a Shapefile and why does it include multiple sidecar files? Learn how Shapefile works, common issues, and how to open Shapefiles online.
What Is FileGDB? ESRI File Geodatabase Explained
Learn what an ESRI File Geodatabase (FileGDB) is, what it’s used for, and how to open or convert FileGDB datasets online.
What Is GeoJSON? A Simple Format for Web Mapping
What is GeoJSON and why is it so common in web maps and APIs? Learn the structure, use cases, limitations, and how to open GeoJSON online.