GitHub - gaetjen/place_names: Estimating how common certain substrings are in different regions in Germany.

Intro

I don't remember when or why, but at some point I asked myself "I wonder to what extent it is possible to know in which part of Germany you are, based only on the characteristics of nearby place names." Even just looking at a map, or traveling around Germany a bit, some rough patterns are pretty apparent: In Baden-Württemberg there are many cities ending in "-ingen" (Böblingen, Villingen-Schwenningen). In Berlin and Brandenburg you'll find places ending in "-ow" (Pankow, Rathenow). Other substrings, such as "dorf" (village), "burg" (castle) or "berg" (hill/mountain) are seemingly ubiquitous and not unique to a region.

Well, I've now gone ahead and created some relief maps for the density of places whose names contain certain substrings. They contain interesting insight into linguistic and geographic differences within Germany. Also, if you want to create a fictional German town, this data can give you an idea of what its name should be, based on where in Germany it is located!

Methods

I downloaded OpenStreetMap data for Germany from Geofabrik and used a modified version of lifeeth's osmnodepbf scripts to extract the names and coordinates for all settlements. The extracted data can be found at data/. For the country and state borders I used data from Wolfram/Mathematica.

To estimate densities I calculated histograms and smoothed them with a Gaussian filter. To minimize biases due to different regions having different densities of named settlements I always normalized with the estimated prior density as shown below, which includes all places. For visualization I used Mathematica's builtin ReliefMap function. Due to the normalization in ReliefMap the height values from separate map images cannot be compared directly.

I haven't looked into why some areas have a much higher density of named places than other areas, but there is no obvious connection to population density. I suspect that it is either due to some data sets which were made freely available, or because of volunteers for openstreetmap being more active in those areas for whatever reasons.

A piece of border between Lower Saxony and Brandenburg is missing, because the data from Wolfram was in a stupid format and I couldn't be bothered to fix it.

Here is a collection of some of the more interesting results, more can be found at pics/.

Some of the most common substrings

berg (hill/mountain)

dorf (village)

haus (house)

hof (farm)

mühl (mill)

Variations on bach (brook/stream)

bach

beck

bek

Variations on stadt (city, but originally, and depending on spelling, also place)

stadt

städt

stätt

stedt

stett

Various river and region names

allgäu

harz

pfalz (palatinate)

donau (Danube)

neckar

rhein (Rhine)

ruhr

Cardinal directions

nord (north)

süd (south)

west

ost (east)

Relative positions

ober (upper)

mitte (middle)

unter (lower)

vorder (front)

hinter (behind)

Variations on brunn (well or source)

brunn

bronn

born

tal and thal (valley)

tal

thal

Various, which all mean clearing

rath

scheid

rode

Others

burg (castle)

by (farm in northern Germany, in other parts (river) curve)

hagen (enclosure)

heide (heath/heathen)

heim (home)

leben (legacy/heritage)

torf (peat)

weiler (hamlet)

ing

ingen

itz

oi

ow

tzsch

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
pics		pics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
borders.nb		borders.nb
main_script.py		main_script.py
measurements.nb		measurements.nb
substrings.nb		substrings.nb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intro

Methods

Some of the most common substrings

Variations on bach (brook/stream)

Variations on stadt (city, but originally, and depending on spelling, also place)

Various river and region names

Cardinal directions

Relative positions

Variations on brunn (well or source)

tal and thal (valley)

Various, which all mean clearing

Others

About

Releases

Packages

Languages

License

gaetjen/place_names

Folders and files

Latest commit

History

Repository files navigation

Intro

Methods

Some of the most common substrings

Variations on bach (brook/stream)

Variations on stadt (city, but originally, and depending on spelling, also place)

Various river and region names

Cardinal directions

Relative positions

Variations on brunn (well or source)

tal and thal (valley)

Various, which all mean clearing

Others

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages