Regio

The sam_ml.data.regio module is helpful when processing data mapping for Germany.

Functions

Function

Description

visualise_plz

Function to visualise data values mapped to zipcode on germany map

get_plz_mapping

Function to get dataframe with ort-postleitzahl-landkreis-bundesland mapping

get_coord_main_cities

Function to get coordinates of top cities from germany

sam_ml.data.regio.get_coord_main_cities() dict[str, tuple[float, float]]

Function to get coordinates of top cities from germany

Returns

top_citiesdict[str, tuple[float, float]]

dictionary with english names of top german cities and their coordinates as values

Examples

>>> from sam_ml.data.regio import get_coord_main_cities
>>> get_coord_main_cities()
{...}
sam_ml.data.regio.get_plz_mapping() DataFrame

Function to get dataframe with ort-postleitzahl-landkreis-bundesland mapping

Returns

df_mappingpd.Dataframe

dataframe with columns “ort”, “plz”, “landkreis”, and “bundesland”

Notes

Source: https://www.suche-postleitzahl.org/downloads, “zuordnung_plz_ort.csv”, 18/07/2023

Examples

>>> from sam_ml.data.regio import get_plz_mapping
>>> get_plz_mapping()
    ort     plz     landkreis                   bundesland
0   Aach    78267   Landkreis Konstanz          Baden-Württemberg
1   Aach    54298   Landkreis Trier-Saarburg    Rheinland-Pfalz
...
sam_ml.data.regio.visualise_plz(plz_region_df: DataFrame, plot_col_name: str, plot_path: str = 'german_map.png', plot_title: str = 'Germany map')

Function to visualise data values mapped to zipcode on germany map

Parameters

plz_region_dfpd.DataFrame

dataframe with plz column dtype string

plot_col_namestr

column to plot

plot_pathstr, default=”german_map.png”

path for saving plot

plot_titlestr, default=”Germany map”

title of plot

Returns

saves plot at plot_path. If default path, then the column name of plot column will be added

Notes

Source: https://www.suche-postleitzahl.org/downloads, 18/07/2023, Genauigkeit: mittel

Examples

First example with less than 8 different unique values to plot (label legend):

>>> # load data (replace with own data)
>>> import pandas as pd
>>> df = pd.DataFrame({"plz": ['78267', '54298', '52062', '52064', '52066', '52068', '52070', '52072', '52074', '52076'], "income": [1400, 700, 2400, 1400, 300, 2400, 700, 1000, 1400, 2000]})
>>>
>>> from sam_ml.data.regio import visualise_plz
>>> visualise_plz(df, plot_col_name="income")

Second example with more than 8 different unique values to plot (colorbar legend):

>>> # load data (replace with own data)
>>> import pandas as pd
>>> df = pd.DataFrame({"plz": ['78267', '54298', '52062', '52064', '52066', '52068', '52070', '52072', '52074', '52076'], "income": [1400, 800, 2400, 1400, 300, 2400, 700, 1000, 1600, 2000]})
>>>
>>> from sam_ml.data.regio import visualise_plz
>>> visualise_plz(df, plot_col_name="income")