Regio
The sam_ml.data.regio
module is helpful when processing data mapping for Germany.
Functions
Function |
Description |
---|---|
Function to visualise data values mapped to zipcode on germany map |
|
Function to get dataframe with ort-postleitzahl-landkreis-bundesland mapping |
|
Function to get coordinates of top cities from germany |
- sam_ml.data.regio.get_coord_main_cities() dict[str, tuple[float, float]]
Function to get coordinates of top cities from germany
Returns
- top_citiesdict[str, tuple[float, float]]
dictionary with english names of top german cities and their coordinates as values
Examples
>>> from sam_ml.data.regio import get_coord_main_cities >>> get_coord_main_cities() {...}
- sam_ml.data.regio.get_plz_mapping() DataFrame
Function to get dataframe with ort-postleitzahl-landkreis-bundesland mapping
Returns
- df_mappingpd.Dataframe
dataframe with columns “ort”, “plz”, “landkreis”, and “bundesland”
Notes
Source: https://www.suche-postleitzahl.org/downloads, “zuordnung_plz_ort.csv”, 18/07/2023
Examples
>>> from sam_ml.data.regio import get_plz_mapping >>> get_plz_mapping() ort plz landkreis bundesland 0 Aach 78267 Landkreis Konstanz Baden-Württemberg 1 Aach 54298 Landkreis Trier-Saarburg Rheinland-Pfalz ...
- sam_ml.data.regio.visualise_plz(plz_region_df: DataFrame, plot_col_name: str, plot_path: str = 'german_map.png', plot_title: str = 'Germany map')
Function to visualise data values mapped to zipcode on germany map
Parameters
- plz_region_dfpd.DataFrame
dataframe with
plz
column dtype string- plot_col_namestr
column to plot
- plot_pathstr, default=”german_map.png”
path for saving plot
- plot_titlestr, default=”Germany map”
title of plot
Returns
saves plot at
plot_path
. If default path, then the column name of plot column will be addedNotes
Source: https://www.suche-postleitzahl.org/downloads, 18/07/2023, Genauigkeit: mittel
Examples
First example with less than 8 different unique values to plot (label legend):
>>> # load data (replace with own data) >>> import pandas as pd >>> df = pd.DataFrame({"plz": ['78267', '54298', '52062', '52064', '52066', '52068', '52070', '52072', '52074', '52076'], "income": [1400, 700, 2400, 1400, 300, 2400, 700, 1000, 1400, 2000]}) >>> >>> from sam_ml.data.regio import visualise_plz >>> visualise_plz(df, plot_col_name="income")
Second example with more than 8 different unique values to plot (colorbar legend):
>>> # load data (replace with own data) >>> import pandas as pd >>> df = pd.DataFrame({"plz": ['78267', '54298', '52062', '52064', '52066', '52068', '52070', '52072', '52074', '52076'], "income": [1400, 800, 2400, 1400, 300, 2400, 700, 1000, 1600, 2000]}) >>> >>> from sam_ml.data.regio import visualise_plz >>> visualise_plz(df, plot_col_name="income")