pydst package

Submodules

pydst.pydst module

This module powers the DstSubjects class that is the workhorse to obtain subjects and subjects from Statistics Denmark.

class pydst.pydst.Dst(lang='en')[source]

Bases: object

Retrieve subjects, metadata and data from Statistics Denmark.

This class provides some simple functions to retrieve information from Statistics Denmark’s API.

lang

Can take the values en for English or da for Danish

Type:str
get_csv(path, table_id, variables=None, lang=None)[source]

Save table_id as csv

Parameters:
  • path (str) – Outputdirectory
  • table_id (str) – Table ID for the the table you want to retrieve data from.
  • lang (str, optional) – If lang is provided it uses this argument instead of the Dst’s class attribute lang. Can take the values en for English or da for Danish
  • variables (dict, optional) –
Returns:

Doesn’t return anything.

Return type:

None

Todo

  • Implement tests
  • Ensure that variables (dict) can only take lists as inputs that is entirely filled with strings
  • Ensure that variables (dict) can take string as values
get_data(table_id, variables=None, lang=None)[source]

DataFrame with variables contained in table_id

Parameters:
  • table_id (str) – Table ID for the the table you want to retrieve data from.
  • lang (str, optional) – If lang is provided it uses this argument instead of the Dst’s class attribute lang. Can take the values en for English or da for Danish
  • variables (dict, optional) –
Returns:

Returns a DataFrame with data from table_id.

Return type:

pandas.DataFrame

Todo

  • Implement tests
  • Ensure that variables (dict) can only take lists as inputs that is entirely filled with strings
  • Ensure that variables (dict) can take string as values
get_metadata(table_id, lang=None)[source]

DataFrame with metadata about table_id

Parameters:
  • table_id (str) – Table ID for the the table you want to retrieve data
  • from.
  • lang (str, optional) – If lang is provided it uses this argument instead of the Dst’s class attribute lang. Can take the values en for English or da for Danish
Returns:

Returns a dictionary containing metadata about the

specified data.

Return type:

dict

Todo

  • Implement tests
get_subjects(subjects=None, lang=None)[source]

Retrieve subjects and sub subjects from Statistics Denmark.

This function allows to retrieve the subjects and subsubjects Statistics Denmark uses to categorize their tables. These subjectsID can be used to only retrieve the tables that is classified with the respective SubjectsID using get_tables.

Parameters:
  • subjects (str/list, optional) – If a valid subjectsID is provided it will return the subject’s subsubjects if available. subjects can either be a list of subjectsIDs in string format or a comma seperated string
  • lang (str, optional) – If lang is provided it uses this argument instead of the Dst’s class attribute lang. Can take the values en for English or da for Danish
Returns:

Returns a DataFrame with subjects.

Return type:

pandas.DataFrame

Examples

The example beneath shows how get_subjects is used.

>>> from pydst import Dst
>>> Dst().get_subjects()
    active                               desc  hasSubjects  id
0     True           Population and elections         True  02
1     True                  Living conditions         True  05
2     True            Education and knowledge         True  03
..     ...                                ...          ...  ..
10    True                   Business sectors         True  11
11    True  Geography, environment and energy         True  01
12    True                              Other         True  19

[13 rows x 4 columns]

get_tables(subjects=None, inactive_tables=False, lang=None)[source]
Parameters:
  • inactive_tables (bool, optional) – If True the DataFrame will contain tables that are no longer updated.
  • subjects (str/list, optional) – If a valid subjectsID is provided it will return the subject’s subsubjects if available. subjects can either be a list of subjectsIDs in string format or a comma seperated string
  • lang (str, optional) – If lang is provided it uses this argument instead of the Dst’s class attribute lang. Can take the values en for English or da for Danish
Returns:

Returns a DataFrame with subjects.

Return type:

pandas.DataFrame

Todo

  • Check inactive_tables (cerberus validator)

Examples

The example beneath shows how get_tables is used.

>>> from pydst import Dst
>>> Dst().get_tables()
      active firstPeriod        id latestPeriod
0       True      2008Q1    FOLK1A       2018Q2
1       True      2008Q1    FOLK1B       2018Q2
2       True      2008Q1    FOLK1C       2018Q2
...      ...         ...       ...          ...
1958    True        2005  SKOVRG01         2016
1959    True        2005  SKOVRG02         2016
1960    True        2005  SKOVRG03         2016
                                            text      unit
0     Population at the first day of the quarter    number
1     Population at the first day of the quarter    number
2     Population at the first day of the quarter    number
...                                          ...       ...
1958            Growing stock (physical account)  1,000 m3
1959            Growing stock (monetary account)  DKK mio.
1960      Forest area (Kyoto) (physical account)       km2
                 updated                                          variables
0    2018-05-08 08:00:00           [region, sex, age, marital status, time]
1    2018-05-08 08:00:00              [region, sex, age, citizenship, time]
2    2018-05-08 08:00:00  [region, sex, age, ancestry, country of origin...
...                  ...                                                ...
1958 2017-11-28 08:00:00  [balance items, species of wood, county counci...
1959 2017-11-28 08:00:00  [balance items, species of wood, county counci...
1960 2017-11-28 08:00:00     [balance items, county council district, time]

[1961 rows x 8 columns]

get_variables(table_id, lang=None)[source]

DataFrame with variables contained in table_id

Parameters:
  • table_id (str) – Table ID for the the table you want to retrieve data
  • from.
  • lang (str, optional) – If lang is provided it uses this argument instead of the Dst’s class attribute lang. Can take the values en for English or da for Danish
Returns:

Returns a DataFrame with subjects.

Return type:

pandas.DataFrame

Todo

  • Implement tests
  • TableID cerberus validator

pydst.utils module

pydst.utils.assign_lang(self, lang=None)[source]
pydst.utils.bad_request_wrapper(r)[source]

Raises an error if http error

A wrapper around httperror such that if there is an error message available from Statistics Denmark use this one because it is more descriptiveself.

Parameters:r (requests.models.Response) – Response from the requests library.
pydst.utils.check_lang(lang)[source]

Returns lang if lang is an available languages

Parameters:lang (str) – Can take the values en for English or da for Danish
pydst.utils.construct_url(base, version, app, path, query)[source]

Todo

  • Test that url result expected url
pydst.utils.desc_to_df(list_)[source]

Flattens subject response from Statistics Denmark

Parameters:list (list) – Json response from the requests library.
Returns:Returns a DataFrame of the flatten response.
Return type:pandas.DataFrame
pydst.utils.flatten_list_to_string_in_dict_remove_none(dict)[source]

pydst.validators module

pydst.validators.dict_keys_to_comma_str(dict)[source]

Comma seperates dict keys into string

Parameters:dict (dict) – A dictionary.
Returns:Comma seperated string of keys.
Return type:str
pydst.validators.lang_validator(lang, valid_langs)[source]

Validates if language is correctly specified.

This function validates that lang is contained in valid_langs, and that lang and valid_langs takes the correct types.

Parameters:
  • lang (str) – Language that is contained in valid_langs. Language must be letters.
  • valid_langs (list of str) – A list of valid languages. Each element must be letters.
Returns:

None

pydst.validators.str_in_list(str, list)[source]

Check if a string is contained in a list

Parameters:
  • str (str) – Arbitrary string
  • list (list of str) – Arbitrary list of strings
Returns:

Returns True (False) if (not) str contained in list

Return type:

bool

pydst.validators.subject_validator(subjects)[source]
pydst.validators.validation_error_raise(validationObject)[source]

Module contents

Top-level package for pydst.