Sunbelt Computer Software

UN Stats Sustainable Development Goals

This import includes data from the UN SDG Global Database. Data is read from the submodule sdg-dataset which is managed by UN Stats. Geography mappings are read from the submodule sssom-mappings which is also managed by UN Stats. Please ensure the submodules stay up to date.

One-time Setup

Initialize submodules:

git submodule update --init --remote sdg-dataset
git submodule update --init --remote sssom-mappings

Data Refresh

Update submodules:

git submodule update --remote sdg-dataset
git submodule update --remote sssom-mappings

Generate place mappings:

python3 geography.py

Produces:

geography/ folder:
- un_places.mcf (place mcf)
- un_containment.mcf (place containment triples)
- place_mappings.csv (map of SDG code -> dcid)

Note that the place_mappings.csv is required before running the process.py script.

Process data and generate artifacts:

python3 process.py

Produces:

schema/ folder:
- measurement_method.mcf
- schema.mcf (classes and enums)
- sdg.textproto (vertical spec)
- series.mcf (series mcf)
- sv.mcf
- unit.mcf
csv/ folder:
- [CODE].csv

(Note that these folders are not included in the repository but can be regenerated by running the script.)

When refreshing the data, the geography, schema, and csv folders might all get updated and will need to be resubmitted to g3. The corresponding TMCF file is sdg.tmcf.

To run unit tests:

python3 -m unittest discover -v -s ../ -p "*_test.py"

Notes:

We currently drop certain series and variables (refer to util.py for the list) which have been identified by UN as potentially containing outliers.

SDMX

As reference, we provide an earlier version of the import scripts that utilized the UN API (which uses SDMX) in the sdmx/ folder. Please note that these scripts may have errors and do not use the most up-to-date schema format, so should only be used as an illustration of the SDMX -> MCF mapping and should not actually be run.

As a quick overview:

preprocess.py downloads all the raw input CSVs to an input/ folder as well as adds all dimensions and attributes to a preprocessed/ folder.
cities.py reads the input CSVs and matches cities with dcids.
process.py reads the input CSVs and concepts and generates a cleaned CSV and schema.
util.py has various shared util functions and constants.
m49.csv has country code mappings.

Name		Name	Last commit message	Last commit date
parent directory ..
geography		geography
sdg-dataset @ a5e866e		sdg-dataset @ a5e866e
sdmx		sdmx
sssom-mappings @ 71a3aac		sssom-mappings @ 71a3aac
testdata		testdata
.gitattributes		.gitattributes
README.md		README.md
__init__.py		__init__.py
geography.py		geography.py
geography_test.py		geography_test.py
process.py		process.py
process_test.py		process_test.py
sdg.tmcf		sdg.tmcf
util.py		util.py
util_test.py		util_test.py

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

UN Stats Sustainable Development Goals

One-time Setup

Data Refresh

SDMX

Sunbelt Computer Software

PL/B Language Development and Support

FilesExpand file tree

sdg

Directory actions

More options

Directory actions

More options

Latest commit

History

sdg

Folders and files

parent directory

README.md

UN Stats Sustainable Development Goals

One-time Setup

Data Refresh

SDMX