data/scripts/un/sdg at master · niveditasing/data · GitHub
Skip to content

Latest commit

 

History

History
 
 

README.md

UN Stats Sustainable Development Goals

This import includes data from the UN SDG Global Database. Data is read from the submodule sdg-dataset which is managed by UN Stats. Geography mappings are read from the submodule sssom-mappings which is also managed by UN Stats. Please ensure the submodules stay up to date.

One-time Setup

Initialize submodules:

git submodule update --init --remote sdg-dataset
git submodule update --init --remote sssom-mappings

Data Refresh

Update submodules:

git submodule update --remote sdg-dataset
git submodule update --remote sssom-mappings

Generate place mappings:

python3 geography.py

Produces:

  • geography/ folder:
    • un_places.mcf (place mcf)
    • un_containment.mcf (place containment triples)
    • place_mappings.csv (map of SDG code -> dcid)

Note that the place_mappings.csv is required before running the process.py script.

Process data and generate artifacts:

python3 process.py

Produces:

  • schema/ folder:
    • measurement_method.mcf
    • schema.mcf (classes and enums)
    • sdg.textproto (vertical spec)
    • series.mcf (series mcf)
    • sv.mcf
    • unit.mcf
  • csv/ folder:
    • [CODE].csv

(Note that these folders are not included in the repository but can be regenerated by running the script.)

When refreshing the data, the geography, schema, and csv folders might all get updated and will need to be resubmitted to g3. The corresponding TMCF file is sdg.tmcf.

To run unit tests:

python3 -m unittest discover -v -s ../ -p "*_test.py"

Notes:

  • We currently drop certain series and variables (refer to util.py for the list) which have been identified by UN as potentially containing outliers.

SDMX

As reference, we provide an earlier version of the import scripts that utilized the UN API (which uses SDMX) in the sdmx/ folder. Please note that these scripts may have errors and do not use the most up-to-date schema format, so should only be used as an illustration of the SDMX -> MCF mapping and should not actually be run.

As a quick overview:

  • preprocess.py downloads all the raw input CSVs to an input/ folder as well as adds all dimensions and attributes to a preprocessed/ folder.
  • cities.py reads the input CSVs and matches cities with dcids.
  • process.py reads the input CSVs and concepts and generates a cleaned CSV and schema.
  • util.py has various shared util functions and constants.
  • m49.csv has country code mappings.