This import includes data from the UN SDG Global Database. Data is read from the submodule sdg-dataset which is managed by UN Stats. Geography mappings are read from the submodule sssom-mappings which is also managed by UN Stats. Please ensure the submodules stay up to date.
Initialize submodules:
git submodule update --init --remote sdg-dataset
git submodule update --init --remote sssom-mappings
Update submodules:
git submodule update --remote sdg-dataset
git submodule update --remote sssom-mappings
Generate place mappings:
python3 geography.py
Produces:
- geography/ folder:
- un_places.mcf (place mcf)
- un_containment.mcf (place containment triples)
- place_mappings.csv (map of SDG code -> dcid)
Note that the place_mappings.csv is required before running the process.py script.
Process data and generate artifacts:
python3 process.py
Produces:
- schema/ folder:
- measurement_method.mcf
- schema.mcf (classes and enums)
- sdg.textproto (vertical spec)
- series.mcf (series mcf)
- sv.mcf
- unit.mcf
- csv/ folder:
- [CODE].csv
(Note that these folders are not included in the repository but can be regenerated by running the script.)
When refreshing the data, the geography, schema, and csv folders might all get updated and will need to be resubmitted to g3. The corresponding TMCF file is sdg.tmcf.
To run unit tests:
python3 -m unittest discover -v -s ../ -p "*_test.py"
Notes:
- We currently drop certain series and variables (refer to
util.pyfor the list) which have been identified by UN as potentially containing outliers.
As reference, we provide an earlier version of the import scripts that utilized the UN API (which uses SDMX) in the sdmx/ folder. Please note that these scripts may have errors and do not use the most up-to-date schema format, so should only be used as an illustration of the SDMX -> MCF mapping and should not actually be run.
As a quick overview:
preprocess.pydownloads all the raw input CSVs to aninput/folder as well as adds all dimensions and attributes to apreprocessed/folder.cities.pyreads the input CSVs and matches cities with dcids.process.pyreads the input CSVs and concepts and generates a cleaned CSV and schema.util.pyhas various shared util functions and constants.m49.csvhas country code mappings.
