A Python script that parses clang-doc's YAML output and converts it into a structured JSON format suitable for the Plugify project.
This script replaces the previous parser that used cxxheaderparser to directly parse C++ files with Doxygen comments. The new approach leverages clang-doc's YAML output, which provides more accurate and complete information about functions, enums, and typedefs.
- Parses clang-doc YAML output: Extracts function signatures, parameters, return types, and documentation
- Type mapping: Converts C++ types to simplified user-friendly types (e.g.,
plg::string→string) - Enum handling: Automatically resolves enum types to their base types
- Enum structure generation: Includes full enum definitions with values and descriptions
- Function typedef parsing: Parses function pointer typedefs and generates complete prototypes
- Nested enum support: Handles enums within function prototypes
- Typedef support: Identifies and marks typedef parameters
- Description extraction: Extracts Doxygen/brief descriptions, parameter descriptions, and return descriptions
- Filtering: Filter functions by name or source file prefix
- Batch processing: Process single YAML files or entire directories
- Python 3.6+
- PyYAML
Install dependencies:
pip install pyyamlpython3 clang_doc_parser.py <input_path> <output_file>Arguments:
input_path: Path to a YAML file or directory containing YAML filesoutput_file: Path to the output JSON file
-
Process a single YAML file:
python3 clang_doc_parser.py index.yaml output.json
-
Process all YAML files in a directory:
python3 clang_doc_parser.py ./docs output.json
-
Filter by function name:
python3 clang_doc_parser.py index.yaml output.json --name-filter "Command"This will only include functions with "Command" in their name.
-
Filter by source file prefix:
python3 clang_doc_parser.py index.yaml output.json --file-filter "commands"This will only include functions from files starting with "commands" (e.g.,
commands.cpp). -
Combine filters:
python3 clang_doc_parser.py index.yaml output.json --name-filter "Add" --file-filter "commands"
--name-filter/-n: Filter functions by name (case-insensitive substring match)--file-filter/-f: Filter functions by source filename prefix--help/-h: Show help message
To integrate clang-doc into your CMake project:
find_program(CLANG_DOC clang-doc)
if(CLANG_DOC)
add_custom_target(docs
COMMAND ${CLANG_DOC}
--executor=all-TUs
-p ${CMAKE_CURRENT_BINARY_DIR}
--output=${CMAKE_CURRENT_SOURCE_DIR}/docs
--extra-arg=-Wno-error
--format=yaml
${CMAKE_SOURCE_DIR}/src/*.cpp
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Generating documentation with clang-doc"
)
endif()Then run:
cmake --build . --target docs
python3 clang_doc_parser.py docs/index.yaml exported_functions.jsonThe script generates JSON with comprehensive type information including enum structures and function prototypes.
{
"name": "AddAdminCommand",
"group": "Commands",
"description": "Creates a console command as an administrative command.",
"funcName": "AddAdminCommand",
"paramTypes": [
{
"name": "name",
"type": "string",
"ref": false,
"description": "The name of the console command."
}
],
"retType": {
"type": "bool",
"description": "A boolean indicating whether the command was successfully added."
}
}When a parameter uses an enum type, the full enum definition is included:
{
"name": "mode",
"type": "uint8",
"ref": false,
"description": "Whether the hook was in post mode (after processing) or pre mode (before processing).",
"enum": {
"name": "HookMode",
"description": "Enum representing the type of callback.",
"values": [
{
"name": "Pre",
"value": 0,
"description": "Callback will be executed before the original function"
},
{
"name": "Post",
"value": 1,
"description": "Callback will be executed after the original function"
}
]
}
}When a parameter is a function pointer typedef, the complete function signature is included:
{
"name": "callback",
"type": "function",
"ref": false,
"description": "A callback function that is invoked when the command is executed.",
"prototype": {
"name": "CommandCallback",
"funcName": "CommandCallback",
"description": "Handles the execution of a command triggered by a caller.",
"paramTypes": [
{
"name": "param1",
"type": "int32",
"ref": false
},
{
"name": "param2",
"type": "int32",
"ref": false,
"enum": {
"name": "CommandCallingContext",
"description": "The command execution context.",
"values": [
{
"name": "Console",
"value": 0,
"description": "The command execute from the client's console."
}
]
}
}
],
"retType": {
"type": "int32",
"enum": {
"name": "ResultType",
"description": "Enum representing the possible results of an operation.",
"values": [...]
}
}
}
}name: Function namegroup: Derived from the source filename (e.g., "Commands" fromcommands.cpp)description: Brief description from Doxygen commentsfuncName: Function name (same asname)paramTypes: Array of parameter objects:name: Parameter nametype: Mapped type (e.g., "string", "int32", "vec3[]", "function")ref: Boolean indicating if it's a reference parameterdescription: Parameter description from Doxygen comments (if available)enum: (Optional) Full enum structure if parameter is an enum typeprototype: (Optional) Full function signature if parameter is a function pointer typedef
retType: Return type object:type: Mapped return typedescription: Return description from Doxygen comments (if available)enum: (Optional) Full enum structure if return type is an enum
The parser automatically detects enum types and includes their complete definition:
- Enum name and description
- All enum values with their numeric values
- Per-value descriptions (if documented)
- Automatically filters out sentinel values like "Count", "MAX", "INVALID"
The parser can parse function pointer typedefs from their underlying signature:
- Extracts return type and parameter types
- Recursively processes parameter types (including nested enums)
- Generates parameter names (param1, param2, etc.)
- Includes full function description from typedef documentation
Note: Parameter names in function prototypes are auto-generated as param1, param2, etc. To include meaningful parameter names and descriptions, you'll need to add them manually or extend the parser to read from additional documentation sources.
Enums can appear at multiple levels:
- Function parameters
- Function return types
- Function prototype parameters (within typedef)
- Function prototype return types (within typedef)
All enum references include the complete enum structure with values and descriptions.
The script maps C++ types to simplified types:
Pointers (except void*) are mapped to ptr64, and unknown types default to ?.
- More accurate parsing: Leverages Clang's AST instead of regex-based parsing
- Better enum handling: Automatically resolves enum base types and includes full enum definitions
- Function typedef support: Parses function pointer signatures from typedefs
- Structured documentation: Extracts Doxygen comments in a more reliable way
- Type safety: Uses Clang's type system for accurate type information
- Better scalability: Can handle complex C++ constructs that regex parsing struggled with
- Nested type support: Handles enums within function prototypes and other complex scenarios
- Ensure your YAML file contains a
ChildFunctionssection - Check that your filters aren't too restrictive
- Verify the YAML file is properly formatted
- The type is not in the type mapping dictionary
- You can add custom type mappings to the
map_type()function
- Ensure your source files have proper Doxygen comments with
@brief,@param, and@returntags - Verify clang-doc is extracting the comments (check the YAML file)
- Verify the enum is defined in the YAML file's
ChildEnumssection - Ensure the enum has proper Doxygen documentation
- Check that enum values are documented
- This is expected behavior - clang-doc doesn't preserve parameter names in typedef signatures
- Parameter names are auto-generated as
param1,param2, etc. - To add meaningful names, you can either:
- Manually edit the JSON output
- Extend the parser to read parameter metadata from additional sources
- Use wrapper functions with documented parameters instead of raw typedefs
By default, the parser filters out common sentinel enum values like "Count", "MAX", "INVALID", etc. You can customize this behavior by modifying the sentinel_names set in the build_enum_structure() function:
sentinel_names = {'Count', 'MAX', 'Max', 'INVALID', 'Invalid', 'NUM', 'Num'}This script is part of the Plugify project.
