doc: generate man pages from markdown by petrasovaa · Pull Request #7540 · OSGeo/grass · GitHub
Skip to content

doc: generate man pages from markdown#7540

Open
petrasovaa wants to merge 3 commits into
OSGeo:mainfrom
petrasovaa:md2man
Open

doc: generate man pages from markdown#7540
petrasovaa wants to merge 3 commits into
OSGeo:mainfrom
petrasovaa:md2man

Conversation

@petrasovaa

Copy link
Copy Markdown
Contributor

This PR adds a converter markdown to groff (man pages). It reuses the groff Formatter. Adjusts makefiles to build the man pages from markdown.

Context: We still have HTML documentation files, ultimately we need to remove them. They are still used for man pages.

AI disclosure: This has been generated by Fable. I tried myself a while ago to write such a converter, but this converter works much better. I tested it myself, inspecting results of several pages. Fable itself verified all the syntax is covered including testing markdown in addons.

More details on the architecture of the md→man conversion (AI summary):

Details The pipeline reuses the existing groff backend and only replaces the front end. Three stages:

Stage 1 — parse Markdown to a tag tree (gmd.py). gmd.parse(text) returns (metadata, nodes). It strips YAML front matter and HTML comments, then splits the body into block-level nodes. Crucially, the output is a tree of (tag, attrs, body) tuples using HTML tag names (h2, p, dl/dt/dd, ul/li, table/tr/td, pre, b, i, code, …) — deliberately the same shape that ghtml.HTMLParser produces. This is the design pivot: by emitting HTML-flavored nodes, the markdown converter can feed the unchanged groff formatter that g.html2man already uses. Block parsing handles the constructs that actually occur in the corpus (headings, fenced code, mkdocs === "tabs", admonitions, the  -indented parameter lists from --md-description, lists, tables, blockquotes), and raw HTML fragments are delegated to ghtml.HTMLParser and spliced in. Inline parsing (parse_inline) handles emphasis, code spans, links/images (text kept, target dropped, as man pages can't link), and protects backslash escapes via private-use placeholders so an escaped * can't pair with real emphasis.

Stage 2 — shape the document for a man page (g.md2man.py). build_document() takes the parsed tree and applies man-page-specific policy: it pulls name/description/keywords from front matter into .SH NAME / .SH KEYWORDS, turns the "Command line" mkdocs tab into the SYNOPSIS (dropping the Python API tabs per your earlier decision via transform_tabs/expand_tabs), drops the lead paragraphs that merely repeat the description, and adds the conventional bare-name and --help synopsis lines. Pages without front matter (index/topic pages, and html-only addon docs) keep their heading as a section instead.

Stage 3 — format to groff (ggroff.Formatter, borrowed from g.html2man). The shaped tree is walked by the existing formatter, which emits the groff macros (.SH, .SS, .IP, .TS/.TE, .nf, font escapes). g.md2man.py then prepends the .TH title line and applies the same whitespace cleanup as g.html2man.py.

Build wiring. include/Make/Html.make builds each man page from $(MDDIR)/source/%.md (the already-generated markdown) via $(MD2MAN); man/Makefile derives the page list from the markdown wildcard and builds the md indices before the man pages. So the data flow is: tool --md-description → mkmarkdown.py (assembles the generated .md) → g.md2man.py → gmd.py → ggroff.py → .1. The net result is that man generation no longer depends on the HTML build at all, while reusing its proven groff emitter.

For context, next steps after this PR (AI summary):

  1. First add markdown generation to CMake. cmake/modules/generate_docs.cmake needs to run each tool's --md-description and mkmarkdown.py (the way Autotools' Html.make does), producing the $(MDDIR)/source/*.md files. This is the prerequisite — g.md2man has nothing to consume until these exist.

  2. Then switch the man rule in generate_docs.cmake from mkhtml.py+g.html2man.py to mkmarkdown.py+g.md2man.py, mirroring the Autotools change. Also wire g.md2man/gmd.py into the CMake install (the equivalent of the utils/Makefile SUBDIRS entry).

  3. After that: remove the HTML build (delete mkhtml.py, the html Make/CMake rules, the committed .html files, and the html index builders. Two cleanups fold in here:

    • Move ggroff.py (and ghtml.py, still used for raw-HTML fragments) into utils/g.md2man/. Once they're no longer shared with g.html2man, the ImportError fallback in g.md2man.py goes away and becomes a plain import ggroff.
    • g.extension needs real work in this PR. Three things break when HTML disappears:
      • update_manual_page() rewrites links inside installed html pages — it has no markdown/man equivalent.
      • The addons prefix only installs docs/html; the generated markdown never gets installed, so there's no browsable doc format for addons once html is gone.
      • remove_extension_std() cleans up html/man/rest but not md.
      • Also create_md_if_missing() (html→md fallback for html-only third-party addons) should be revisited — its assumptions change once html isn't the source of truth.

@github-actions github-actions Bot added Python Related code is in Python module docs markdown Related to markdown, markdown files general tests Related to Test Suite labels Jun 13, 2026
Comment thread utils/g.md2man/gmd.py Fixed
Comment thread utils/g.md2man/tests/g_md2man_test.py Fixed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file is a script and even has a shebang, does it have the executable bit set? I’m not sure by looking at the GitHub interface

@wenzeslaus

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs general markdown Related to markdown, markdown files module Python Related code is in Python tests Related to Test Suite

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants