You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While many extensions to Python-Markdown add new syntax, occasionally, you want to simply alter the way Markdown renders the existing syntax. For example, you may want to display some images inline, but require externally hosted images to simply be links which point to the image.
Suppose the following Markdown was provided:


We would like Python-Markdown to return the following HTML:
<p><imgalt="a local image" src="/path/to/image.jpg" /></p><p><ahref="http://example.com/image.jpg">a remote image</a></p>
Note: This tutorial is very generic and assumes a basic Python 3 development environment. A basic understanding of Python development is expected.
Analysis
Let's consider the options available to us:
Override the image related inline patterns.
While this would work, we don't need to alter the existing patterns. The parser is recognizing the syntax just fine. All we need to do is alter the HTML output.
We also want to support both inline image links and reference style image links, which would require redefining both inline patterns, doubling the work.
Leave the existing pattern alone and use a Treeprocessor to alter the HTML.
This does not alter the tokenization of the Markdown syntax in any way. We can be sure that anything which represents an image will be included, even any new image syntax added by other third-party extensions.
Given the above, let's use option two.
The Solution
To begin, let's create a new Treeprocessor:
frommarkdown.treeprocessorsimportTreeprocessorclassInlineImageProcessor(Treeprocessor):
defrun(self, root):
# Modify the HTML here
The run method of a Treeprocessor receives a root argument which contains an ElementTree object. We need to iterate over all of the img elements within that object and alter those which contain external URLs. Therefore, add the following code to the run method:
# Iterate over img elements onlyforelementinroot.iter('img'):
# copy the element's attributes for later useattrib=element.attrib# Check for links to external imagesifattrib['src'].startswith('http'):
# Save the tailtail=element.tail# Reset the elementelement.clear()
# Change the element to a linkelement.tag='a'# Copy src to hrefelement.set('href', attrib.pop('src'))
# Copy alt to labelelement.text=attrib.pop('alt')
# Reassign tailelement.tail=tail# Copy all remaining attributes to elementfork, vinattrib.items():
element.set(k, v)
A few things to note about the above code:
We make a copy of the element's attributes so that we don't loose them when we later reset the element with element.clear(). The same applies for the tail. As img elements don't have text, we don't need to worry about that.
We explicitly set the href attribute and the element.text as those are assigned to different attribute names on a elements that on img elements. When doing so, we pop the src and alt attributes from attrib so that they are no longer present when we copy all remaining attributes in the last step.
We don't need to make changes to img elements which point to internal images, so there no need to reference them in the code (they simply get skipped).
The test for external links (startswith('http')) could be improved and is left as an exercise for the reader.
Now we need to inform Markdown of our new Treeprocessor with an Extension subclass:
frommarkdown.extensionsimportExtensionclassImageExtension(Extension):
defextendMarkdown(self, md):
# Register the new treeprocessormd.treeprocessors.register(InlineImageProcessor(md), 'inlineimageprocessor', 15)
We register the Treeprocessor with a priority of 15, which ensures that it runs after all inline processing is done.
importmarkdowninput=""""""fromImageExtensionimportImageExtensionhtml=markdown.markdown(input, extensions=[ImageExtension()])
print(html)
And running python Test.py correctly returns the following output:
<p><imgalt="a local image" src="/path/to/image.jpg" title="A title."/></p><p><ahref="http://example.com/image.jpg" title="A title.">a remote image</a></p>
Success! Note that we included a title for each image, which was also properly retained.
Adding Configuration Settings
Suppose we want to allow the user to provide a list of know image hosts. Any img tags which point at images in those hosts may be inlined, but any other images should be external links. Of course, we want to keep the existing behavior for internal (relative) links.
classImageExtension(Extension):
def__init__(self, **kwargs):
# Define a config with defaultsself.config= {'hosts' : [[], 'List of approved hosts']}
super(ImageExtension, self).__init__(**kwargs)
We defined a hosts configuration setting which defaults to an empty list. Now, we need to pass that option on to our treeprocessor in the extendMarkdown method:
defextendMarkdown(self, md):
# Pass host to the treeprocessormd.treeprocessors.register(InlineImageProcessor(md, hosts=self.getConfig('hosts')), 'inlineimageprocessor', 15)
Next, we need to modify our treeprocessor to accept the new setting:
classInlineImageProcessor(Treeprocessor):
def__init__(self, md, hosts):
self.md=md# Assign the setting to the hosts attribute of the class instanceself.hosts=hosts
Then, we can add a method which uses the setting to test a URL:
fromurllib.parseimporturlparseclassInlineImageProcessor(Treeprocessor):
...
defis_unknown_host(self, url):
url=urlparse(url)
# Return False if network location is empty or an known hostreturnurl.netlocandurl.netlocnotinself.hosts
Finally, we can make use of the test method by replacing the if attrib['src'].startswith('http'): line of the run method with if self.is_unknown_host(attrib['src']):.