Gnerate TOC (navigation menu) of a markdown file when converting it to a html file using python

Some people write their articles in the format of markdown, and use Python-Markdown to convert markdown files to html before publishing. The problem is, how to automatically generate a TOC of it?

Insert TOC to markdown

Make sure that you have already installed Python-Markdown:

pip install markdown

In the following code, we insert a special mark [TOC] at the beginning of the markdown content:

import markdown

# put "[TOC]" to somewhere of the markdown content
content = '''
[TOC]

# A
# B
## B1
## B2
# C
'''

# extensions=['toc'] tell Python-Markdown to handle "[TOC]" mark
html = markdown.markdown(content, output_format='html5', extensions=['toc'])
print(html)

The output html is:

<div class="toc">
    <ul>
        <li><a href="#a">A</a></li>
        <li><a href="#b">B</a>
            <ul>
                <li><a href="#b1">B1</a></li>
                <li><a href="#b2">B2</a></li>
            </ul>
        </li>
        <li><a href="#c">C</a></li>
    </ul>
</div>
<h1 id="a">A</h1>
<h1 id="b">B</h1>
<h2 id="b1">B1</h2>
<h2 id="b2">B2</h2>
<h1 id="c">C</h1>

In the html above, a div with class name "toc" is generated. It uses <ul> to form a navigation menu which supports nested submenu.

Separate TOC from other content

Sometimes you may want the TOC alone, we can utilize bs4 to separate TOC from other content.

Make sure that you have already installed beautifulsoup4:

pip install beautifulsoup4

Here's the code:

import markdown
from bs4 import BeautifulSoup

content = '''
[TOC]

# A
# B
## B1
## B2
# C
'''

html = markdown.markdown(content, output_format='html5', extensions=['toc'])
soup = BeautifulSoup(html, 'html.parser')  # use BeautifulSoup to parse html
toc = soup.select_one('div.toc')  # find TOC element
print(toc)  # print TOC alone
toc.extract()  # remove TOC from the html
print(soup)  # print content without TOC

In the code above, we printed twice. The former is the standalone TOC and the latter is the other content.

Posted on 2022-05-23