Some people write their articles in the format of markdown, and use Python-Markdown
to convert markdown files to html before publishing. The problem is, how to automatically generate a TOC of it?
Make sure that you have already installed Python-Markdown
:
pip install markdown
In the following code, we insert a special mark [TOC]
at the beginning of the markdown content:
import markdown
# put "[TOC]" to somewhere of the markdown content
content = '''
[TOC]
# A
# B
## B1
## B2
# C
'''
# extensions=['toc'] tell Python-Markdown to handle "[TOC]" mark
html = markdown.markdown(content, output_format='html5', extensions=['toc'])
print(html)
The output html is:
<div class="toc">
<ul>
<li><a href="#a">A</a></li>
<li><a href="#b">B</a>
<ul>
<li><a href="#b1">B1</a></li>
<li><a href="#b2">B2</a></li>
</ul>
</li>
<li><a href="#c">C</a></li>
</ul>
</div>
<h1 id="a">A</h1>
<h1 id="b">B</h1>
<h2 id="b1">B1</h2>
<h2 id="b2">B2</h2>
<h1 id="c">C</h1>
In the html above, a div with class name "toc" is generated. It uses <ul>
to form a navigation menu which supports nested submenu.
Sometimes you may want the TOC alone, we can utilize bs4
to separate TOC from other content.
Make sure that you have already installed beautifulsoup4
:
pip install beautifulsoup4
Here's the code:
import markdown
from bs4 import BeautifulSoup
content = '''
[TOC]
# A
# B
## B1
## B2
# C
'''
html = markdown.markdown(content, output_format='html5', extensions=['toc'])
soup = BeautifulSoup(html, 'html.parser') # use BeautifulSoup to parse html
toc = soup.select_one('div.toc') # find TOC element
print(toc) # print TOC alone
toc.extract() # remove TOC from the html
print(soup) # print content without TOC
In the code above, we printed twice. The former is the standalone TOC and the latter is the other content.