EDDYMENS

Matching Markdown And HTML Headings Using Regex | PHP

There are about three different heading syntaxes in Markdown. Two of which allow a heading level of up to about 6. The PHP script below matches all the different types of headings in Markdown

Also, the script ignores the maximum level of 6 and will match as long as the pattern follows the heading syntax. Thus a non-existing heading such as ################# Text will be matched.

Also note that since Markdown is a superset of HTML, the HTML syntax for headings is also matched.

The different types of headings the script matches are:

  • Markdown hash based heading: Thus #, ##, ###, ####, etc.
  • Alternative Markdown heading syntax: i.e.: ==== and ------.
  • HTML heading tags: Thus <‌h1>, <‌h2>, etc.

Script


01: <?php
02: 
03: $re = '/(?<headerTag>#+)\s+(?<headingText>[^"\n]*)|(<(?<HTMLHeaderTag>h[1-6]).*?>(?<HTMLHeadingText>.*?)<\/h[1-6]>)|(((?<altMDHeadingText>.+)\n)(?<altMDHeadingTag>-+|=+))/m';
04: $str = '# Overview
05: AWS S3 is a cloud storage service that caters to the storage needs of modern software applications. S3 buckets can be used to host static sites.
06: 
07: ## Getting started
08: Once you have your AWS account all set up you can log in and then use the search bar up top to search for the S3 service.
09: 
10: ### Third-level header
11:  Third-level header content goes here.
12: 
13: #### Forth-level header 
14:  Fourth-level content goes here.
15: 
16: Alternative Heading Level 1
17: ===========================
18: Alternative heading 1 text.
19: 
20: Alternative Heading Level 2
21: ---------------------------
22: Alternative heading 2 text.
23: 
24: <h1>   HTML Header 1  </h1>
25: Level 1 heading
26: 
27: <h2> HTML Header 2  </h2>
28: Level 2 heading.
29: 
30: <h6> HTML Header 6  </h6>
31: Level 6 heading.';
32: 
33: preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
34: 
35: $images = [];
36: function getData($set1, $set2, $set3) {
37:     if(strlen($set1)) return $set1;
38:     if(strlen($set2)) return $set2;
39:     if(strlen($set3)) return $set3;
40:     return ''; 
41: }
42: foreach($matches as $index => $eachMatch) {
43: $images[] = [
44:     'headingTag' => getData($eachMatch['headerTag'] ?? '' , $eachMatch['HTMLHeaderTag'] ?? '' , $eachMatch['altMDHeadingTag'] ?? '' ),
45:     'headingText' => getData($eachMatch['headingText'] ?? '' , $eachMatch['HTMLHeadingText'] ?? '' , $eachMatch['altMDHeadingText'] ?? '')
46: ];
47: 
48: }
49: // Print the entire match result
50:  echo json_encode($images, true);
51: ?>

Output


01: [
02:    {
03:       "headingTag":"#",
04:       "headingText":"Overview"
05:    },
06:    {
07:       "headingTag":"##",
08:       "headingText":"Getting started"
09:    },
10:    {
11:       "headingTag":"###",
12:       "headingText":"Third-level header"
13:    },
14:    {
15:       "headingTag":"####",
16:       "headingText":"Forth-level header "
17:    },
18:    {
19:       "headingTag":"===========================",
20:       "headingText":"Alternative Heading Level 1"
21:    },
22:    {
23:       "headingTag":"---------------------------",
24:       "headingText":"Alternative Heading Level 2"
25:    },
26:    {
27:       "headingTag":"h1",
28:       "headingText":" HTML Header 1 "
29:    },
30:    {
31:       "headingTag":"h2",
32:       "headingText":" HTML Header 2 "
33:    },
34:    {
35:       "headingTag":"h6",
36:       "headingText":" HTML Header 6 "
37:    }
38: ]

Here is another article you might like 😊 "Netlify _redirect not working | fix"