EDDYMENS

Last updated 2023-05-27 08:48:08

Matching Markdown And HTML Headings Using Regex | PHP

There are about three different heading syntaxes in Markdown. Two of which allow a heading level of up to about 6. The PHP script below matches all the different types of headings in Markdown

Also, the script ignores the maximum level of 6 and will match as long as the pattern follows the heading syntax. Thus a non-existing heading such as ################# Text will be matched.

Also note that since Markdown is a superset of HTML, the HTML syntax for headings is also matched.

The different types of headings the script matches are:

  • Markdown hash based heading: Thus #, ##, ###, ####, etc.
  • Alternative Markdown heading syntax: i.e.: ==== and ------.
  • HTML heading tags: Thus <‌h1>, <‌h2>, etc.

Script

01: <?php 02: 03: $re = '/(?<headerTag>#+)\s+(?<headingText>[^"\n]*)|(<(?<HTMLHeaderTag>h[1-6]).*?>(?<HTMLHeadingText>.*?)<\/h[1-6]>)|(((?<altMDHeadingText>.+)\n)(?<altMDHeadingTag>-+|=+))/m'; 04: $str = '# Overview 05: AWS S3 is a cloud storage service that caters to the storage needs of modern software applications. S3 buckets can be used to host static sites. 06: 07: ## Getting started 08: Once you have your AWS account all set up you can log in and then use the search bar up top to search for the S3 service. 09: 10: ### Third-level header 11: Third-level header content goes here. 12: 13: #### Forth-level header 14: Fourth-level content goes here. 15: 16: Alternative Heading Level 1 17: =========================== 18: Alternative heading 1 text. 19: 20: Alternative Heading Level 2 21: --------------------------- 22: Alternative heading 2 text. 23: 24: <h1> HTML Header 1 </h1> 25: Level 1 heading 26: 27: <h2> HTML Header 2 </h2> 28: Level 2 heading. 29: 30: <h6> HTML Header 6 </h6> 31: Level 6 heading.'; 32: 33: preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0); 34: 35: $images = []; 36: function getData($set1, $set2, $set3) { 37: if(strlen($set1)) return $set1; 38: if(strlen($set2)) return $set2; 39: if(strlen($set3)) return $set3; 40: return ''; 41: } 42: foreach($matches as $index => $eachMatch) { 43: $images[] = [ 44: 'headingTag' => getData($eachMatch['headerTag'] ?? '' , $eachMatch['HTMLHeaderTag'] ?? '' , $eachMatch['altMDHeadingTag'] ?? '' ), 45: 'headingText' => getData($eachMatch['headingText'] ?? '' , $eachMatch['HTMLHeadingText'] ?? '' , $eachMatch['altMDHeadingText'] ?? '') 46: ]; 47: 48: } 49: // Print the entire match result 50: echo json_encode($images, true); 51: ?>

Output

01: [ 02: { 03: "headingTag":"#", 04: "headingText":"Overview" 05: }, 06: { 07: "headingTag":"##", 08: "headingText":"Getting started" 09: }, 10: { 11: "headingTag":"###", 12: "headingText":"Third-level header" 13: }, 14: { 15: "headingTag":"####", 16: "headingText":"Forth-level header " 17: }, 18: { 19: "headingTag":"===========================", 20: "headingText":"Alternative Heading Level 1" 21: }, 22: { 23: "headingTag":"---------------------------", 24: "headingText":"Alternative Heading Level 2" 25: }, 26: { 27: "headingTag":"h1", 28: "headingText":" HTML Header 1 " 29: }, 30: { 31: "headingTag":"h2", 32: "headingText":" HTML Header 2 " 33: }, 34: { 35: "headingTag":"h6", 36: "headingText":" HTML Header 6 " 37: } 38: ]

Here is another article you might like 😊 "Diary Of Insights: A Documentation Of My Discoveries"