Regex Get All Images In A Markdown File | PHP

Stripe Integration Freelance Gig external link
🔗 See other jobs

The PHP script below parses and returns the URL and ALT of image tags found in any markdown text.

Note: I added support for the HTML image tag as well since markdown is a superset of HTML.

Script


01: <?php
02: $re = '/!\[(?<altText>.*)\]\s*\((?<imageURL>.+)\)|img\s*src="(?<imageURL1>[^"]*)"\s*alt="(?<altText1>[^"]*)" \/>|img\s*alt="(?<altText2>[^"]*)"\s*src="(?<imageURL2>[^"]*)" \/>/m';
03: $str = 'AWS S3 is a cloud storage service that caters to the storage needs of modern software applications. S3 buckets can be used to host static sites.
04: 
05: ## Getting started
06: Once you have your AWS account all setup you can login and then use the search bar up top to search for the S3 service.
07: 
08: ![alt1](/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.jpg)
09: 
10: <img src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.fpg" alt="alt2" />
11: 
12: <img alt="alt3" src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.mono" />';
13: 
14: preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
15: 
16: $images = [];
17: function getData($set1, $set2, $set3) {
18:     if(strlen($set1)) return $set1;
19:     if(strlen($set2)) return $set2;
20:     if(strlen($set3)) return $set3;
21:     return ''; 
22: }
23: foreach($matches as $eachMatch) {
24: $images[] = [
25:     'src' => getData($eachMatch['imageURL'] , $eachMatch['imageURL1'] , $eachMatch['imageURL2']),
26:     'alt' => getData($eachMatch['altText'] , $eachMatch['altText1'] , $eachMatch['altText2'])
27: ];
28: 
29: }
30: // Print the entire match result
31:  echo json_encode($images, true);
32: ?>

The regex matches three types of image tag structures:

  • markdown syntax i.e.: [alt](URL)
  • HTML image tag where src comes before the alt
  • HTML image tag where alt comes before src

The above possible matches are why we have the getData function. The function returns data from whichever structure happens to be a match.

Output


01: [
02:   {
03:     "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.jpg",
04:     "alt": "alt1"
05:   },
06:   {
07:     "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.fpg",
08:     "alt": "alt2"
09:   },
10:   {
11:     "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.mono",
12:     "alt": "alt3"
13:   }
14: ]

Here is another article you might like 😊 "Matching Markdown and HTML headings using Regex | JS"