How to extract links from a HTML page using XSL

A small XSL template to extract all the links from a HTML page:

<?xml version=”1.0″ encoding=”UTF-8″?>
<xsl:stylesheet version=”1.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>
<xsl:output indent=”yes”/>

<xsl:template match=”/”>
<html>
<head>
<title>Link List</title>
</head>
<body>
<p>
<ul>
<xsl:for-each select=”//li”>
<xsl:copy-of select=”.”/>
</xsl:for-each>
</ul>
</p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

To use it, install libxslt-bin in fink or xsltproc in debian, and execute the following line:

xsltproc filter.xsl index.html

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>