String operations or regex please
Code:
<h1 class="headline1"><a href="article/2011-03-31-obama-saga-sucks-in-senior-democrats">
Obama in serious trouble
</a></h1>
I am trying to parse a page which has headlines of this sort on it. Now my objective is to isolate all instance of links that are found in the h1 tag of class headline
Code:
article/2011-03-31-obama-saga-sucks-in-senior-democrats
and the concatenate it with http://www.website.com/ so I end up with:
Code:
http://www.website.com/article/2011-03-31-obama-saga-sucks-in-senior-democrats
I will then use this url to get the content of article in question.
There are perhaps 15 links with the structure described above. How get these links an ignore all the other HTML on the page. Help please :(