Results 1 to 2 of 2
- 01-02-2013, 12:27 PM #1
Member
- Join Date
- Jan 2013
- Posts
- 1
- Rep Power
- 0
Parse HTML with Jsoup, remain layout
Hey there,
I need to parse an HTML e-mail to plain test, but I want to keep the layout. I heard Jsoup was what I was looking for. So I started puzzling with it but I can't seem to parse the HTML in such a way that it will return the text while the layout remains the same.
This is what the e-mail looks like:

This is the HTML from the e-mail:
<html dir=3D"ltr"><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-1"><style id=3D"owaParaStyle">=0A<!--=0Ap=0A {margin-top:0px;=0A margin-bottom:0px}=0A-->=0A</style></head><body bgcolor=3D"#ffffff" fpstyle=3D"1" ocsi=3D"0"><div style=3D"direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;"><table border=3D"0" cellspacing=3D"0" cellpadding=3D"0" width=3D"816" style=3D"font-family: 'Times New Roman';"><tbody><tr><td width=3D"35"></td><td width=3D"1"></td><td width=3D"18"></td><td width=3D"101"></td><td width=3D"7"></td><td width=3D"24"></td><td width=3D"83"></td><td width=3D"18"></td><td width=3D"13"></td><td width=3D"89"></td><td width=3D"6"></td><td width=3D"7"></td><td width=3D"137"></td><td width=3D"132"></td><td width=3D"6"></td><td width=3D"12"></td><td width=3D"127"></td></tr><tr valign=3D"top"><td rowspan=3D"21" width=3D"689" colspan=3D"16"><font size=3D"2" face=3D"Verdana">Geachte heer/mevrouw,</font><br><font size=3D"2" face=3D"Verdana"> </font><br><font size=3D"2" face=3D"Verdana">Wij hebben uw inze nding ontvangen en gecontroleerd.&n bsp;Hierbij het verslag van de controle.</font><br><font size=3D"2" face=3D"Verdana"> </font><br><font size=3D"2" face=3D"Verdana">Fiscaal nummer: 3183579 </font><br><font size=3D"2" face=3D"Verdana">Berichtsoort : UZS:&nbs p;OTP plateau 3 bericht </font><br><font size=3D"2" face=3D"Verdana">Datum/tijd aanmaak : 13-06-2012 02:02:03 </font><br><font size=3D"2" face=3D"Verdana">Referentie : </font><br><font size=3D"2" face=3D"Verdana">Volgnummer inzending :& nbsp; </font><br><font size=3D"2" face=3D"Verdana">Aantal meldingen: 1&nbs p;</font><br><font size=3D"2" face=3D"Verdana">Aantal meldingen verwer kt in de administratie: 1  ;</font><br><font size=3D"2" face=3D"Verdana">Aantal afgekeurde meldi ngen: 0 </font><br><font size=3D"2" face=3D"Verdana"> </font><br><font size=3D"2" face=3D"Verdana">De inzending is in onze administratie opgenomen onder nummer 8. </font><br></td><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td rowspan=3D"2" width=3D"677" colspan=3D"15"><font size=3D"2" face=3D"Verdana">De inzending bevatte&nb sp;de volgende leveringen:</font></td><td height=3D"9" colspan=3D"2"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"2"></td></tr><tr><td height=3D"9" colspan=3D"17"></td></tr><tr valign=3D"top"><td rowspan=3D"4" width=3D"35"><font size=3D"2" face=3D"Verdana">Volg</font><br><font size=3D"2" face=3D"Verdana">nr</font><br></td><td height=3D"9"></td><td rowspan=3D"3" width=3D"119" colspan=3D"2" align=3D"right"><font size=3D"2" face=3D"Verdana">Aantal verwerkt</font><br><font size=3D"2" face=3D"Verdana">in administratie</font><br></td><td></td><td rowspan=3D"3" width=3D"125" colspan=3D"3" align=3D"right"><font size=3D"2" face=3D"Verdana">Aantal afgekeurde</font><br><font size=3D"2" face=3D"Verdana">meldingen</font><br></td><td></td><td rowspan=3D"3" width=3D"95" colspan=3D"2"><font size=3D"2" face=3D"Verdana">Loonheffingen</font><br><font size=3D"2" face=3D"Verdana">nummer</font><br></td><td></td><td rowspan=3D"2" width=3D"137"><font size=3D"2" face=3D"Verdana">Naam</font></td><td colspan=3D"4"></td></tr><tr valign=3D"top"><td height=3D"9"></td><td></td><td></td><td></td><td colspan=3D"4"></td></tr><tr valign=3D"top"><td height=3D"9"></td><td></td><td></td><td colspan=3D"6"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"16"></td></tr><tr valign=3D"top"><td rowspan=3D"2" width=3D"35"><font size=3D"2" face=3D"Verdana">1</font></td><td height=3D"9" colspan=3D"2"></td><td rowspan=3D"2" width=3D"101" align=3D"right"><font size=3D"2" face=3D"Verdana">1</font></td><td colspan=3D"2"></td><td rowspan=3D"2" width=3D"101" colspan=3D"2" align=3D"right"><font size=3D"2" face=3D"Verdana">0</font></td><td></td><td rowspan=3D"2" width=3D"89"><font size=3D"2" face=3D"Verdana">003183579L02</font></td><td colspan=3D"2"></td><td rowspan=3D"2" width=3D"269" colspan=3D"2"><font size=3D"2" face=3D"Verdana">ETOS B.V.</font></td><td colspan=3D"3"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"2"></td><td colspan=3D"2"></td><td></td><td colspan=3D"2"></td><td colspan=3D"3"></td></tr><tr><td height=3D"19" colspan=3D"17"></td></tr><tr valign=3D"top"><td rowspan=3D"4" width=3D"269" colspan=3D"7"><font size=3D"2" face=3D"Verdana">Met vriendelijke groet, </font><br><font size=3D"2" face=3D"Verdana">UWV</font><br></td><td height=3D"9" colspan=3D"10"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"10"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"10"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"10"></td></tr></tbody></table><div style=3D"font-family: Times New Roman; color: #000000; font-size: 16px"><div><div style=3D"direction:ltr; font-family:Calibri; color:#000000; font-size:10pt"><div style=3D"font-family:Times New Roman; color:#000000; font-size:16px"><div><p style=3D"page-break-before:always"></p><br><hr><font color=3D"gray" size=3D"1" face=3D"Verdana"><br>Dit bericht kan informatie bevatten die niet voor u bedoeld is. Bent u niet de geadresseerde, of is dit bericht per<br>ongeluk aan u verzonden? Meld dit dan aan de afzender en verwijder dit bericht.<br><br>Wees milieuvriendelijk, voorkom papierverspilling door bewust te printen.<br><br></font></div></div></div></div></div></div><style type=3D"text/css"></style></body></html>
I tried to parse and clean, but can´t seem to keep the same layout like the screenshot. Is it even possible? And if yes, how can I do that?
Any help is very much appreciated, this has been bothering me for weeks now.
Thanks!Last edited by Jef; 01-02-2013 at 12:28 PM. Reason: Removed HTML code tag
- 01-02-2013, 07:28 PM #2
Senior Member
- Join Date
- Oct 2011
- Location
- Sweden
- Posts
- 123
- Rep Power
- 0
Similar Threads
-
Html scraping Site Loads Wrong Jsoup Java
By kevinn205 in forum Advanced JavaReplies: 1Last Post: 08-27-2012, 09:19 PM -
Best way to parse HTML??
By SeeD419 in forum New To JavaReplies: 5Last Post: 07-10-2011, 05:05 PM -
Parse HTML, regex help
By africanhacker in forum New To JavaReplies: 1Last Post: 04-01-2011, 02:50 PM -
Parse HTML
By gab in forum New To JavaReplies: 1Last Post: 02-21-2011, 10:53 PM -
How to parse in html
By paty in forum New To JavaReplies: 1Last Post: 07-24-2007, 12:29 AM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks