Results 1 to 2 of 2
  1. #1
    Jef
    Jef is offline Member
    Join Date
    Jan 2013
    Posts
    1
    Rep Power
    0

    Question Parse HTML with Jsoup, remain layout

    Hey there,

    I need to parse an HTML e-mail to plain test, but I want to keep the layout. I heard Jsoup was what I was looking for. So I started puzzling with it but I can't seem to parse the HTML in such a way that it will return the text while the layout remains the same.

    This is what the e-mail looks like:
    Parse HTML with Jsoup, remain layout-examplemail.png


    This is the HTML from the e-mail:

    <html dir=3D"ltr"><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-1"><style id=3D"owaParaStyle">=0A<!--=0Ap=0A {margin-top:0px;=0A margin-bottom:0px}=0A-->=0A</style></head><body bgcolor=3D"#ffffff" fpstyle=3D"1" ocsi=3D"0"><div style=3D"direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;"><table border=3D"0" cellspacing=3D"0" cellpadding=3D"0" width=3D"816" style=3D"font-family: 'Times New Roman';"><tbody><tr><td width=3D"35"></td><td width=3D"1"></td><td width=3D"18"></td><td width=3D"101"></td><td width=3D"7"></td><td width=3D"24"></td><td width=3D"83"></td><td width=3D"18"></td><td width=3D"13"></td><td width=3D"89"></td><td width=3D"6"></td><td width=3D"7"></td><td width=3D"137"></td><td width=3D"132"></td><td width=3D"6"></td><td width=3D"12"></td><td width=3D"127"></td></tr><tr valign=3D"top"><td rowspan=3D"21" width=3D"689" colspan=3D"16"><font size=3D"2" face=3D"Verdana">Geachte&nbsp;heer/mevrouw,</font><br><font size=3D"2" face=3D"Verdana">&nbsp;</font><br><font size=3D"2" face=3D"Verdana">Wij&nbsp;hebben&nbsp;uw&nbsp;inze nding&nbsp;ontvangen&nbsp;en&nbsp;gecontroleerd.&n bsp;Hierbij&nbsp;het&nbsp;verslag&nbsp;van&nbsp;de &nbsp;controle.</font><br><font size=3D"2" face=3D"Verdana">&nbsp;</font><br><font size=3D"2" face=3D"Verdana">Fiscaal&nbsp;nummer:&nbsp;3183579 &nbsp;</font><br><font size=3D"2" face=3D"Verdana">Berichtsoort&nbsp;:&nbsp;UZS:&nbs p;OTP&nbsp;plateau&nbsp;3&nbsp;bericht&nbsp;</font><br><font size=3D"2" face=3D"Verdana">Datum/tijd&nbsp;aanmaak&nbsp;:&nbsp;13-06-2012&nbsp;02:02:03&nbsp;</font><br><font size=3D"2" face=3D"Verdana">Referentie&nbsp;:&nbsp;&nbsp;</font><br><font size=3D"2" face=3D"Verdana">Volgnummer&nbsp;inzending&nbsp;:& nbsp;&nbsp;</font><br><font size=3D"2" face=3D"Verdana">Aantal&nbsp;meldingen:&nbsp;1&nbs p;</font><br><font size=3D"2" face=3D"Verdana">Aantal&nbsp;meldingen&nbsp;verwer kt&nbsp;in&nbsp;de&nbsp;administratie:&nbsp;1&nbsp ;</font><br><font size=3D"2" face=3D"Verdana">Aantal&nbsp;afgekeurde&nbsp;meldi ngen:&nbsp;0&nbsp;</font><br><font size=3D"2" face=3D"Verdana">&nbsp;</font><br><font size=3D"2" face=3D"Verdana">De&nbsp;inzending&nbsp;is&nbsp;in &nbsp;onze&nbsp;administratie&nbsp;opgenomen&nbsp; onder&nbsp;nummer&nbsp;8.&nbsp;</font><br></td><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td height=3D"9"></td></tr><tr valign=3D"top"><td rowspan=3D"2" width=3D"677" colspan=3D"15"><font size=3D"2" face=3D"Verdana">De&nbsp;inzending&nbsp;bevatte&nb sp;de&nbsp;volgende&nbsp;leveringen:</font></td><td height=3D"9" colspan=3D"2"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"2"></td></tr><tr><td height=3D"9" colspan=3D"17"></td></tr><tr valign=3D"top"><td rowspan=3D"4" width=3D"35"><font size=3D"2" face=3D"Verdana">Volg</font><br><font size=3D"2" face=3D"Verdana">nr</font><br></td><td height=3D"9"></td><td rowspan=3D"3" width=3D"119" colspan=3D"2" align=3D"right"><font size=3D"2" face=3D"Verdana">Aantal&nbsp;verwerkt</font><br><font size=3D"2" face=3D"Verdana">in&nbsp;administratie</font><br></td><td></td><td rowspan=3D"3" width=3D"125" colspan=3D"3" align=3D"right"><font size=3D"2" face=3D"Verdana">Aantal&nbsp;afgekeurde</font><br><font size=3D"2" face=3D"Verdana">meldingen</font><br></td><td></td><td rowspan=3D"3" width=3D"95" colspan=3D"2"><font size=3D"2" face=3D"Verdana">Loonheffingen</font><br><font size=3D"2" face=3D"Verdana">nummer</font><br></td><td></td><td rowspan=3D"2" width=3D"137"><font size=3D"2" face=3D"Verdana">Naam</font></td><td colspan=3D"4"></td></tr><tr valign=3D"top"><td height=3D"9"></td><td></td><td></td><td></td><td colspan=3D"4"></td></tr><tr valign=3D"top"><td height=3D"9"></td><td></td><td></td><td colspan=3D"6"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"16"></td></tr><tr valign=3D"top"><td rowspan=3D"2" width=3D"35"><font size=3D"2" face=3D"Verdana">1</font></td><td height=3D"9" colspan=3D"2"></td><td rowspan=3D"2" width=3D"101" align=3D"right"><font size=3D"2" face=3D"Verdana">1</font></td><td colspan=3D"2"></td><td rowspan=3D"2" width=3D"101" colspan=3D"2" align=3D"right"><font size=3D"2" face=3D"Verdana">0</font></td><td></td><td rowspan=3D"2" width=3D"89"><font size=3D"2" face=3D"Verdana">003183579L02</font></td><td colspan=3D"2"></td><td rowspan=3D"2" width=3D"269" colspan=3D"2"><font size=3D"2" face=3D"Verdana">ETOS&nbsp;B.V.</font></td><td colspan=3D"3"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"2"></td><td colspan=3D"2"></td><td></td><td colspan=3D"2"></td><td colspan=3D"3"></td></tr><tr><td height=3D"19" colspan=3D"17"></td></tr><tr valign=3D"top"><td rowspan=3D"4" width=3D"269" colspan=3D"7"><font size=3D"2" face=3D"Verdana">Met&nbsp;vriendelijke&nbsp;groet, </font><br><font size=3D"2" face=3D"Verdana">UWV</font><br></td><td height=3D"9" colspan=3D"10"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"10"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"10"></td></tr><tr valign=3D"top"><td height=3D"9" colspan=3D"10"></td></tr></tbody></table><div style=3D"font-family: Times New Roman; color: #000000; font-size: 16px"><div><div style=3D"direction:ltr; font-family:Calibri; color:#000000; font-size:10pt"><div style=3D"font-family:Times New Roman; color:#000000; font-size:16px"><div><p style=3D"page-break-before:always"></p><br><hr><font color=3D"gray" size=3D"1" face=3D"Verdana"><br>Dit bericht kan informatie bevatten die niet voor u bedoeld is. Bent u niet de geadresseerde, of is dit bericht per<br>ongeluk aan u verzonden? Meld dit dan aan de afzender en verwijder dit bericht.<br><br>Wees milieuvriendelijk, voorkom papierverspilling door bewust te printen.<br><br></font></div></div></div></div></div></div><style type=3D"text/css"></style></body></html>

    I tried to parse and clean, but canīt seem to keep the same layout like the screenshot. Is it even possible? And if yes, how can I do that?

    Any help is very much appreciated, this has been bothering me for weeks now.

    Thanks!
    Last edited by Jef; 01-02-2013 at 01:28 PM. Reason: Removed HTML code tag

  2. #2
    Zyril is offline Senior Member
    Join Date
    Oct 2011
    Location
    Sweden
    Posts
    124
    Rep Power
    0

    Default Re: Parse HTML with Jsoup, remain layout

    The text is presented using CSS, and you are parsing the HTML. With JSoup you can parse the HTML though not the CSS layout.

Similar Threads

  1. Html scraping Site Loads Wrong Jsoup Java
    By kevinn205 in forum Advanced Java
    Replies: 1
    Last Post: 08-27-2012, 10:19 PM
  2. Best way to parse HTML??
    By SeeD419 in forum New To Java
    Replies: 5
    Last Post: 07-10-2011, 06:05 PM
  3. Parse HTML, regex help
    By africanhacker in forum New To Java
    Replies: 1
    Last Post: 04-01-2011, 03:50 PM
  4. Parse HTML
    By gab in forum New To Java
    Replies: 1
    Last Post: 02-21-2011, 11:53 PM
  5. How to parse in html
    By paty in forum New To Java
    Replies: 1
    Last Post: 07-24-2007, 01:29 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •