Results 1 to 3 of 3
  1. #1
    Neruk is offline Member
    Join Date
    Jan 2011
    Posts
    2
    Rep Power
    0

    Default HTML table to two-dimensional array

    Hello everyone!
    The problem is to parse the HTML table into the 2D array.
    The requirements are:
    1. The parser should handle rowspans, collspans, nested tables and so on. That means, everything that suites HTML standard. The thing is that the tables I need to parse are automatically generated by another program, so they are rather complex and excessive.
    2. If the table contains spans, the value of the spanned cell should be put only ones in array in the left up corner of cells of this array, the other cells, correspondent to the spanned cell in HTML table should be nulled (maybe it is hard to understand, see example below).
    To date I found a solutions on php (JS_Extractor JS_Extractor! And the death of Table Extractor - Jack Sleight), and Java (Java HTML Table parser Simbiosis), but is seems like they don't suite my requirements, as JS_Extractor is written on PHP and doesn't handle inherited tables, "Java HTML Table parser Simbiosis" doesn't handle even spans.
    Today I tried to use HTTPUnit, but the results are disappointing too. The simple tables are parsed correctly, but the complex ones are not.
    E.g.
    Table code:
    XML Code:
    <html>
    	<body>
    	<table   border="2" width="20%" height="20%">
    		<tr bgcolor="red">
    			<td colspan="2" rowspan="2">
    				<span>1</span>
    			</td>
    			<td>
    				<span>2</span>
    			</td>
    			<td>
    				<span>3</span></td>
    			<td>
    				<span>4.1</span>
    			</td>
    			<td>
    				<span>5.1</span>
    			</td>
    			<td>
    				<span>6 last</span>
    			</td>
    		</tr>
    		<tr bgcolor="green">
    			<td rowspan="2">
    				<span>1</span>
    			</td>
    			<td>
    				<span>2.4x</span>
    			</td>
    			<td>
    				<span>3.3x</span>
    			</td>
    			<td>
    				<span>4</span>
    			</td>
    			<td>
    				<span>5 last</span>
    			</td>
    		</tr>
    		<tr bgcolor="ffcc00">
    			<td>
    				<span>1x</span>
    			</td>
    			<td>
    				<span>2</span>
    			</td>
    			<td>
    				<span>3</span>
    			</td>
    			<td>
    				<span>4</span>
    			</td>
    			<td>
    				<span>5.8</span>
    			</td>
    			<td>
    				<span>6 last</span>
    			</td>
    		</tr>
    		<tr bgcolor="yellow">
    			<td><span>1</span></td>
    			<td><span>2</span></td>
    			<td><span>3</span></td>
    			<td><span>4</span></td>
    			<td><span>5</span></td>
    			<td><span>6</span></td>
    			<td><span>7 last</span></td>
    		</tr>
    	</table>	
    </body>
    </html>
    This is a table, shown in Chrome (various colors represent correspondent rows):

    An array I want to see as a result:

    Here you can see what I meant in the requirement number 2. "1" from the fist span and "1" from the second span are put in the left upper corner of the spanned area, while the rest cells in this area are null.
    The result, given by HTTPUnit:

    As you can see, even if we throw the requirement 2 away, we have an error in the third row here.
    And this is a rather simple example, without inherited tables, with them it is terribly wrong.
    What can you recomend me in that case?
    I'll be happy for any help, as I can't believe this problem was not solved yet!

  2. #2
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    8

    Default

    Okay? So try parsing it yourself.

    You can pull out the table and then use an xmlparser to navigate it if you want to avoid the large part of the regex (which I would suggest doing).

  3. #3
    Neruk is offline Member
    Join Date
    Jan 2011
    Posts
    2
    Rep Power
    0

    Default

    Quote Originally Posted by masijade View Post
    Okay? So try parsing it yourself.

    You can pull out the table and then use an xmlparser to navigate it if you want to avoid the large part of the regex (which I would suggest doing).
    Thank you for you response. I only thought there may be some allready realized algorythms, that are not presented to the general public, that were made by developers, that can possibly use this forum.

Similar Threads

  1. 2 dimensional array help!
    By ber1023 in forum New To Java
    Replies: 9
    Last Post: 01-02-2011, 12:29 AM
  2. Two dimensional array
    By niu_niu in forum New To Java
    Replies: 4
    Last Post: 06-13-2010, 12:34 AM
  3. about two dimensional array
    By matin1234 in forum New To Java
    Replies: 2
    Last Post: 06-01-2010, 11:09 AM
  4. two dimensional array print
    By Myles in forum New To Java
    Replies: 0
    Last Post: 04-26-2010, 11:10 AM
  5. two-dimensional array
    By kHim in forum New To Java
    Replies: 4
    Last Post: 11-16-2008, 07:21 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •