Results 21 to 39 of 39
- 11-03-2010, 08:43 AM #21
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 8
It seems to be using some homegrown sessions and looking at some of the HTML source there are a few hidden fields that are passed back and forth and those probably control the sessions. Look at the source again (and that one javascript method is contained in that page) and see what other fields you might be able to pass along. The "mechanics" of this seem to be correct, it seems, now, as though you simply need to find the right information to post. If need be get a packet sniffer and make the connection with the browser and then examine the network traffic that resulted and see what is really going on.
- 11-03-2010, 08:59 AM #22
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
For now, I have given up trying to post anything to the site ... only trying to read. Do you still think the source and included Javascript will help when all I am trying to do is read the HTML page that the web site is creating ?
- 11-03-2010, 09:03 AM #23
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 8
Whether with "Get" or "Post" you are still providing information. And that site seems to run through a couple of redirects (building up a session as it goes, seemingly). I still think your best bet would be to go to the site where you have to enter the number, then start a packet sniffer and then enter a number submit the form and find out exactly what it is that the form submits and recreate (whether with get or post probably won't matter).
- 11-03-2010, 09:09 AM #24
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
Thanks for the tip. I will try that tomorrow. My brain batteries run out for tonight.
- 11-04-2010, 07:57 AM #25
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
Here is the first HTTP line from WireShark. There are other lines, but let's start with the first:
Protocol = HTTP
Info = GET /Parcel/SetSession.asp?pn=10101663&tcd=1&fpn=101-01-663%201&mry=2010&sec=True&rf=False&dp=S HTTP/1.1
Based on the above, I added the following to my code:
String msg = "pn=" + URLEncoder.encode("10101663");
msg += "&tcd=" + URLEncoder.encode("1");
msg += "&fpn=" + URLEncoder.encode("101-01-663%201");
msg += "&mry=" + URLEncoder.encode("2010");
msg += "&sec=" + URLEncoder.encode("True");
msg += "&rf=" + URLEncoder.encode("False");
msg += "&dp=" + URLEncoder.encode("S");
msg is what I ultimately send to the web server.
How would you translate the "/Parcel/SetSession.asp?" part into Java code ?
It takes a few seconds for the web server to come back with the information I need. Should I make my program sleep a few seconds, or is there a smarter way to detect when the web server is ready to serve me the information I need ?
AbdenourLast edited by achab; 11-04-2010 at 08:09 AM.
- 11-04-2010, 08:05 AM #26
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 8
http://<server>/Parcel/SetSession.asp
With the rest of that as the part written to the outputstream in the post call.
As far as sleeping, no, simply attempt to open the inputstream and it will "wait" until it gets some data, or until the server closes it.
But this "rf=False$dp=S" is
and notJava Code:msg += "&rf=" + URLEncoder.encode("False$dp=S");
if that is really a "$" and not an "&".Java Code:msg += "&rf=" + URLEncoder.encode("False"); msg += "&dp=" + URLEncoder.encode("S");
BTW, that String is already URL encoded (%20 is " "), so there is no reason to call encode at all, and if you do change
toJava Code:msg += "&fpn=" + URLEncoder.encode("101-01-663%201");
Java Code:msg += "&fpn=" + URLEncoder.encode("101-01-663 1");
- 11-04-2010, 08:21 AM #27
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 8
Or, if you want to use a get simply use
as the urlJava Code:"http://treasurer.maricopa.gov/Parcel/SetSession.asp?pn=10101663&tcd=1&fpn=101-01-663%201&mry=2010&sec=True&rf=False&dp=S"
- 11-04-2010, 08:25 AM #28
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
Oops ... it's rf=False&dp=S
I changed 101-01-663%201 to 101-01-663 1 in the write/post version.
With the new URL, the "get/read" version of the code no longer throws an exception. But, both the "get/read" and the "write/post followed by read" versions of the code give me an HTTP page source that contains "Maricopa County Parcel Inquiry", which is the page source you get if you go to treasurer.maricopa.gov/Parcel/Default.aspx and don't type anything.
As for the input stream waiting until it gets some data, when you do it manually, there is data there before you type anything. It just happens that the data I want is the one after you type, hit Submit, and wait a few seconds. Making the code sleep 5 seconds before getting in the InputStream doesn't solve the problem though.
Abdenour
- 11-04-2010, 08:34 AM #29
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
Here is the new full version of the "Get/Read" version of the code, which still doesn't work.
Java Code:import java.io.*; import java.util.*; import java.net.*; public class WebScanner { public static void main(String args[]) { try { // URL taxIDURL = new URL("http://treasurer.maricopa.gov/parcels/default.asp?Parcel=10101663"); URL taxIDURL = new URL("http://treasurer.maricopa.gov/Parcel/SetSession.asp?pn=10101663&tcd=1&fpn=101-01-663%201&mry=2010&sec=True&rf=False&dp=S"); HttpURLConnection conn = (HttpURLConnection) taxIDURL.openConnection(); conn.setRequestProperty("Proxy-Authorization", "Basic "); conn.setRequestProperty("User-Agent", "MSIE"); conn.setFollowRedirects(true); BufferedReader data = new BufferedReader(new InputStreamReader(conn.getInputStream())); for ( int i=0 ; i<45 ; i++ ) { String line = data.readLine(); System.out.println ("line = " + line); } } catch (MalformedURLException e) { System.out.println("Malformed URL: http://treasurer.maricopa.gov/parcels/default.asp?Parcel=10101663"); } catch (IOException e) { System.out.println("scanTreasurersWebSite - Caugh Exception " + e.toString()); e.printStackTrace(); } } }Last edited by achab; 11-04-2010 at 08:52 AM.
- 11-04-2010, 08:44 AM #30
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 8
push "QUOTE" on one of my posts with code tags and you will see them.
- 11-04-2010, 08:54 AM #31
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
Thanks. I just did it. So the progress in the "Get/Read" version from yesterday is that there is no longer an exception thrown. The progress that needs to be made is to get the same HTTP one would get manually from typing:
treasurer.maricopa.gov/Parcel/SetSession.asp?pn=10101663&tcd=1&fpn=101-01-663%201&mry=2010&sec=True&rf=False&dp=S
into a web browser and waiting 1 second or so.
- 11-04-2010, 09:34 AM #32
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 8
When I try it I get a "server redirected too many times" error. The server seems to be actively attempting to block automated lookups. I have the feeling the problem is at the server side.
BTW
should beJava Code:conn.setFollowRedirects(true);
and should be done before calling openConnection.Java Code:HttpURLConnection.setFollowRedirects(true);
- 11-04-2010, 06:32 PM #33
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
Thanks Masijade. I changed that. The problem still remains though. The output is the same: the page source of what you would get if you went to treasurer.maricopa.gov/Parcel/Default.aspx
I just rerun WireShark, this time simulating the "Get/Read" version, that is, hitting enter on my browser's URL, in which I had typed treasurer.maricopa.gov/Parcel/SetSession.asp?pn=10101663&tcd=1&fpn=101-01-663%201&mry=2010&sec=True&rf=False&dp=S before starting sniffing.
Here is the Info for first HTTP and HTTP/XML lines in WireShark. I highlighted in Bold the few lines that I think are relevant:
GET /root.sxml HTTP/1.1
HTTP/1.1 200 OK
HTTP/1.1 200 OK
HTTP/1.1 200 OK
GET /WANIPConn1.xml HTTP/1.1
HTTP/1.1 200 OK
POST /sqm/windowsLive/sqmserver.dll HTTP/1.1
HTTP /1.1 403 .6 Forbidden (text/html)
GET /1reupdate/short?!/~Live.ConfigServer.SuiteUpdate/~/~/~/~/~op-GetShortCatalog-ship
HTTP/1.1 301 Moved Permanently
GET /capacity HTTP/1.1
HTTP/1.1 200 OK (text/plain)
GET /capacity HTTP/1.1
HTTP/1.1 200 OK (text/plain)
GET /1reupdate/short?!/~Live.ConfigServer.SuiteUpdate/~/~/~/~/~op-GetShortCatalog-ship/~ts-101104/~l-en/config.xml HTTP/1.1
HTTP/1.1 200 OK
GET /Parcel/SetSession.asp?pn=10101663&tcd=1&fpn=101-01-663%201&mry=2010&sec=True&rf=False&dp=S HTTP/1.1
HTTP/1.1 302 Object moved (text/html)
GET /Parcel/Summary.aspx HTTP/1.1
GET /1reupdate/short?!/~Live.ConfigServer.SuiteUpdate/~/~/~/~/~op-GetShortCatalog-ship
HTTP/1.1 200 OK (text/plain)
GET /1reupdate/short?!/~Live.ConfigServer.SuiteUpdate/~/~/~/~/~op-GetShortCatalog-ship/~ts-101104/~l-en/config.xml HTTP/1.1
HTTP/1.1 200 OK
GET /_utm.gif?utmwv=4.8.6&utmn=1516772251&utmhn=treasur er.maricopa.gov&utmcs=UTF-8&utmsr=1280x800&utmcs=32-bit&utmul=en-us&utmje=1&utmfl=..
I can't read the full last line above. How would you translate it into code ?
- 11-04-2010, 07:52 PM #34
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
It seems like javascript:WebForm_DoPostBackWithOptions is a standard JavaScript method/function, not something defined in an included .js file. Is there an easy way to integrate Java with JavaScript, so that my code calls javascript:WebForm_DoPostBackWithOptions ?
Abdenour
- 11-05-2010, 06:31 AM #35
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 8
No, it's not a standard. If you look at the source of the page where you enter the number it is contained at the top of the page between SCRIPT tags.
- 11-05-2010, 06:45 AM #36
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
I have done a search through Ctrl-F, and don't see it. Here are the script tags:
.......Java Code:<script type="text/javascript"> //<![CDATA[ var theForm = document.forms['aspnetForm']; if (!theForm) { theForm = document.aspnetForm; } function __doPostBack(eventTarget, eventArgument) { if (!theForm.onsubmit || (theForm.onsubmit() != false)) { theForm.__EVENTTARGET.value = eventTarget; theForm.__EVENTARGUMENT.value = eventArgument; theForm.submit(); } } //]]> </script>
.................Java Code:<script src="/Parcel/WebResource.axd?d=82X_DmeRwShoLJobNHsQnbVOlScs8_yWxC9DCuLI2YS__W5fmnbySsuN6RVESgJ05_dGX5bZ3fQBxx2amMPSDiePiBQ1&t=634234388291823091" type="text/javascript"></script>
................Java Code:<script src="/Parcel/WebResource.axd?d=0lTb8rR5d7hmnuOluNU1bbXvkHvoFQ_jnwji6bwN6BWAUT5z66CJqXrS0vVoz3RIP0OL323hu9vmTevPzdVyoW8A-7U1&t=634234388291823091" type="text/javascript"></script>
Java Code:<script type="text/javascript"> //<![CDATA[ WebForm_AutoFocus('ctl00_cphMainContent_txtParcelNumber');//]]> </script> </form> <script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> var pageTracker = _gat._getTracker("UA-2900680-1"); pageTracker._initData(); pageTracker._trackPageview(); </script>
I am now trying to get the thing to work using the HttpClient class in org.apache.*, but am running into a stupid "package org.apache..... does not exist", even though my CLASSPATH includes the directory where org is.
I posted that issue in a separate thread under the "New to Java" category, at
package org.apache.httpcomponents.httpclient.debian does not ex
Abdenour
- 11-05-2010, 06:49 AM #37
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 8
You see those two script tags with the "src" param? Those are javascript files.
- 11-05-2010, 06:54 AM #38
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
- 11-06-2010, 01:21 AM #39
Member
- Join Date
- Nov 2010
- Posts
- 26
- Rep Power
- 0
Here is how JavaScript changes the web page from the one titled "Maricopa County Parcel Inquity" to a page titled "Maricopa County Tax History". The JavaScript function below doesn't give me the data I want, but goes farther than I managed to accomplish in Java. In Java, I have not able to get a web page titled "Maricopa County Tax History".
new WebForm_PostBackOptions("ctl00$cphMainContent $btnSubmit", "", true, "", "", false, false)
The arguments passed to WebForm_PostBackOptions are:Java Code:function WebForm_PostBackOptions(eventTarget, eventArgument, validation, validationGroup, actionUrl, trackFocus, clientSubmit) { this.eventTarget = eventTarget; this.eventArgument = eventArgument; this.validation = validation; this.validationGroup = validationGroup; this.actionUrl = actionUrl; this.trackFocus = trackFocus; this.clientSubmit = clientSubmit; }
actionURL = ""
clientSubmit = false
eventArgument = ""
eventTarget = "ctl00$cphMainContent$binSubmit"
trackFocus = false
validation = true
validationGroup = ""
I don't know much about JavaScript. Is the keyword new, coupled with ctl00$cphMainContent$binSubmit, what is creating the new page titled "Maricopa County Tax History" ? If so, and if all I wanted in Java was to get that far, how do I accomplish in Java the equivalent of what new/ctl00$cphMainContent$binSubmit did in JavaScript ?
Abdenour
Similar Threads
-
Unable to create and write files
By DrKilljoy in forum New To JavaReplies: 4Last Post: 09-05-2010, 12:55 AM -
First post :D, Well im new to java, here is my first ever project!
By goffy in forum New To JavaReplies: 8Last Post: 04-30-2010, 10:25 AM -
HTTPS POST Using Java
By drcman in forum Advanced JavaReplies: 7Last Post: 02-13-2010, 02:19 PM -
DOnt know if 1st post if did, I am VERY sorry for duplicate post. I have error messg
By afisher300 in forum New To JavaReplies: 3Last Post: 05-04-2009, 03:15 AM -
Post Method in java.net
By freddieMaize in forum Advanced JavaReplies: 2Last Post: 02-23-2009, 02:59 AM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks