Results 1 to 12 of 12
Thread: Percentile by date
- 06-14-2010, 05:48 PM #1
Member
- Join Date
- Jun 2010
- Posts
- 5
- Rep Power
- 0
Percentile by date
Hi all,
I have just started learning Java (reading Sun's tutorial) because I have been working on a project in Microsoft Access & Excel which those applications cannot seem to handle. I have seen some Java programs before and think Java could more efficiently address my problem, but I currently have no idea what I'm doing and would very much appreciate some help.
I have a table in Access consisting of about 1.5 million records which I export to a csv file (called ComboOut.csv which I'm hoping to use as input for the Java program). The fields of interest include "Date", "Metric1", and "Metric2". Date is in the format YYYYMMDD. Metric1 and Metric2 are numbers where a higher number is a better "score" of sorts. For each date, there will be several thousand records. The table is currently NOT sorted by any field, but could be if necessary. Metric1 and Metric2 will often contain blanks for certain IDs on certain dates (e.g. for 20071231, of 1,400 records, 1,200 IDs might have a score for Metric1, 1,300 might have a score for Metric2, and 100 might be blank for both scores).
I am trying to append two new fields, "Pntl_Met1" and "Pntl_Met2" which would be the percentile of Metric1 and Metric2, respectively (if it's of help, the Excel function I'm somewhat trying to mimic is PERCENTRANK). This percentile would be date-dependent and would ignore blanks. The formula I'm trying to use is basically [RANK(MetricX for ID xxxxxx on Date YYYYMMDD relative to all other non-blank MetricXs for the same date) - 1]/[Number of non-blank MetricX records on Date YYYYMMDD - 1]. I hope that explanation was somewhat inteligible.
Any help on this would be very much appreciated. Please let me know if I can clarify any further. Many thanks.
- 06-14-2010, 06:13 PM #2
Member
- Join Date
- Jun 2010
- Location
- Berlin
- Posts
- 22
- Rep Power
- 0
Well, while you've just started it's not that easy to give you an advise where to start.
The most important thing to get an application developed is, to break down the one big problem into parts that are small and easy to solve. Afterwards you just have to solve these small parts and assemble all together to get a good solution.
So let's have a look on your problem:
1. You want to read some textfile
2. You want to split each line into the fields date, metricX and metricY
3. You want to sum all metric<X|Y> values of one date
4. You want to compute some rank
5. You want to write back the data
That's pretty much it, isn't it?
I would suggest you start at item 1. and just lookup how to do this. There should be a lot of tutorials available. While I/O is somehow a complex theme (some techniques like exception handling, streaming and buffers might be included) feel free to ask anything that you don't get. Although it might be quite easy to post the whole code, I think it is worth to get this done by yourself, while this is really a good task to get started with Java!
- 06-14-2010, 09:15 PM #3
Member
- Join Date
- Jun 2010
- Posts
- 5
- Rep Power
- 0
HerrK. -
Thanks for the reply. I think you're underestimating how little I know about programming. I have a few files of java code that an intern wrote for me a few summers ago. He taught me how to compile the files and how to run them, but that's really it (I'm just a financial analyst who wanted to improve the efficiency of a few processes I run). I've never programmed myself, though I clearly should take a class or two.
Anyhow, from looking at the codes he wrote and a few tutorials, I think I understand how to do #1 and #5. As for the rest, I really have no idea how to approach this. Any tips would be great. Thanks.
- 06-14-2010, 09:48 PM #4
For splitting a line look at the String.split() method in the API doc. And do a search for coding examples.
Then to convert String to numbers, look at the Integer and Double classes for parse methods.
- 06-15-2010, 09:42 AM #5
Moderator
- Join Date
- Apr 2009
- Posts
- 10,484
- Rep Power
- 16
How long have you got?
If this is supposed to be a quick job then it's possible Java isn't the best solution, since it appears you'll have to leanr programming on top of Java. Java really requires some form of grounding in programming.
- 06-15-2010, 03:13 PM #6
Member
- Join Date
- Jun 2010
- Posts
- 5
- Rep Power
- 0
Tolls -
I'm trying to finish this week. You're right, I'm trying to learn programming and Java simultaneously...it's a bit overwhelming.
Here's what I have so far:
I'm not sure if I'm on the right track or not. Regardless, I'm pretty stuck right now, so any help would be great.Java Code:import java.lang.*; import java.io.*; import java.util.*; class PercentileCalc { public static void main(String args[]) { if (args.length < 1) {System.out.println("proper usage: java PercentileCalc inputFile"); System.exit(1);} String inputFile = args[0]; try { BufferedReader in = new BufferedReader(new FileReader(inputFile)); BufferedWriter bw1 = new BufferedWriter(new FileWriter("PercentileOut.csv")); String s = in.readLine(); bw1.write(s + ",Pntl_Met1,Pntl_Met2"); bw1.newLine(); s = in.readLine(); while (s != null) { String[] data = s.split(","); // code here; System.out.println(s); bw1.write(s + "," + pntlMet1 + "," + pntlMet2); bw1.newLine(); s = in.readLine(); } in.close(); bw1.close(); } catch(IOException error) { System.out.println(error); System.exit(2); } } }
Thanks!
- 06-15-2010, 03:16 PM #7
A help would be to show what the program currently does and explain what is wrong with it and what you want instead.
- 06-15-2010, 03:24 PM #8
Moderator
- Join Date
- Apr 2009
- Posts
- 10,484
- Rep Power
- 16
OK, looking at HerrK's reasonable breakdown you've done 1, part of 2 (split, but not assigned yet), and 5.
The structure looks OK, from what I can gather of your requirements...it's also better than a lot of things we get to see here.
So you now need to figure out which bits of the String[] you need, and do whatever it is you need to do to them.
ETA: Oh, one thing. Generally resources (like the reader and writer you're using) should be closed in a finally block of a try/catch. That ensures they get closed. so:
Java Code:BufferedReader br = null; BufferedWriter bw = null; try { } catch () { // Print the stacktrace in here at a minimum. } finally { if (br != null) { //close in its own try/catch. } if (bw != null) { // ditto. } }Last edited by Tolls; 06-15-2010 at 03:27 PM.
- 06-15-2010, 03:32 PM #9
Member
- Join Date
- Jun 2010
- Posts
- 5
- Rep Power
- 0
Norm,
Thanks for the reply. Right now, I don't think it really does anything. Where I've put // code here; is where I'm expecting to calculate pntlMet1 and pntlMet2. Unfortunately, I think the way I'm reading the file in might not allow me to calculate pntlMet1 and pntlMet2 (as I'll need the entire array - or at least some subset of the array - in order to rank and count for the date). Basically, I'm not sure how to access the necessary portion of the array. Once I can access the necessary portion of the array, I *think* I'll be able to run the calculation. Apologies for my lack of coherence...I'm fairly confused right now.
- 06-15-2010, 03:37 PM #10
For debugging Print out the contents of the data array so you can understand what's there.
I think the Arrays.toString(data) will do it.
- 06-15-2010, 03:37 PM #11
Moderator
- Join Date
- Apr 2009
- Posts
- 10,484
- Rep Power
- 16
Oh.
Half the battle is figuring out what needs to be done.
That is, figuring out your requirements.
Until you've got that then we will have trouble helping you.
I don't know the Excel stuff you're referring to, so I can't say what need doing.
- 06-15-2010, 04:28 PM #12
Member
- Join Date
- Jun 2010
- Posts
- 5
- Rep Power
- 0
Norm & Tolls -
Thank you both for your replies and patience. I'm going to try to post some examples (that I did in Excel) to hopefully help me explain what I'm trying to do (apologies I'm having such trouble articulating this). Here is the data file I'm reading in/intending to work on:
And here is the desired output after running the program:Java Code:Date,Company,Metric1,Metric2 20071231,0010,10.07085121,64.20815943 20071231,0004,13.1092769,23.15186119 20071231,0016,16.09019481,15.96625588 20071231,0012,23.03682051,9.587661338 20071231,0007,25.8556977, 20071231,0009,30.85260815,17.10465687 20071231,0013,36.32942237,40.74556526 20071231,0006,50.16859251,63.80327017 20071231,0002,50.70576746,42.28305485 20071231,0015,50.83626351,42.56306583 20071231,0019,54.33077808,15.96116427 20071231,0014,55.10938229,34.5599827 20071231,0003,55.89285515,40.77108696 20071231,0017,60.3381419,55.83834781 20071231,0020,65.60747367,50.91258182 20071231,0001,66.49241283,16.46791153 20071231,0011,69.72067182,40.05724068 20071231,0018,, 20071231,0005,, 20071231,0008,, 20081231,0009,1.965026802,49.04228993 20081231,0013,4.763362931,26.74176994 20081231,0010,9.785422608,34.37196992 20081231,0020,22.97690384,56.65950917 20081231,0007,23.4541784, 20081231,0011,24.94277515,7.7721929 20081231,0014,34.11500239,59.68550482 20081231,0001,35.30451095,34.83260328 20081231,0017,37.34781662,31.97472359 20081231,0019,39.84911186,64.52047971 20081231,0016,48.21062048,12.46950378 20081231,0005,52.20484184,68.69450404 20081231,0015,53.67857535,22.02146368 20081231,0006,58.56011078,32.21944042 20081231,0018,66.35641082,10.28161706 20081231,0008,67.25302768,56.40646462 20081231,0002,67.63581224,5.650759421 20081231,0003,67.7130615,19.59578098 20081231,0012,, 20081231,0004,,60.46363235
In terms of the #N/A errors in the output, this is from Excel. Any non-numeric output for these errors would be fine.Java Code:Date,Company,Metric1,Metric2,Pntl_Met1,Pntl_Met2, 20071231,0010,10.07085121,64.20815943,0,1, 20071231,0004,13.1092769,23.15186119,0.062,0.333, 20071231,0016,16.09019481,15.96625588,0.125,0.133, 20071231,0012,23.03682051,9.587661338,0.187,0, 20071231,0007,25.8556977,,0.25,#N/A, 20071231,0009,30.85260815,17.10465687,0.312,0.266, 20071231,0013,36.32942237,40.74556526,0.375,0.533, 20071231,0006,50.16859251,63.80327017,0.437,0.933, 20071231,0002,50.70576746,42.28305485,0.5,0.666, 20071231,0015,50.83626351,42.56306583,0.562,0.733, 20071231,0019,54.33077808,15.96116427,0.625,0.066, 20071231,0014,55.10938229,34.5599827,0.687,0.4, 20071231,0003,55.89285515,40.77108696,0.75,0.6, 20071231,0017,60.3381419,55.83834781,0.812,0.866, 20071231,0020,65.60747367,50.91258182,0.875,0.8, 20071231,0001,66.49241283,16.46791153,0.937,0.2, 20071231,0011,69.72067182,40.05724068,1,0.466, 20071231,0018,,,#N/A,#N/A, 20071231,0005,,,#N/A,#N/A, 20071231,0008,,,#N/A,#N/A, 20081231,0009,1.965026802,49.04228993,0,0.647, 20081231,0013,4.763362931,26.74176994,0.058,0.352, 20081231,0010,9.785422608,34.37196992,0.117,0.529, 20081231,0020,22.97690384,56.65950917,0.176,0.764, 20081231,0007,23.4541784,,0.235,#N/A, 20081231,0011,24.94277515,7.7721929,0.294,0.058, 20081231,0014,34.11500239,59.68550482,0.352,0.823, 20081231,0001,35.30451095,34.83260328,0.411,0.588, 20081231,0017,37.34781662,31.97472359,0.47,0.411, 20081231,0019,39.84911186,64.52047971,0.529,0.941, 20081231,0016,48.21062048,12.46950378,0.588,0.176, 20081231,0005,52.20484184,68.69450404,0.647,1, 20081231,0015,53.67857535,22.02146368,0.705,0.294, 20081231,0006,58.56011078,32.21944042,0.764,0.47, 20081231,0018,66.35641082,10.28161706,0.823,0.117, 20081231,0008,67.25302768,56.40646462,0.882,0.705, 20081231,0002,67.63581224,5.650759421,0.941,0, 20081231,0003,67.7130615,19.59578098,1,0.235, 20081231,0012,,,#N/A,#N/A, 20081231,0004,,60.46363235,#N/A,0.882
I tried the line System.out.println(Arrays.toString(data)); and I think it has alerted me to a problem (maybe?). When I look at the output in the command prompt screen, I have a string for each row of my input above. That is:
I think (maybe) that rather than a string for each row, I need to read the file in as a matrix (perhaps I am and just don't realize it). Regardless, what I need to do once I correctly have the data read in is to work with one subgroup (by date) at a time. My example above has two subgroups - 20071231 and 20081231. My actual data has several hundred of these subgroups.Java Code:[20071231,0010,10.07085121,64.20815943] [20071231,0004,13.1092769,23.15186119] [20071231,0016,16.09019481,15.96625588] [20071231,0012,23.03682051,9.587661338] etc.
To hopefully clarify a bit more by example: in the 4th row of my desired output data above, the 5th column is 0.125. This value was calculated by the following steps:
1. I restricted the data considered to only those rows where the date was 20071231.So, I know which operations I want to do to my data, but I don't know how to tell Java to:
2. For this subset, I ranked the value in column 3 (16.09019481) relative to all other *values* in column 3 for the subset of data I was considering. That is, of the 20 rows pertaining to date = 20071231, 17 have values (3 are blank). The value 16.09019481 ranks 3 in this subset.
3. I subtract 1 from the rank, yielding 2.
4. I count all of the rows where date = 20071231 and the third column contains a value (i.e. the 17 mentioned in step 2).
5. I subtract 1 from the count, yielding 16.
6. I divide the answer in step 3 by the answer in step 5: 2/16 = 0.125. This value populates column 5 in my output file.
1. Restrict the data considered to rows where the date equals the date in the row under consideration.I'm still playing with this the best I can, but your help is very much appreciated.
2. Ignore rows where column 3 is blank (when calculating Pntl_Met1...when we calculate Pntl_Met2, rows where column 4 are blank should be ignored instead).
3. Rank and count the appropriate data.
Thanks.
Similar Threads
-
want the first date and last date of each month
By javastuden in forum New To JavaReplies: 3Last Post: 02-24-2010, 10:31 AM -
julian date to full date format
By judy318 in forum New To JavaReplies: 7Last Post: 11-02-2009, 12:17 PM -
Compare date input to database with current date
By hungleon88 in forum Advanced JavaReplies: 2Last Post: 11-25-2008, 08:10 AM -
Creating a Gregorian Calendar using a Date object gives date - 1
By prachi_goliwadekar in forum New To JavaReplies: 1Last Post: 05-08-2008, 08:32 PM -
Difference between current date and anothe date
By vijay balusamy in forum New To JavaReplies: 1Last Post: 04-16-2008, 04:15 PM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks