Results 1 to 12 of 12
  1. #1
    Join Date
    Jun 2010
    Posts
    5
    Rep Power
    0

    Default Percentile by date

    Hi all,

    I have just started learning Java (reading Sun's tutorial) because I have been working on a project in Microsoft Access & Excel which those applications cannot seem to handle. I have seen some Java programs before and think Java could more efficiently address my problem, but I currently have no idea what I'm doing and would very much appreciate some help.

    I have a table in Access consisting of about 1.5 million records which I export to a csv file (called ComboOut.csv which I'm hoping to use as input for the Java program). The fields of interest include "Date", "Metric1", and "Metric2". Date is in the format YYYYMMDD. Metric1 and Metric2 are numbers where a higher number is a better "score" of sorts. For each date, there will be several thousand records. The table is currently NOT sorted by any field, but could be if necessary. Metric1 and Metric2 will often contain blanks for certain IDs on certain dates (e.g. for 20071231, of 1,400 records, 1,200 IDs might have a score for Metric1, 1,300 might have a score for Metric2, and 100 might be blank for both scores).

    I am trying to append two new fields, "Pntl_Met1" and "Pntl_Met2" which would be the percentile of Metric1 and Metric2, respectively (if it's of help, the Excel function I'm somewhat trying to mimic is PERCENTRANK). This percentile would be date-dependent and would ignore blanks. The formula I'm trying to use is basically [RANK(MetricX for ID xxxxxx on Date YYYYMMDD relative to all other non-blank MetricXs for the same date) - 1]/[Number of non-blank MetricX records on Date YYYYMMDD - 1]. I hope that explanation was somewhat inteligible.

    Any help on this would be very much appreciated. Please let me know if I can clarify any further. Many thanks.

  2. #2
    HerrK. is offline Member
    Join Date
    Jun 2010
    Location
    Berlin
    Posts
    22
    Rep Power
    0

    Default

    Well, while you've just started it's not that easy to give you an advise where to start.

    The most important thing to get an application developed is, to break down the one big problem into parts that are small and easy to solve. Afterwards you just have to solve these small parts and assemble all together to get a good solution.

    So let's have a look on your problem:

    1. You want to read some textfile
    2. You want to split each line into the fields date, metricX and metricY
    3. You want to sum all metric<X|Y> values of one date
    4. You want to compute some rank
    5. You want to write back the data

    That's pretty much it, isn't it?

    I would suggest you start at item 1. and just lookup how to do this. There should be a lot of tutorials available. While I/O is somehow a complex theme (some techniques like exception handling, streaming and buffers might be included) feel free to ask anything that you don't get. Although it might be quite easy to post the whole code, I think it is worth to get this done by yourself, while this is really a good task to get started with Java!

  3. #3
    Join Date
    Jun 2010
    Posts
    5
    Rep Power
    0

    Default

    HerrK. -

    Thanks for the reply. I think you're underestimating how little I know about programming. I have a few files of java code that an intern wrote for me a few summers ago. He taught me how to compile the files and how to run them, but that's really it (I'm just a financial analyst who wanted to improve the efficiency of a few processes I run). I've never programmed myself, though I clearly should take a class or two.

    Anyhow, from looking at the codes he wrote and a few tutorials, I think I understand how to do #1 and #5. As for the rest, I really have no idea how to approach this. Any tips would be great. Thanks.

  4. #4
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,559
    Rep Power
    25

    Default

    For splitting a line look at the String.split() method in the API doc. And do a search for coding examples.
    Then to convert String to numbers, look at the Integer and Double classes for parse methods.

  5. #5
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,090
    Rep Power
    20

    Default

    How long have you got?
    If this is supposed to be a quick job then it's possible Java isn't the best solution, since it appears you'll have to leanr programming on top of Java. Java really requires some form of grounding in programming.

  6. #6
    Join Date
    Jun 2010
    Posts
    5
    Rep Power
    0

    Default

    Tolls -

    I'm trying to finish this week. You're right, I'm trying to learn programming and Java simultaneously...it's a bit overwhelming.

    Here's what I have so far:

    Java Code:
    import java.lang.*;
    import java.io.*;
    import java.util.*;
    
    
    
    class PercentileCalc
    {
    
    	public static void main(String args[])
    	{
    
    		if (args.length < 1) {System.out.println("proper usage:  java PercentileCalc inputFile"); System.exit(1);}
    
    		String inputFile = args[0];
    		
    
    		try
    		{
    			BufferedReader in = new BufferedReader(new FileReader(inputFile));
    			BufferedWriter bw1 = new BufferedWriter(new FileWriter("PercentileOut.csv"));
    			
    			String s = in.readLine();
    			bw1.write(s + ",Pntl_Met1,Pntl_Met2");
    			bw1.newLine();			
    
    			s = in.readLine();
    			while (s != null) 
    			{
    				String[] data = s.split(",");
    				
    				// code here;
    
    				System.out.println(s);				
    
    				bw1.write(s + "," + pntlMet1 + "," + pntlMet2);
    				bw1.newLine();
    				s = in.readLine();
    			}
    			in.close();
    			bw1.close();
    
    			
    		}
    		catch(IOException error)
    		{
    			System.out.println(error);
    			System.exit(2);
    		}
    
    
    	}
    
    }
    I'm not sure if I'm on the right track or not. Regardless, I'm pretty stuck right now, so any help would be great.

    Thanks!

  7. #7
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,559
    Rep Power
    25

    Default

    A help would be to show what the program currently does and explain what is wrong with it and what you want instead.

  8. #8
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,090
    Rep Power
    20

    Default

    OK, looking at HerrK's reasonable breakdown you've done 1, part of 2 (split, but not assigned yet), and 5.

    The structure looks OK, from what I can gather of your requirements...it's also better than a lot of things we get to see here.

    So you now need to figure out which bits of the String[] you need, and do whatever it is you need to do to them.

    ETA: Oh, one thing. Generally resources (like the reader and writer you're using) should be closed in a finally block of a try/catch. That ensures they get closed. so:

    Java Code:
    BufferedReader br = null;
    BufferedWriter bw = null;
    try {
    } catch () {
        // Print the stacktrace in here at a minimum.
    } finally {
        if (br != null) {
            //close in its own try/catch.
        }
        if (bw != null) {
            // ditto.
        }
    }
    Last edited by Tolls; 06-15-2010 at 03:27 PM.

  9. #9
    Join Date
    Jun 2010
    Posts
    5
    Rep Power
    0

    Default

    Norm,

    Thanks for the reply. Right now, I don't think it really does anything. Where I've put // code here; is where I'm expecting to calculate pntlMet1 and pntlMet2. Unfortunately, I think the way I'm reading the file in might not allow me to calculate pntlMet1 and pntlMet2 (as I'll need the entire array - or at least some subset of the array - in order to rank and count for the date). Basically, I'm not sure how to access the necessary portion of the array. Once I can access the necessary portion of the array, I *think* I'll be able to run the calculation. Apologies for my lack of coherence...I'm fairly confused right now.

  10. #10
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,559
    Rep Power
    25

    Default

    For debugging Print out the contents of the data array so you can understand what's there.
    I think the Arrays.toString(data) will do it.

  11. #11
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,090
    Rep Power
    20

    Default

    Oh.
    Half the battle is figuring out what needs to be done.
    That is, figuring out your requirements.
    Until you've got that then we will have trouble helping you.

    I don't know the Excel stuff you're referring to, so I can't say what need doing.

  12. #12
    Join Date
    Jun 2010
    Posts
    5
    Rep Power
    0

    Default

    Norm & Tolls -

    Thank you both for your replies and patience. I'm going to try to post some examples (that I did in Excel) to hopefully help me explain what I'm trying to do (apologies I'm having such trouble articulating this). Here is the data file I'm reading in/intending to work on:

    Java Code:
    Date,Company,Metric1,Metric2
    20071231,0010,10.07085121,64.20815943
    20071231,0004,13.1092769,23.15186119
    20071231,0016,16.09019481,15.96625588
    20071231,0012,23.03682051,9.587661338
    20071231,0007,25.8556977,
    20071231,0009,30.85260815,17.10465687
    20071231,0013,36.32942237,40.74556526
    20071231,0006,50.16859251,63.80327017
    20071231,0002,50.70576746,42.28305485
    20071231,0015,50.83626351,42.56306583
    20071231,0019,54.33077808,15.96116427
    20071231,0014,55.10938229,34.5599827
    20071231,0003,55.89285515,40.77108696
    20071231,0017,60.3381419,55.83834781
    20071231,0020,65.60747367,50.91258182
    20071231,0001,66.49241283,16.46791153
    20071231,0011,69.72067182,40.05724068
    20071231,0018,,
    20071231,0005,,
    20071231,0008,,
    20081231,0009,1.965026802,49.04228993
    20081231,0013,4.763362931,26.74176994
    20081231,0010,9.785422608,34.37196992
    20081231,0020,22.97690384,56.65950917
    20081231,0007,23.4541784,
    20081231,0011,24.94277515,7.7721929
    20081231,0014,34.11500239,59.68550482
    20081231,0001,35.30451095,34.83260328
    20081231,0017,37.34781662,31.97472359
    20081231,0019,39.84911186,64.52047971
    20081231,0016,48.21062048,12.46950378
    20081231,0005,52.20484184,68.69450404
    20081231,0015,53.67857535,22.02146368
    20081231,0006,58.56011078,32.21944042
    20081231,0018,66.35641082,10.28161706
    20081231,0008,67.25302768,56.40646462
    20081231,0002,67.63581224,5.650759421
    20081231,0003,67.7130615,19.59578098
    20081231,0012,,
    20081231,0004,,60.46363235
    And here is the desired output after running the program:

    Java Code:
    Date,Company,Metric1,Metric2,Pntl_Met1,Pntl_Met2,
    20071231,0010,10.07085121,64.20815943,0,1,
    20071231,0004,13.1092769,23.15186119,0.062,0.333,
    20071231,0016,16.09019481,15.96625588,0.125,0.133,
    20071231,0012,23.03682051,9.587661338,0.187,0,
    20071231,0007,25.8556977,,0.25,#N/A,
    20071231,0009,30.85260815,17.10465687,0.312,0.266,
    20071231,0013,36.32942237,40.74556526,0.375,0.533,
    20071231,0006,50.16859251,63.80327017,0.437,0.933,
    20071231,0002,50.70576746,42.28305485,0.5,0.666,
    20071231,0015,50.83626351,42.56306583,0.562,0.733,
    20071231,0019,54.33077808,15.96116427,0.625,0.066,
    20071231,0014,55.10938229,34.5599827,0.687,0.4,
    20071231,0003,55.89285515,40.77108696,0.75,0.6,
    20071231,0017,60.3381419,55.83834781,0.812,0.866,
    20071231,0020,65.60747367,50.91258182,0.875,0.8,
    20071231,0001,66.49241283,16.46791153,0.937,0.2,
    20071231,0011,69.72067182,40.05724068,1,0.466,
    20071231,0018,,,#N/A,#N/A,
    20071231,0005,,,#N/A,#N/A,
    20071231,0008,,,#N/A,#N/A,
    20081231,0009,1.965026802,49.04228993,0,0.647,
    20081231,0013,4.763362931,26.74176994,0.058,0.352,
    20081231,0010,9.785422608,34.37196992,0.117,0.529,
    20081231,0020,22.97690384,56.65950917,0.176,0.764,
    20081231,0007,23.4541784,,0.235,#N/A,
    20081231,0011,24.94277515,7.7721929,0.294,0.058,
    20081231,0014,34.11500239,59.68550482,0.352,0.823,
    20081231,0001,35.30451095,34.83260328,0.411,0.588,
    20081231,0017,37.34781662,31.97472359,0.47,0.411,
    20081231,0019,39.84911186,64.52047971,0.529,0.941,
    20081231,0016,48.21062048,12.46950378,0.588,0.176,
    20081231,0005,52.20484184,68.69450404,0.647,1,
    20081231,0015,53.67857535,22.02146368,0.705,0.294,
    20081231,0006,58.56011078,32.21944042,0.764,0.47,
    20081231,0018,66.35641082,10.28161706,0.823,0.117,
    20081231,0008,67.25302768,56.40646462,0.882,0.705,
    20081231,0002,67.63581224,5.650759421,0.941,0,
    20081231,0003,67.7130615,19.59578098,1,0.235,
    20081231,0012,,,#N/A,#N/A,
    20081231,0004,,60.46363235,#N/A,0.882
    In terms of the #N/A errors in the output, this is from Excel. Any non-numeric output for these errors would be fine.

    I tried the line System.out.println(Arrays.toString(data)); and I think it has alerted me to a problem (maybe?). When I look at the output in the command prompt screen, I have a string for each row of my input above. That is:

    Java Code:
    [20071231,0010,10.07085121,64.20815943]
    [20071231,0004,13.1092769,23.15186119]
    [20071231,0016,16.09019481,15.96625588]
    [20071231,0012,23.03682051,9.587661338]
    etc.
    I think (maybe) that rather than a string for each row, I need to read the file in as a matrix (perhaps I am and just don't realize it). Regardless, what I need to do once I correctly have the data read in is to work with one subgroup (by date) at a time. My example above has two subgroups - 20071231 and 20081231. My actual data has several hundred of these subgroups.

    To hopefully clarify a bit more by example: in the 4th row of my desired output data above, the 5th column is 0.125. This value was calculated by the following steps:

    1. I restricted the data considered to only those rows where the date was 20071231.

    2. For this subset, I ranked the value in column 3 (16.09019481) relative to all other *values* in column 3 for the subset of data I was considering. That is, of the 20 rows pertaining to date = 20071231, 17 have values (3 are blank). The value 16.09019481 ranks 3 in this subset.

    3. I subtract 1 from the rank, yielding 2.

    4. I count all of the rows where date = 20071231 and the third column contains a value (i.e. the 17 mentioned in step 2).

    5. I subtract 1 from the count, yielding 16.

    6. I divide the answer in step 3 by the answer in step 5: 2/16 = 0.125. This value populates column 5 in my output file.
    So, I know which operations I want to do to my data, but I don't know how to tell Java to:

    1. Restrict the data considered to rows where the date equals the date in the row under consideration.

    2. Ignore rows where column 3 is blank (when calculating Pntl_Met1...when we calculate Pntl_Met2, rows where column 4 are blank should be ignored instead).

    3. Rank and count the appropriate data.
    I'm still playing with this the best I can, but your help is very much appreciated.

    Thanks.

Similar Threads

  1. want the first date and last date of each month
    By javastuden in forum New To Java
    Replies: 3
    Last Post: 02-24-2010, 10:31 AM
  2. julian date to full date format
    By judy318 in forum New To Java
    Replies: 7
    Last Post: 11-02-2009, 12:17 PM
  3. Compare date input to database with current date
    By hungleon88 in forum Advanced Java
    Replies: 2
    Last Post: 11-25-2008, 08:10 AM
  4. Creating a Gregorian Calendar using a Date object gives date - 1
    By prachi_goliwadekar in forum New To Java
    Replies: 1
    Last Post: 05-08-2008, 08:32 PM
  5. Difference between current date and anothe date
    By vijay balusamy in forum New To Java
    Replies: 1
    Last Post: 04-16-2008, 04:15 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •