Page 1 of 2 12 LastLast
Results 1 to 20 of 26
  1. #1
    Ndt
    Ndt is offline Member
    Join Date
    Jun 2008
    Posts
    6
    Rep Power
    0

    Default How can I improve the execution time of a Java Project

    Hi,

    I have a Java project which does:

    1> connect to DB2 to collect the account information (here I use DKDDO Method) and put into an ArrayList which has 51000 records

    2> Loop of Array List until it ends
    a> Connect to Content Manager (using CMBConnection, CMBSearchResults, CMBDataManagement) to get the documents belong to the account.
    b> If found, create folder in F: drive and copy all documents related to that account to that folder (name same as Account No).

    Since we are testing, 50990 records has the same Account No (which has 6 documents related to it), only 10 other records has 10 different Account No (which various documents related to it).

    My application ran and complete successfully in 13 hours and 20 minutes. Which I calculated almost 1 second per account (with 6 documents related) or 0.15 sec per documents.

    My boss claimed it is too long and asking me to improve the speed of my application. Would you like to show me how to do that ? Thanks a lot.
    Attached Files Attached Files

  2. #2
    sukatoa's Avatar
    sukatoa is offline Senior Member
    Join Date
    Jan 2008
    Location
    Cebu City, Philippines
    Posts
    556
    Rep Power
    7

    Default

    It depends on your implementation.....

    remove all unnecessary iterations... use Casting as possible
    And to those variables that will not dynamically changing as calculations executed, make them all static....

    For all constants, make them final...

    Or you can read Java Optimization
    freedom exists in the world of ideas

  3. #3
    Zosden's Avatar
    Zosden is offline Senior Member
    Join Date
    Apr 2008
    Posts
    384
    Rep Power
    7

    Default

    Three words switch to c++. Java is slow because from what I
    understand correct me if I'm wrong. All objects are allocated onto
    the heap whereas c++ you can allocate onto the stack.
    My IP address is 127.0.0.1

  4. #4
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    19

  5. #5
    Zosden's Avatar
    Zosden is offline Senior Member
    Join Date
    Apr 2008
    Posts
    384
    Rep Power
    7

    Default

    I didn't say all objects but some you can just google why c++ is faster than java
    My IP address is 127.0.0.1

  6. #6
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    19

    Default

    In your post you have use two words All objects Seems you don't know what you have post here either.

  7. #7
    Zosden's Avatar
    Zosden is offline Senior Member
    Join Date
    Apr 2008
    Posts
    384
    Rep Power
    7

    Default

    no I said all objects in JAVA are allocated to heap

    You CAN allocate them to the stack in c++ or the heap
    My IP address is 127.0.0.1

  8. #8
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    19

  9. #9
    sukatoa's Avatar
    sukatoa is offline Senior Member
    Join Date
    Jan 2008
    Location
    Cebu City, Philippines
    Posts
    556
    Rep Power
    7

    Default

    Quote Originally Posted by Zosden View Post
    Three words switch to c++. Java is slow because from what I
    understand correct me if I'm wrong. All objects are allocated onto
    the heap whereas c++ you can allocate onto the stack.
    Heap or stack, it doesn't matter, if can be measured, it is negligible...

    The difference between java and c++ is that, java is interpreted(2 process),
    1st from the program to JVM, 2nd from JVM to Machine...

    C++ compiled files are pure machine language... no interpretation required...
    and directly executed....

    To keep you updated, try to look at this article....
    freedom exists in the world of ideas

  10. #10
    Ndt
    Ndt is offline Member
    Join Date
    Jun 2008
    Posts
    6
    Rep Power
    0

    Default

    Zosden, thanks for the recommendation, I always love C++ and VB, but we need to go the direction the company decided. :)

    Sukatoa, thanks for the advise, I will try to improve it as you recommend as much as possible. Regarding the implementation, I export the project to a Jar file then run it from DOS Command (which I built a .BAT file to run it).

    Heap size does not matter much to this project, I mean, I increase to 500M to start and 500M max, I checked the CPU usage, it only used in an average of 40M-50M. I also asked the DBA to trace it run, he said the my Java thread took an avarage of quick 5-10 ms to execute when retrieved objects (files) from Content Manager.

    My boss said if it took 10 ms/6 documents to be retrieved from CM, total of 51000 == 510000 ms == 5100 sec == 85 minutes, why it ran for more than 12 hours. I said it need to be saved to the F: drive too, and this one I cannot prove how long does it takes to save a file to a drive (the 6 documents who repeatingly retrieved and copied over has an average of 200Kb each). And it also depends on the network traffic, and the number of access to F: drive from other users.

  11. #11
    Zosden's Avatar
    Zosden is offline Senior Member
    Join Date
    Apr 2008
    Posts
    384
    Rep Power
    7

    Default

    writing to a disk takes a very long time compared to ram. Tell your boss that maybe using a local drive would help and maybe look into flash memory storage. All of this would dramatically help your performance time.
    My IP address is 127.0.0.1

  12. #12
    Ndt
    Ndt is offline Member
    Join Date
    Jun 2008
    Posts
    6
    Rep Power
    0

    Default

    Thanks Zosden. I just talked to the DBA after he put a trace on my application execution yesterday, and he said there is nothing else he or I can do about it. The real-time execution time of Java to get document from CM is 0.03ms, and the real-time to process (read and copy the document to F: drive) is 0.15 sec/document.

    Another thing help is I ran the app again last night and it took less than 12 hours to finish, almost 1 hour and half less compare to the previous run. I said that is depending on the network traffic, since I started the application at 4pm (almost the time for the people to leave the building) and my boss has nothing else to do than accept my explanation.

    Thanks for all your help.

  13. #13
    sukatoa's Avatar
    sukatoa is offline Senior Member
    Join Date
    Jan 2008
    Location
    Cebu City, Philippines
    Posts
    556
    Rep Power
    7

    Default

    2> Loop of Array List until it ends
    a> Connect to Content Manager (using CMBConnection, CMBSearchResults, CMBDataManagement) to get the documents belong to the account.
    b> If found, create folder in F: drive and copy all documents related to that account to that folder (name same as Account No).
    Since all of those process are file creation related operations,
    Search tasks maybe negligible if binary search algorithm used(sorted already),

    Maybe you can divide the task(the quoted above) and created some threads that will do the divided task and run them concurrently....

    That would save more time....
    freedom exists in the world of ideas

  14. #14
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    7

    Default

    Three things I see here, even for massive data stores it may not need to be an irresolvable 10-12 hour ordeal.

    First, I read a book by Dov Bulka: "Efficient C++" and though that is not a dedicated Java approach, the book discusses in exquisite detail the time-perfomance issue. The authors make no wasted bones, picking every topic and giving it the full roomful of rockers for any cat they can find.

    Two: Java currently has a runtime cross-compiler that flattens core loops that are visited frequently. This compiler will, with correct switches, cross-compile the Java to C during the run.

    Three: A standard first tool to apply is Threading. It may be that reads/writes may be lifted from the processing loop. If so that usually results in dramatic time-performance gains on the first test run.

    200 kb is not a lot of room, it sounds like the profiling is telling us something is available for efficiency work: The numbers suggest an unthought of prior code that is bottlenecking the workflow.

    [ looked at the code:]
    The code pulls a db connection, getting records one at a time - that ( in greatly abbreviated words ) is the bottleneck. Will gladly explain but we need to know where you got this code and how you put things together, an immediate and dramatic time-domain reduction by about an order of magnitude is available ..... but we really need to know what the programming environment is as this is a bulky codebase ~ That reveals a project management issue that is wicked and we must tread carfully.

    I do not have the resources to take a hickey on this.
    Last edited by Nicholas Jordan; 06-11-2008 at 04:55 PM. Reason: Additional information.

  15. #15
    Ndt
    Ndt is offline Member
    Join Date
    Jun 2008
    Posts
    6
    Rep Power
    0

    Default

    Nicholas, thanks for the comment.

    As I said, I have to go with Java Application (since this is what the Management decided to use). Therefore your First and Two, won't help much here.

    Three: I divided my program in 2 steps:
    1 - read DB2 to get accounts and put them into ArrayList (real runtime took about 2-4 minutes for 50000 accounts). This looks fine.
    2 - Loop of Accounts in ArrayList, go to Content Manager to search for documents and rename then copy them to F: drive. Real runtime showed that searching for documents belong to the account 0.2-0.3ms and 0.15 second for a document to be retrieved from CM and copied to F: drive.

    The reason I sectionned this program into 2 parts is because I only want 1 database connection at a time (especially step 2 which took 12 hours). I can eliminate step 1, but then I will have 2 connections last for 12 hours each. And if the connection to DB2 lost, the connection to CM is no matter anymore.

    The question here is how I can do to improve the copy time to F: drive which is 0.15 seconds for a document of 200KB. And this time depends on network traffic and the F: drive accessibility. How can I improve it from Java code ?

    And please explain my bottleneck, it will greatly help to write future program better. Thanks again for spending your time to help me.

  16. #16
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    7

    Default

    Okay, I did not expect this degree of coding skills. I have several things to work on so let this be an iterative development cycle. The solution is called floppy-copy in sample code, it consists of a Triune concept ( that is my nomenclature, dreamed up late at night to codifiy a concept - it is rare in cs literature, I have seen it exactly once ) Leave the 'two sections' exactly as you have it. Constrain all of our effort to first getting something to show gains, then tweaking and testing for at least a little while.

    In the ArrayList ( which is not synchronized ) we have the central structure around which we can ( read should ) build a FIFO - also known by other names, I like to have an informal style and if you want to humor me: Propose explainations of how FIDO became FIFO. I have done some work with really skilled people and those who cannot spot and skip such things present the subtle risks I was concerned about.

    What we do is have three Threads, that may be implemented as a Runnable interface or as a Thread Object. Discussion of which is best may be left to rumble and rot between the OO-er's v Student coders. The first thread reads the database, then tries to write to the ArrayList, the second works on the item in the array list then 'marks' the object as completed. The third thread tries to do the write()'s to drive F:\

    I did some testing in STL and I do not see any dramatic improvements in your core-loop times for disc writes. We can do some, likely marginal, improvements by keeping a block of F:\ dedicated to this task and keeping it defragmented and so on. We stack up some requests in an init(), then start Processing Thread - it gets very bizzare in that we can have some hidden failures that do not reveal until hundreds of runs, then ruining a weekend. We absolutely must and I have no wiggle room here have a fully tested and operational backup that runs several layers deep and has no-writeback protection.

    I have more to say, let me see what you do with this. I have great need of proof of first actual work for an actual company so a Pro-Forma Business Letter of Thank You to my Team Lead would be a para-LifeSaver for me right now. If I can get to it today, I will put up some preliminary concepting code on my server and give you a directory. pm me an email address or something, do no put up an unobscured email in the clear here in the thread, maybe I can find it in your profile or something.

    In any event, fifteen milliseconds times 50,000 is likely 12.5 minutes so I think my original estimate of an hour is targetable as realistic hope. All caveats apply, except as noted in the caveat correction manual.

    There are also some disk buffering approaches that promise improvements in data retirement rate.
    Last edited by Nicholas Jordan; 06-13-2008 at 02:27 AM. Reason: Add additional thoughts on disk buffering.

  17. #17
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    7

    Default

    I coded about two hundred lines, I will have to read some DEK to do a FIFO.

    I have to get twenty posts before this thing will let me do email and private messaging, so change the ++ to -- in // Not Rot-13 and run main() if you have any private or proprietary concerns not cleared for clearcode.
    Java Code:
    /* 85a9d7f1c2fb06697f11845eb66e0b24fb5bef264ec592911d6941
     * dca87393c866fcb962318b937b0f36abe63599e14e631b43b6474
     * aaca701b9cf10b182512d9cc79366fcc40c4c173e93e1d0e87386
     * 2b690e5ce615406747ebe27421052a9c260c673341a6710b41b2 
     */
    public class Rotator
    {
        public static void main(String[] args)
        {
            String c29 = "gb93:9c9Ahnbjm/dpn";
            char[] d74a3 = new char[c29.length()];
            //
            int b10 = c29.length();
            do
            {
                d74a3[--b10] = c29.charAt(b10);
                d74a3[b10]++;// Not Rot-13
            }
            while(b10 > 0);//
            System.out.println(new String(d74a3));//
        }
    }

  18. #18
    developer321 is offline Member
    Join Date
    Jun 2008
    Posts
    10
    Rep Power
    0

    Default

    Seems most of the time is take for transfering the files to F:.

    Is it possible to write to local and then bulk upload to F: Drive.

    Currently the process is sequential,this can be divided in to 2 or 3 threads and using the queue to process.
    1 process to get the data from Conent Content manager
    2. another process to write data to F: drive.

    Also you can use multiple systems(CPUS).

  19. #19
    Ndt
    Ndt is offline Member
    Join Date
    Jun 2008
    Posts
    6
    Rep Power
    0

    Default

    Nicholas, thank you for your concern, I will gladly to do your request if I still has the job, but I don't. I got into a discussion about to make my program run faster with my team leader and my boss where I explained and prove that my code worked fast but the time to transfer file to F:\ took the whole time. I also said manipulating files cannot be compared to calculation and update of transactions (where 90% of transactions were not depending on copying file from one place to another but only in memory) but we were completely disagree on all things and I was let go. But hey, I still appreciate a lot for your voluntary to put time to read and try to help me to improve it. Thanks again.

    Developer321, as I am brand new to Java, I read about threads, and did try some examples that I found on the net. But I don't know if it can help to improve much in timing, only it can save is the 2-4 minutes (the time to read DB2 table and put them into ArrayList). The time it took to transfer from Content Manager to F: drive stays almost the same. I told my supervisor is to run it after 5pm, to make it run faster (and I proved it saved almost 2 hours on the process) but they are still convincing that the program should finish in the time frame of 3-5 hours. I make a simple DOS batch to copy 50000 files (by duplicating one account of 5 files averaging 200K / file) from one place to another to show/prove the time frame that it should take but they don't want to try it. I knew it can't within the time frame of 3-5 hours as they want. Because I already ran a VBA Apps in the past to handle 400 communities Excel files averaging 300Kbs (download from SQL Server, update the date in the Excel file, then upload to another place in SQL server) and it took about 2-3 hours already.

    But I'm curious about your propose of using multiple system (CPUs), how can you do that ? If you can show me a link to where I can read how to do so, it will be wonderful. Thank you.

  20. #20
    developer321 is offline Member
    Join Date
    Jun 2008
    Posts
    10
    Rep Power
    0

    Default Thread implmentation

    Below are the observations :
    1. Current drawback in the process is its sequential
    Suggestions :
    1. Since currently one thread processing.
    This can be divided in to 3 to n number of threads.
    For eg 3 threads :
    First Thread process first 17000
    Second Thread process 17000 to to 34000
    Third thread process 34000 to rest of the records

    Since your task is divided this should help improve performance.

    2. May be 50000 objects data can be cached

    If you want to use multiple CPUS,divide the thread work into different systems/CPUS.

Page 1 of 2 12 LastLast

Similar Threads

  1. How to track client logout time and orignal ipaddress (not gateway) in java
    By psandeep in forum JavaServer Pages (JSP) and JSTL
    Replies: 1
    Last Post: 06-13-2008, 12:32 AM
  2. Method execution time
    By javaplus in forum Advanced Java
    Replies: 3
    Last Post: 11-26-2007, 09:51 AM
  3. how to improve the performance of JWS?
    By dinesh kaushik in forum Java Applets
    Replies: 0
    Last Post: 11-21-2007, 08:46 AM
  4. Time and Date in Java
    By java_fun2007 in forum New To Java
    Replies: 4
    Last Post: 11-06-2007, 07:25 PM
  5. Part time java Programmer-Indianapolis,IN
    By pegitha in forum Jobs Offered
    Replies: 0
    Last Post: 05-08-2007, 04:01 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •