Java Forums

Main Menu
Home
Today's Posts
FAQ
Search
Contact Us

Java Network
Java Tips
Java Tips Blog

Sponsored Links





Welcome to the Java Forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community, you will:

  • have access to post topics
  • communicate privately with other members (PM)
  • not see advertisements between posts
  • have the possibility to earn one of our surprises if you are an active member
  • access many other special features that will be introduced later.

Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 06-05-2008, 11:30 PM
Ndt Ndt is offline
Member
 
Join Date: Jun 2008
Posts: 6
Ndt is on a distinguished road
How can I improve the execution time of a Java Project
Hi,

I have a Java project which does:

1> connect to DB2 to collect the account information (here I use DKDDO Method) and put into an ArrayList which has 51000 records

2> Loop of Array List until it ends
a> Connect to Content Manager (using CMBConnection, CMBSearchResults, CMBDataManagement) to get the documents belong to the account.
b> If found, create folder in F: drive and copy all documents related to that account to that folder (name same as Account No).

Since we are testing, 50990 records has the same Account No (which has 6 documents related to it), only 10 other records has 10 different Account No (which various documents related to it).

My application ran and complete successfully in 13 hours and 20 minutes. Which I calculated almost 1 second per account (with 6 documents related) or 0.15 sec per documents.

My boss claimed it is too long and asking me to improve the speed of my application. Would you like to show me how to do that ? Thanks a lot.
Attached Files
File Type: txt TConnect.txt (3.6 KB, 9 views)
File Type: txt TSearch .txt (14.7 KB, 4 views)
File Type: txt LoanSale .txt (18.7 KB, 2 views)
Bookmark Post in Technorati
Reply With Quote
Sponsored Links
  #2 (permalink)  
Old 06-06-2008, 04:25 AM
sukatoa's Avatar
Senior Member
 
Join Date: Jan 2008
Location: Cebu City, Philippines
Posts: 518
sukatoa is on a distinguished road
Send a message via Yahoo to sukatoa
It depends on your implementation.....

remove all unnecessary iterations... use Casting as possible
And to those variables that will not dynamically changing as calculations executed, make them all static....

For all constants, make them final...

Or you can read Java Optimization
__________________
A specific, detailed, simple, well elaborated, and "tested before asking" question may gather more quick replies. hopefully
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Bookmark Post in Technorati
Reply With Quote
  #3 (permalink)  
Old 06-06-2008, 05:03 AM
Zosden's Avatar
Senior Member
 
Join Date: Apr 2008
Posts: 386
Zosden is on a distinguished road
Three words switch to c++. Java is slow because from what I
understand correct me if I'm wrong. All objects are allocated onto
the heap whereas c++ you can allocate onto the stack.
__________________
My IP address is 127.0.0.1
Bookmark Post in Technorati
Reply With Quote
  #4 (permalink)  
Old 06-06-2008, 07:00 AM
Eranga's Avatar
Moderator
 
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 2,875
Eranga has a spectacular aura aboutEranga has a spectacular aura about
Send a message via Yahoo to Eranga
C++ not allocate all objects on stack. Who says that, where you find it?
__________________
Use an appropriate Subject. "Help, urgent!" isn't one.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

Has someone helped you? Then you can
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
their helpful post.

Want to make your IDE the best?
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
(Close on September 4, 2008)

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Bookmark Post in Technorati
Reply With Quote
  #5 (permalink)  
Old 06-06-2008, 07:06 AM
Zosden's Avatar
Senior Member
 
Join Date: Apr 2008
Posts: 386
Zosden is on a distinguished road
I didn't say all objects but some you can just google why c++ is faster than java
__________________
My IP address is 127.0.0.1
Bookmark Post in Technorati
Reply With Quote
  #6 (permalink)  
Old 06-06-2008, 07:10 AM
Eranga's Avatar
Moderator
 
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 2,875
Eranga has a spectacular aura aboutEranga has a spectacular aura about
Send a message via Yahoo to Eranga
In your post you have use two words All objects Seems you don't know what you have post here either.
__________________
Use an appropriate Subject. "Help, urgent!" isn't one.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

Has someone helped you? Then you can
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
their helpful post.

Want to make your IDE the best?
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
(Close on September 4, 2008)

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Bookmark Post in Technorati
Reply With Quote
  #7 (permalink)  
Old 06-06-2008, 07:25 AM
Zosden's Avatar
Senior Member
 
Join Date: Apr 2008
Posts: 386
Zosden is on a distinguished road
no I said all objects in JAVA are allocated to heap

You CAN allocate them to the stack in c++ or the heap
__________________
My IP address is 127.0.0.1
Bookmark Post in Technorati
Reply With Quote
  #8 (permalink)  
Old 06-06-2008, 07:32 AM
Eranga's Avatar
Moderator
 
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 2,875
Eranga has a spectacular aura aboutEranga has a spectacular aura about
Send a message via Yahoo to Eranga
Sorry for the misunderstand.
__________________
Use an appropriate Subject. "Help, urgent!" isn't one.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

Has someone helped you? Then you can
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
their helpful post.

Want to make your IDE the best?
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
(Close on September 4, 2008)

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Bookmark Post in Technorati
Reply With Quote
  #9 (permalink)  
Old 06-06-2008, 07:55 AM
sukatoa's Avatar
Senior Member
 
Join Date: Jan 2008
Location: Cebu City, Philippines
Posts: 518
sukatoa is on a distinguished road
Send a message via Yahoo to sukatoa
Quote:
Originally Posted by Zosden View Post
Three words switch to c++. Java is slow because from what I
understand correct me if I'm wrong. All objects are allocated onto
the heap whereas c++ you can allocate onto the stack.
Heap or stack, it doesn't matter, if can be measured, it is negligible...

The difference between java and c++ is that, java is interpreted(2 process),
1st from the program to JVM, 2nd from JVM to Machine...

C++ compiled files are pure machine language... no interpretation required...
and directly executed....

To keep you updated, try to look at this article....
__________________
A specific, detailed, simple, well elaborated, and "tested before asking" question may gather more quick replies. hopefully
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Bookmark Post in Technorati
Reply With Quote
  #10 (permalink)  
Old 06-06-2008, 03:23 PM
Ndt Ndt is offline
Member
 
Join Date: Jun 2008
Posts: 6
Ndt is on a distinguished road
Zosden, thanks for the recommendation, I always love C++ and VB, but we need to go the direction the company decided.

Sukatoa, thanks for the advise, I will try to improve it as you recommend as much as possible. Regarding the implementation, I export the project to a Jar file then run it from DOS Command (which I built a .BAT file to run it).

Heap size does not matter much to this project, I mean, I increase to 500M to start and 500M max, I checked the CPU usage, it only used in an average of 40M-50M. I also asked the DBA to trace it run, he said the my Java thread took an avarage of quick 5-10 ms to execute when retrieved objects (files) from Content Manager.

My boss said if it took 10 ms/6 documents to be retrieved from CM, total of 51000 == 510000 ms == 5100 sec == 85 minutes, why it ran for more than 12 hours. I said it need to be saved to the F: drive too, and this one I cannot prove how long does it takes to save a file to a drive (the 6 documents who repeatingly retrieved and copied over has an average of 200Kb each). And it also depends on the network traffic, and the number of access to F: drive from other users.
Bookmark Post in Technorati
Reply With Quote
  #11 (permalink)  
Old 06-06-2008, 09:56 PM
Zosden's Avatar
Senior Member
 
Join Date: Apr 2008
Posts: 386
Zosden is on a distinguished road
writing to a disk takes a very long time compared to ram. Tell your boss that maybe using a local drive would help and maybe look into flash memory storage. All of this would dramatically help your performance time.
__________________
My IP address is 127.0.0.1
Bookmark Post in Technorati
Reply With Quote
  #12 (permalink)  
Old 06-06-2008, 11:59 PM
Ndt Ndt is offline
Member
 
Join Date: Jun 2008
Posts: 6
Ndt is on a distinguished road
Thanks Zosden. I just talked to the DBA after he put a trace on my application execution yesterday, and he said there is nothing else he or I can do about it. The real-time execution time of Java to get document from CM is 0.03ms, and the real-time to process (read and copy the document to F: drive) is 0.15 sec/document.

Another thing help is I ran the app again last night and it took less than 12 hours to finish, almost 1 hour and half less compare to the previous run. I said that is depending on the network traffic, since I started the application at 4pm (almost the time for the people to leave the building) and my boss has nothing else to do than accept my explanation.

Thanks for all your help.
Bookmark Post in Technorati
Reply With Quote
  #13 (permalink)  
Old 06-07-2008, 04:09 AM
sukatoa's Avatar
Senior Member
 
Join Date: Jan 2008
Location: Cebu City, Philippines
Posts: 518
sukatoa is on a distinguished road
Send a message via Yahoo to sukatoa
Quote:
2> Loop of Array List until it ends
a> Connect to Content Manager (using CMBConnection, CMBSearchResults, CMBDataManagement) to get the documents belong to the account.
b> If found, create folder in F: drive and copy all documents related to that account to that folder (name same as Account No).
Since all of those process are file creation related operations,
Search tasks maybe negligible if binary search algorithm used(sorted already),

Maybe you can divide the task(the quoted above) and created some threads that will do the divided task and run them concurrently....

That would save more time....
__________________
A specific, detailed, simple, well elaborated, and "tested before asking" question may gather more quick replies. hopefully
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Bookmark Post in Technorati
Reply With Quote
  #14 (permalink)  
Old 06-11-2008, 05:34 PM
Nicholas Jordan's Avatar
Senior Member
 
Join Date: Jun 2008
Location: Southwest
Posts: 402
Nicholas Jordan is on a distinguished road
Three things I see here, even for massive data stores it may not need to be an irresolvable 10-12 hour ordeal.

First, I read a book by Dov Bulka: "Efficient C++" and though that is not a dedicated Java approach, the book discusses in exquisite detail the time-perfomance issue. The authors make no wasted bones, picking every topic and giving it the full roomful of rockers for any cat they can find.

Two: Java currently has a runtime cross-compiler that flattens core loops that are visited frequently. This compiler will, with correct switches, cross-compile the Java to C during the run.

Three: A standard first tool to apply is Threading. It may be that reads/writes may be lifted from the processing loop. If so that usually results in dramatic time-performance gains on the first test run.

200 kb is not a lot of room, it sounds like the profiling is telling us something is available for efficiency work: The numbers suggest an unthought of prior code that is bottlenecking the workflow.

[ looked at the code:]
The code pulls a db connection, getting records one at a time - that ( in greatly abbreviated words ) is the bottleneck. Will gladly explain but we need to know where you got this code and how you put things together, an immediate and dramatic time-domain reduction by about an order of magnitude is available ..... but we really need to know what the programming environment is as this is a bulky codebase ~ That reveals a project management issue that is wicked and we must tread carfully.

I do not have the resources to take a hickey on this.

Last edited by Nicholas Jordan : 06-11-2008 at 05:55 PM. Reason: Additional information.
Bookmark Post in Technorati
Reply With Quote
  #15 (permalink)  
Old 06-12-2008, 07:46 PM
Ndt Ndt is offline
Member
 
Join Date: Jun 2008
Posts: 6
Ndt is on a distinguished road
Nicholas, thanks for the comment.

As I said, I have to go with Java Application (since this is what the Management decided to use). Therefore your First and Two, won't help much here.

Three: I divided my program in 2 steps:
1 - read DB2 to get accounts and put them into ArrayList (real runtime took about 2-4 minutes for 50000 accounts). This looks fine.
2 - Loop of Accounts in ArrayList, go to Content Manager to search for documents and rename then copy them to F: drive. Real runtime showed that searching for documents belong to the account 0.2-0.3ms and 0.15 second for a document to be retrieved from CM and copied to F: drive.

The reason I sectionned this program into 2 parts is because I only want 1 database connection at a time (especially step 2 which took 12 hours). I can eliminate step 1, but then I will have 2 connections last for 12 hours each. And if the connection to DB2 lost, the connection to CM is no matter anymore.

The question here is how I can do to improve the copy time to F: drive which is 0.15 seconds for a document of 200KB. And this time depends on network traffic and the F: drive accessibility. How can I improve it from Java code ?

And please explain my bottleneck, it will greatly help to write future program better. Thanks again for spending your time to help me.
Bookmark Post in Technorati
Reply With Quote
  #16 (permalink)  
Old 06-12-2008, 08:21 PM
Nicholas Jordan's Avatar
Senior Member
 
Join Date: Jun 2008
Location: Southwest
Posts: 402
Nicholas Jordan is on a distinguished road
Okay, I did not expect this degree of coding skills. I have several things to work on so let this be an iterative development cycle. The solution is called floppy-copy in sample code, it consists of a Triune concept ( that is my nomenclature, dreamed up late at night to codifiy a concept - it is rare in cs literature, I have seen it exactly once ) Leave the 'two sections' exactly as you have it. Constrain all of our effort to first getting something to show gains, then tweaking and testing for at least a little while.

In the ArrayList ( which is not synchronized ) we have the central structure around which we can ( read should ) build a FIFO - also known by other names, I like to have an informal style and if you want to humor me: Propose explainations of how FIDO became FIFO. I have done some work with really skilled people and those who cannot spot and skip such things present the subtle risks I was concerned about.

What we do is have three Threads, that may be implemented as a Runnable interface or as a Thread Object. Discussion of which is best may be left to rumble and rot between the OO-er's v Student coders. The first thread reads the database, then tries to write to the ArrayList, the second works on the item in the array list then 'marks' the object as completed. The third thread tries to do the write()'s to drive F:\

I did some testing in STL and I do not see any dramatic improvements in your core-loop times for disc writes. We can do some, likely marginal, improvements by keeping a block of F:\ dedicated to this task and keeping it defragmented and so on. We stack up some requests in an init(), then start Processing Thread - it gets very bizzare in that we can have some hidden failures that do not reveal until hundreds of runs, then ruining a weekend. We absolutely must and I have no wiggle room here have a fully tested and operational backup that runs several layers deep and has no-writeback protection.

I have more to say, let me see what you do with this. I have great need of proof of first actual work for an actual company so a Pro-Forma Business Letter of Thank You to my Team Lead would be a para-LifeSaver for me right now. If I can get to it today, I will put up some preliminary concepting code on my server and give you a directory. pm me an email address or something, do no put up an unobscured email in the clear here in the thread, maybe I can find it in your profile or something.

In any event, fifteen milliseconds times 50,000 is likely 12.5 minutes so I think my original estimate of an hour is targetable as realistic hope. All caveats apply, except as noted in the caveat correction manual.

There are also some disk buffering approaches that promise improvements in data retirement rate.

Last edited by Nicholas Jordan : 06-13-2008 at 03:27 AM. Reason: Add additional thoughts on disk buffering.
Bookmark Post in Technorati
Reply With Quote
  #17 (permalink)  
Old 06-13-2008, 04:27 AM
Nicholas Jordan's Avatar
Senior Member
 
Join Date: Jun 2008
Location: Southwest
Posts: 402
Nicholas Jordan is on a distinguished road
I coded about two hundred lines, I will have to read some DEK to do a FIFO.

I have to get twenty posts before this thing will let me do email and private messaging, so change the ++ to -- in // Not Rot-13 and run main() if you have any private or proprietary concerns not cleared for clearcode.
Code:
/* 85a9d7f1c2fb06697f11845eb66e0b24fb5bef264ec592911d6941 * dca87393c866fcb962318b937b0f36abe63599e14e631b43b6474 * aaca701b9cf10b182512d9cc79366fcc40c4c173e93e1d0e87386 * 2b690e5ce615406747ebe27421052a9c260c673341a6710b41b2 */ public class Rotator { public static void main(String[] args) { String c29 = "gb93:9c9Ahnbjm/dpn"; char[] d74a3 = new char[c29.length()]; // int b10 = c29.length(); do { d74a3[--b10] = c29.charAt(b10); d74a3[b10]++;// Not Rot-13 } while(b10 > 0);// System.out.println(new String(d74a3));// } }
Bookmark Post in Technorati
Reply With Quote
  #18 (permalink)  
Old 06-15-2008, 06:52 AM
Member
 
Join Date: Jun 2008
Posts: 10
developer321 is on a distinguished road
Seems most of the time is take for transfering the files to F:.

Is it possible to write to local and then bulk upload to F: Drive.

Currently the process is sequential,this can be divided in to 2 or 3 threads and using the queue to process.
1 process to get the data from Conent Content manager
2. another process to write data to F: drive.

Also you can use multiple systems(CPUS).
Bookmark Post in Technorati
Reply With Quote
  #19 (permalink)  
Old 06-16-2008, 03:56 PM
Ndt Ndt is offline
Member
 
Join Date: Jun 2008
Posts: 6
Ndt is on a distinguished road
Nicholas, thank you for your concern, I will gladly to do your request if I still has the job, but I don't. I got into a discussion about to make my program run faster with my team leader and my boss where I explained and prove that my code worked fast but the time to transfer file to F:\ took the whole time. I also said manipulating files cannot be compared to calculation and update of transactions (where 90% of transactions were not depending on copying file from one place to another but only in memory) but we were completely disagree on all things and I was let go. But hey, I still appreciate a lot for your voluntary to put time to read and try to help me to improve it. Thanks again.

Developer321, as I am brand new to Java, I read about threads, and did try some examples that I found on the net. But I don't know if it can help to improve much in timing, only it can save is the 2-4 minutes (the time to read DB2 table and put them into ArrayList). The time it took to transfer from Content Manager to F: drive stays almost the same. I told my supervisor is to run it after 5pm, to make it run faster (and I proved it saved almost 2 hours on the process) but they are still convincing that the program should finish in the time frame of 3-5 hours. I make a simple DOS batch to copy 50000 files (by duplicating one account of 5 files averaging 200K / file) from one place to another to show/prove the time frame that it should take but they don't want to try it. I knew it can't within the time frame of 3-5 hours as they want. Because I already ran a VBA Apps in the past to handle 400 communities Excel files averaging 300Kbs (download from SQL Server, update the date in the Excel file, then upload to another place in SQL server) and it took about 2-3 hours already.

But I'm curious about your propose of using multiple system (CPUs), how can you do that ? If you can show me a link to where I can read how to do so, it will be wonderful. Thank you.
Bookmark Post in Technorati
Reply With Quote
  #20 (permalink)  
Old 06-19-2008, 05:52 AM
Member
 
Join Date: Jun 2008
Posts: 10
developer321 is on a distinguished road
Thread implmentation
Below are the observations :
1. Current drawback in the process is its sequential
Suggestions :
1. Since currently one thread processing.
This can be divided in to 3 to n number of threads.
For eg 3 threads :
First Thread process first 17000
Second Thread process 17000 to to 34000
Third thread process 34000 to rest of the records

Since your task is divided this should help improve performance.

2. May be 50000 objects data can be cached

If you want to use multiple CPUS,divide the thread work into different systems/CPUS.
Bookmark Post in Technorati
Reply With Quote
Sponsored Links