# Thread: Newbie distributed computing question

1. Member
Join Date
Feb 2010
Posts
1
Rep Power
0

## Newbie distributed computing question

I am interested in distributed computing but new to this field. Now I have a question regarding to this field. My question is `how can I programm to compute and find the occurrence of a great deal of integers?' For instance, suppose there 1m integers, in which there are several integer which may repeat, e.g., ... 2, 2, .... 999 ... 999 ...; therefore, what I would like to do is to
count how many times those integers occur.

I understand there has some frameworks, e.g., Hadoop, which can help to deal with such kind of task. But what I would like is to go through some explain and learn some important issues related to distributed computing through a simple example.

Is there any example/ tutorial that may have such kind of explain? Or any resource/book may talk about this?

I appreciate any suggestion.

Thank you very much.

2. Originally Posted by shogun1234
I am interested in distributed computing but new to this field. Now I have a question regarding to this field. My question is `how can I programm to compute and find the occurrence of a great deal of integers?' For instance, suppose there 1m integers, in which there are several integer which may repeat, e.g., ... 2, 2, .... 999 ... 999 ...; therefore, what I would like to do is to
count how many times those integers occur.

I understand there has some frameworks, e.g., Hadoop, which can help to deal with such kind of task. But what I would like is to go through some explain and learn some important issues related to distributed computing through a simple example.

Is there any example/ tutorial that may have such kind of explain? Or any resource/book may talk about this?

I appreciate any suggestion.

Thank you very much.
The classical MPI type of approach (MPI == Message Passing Interface, a distributed computing system) is as follows: given a (long) list of numbers n1, n2, n3 ..., assume the list is sorted; chop up the list so that the chunks are completely disjunct from each other; distribute the chunks to the 'other' machines in the cluster and let them do their job. Finally collect all the statistics.

If the list is not sorted you can do the same but then you have to do some post processing on all the results because those chunks might not have been disjunct.

kind regards,

Jos

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•