# Creating an algorithm for finding the Longest Common Sequence

• 03-22-2013, 03:01 AM
abi
IndexOutOfBounds error
I have a .txt file which contains the following data. The first line of data is an integer number n that gives the number of pairs of DNA to follow. Reading one pair of string at a time I need to display the LCS.

3
GAAGGTCGAA
CCTCGGGA
ATGATGGAC
GTGATAAGGACCC
AAATTT
GGGCCC

I keep getting this error:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1958)
at DNA_12.LCS(DNA_12.java:46)
at DNA_12.main(DNA_12.java:32)

Code:

```import java.io.*; import java.util.Scanner; public class DNA_12 {   public static void main(String[] args) throws IOException     {             Scanner sc = new Scanner( new File("dna.txt"));                 int n = sc.nextInt();         for(int i = 0; i < n; i++)       {         String input1 = sc.nextLine();         String input2 = sc.nextLine();           String dna1;         String dna2;                  if(input1.length() > input2.length())         {           dna1 = input1;           dna2 = input2;         }       else         {           dna1 = input2;           dna2 = input1;         }           System.out.println(LCS(dna1 , dna2));         }         sc.close();     }     public static String LCS(String dna1, String dna2)     {         String temp = "";         String longest = "";         int match;         for (int i = 0; i < dna2.length();i++)         {             for (int j = 1; j < (dna2.length() - 1); j++)             {                 temp = dna2.substring(i, j);                 match = dna1.indexOf(temp);                 if (match != -1)                 {                     longest=temp;                 }             }         }         return longest;     } }```
• 03-22-2013, 03:25 AM
Junky
Re: Creating an algorithm for finding the Longest Common Sequence
In future there is no need to start another thread, just keep asking questions in the same one.

If you do not have to write the code yourself then the String class has two methods that can help: contains and indexOf. If you have to write the code yourself then you need to write a loop that compares each character of one String to each character of the other String. As soon as you find a mismatch you know that substring is not correct. This can be done in a method which returns false as soon as it finds a mismatch.
• 03-22-2013, 03:30 AM
Junky
Re: Creating an algorithm for finding the Longest Common Sequence
BTW your algorithm is not robust. Using your example the substrings of s2 are
CCTCGGGA
CCTCGGG
CCTCGG
CCTCG
CCTC
CCT
CC
C
of which only the last one exists in s1. However the longest sequence is TCG which exists in the middle of s2 and would require dropping chars from both ends of the String.
• 03-22-2013, 03:42 AM
abi
Re: Creating an algorithm for finding the Longest Common Sequence
Thank You for that information.

Exactly, that's why I mentioned I haven't figured out the appropriate way to compare them.

So, I should convert them to arrays first
then compare the arrays starting from two going up correct??

Could you give me a sample code? A simple one.
• 03-22-2013, 03:43 AM
Junky
Re: Creating an algorithm for finding the Longest Common Sequence
There is no need to convert to arrays. Once again the String class has a method that can help. Perhaps you should spend some time reading the API.
• 03-22-2013, 03:44 AM
abi
Re: Creating an algorithm for finding the Longest Common Sequence
I am a beginner so maybe I haven't come across those methods yet. I just started a few months ago. Well I have to write the code my self aswell.
• 03-22-2013, 03:49 AM
Junky
Re: Creating an algorithm for finding the Longest Common Sequence
Since you are a beginner that is even more reason to read the API. That is where you found out what methods each class has and what they can do.
• 03-22-2013, 04:12 AM
abi
Re: Creating an algorithm for finding the Longest Common Sequence
This probably isn't correct but something of this manner is what your trying to get me to do/understand right?
Code:

```public class PracDna {    public static void main(String[] args)   {     String s1 = "GAAGGTCGAA";     String s2 = "CCTCGGGA";     int s2Length = s2.length();     int s1Length = s1.length();     for(int i = 0; i < s2Length; i++)     {       String s2SubString = s2.substring(0,s2Length - i);       int index = index(s2SubString, s1);       if( index > 0 )         {           System.out.print(s2SubString);         }     }   }   public static int index(String string1, String string2 )   {     return string2.indexOf(string1);   } }```
• 03-22-2013, 04:17 AM
Junky
Re: Creating an algorithm for finding the Longest Common Sequence
You might want to make the if statement >= 0 since 0 is the index of the first char in a String.
• 03-22-2013, 04:32 AM
abi
Re: Creating an algorithm for finding the Longest Common Sequence
Ok. Could you help me with creating the substrings for s2? I can't figure out how to do it properly.
• 03-22-2013, 05:08 AM
Junky
Re: Creating an algorithm for finding the Longest Common Sequence
Spam! Spam! Spam! Spam!

Recursion would probably work but I guess that is beyond your level of knowledge as yet. Try nested loops then instead of making smaller substrings make longer ones. For example for the String "ABCD" you would do
A
AB
ABC
ABCD
B
BC
BCD
C
CD
D
• 03-22-2013, 09:58 AM
JosAH
Re: Creating an algorithm for finding the Longest Common Sequence
Don't try to reinvent the wheel; the LCS problem is a well known problem and dynamic programming can be of help; see Longest common subsequence problem - Wikipedia, the free encyclopedia.

kind regards,

Jos
• 03-22-2013, 09:59 PM
abi
Re: Creating an algorithm for finding the Longest Common Sequence
Thanks. I did create a code but I have one small problem with it. It has to do with use of indexOf(). I updated this post if you would like to look at it.