Results 1 to 9 of 9
Thread: Longest common sequence
- 04-26-2011, 03:45 PM #1
Member
- Join Date
- Feb 2011
- Posts
- 78
- Rep Power
- 0
Longest common sequence
I have an assignment in my data structures class to find the longest common sequence of letters of 2 strings. It does not need to be done programatically, but rather with a table. For example with the 2 strings 'MZJAWXU' and XMJYAUZ here is the table. I am not following the logic. Any help appreciated. I looked at the wikipedia page and its still not clicking.
| 0 1 2 3 4 5 6 7
| M Z J A W X U
-----|-----------------
0 | 0 0 0 0 0 0 0 0
1 X | 0 0 0 0 0 0 1 1
2 M | 0 1 1 1 1 1 1 1
3 J | 0 1 1 2 2 2 2 2
4 Y | 0 1 1 2 2 2 2 2
5 A | 0 1 1 2 3 3 3 3
6 U | 0 1 1 2 3 3 3 4
7 Z | 0 1 2 2 3 3 3 4
- 04-26-2011, 03:49 PM #2
- Join Date
- Jan 2011
- Location
- Richmond, Virginia
- Posts
- 3,069
- Blog Entries
- 3
- Rep Power
- 7
Would you mind posting the assignment details verbatim?
- 04-26-2011, 04:25 PM #3
Member
- Join Date
- Feb 2011
- Posts
- 78
- Rep Power
- 0
Find longest common sequence of letters from the 2 strings:
X = "skullandbones" Y = "lullabybabies"
Use the matrix as done in class (referring to a matrix similar to the one found here: Longest common subsequence problem - Wikipedia, the free encyclopedia)
The 'Worked Example' section on that page shows the way to do it, I just dont follow it.
-
i Think i get it...
difference between subsequence and substring,
the longest common substring would be 'ulla',
but the longest common subsequence would be 'ullabes'
the matrix is just a technique to find the longest subsequence progressively
and each step in building the matrix is just a break-down of comparisons of subsequences
1. If we have a sequence called S which contained AGCA, all the possible subsequences would be:
S1 = A
S2 = AG
S3 = AGC
S4 = AGCA
2. When finding the Longest Common Subsequence (LCS) of Two sequences,
to simplify our calculations, we first check the END of the sequence and remove any common last elements.
The example given is two sequences: BANANA and ATANA.
Why bother going through the whole sequence when you can see they both end with 'ANA'. The idea is to save your calculation and confusions, so remove ANA from both sequences and you can find the LCS of the shorter sequences (which would be BAN and AT).
the LCS of BAN and AT is just 'A', so append the last elements you removed and you get the final LCS of A+ANA = 'AANA'.
3. When you get two sequences that do not end in the same symbol,
then you'd never need the last element from one of them.
ATK and BTG don't have a common last element, so the LCS could never include those last elements (K or G). In that case, you're trying to find the LCS of two sequences: ATK and BT - or - the LCS of AT and BTG
if ATK is X and BTG is Y, then you're looking for LCS(Xn,Ym-1) or LCS(Xn-1,Ym) in this case.
4. Make the Matrix Table (one sequence as row header, the other sequence as column header), and for each co-ordinate make comparisons between the two sequences.
5. The two sequences you were given both end in 'ES', so you can remove those from your table, and append 'ES' to your LCS laterLast edited by ozzyman; 04-26-2011 at 05:46 PM.
-
Here's the table as we start off
-
The first comparison is S and L. Not the same, so in this case the longer sequence is taken,
but they're both empty, so 0 is given instead, with arrows pointing to both.
-
The same goes for the next two comparisons; L against K and L against U, they're not the same and neither direction contains a longer subsequence since they are empty
See picture
-
Finally we have a few matches, L and L. In the first case, since there is no longer subsequence, we have to use a diagonal arrow pointing to an empty cell. The resulting subsequence is (0L) = (L).
In the second case, the sequence to the left is longer (1 symbol) than the sequence to the top (empty), so we use an arrow pointing to the longer sequence (left).
See 1st pic.
Now we have a longest common subsequence of LL so far.
Continue this work for the rest of the row and notice that the remaining cells always point Left because the sequence there is longer (LL) against (0).
See 2nd pic.
By the end of the 1st row, we have the longest subsequence of (LL)+(ES) = (LLES)Last edited by ozzyman; 04-26-2011 at 06:24 PM.
- 04-29-2011, 02:00 AM #9
Why the table?
To be honest I found the table very confusing.
Maybe someone can explain to me why it would be needed, or is it to explain the process? It all works fine with recursion:
(By the way: thanks for the link. I first did DNA-alignment by an other recursive algorithm, repeatedly chopping it in half, searching and extending.Java Code:package lcs; public class LCS_maker{ private String DNA1 = ""; private String DNA2 = ""; public LCS_maker( String dna1, String dna2 ){ DNA1 = dna1; DNA2 = dna2; printLCS(); } public static void main( String[] args ){ new LCS_maker( "skullandbones", "lullabybabies" ); // new LCS_maker( "AGCAT", "GAC" ); } private void printLCS(){ int len1 = DNA1.length(); int len2 = DNA2.length(); String lcs = LCS( len1, len2 ); System.out.println( "LCS_maker: \"" + lcs + "\"" ); } //Don't use Strings, to save space private String LCS( int len1, int len2 ){ char last1, last2; String lcs1, lcs2; int lenLcs1, lenLcs2; if( len1 <= 0 || len2 <= 0 )return ""; last1 = DNA1.charAt( len1-1 ); last2 = DNA2.charAt( len2-1 ); if ( last1 == last2 ){ return LCS( len1-1, len2-1 ) + last1; }else{ lcs1 = LCS( len1, len2-1 ); lcs2 = LCS( len1-1, len2 ); lenLcs1 = lcs1.length(); lenLcs2 = lcs2.length(); return lenLcs1 > lenLcs2 ? lcs1 : lcs2;//Could be throwing some solutions away here } } }
It worked fine, but this is much more elegant!)Last edited by Jodokus; 04-29-2011 at 02:02 AM. Reason: spelling
Similar Threads
-
longest word present in a string
By somnath6088 in forum New To JavaReplies: 4Last Post: 11-09-2010, 05:43 PM -
Longest word in a program...
By hustlas4ever in forum New To JavaReplies: 5Last Post: 08-20-2010, 01:34 PM -
Linked list sequence and array sequence
By Predz in forum New To JavaReplies: 1Last Post: 12-31-2009, 01:30 AM -
java longest word in array
By mayhewj7 in forum New To JavaReplies: 10Last Post: 04-24-2009, 01:39 AM -
How to get Home directory using Common Net ftp api?
By cprash.aggarwal in forum NetworkingReplies: 1Last Post: 03-01-2009, 05:37 PM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks