Results 1 to 9 of 9
  1. #1
    fam2315 is offline Member
    Join Date
    Feb 2011
    Posts
    78
    Rep Power
    0

    Default Longest common sequence

    I have an assignment in my data structures class to find the longest common sequence of letters of 2 strings. It does not need to be done programatically, but rather with a table. For example with the 2 strings 'MZJAWXU' and XMJYAUZ here is the table. I am not following the logic. Any help appreciated. I looked at the wikipedia page and its still not clicking.

    | 0 1 2 3 4 5 6 7
    | M Z J A W X U
    -----|-----------------
    0 | 0 0 0 0 0 0 0 0
    1 X | 0 0 0 0 0 0 1 1
    2 M | 0 1 1 1 1 1 1 1
    3 J | 0 1 1 2 2 2 2 2
    4 Y | 0 1 1 2 2 2 2 2
    5 A | 0 1 1 2 3 3 3 3
    6 U | 0 1 1 2 3 3 3 4
    7 Z | 0 1 2 2 3 3 3 4

  2. #2
    sunde887's Avatar
    sunde887 is offline Moderator
    Join Date
    Jan 2011
    Location
    Richmond, Virginia
    Posts
    3,069
    Blog Entries
    3
    Rep Power
    8

    Default

    Would you mind posting the assignment details verbatim?

  3. #3
    fam2315 is offline Member
    Join Date
    Feb 2011
    Posts
    78
    Rep Power
    0

    Default

    Find longest common sequence of letters from the 2 strings:
    X = "skullandbones" Y = "lullabybabies"


    Use the matrix as done in class (referring to a matrix similar to the one found here: Longest common subsequence problem - Wikipedia, the free encyclopedia)

    The 'Worked Example' section on that page shows the way to do it, I just dont follow it.

  4. #4
    ozzyman's Avatar
    ozzyman is offline Senior Member
    Join Date
    Mar 2011
    Location
    London, UK
    Posts
    797
    Blog Entries
    2
    Rep Power
    4

    Default

    i Think i get it...

    difference between subsequence and substring,

    the longest common substring would be 'ulla',

    but the longest common subsequence would be 'ullabes'

    the matrix is just a technique to find the longest subsequence progressively

    and each step in building the matrix is just a break-down of comparisons of subsequences

    1. If we have a sequence called S which contained AGCA, all the possible subsequences would be:
    S1 = A
    S2 = AG
    S3 = AGC
    S4 = AGCA

    2. When finding the Longest Common Subsequence (LCS) of Two sequences,
    to simplify our calculations, we first check the END of the sequence and remove any common last elements.

    The example given is two sequences: BANANA and ATANA.
    Why bother going through the whole sequence when you can see they both end with 'ANA'. The idea is to save your calculation and confusions, so remove ANA from both sequences and you can find the LCS of the shorter sequences (which would be BAN and AT).

    the LCS of BAN and AT is just 'A', so append the last elements you removed and you get the final LCS of A+ANA = 'AANA'.

    3. When you get two sequences that do not end in the same symbol,
    then you'd never need the last element from one of them.

    ATK and BTG don't have a common last element, so the LCS could never include those last elements (K or G). In that case, you're trying to find the LCS of two sequences: ATK and BT - or - the LCS of AT and BTG

    if ATK is X and BTG is Y, then you're looking for LCS(Xn,Ym-1) or LCS(Xn-1,Ym) in this case.

    4. Make the Matrix Table (one sequence as row header, the other sequence as column header), and for each co-ordinate make comparisons between the two sequences.

    5. The two sequences you were given both end in 'ES', so you can remove those from your table, and append 'ES' to your LCS later
    Last edited by ozzyman; 04-26-2011 at 05:46 PM.

  5. #5
    ozzyman's Avatar
    ozzyman is offline Senior Member
    Join Date
    Mar 2011
    Location
    London, UK
    Posts
    797
    Blog Entries
    2
    Rep Power
    4

    Default

    Here's the table as we start off


  6. #6
    ozzyman's Avatar
    ozzyman is offline Senior Member
    Join Date
    Mar 2011
    Location
    London, UK
    Posts
    797
    Blog Entries
    2
    Rep Power
    4

    Default

    The first comparison is S and L. Not the same, so in this case the longer sequence is taken,

    but they're both empty, so 0 is given instead, with arrows pointing to both.
    Attached Thumbnails Attached Thumbnails Longest common sequence-untitled.jpg  

  7. #7
    ozzyman's Avatar
    ozzyman is offline Senior Member
    Join Date
    Mar 2011
    Location
    London, UK
    Posts
    797
    Blog Entries
    2
    Rep Power
    4

    Default

    The same goes for the next two comparisons; L against K and L against U, they're not the same and neither direction contains a longer subsequence since they are empty

    See picture
    Attached Thumbnails Attached Thumbnails Longest common sequence-untitled1.png  

  8. #8
    ozzyman's Avatar
    ozzyman is offline Senior Member
    Join Date
    Mar 2011
    Location
    London, UK
    Posts
    797
    Blog Entries
    2
    Rep Power
    4

    Default

    Finally we have a few matches, L and L. In the first case, since there is no longer subsequence, we have to use a diagonal arrow pointing to an empty cell. The resulting subsequence is (0L) = (L).

    In the second case, the sequence to the left is longer (1 symbol) than the sequence to the top (empty), so we use an arrow pointing to the longer sequence (left).

    See 1st pic.

    Now we have a longest common subsequence of LL so far.

    Continue this work for the rest of the row and notice that the remaining cells always point Left because the sequence there is longer (LL) against (0).

    See 2nd pic.

    By the end of the 1st row, we have the longest subsequence of (LL)+(ES) = (LLES)
    Attached Thumbnails Attached Thumbnails Longest common sequence-untitled3.png   Longest common sequence-untitled4.jpg  
    Last edited by ozzyman; 04-26-2011 at 06:24 PM.

  9. #9
    Jodokus's Avatar
    Jodokus is offline Senior Member
    Join Date
    Jan 2011
    Location
    Amsterdam, the Netherlands
    Posts
    230
    Rep Power
    4

    Default Why the table?

    To be honest I found the table very confusing.
    Maybe someone can explain to me why it would be needed, or is it to explain the process? It all works fine with recursion:

    Java Code:
    package lcs;
    public class LCS_maker{
    	private String DNA1 = "";
    	private String DNA2 = "";
    	
    	public LCS_maker( String dna1, String dna2 ){
    		DNA1 = dna1;
    		DNA2 = dna2;
    		printLCS();
    	}
    	public static void main( String[] args ){
    		new LCS_maker( "skullandbones", "lullabybabies" );
    //		new LCS_maker( "AGCAT", "GAC" );
    	}
    	private void printLCS(){
    		int len1 = DNA1.length();
    		int len2 = DNA2.length();
    		String lcs = LCS( len1, len2 );
    		System.out.println( "LCS_maker: \"" + lcs + "\"" );
    	} 
    	//Don't use Strings, to save space
    	private String LCS( int len1, int len2 ){
    		char last1, last2;
    		String lcs1, lcs2;
    		int lenLcs1, lenLcs2;
    		if( len1 <= 0 || len2 <= 0 )return "";
    		last1 = DNA1.charAt( len1-1 );
    		last2 = DNA2.charAt( len2-1 );
    		if ( last1 == last2 ){
    			return LCS( len1-1, len2-1 ) + last1;
    		}else{
    			lcs1 = LCS( len1, len2-1 ); 
    			lcs2 = LCS( len1-1, len2 );
    			lenLcs1 = lcs1.length();
    			lenLcs2 = lcs2.length();
    			return lenLcs1 > lenLcs2 ? lcs1 : lcs2;//Could be throwing some solutions away here
    		}
    	}
    }
    (By the way: thanks for the link. I first did DNA-alignment by an other recursive algorithm, repeatedly chopping it in half, searching and extending.
    It worked fine, but this is much more elegant!)
    Last edited by Jodokus; 04-29-2011 at 02:02 AM. Reason: spelling

Similar Threads

  1. longest word present in a string
    By somnath6088 in forum New To Java
    Replies: 4
    Last Post: 11-09-2010, 05:43 PM
  2. Longest word in a program...
    By hustlas4ever in forum New To Java
    Replies: 5
    Last Post: 08-20-2010, 01:34 PM
  3. Linked list sequence and array sequence
    By Predz in forum New To Java
    Replies: 1
    Last Post: 12-31-2009, 01:30 AM
  4. java longest word in array
    By mayhewj7 in forum New To Java
    Replies: 10
    Last Post: 04-24-2009, 01:39 AM
  5. How to get Home directory using Common Net ftp api?
    By cprash.aggarwal in forum Networking
    Replies: 1
    Last Post: 03-01-2009, 05:37 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •