Parsing of information on EMBL
I need to make a java program to do the following. I am completely lost. Please help.
Develop an API for representing the information in an EMBL file(eg. at bottom). This must include information on the EMBL ID, the species information as a list of taxon classifications, and the DNA sequence. In terms of Java classes, I need to produce at least EMBL.java and Sequence.java, with EMBL classes having a Sequence object as a field.
Develop a class with a main method that takes the path of an EMBL file as a command line parameter. This class should use the file path passed to the main method to create a Scanner object, and then parses the EMBL file into the classes defined in 1.
The Sequence class should implement the java.lang.CharSequence interface. It should store the sequence as a List of java.lang.Character objects. In particular, charAt(int) should extract the appropriate Character from the list and return the equivalent char
Finally carry out one of the following:
Write a method that searches the Sequence for a given DNA string. Write a test method, which searches for a Shine-Dalgarno Sequence (AGGAGGU) in the DNA sequence.
EMBL file eg:
ID BB252375; SV 1; linear; mRNA; EST; MUS; 318 BP.
DT 01-JUL-2000 (Rel. 64, Created)
DT 01-DEC-2005 (Rel. 86, Last updated, Version 4)
DE Mus musculus 7 days neonate cerebellum cDNA, RIKEN full-length
DE enriched library, clone:A730051B17, 3' end partial sequence, similar to
DE refseq:NM_000280 Homo sapiens paired box gene 6 (PAX6), isoform a, DE mRNA.
SQ Sequence 318 BP; 80 A; 129 C; 35 G; 73 T; 1 other;
ttatctatcat ctccacccct cacctctcca tcctcacccc ccggccccca 50
taaacacact tgagccatca ccaatcagca cagctgtncc ggctgcaccc 100