Results 1 to 1 of 1
Thread: Format some text with Java
- 02-18-2010, 06:45 AM #1
Member
- Join Date
- Feb 2010
- Posts
- 1
- Rep Power
- 0
Format some text with Java
Hi All,
I am new to this forum and this is my first thread. I am new to Java as well.
I have some text like this:
I needed to format the text into this format: {This is the desired output format}Java Code:[NP The/DT U/NNP ] P/. [NP Workers/NNPS April/NNP skip/NN ] [PP to/TO ] [NP main/JJ skip/NN ] [PP to/TO ] [NP sidebar/NN ] [NP The/DT U/NNP ] P/. Workers/NNPS [NP This/DT site/NN ] [VP is/VBZ ] [ADJP open/JJ ] [PP for/IN ] [NP posting/VBG and/CC comments/NNS ] [PP by/IN ] [NP all/DT rank/NN and/CC file/NN administrative/JJ employees/NNS ] [PP of/IN ] [NP the/DT University/NNP ] [PP of/IN ] [NP the/DT Philippines/NNPS ] and/CC [NP the/DT Philippine/NNP General/NNP Hospital/NNP The/NNP National/NNP University/NNP Hospital/NNP ] [ADVP especially/RB ] [NP the/DT officers/NNS and/CC members/NNS ] [PP of/IN ] [NP the/DT All/NNP U/NNP ] P/. [NP Workers/NNPS Union/NNP ] [NP Friday/NNP April/NNP Stop/NNP Paying/NNP Nuke/NNP Plant/NNP Debt/NNP SC/NNP Justice/NNP Urges/NNPS Gov't/NNP ] [VP Posted/VBD pm/VBN ] [NP Mla/NNP time/NN April/NNP By/NNP Vincent/NNP Cabreza/NNP Inquirer/NNP News/NNP Service/NNP Published/NNP ] [PP on/IN ] [NP page/NN A/NNP ] [PP of/IN ] [NP the/DT Apr/NNP ] But/CC [NP Puno/NNP ] [VP points/VBZ ] [PRT out/RP ] [SBAR that/IN ] [NP the/DT US/NNP law/NN ] [VP bars/VBZ ] [NP the/DT towns/NNS ] [PP from/IN ] [VP issuing/VBG ] [NP new/JJ taxes/NNS ] [VP to/TO pay/VB ] [PP for/IN ] [NP their/PRP$ debts/NNS ] unsafe/JJ www/WRB -----etc-----------------
I have written the code to transform this into a format but the output does not match the above one. So the requirement is not met.Java Code:The DT B-NP U NNP I-NP P Workers NNPS B-NP April NNP I-NP skip NN I-NP to TO B-PP main JJ B-NP skip NN I-NP to TO B-PP sidebar NN B-NP The DT B-NP U NNP I-NP P Workers NNPS ......... etc .......
I am using Regex to solve the problem:
Printing the output as:Java Code:Pattern p = Pattern .compile("\\[(\\p{Alpha}+) +(\\p{Graph}+)/(\\p{Alpha}+)(?: +(\\p{Alnum}+)/(\\p{Alpha}+))?(?: +(\\p{Alnum}+)/(\\p{Alpha}+))?(?: +(\\p{Alnum}+)/(\\p{Alpha}+))?(?: +(\\p{Alnum}+)/(\\p{Alpha}+))?(?: +(\\p{Alnum}+)/(\\p{Alpha}+))?(?: +(\\p{Alnum}+)/(\\p{Alpha}+))? ]+(?:(\\./. |\\./.$))?(?: +(\\./. |\\./.$))?(?: +(\\p{Alnum}+)/(\\p{Alpha}+))?(?:(\\p{Alnum}+)/(\\p{Alpha}+))?",Pattern.MULTILINE);
The regex looks big as I have trained it to capture all types of words in the brackets []. But it is failing to generate the output when it sees: "But/CC " or this kind of pattern in my text. But when it sees the second one like: "unsafe/JJ" it generates the output.Java Code:while (matcher.find()) { //System.out.println(); System.out.println("For: " +matcher.group()) ; System.out.println(matcher.group(2) + "\t" + matcher.group(3) + "\tB-" + matcher.group(1)); if (matcher.group(4) != null) { System.out.println(matcher.group(4) + "\t" + matcher.group(5) + "\tI-" + matcher.group(1)); } -------etc---------------------------------------
So currently my output(which is wrong) looks like this(with no gaps after a sentence):
You can see that it has omitted some words straightaway.Java Code:The DT B-NP U NNP I-NP Workers NNPS B-NP April NNP I-NP skip NN I-NP to TO B-PP main JJ B-NP skip NN I-NP to TO B-PP sidebar NN B-NP The DT B-NP U NNP I-NP This DT B-NP site NN I-NP is VBZ B-VP -------
So I have 2 requirements:
1. How to capture the pattern "But/CC" (or this type) which is not in brackets?
2. After every sentence or pattern we see that there is a line gap in the input text. Thus after a sentence we see a gap. So in the output also, I need to give a line break after each sentence as provided in the input text file. [Also after P/. there should be a line break as is there in the input]
Please refer to the desired output part of this thread. I need to write a Regex code to solve this. Please help me to modify/write the same.
Thanks!
Similar Threads
-
Text Format Error
By MrFish in forum New To JavaReplies: 2Last Post: 01-13-2010, 01:06 AM -
HDF format ( HDF4 HDF5 ) files in pure Java
By williamfgc in forum Advanced JavaReplies: 0Last Post: 12-04-2009, 08:34 PM -
how to convert one format to another format
By mahipal_reddy621 in forum New To JavaReplies: 1Last Post: 12-02-2008, 10:21 AM -
PNG file format decoder in Java
By Java Tip in forum java.awtReplies: 0Last Post: 06-21-2008, 08:51 PM -
it's possible format data in java?
By paul in forum New To JavaReplies: 1Last Post: 08-07-2007, 05:10 AM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks