Results 1 to 14 of 14
  1. #1
    moaxjlou is offline Member
    Join Date
    Oct 2008
    Posts
    23
    Rep Power
    0

    Default Regex that matches whole word

    Hi
    I'm trying to work out the regex that matches a name : so basically have 2 parts (the first name and the last name, assuming that everybody's names is in that format), but the first name cannot be the word "Cand" and the last name cannot be the word "Key"

    This is what i've done:
    Pattern studentNamePattern = Pattern.compile("[a-zA-Z]+\\s[a-zA-Z]+");

    Tested against this input,
    Java Code:
    Cand Key
    Cand Key
    Ramone Ray
    Paul Brimbell
    Neil Bristow
    it returns
    Java Code:
    Cand Key
    Cand Key
    Ramone Ray
    Paul Brimbell
    Neil Bristow
    but i only want
    Java Code:
    Ramone Ray
    Paul Brimbell
    Neil Bristow
    What should i add to the regex to say that the first names cannot be "Cand" and the last name cannot be "Key"?
    Last edited by moaxjlou; 11-02-2008 at 07:19 PM. Reason: mistakes in the first question

  2. #2
    DarrylBurke's Avatar
    DarrylBurke is online now Member
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,242
    Rep Power
    19

    Default

    Yor specification is incomplete. What result do you want if the name is Cand Ray? If it is Paul Key?

    db

  3. #3
    moaxjlou is offline Member
    Join Date
    Oct 2008
    Posts
    23
    Rep Power
    0

    Default

    Hi db

    Yor specification is incomplete. What result do you want if the name is Cand Ray? If it is Paul Key?
    Sorry, actually it should not capture the word "#Cand Key" --> that's the name of one of the column in the input file.

    Any idea how I can not capture those specific words?

  4. #4
    DarrylBurke's Avatar
    DarrylBurke is online now Member
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,242
    Rep Power
    19

    Default

    The regex you posted won't match "#Cand Key" because of the #.

    I think you need to be more clear.

    db

  5. #5
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    It seems that it's always problems with the clarity and completeness of the specification that dooms regex questions here, isn't it?

  6. #6
    moaxjlou is offline Member
    Join Date
    Oct 2008
    Posts
    23
    Rep Power
    0

    Default

    Hi Db
    Java Code:
    The regex you posted won't match "#Cand Key" because of the #.
    Actually it did match the regex (I tested it against the input file), so that is why I'm confused now.

    Anyone any ideas to solve this prob?

  7. #7
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    SW Missouri
    Posts
    17,421
    Rep Power
    25

    Default

    If the input is as show, why do you need a regex?
    The examples given would be satisfied by using .equals()

    Can you give some complete examples of the Strings to be searched?

    Show which are to be matches and which NOT.
    Given your first post, equals() would work.

  8. #8
    moaxjlou is offline Member
    Join Date
    Oct 2008
    Posts
    23
    Rep Power
    0

    Default

    Hi Norm
    If the input is as show, why do you need a regex?
    I need to use regex to match the occurence of any people's names, so I don't think I can use equals() for that reason.

    Here is the complete example of the string input:
    #Cand Key
    Nick Bray
    Michelle Ramone
    Rochelle Aroulins
    Paul Abraham
    Donna Lewis
    Cristina Saar
    Tracey Mumford

    so I wan to only return the names, not the #Cand Key.

    So essentially, the regex should start with an alphabet, an then followed by as many alphabets there are, then followed by a space, and then again starts with an alphabet and followed by as many alphabets for the last name.

    Having said that, this is what i believe it looks like in java:
    Java Code:
    Pattern studentNamePattern = Pattern.compile("[A-Z][a-zA-Z]+\\s[A-Z][a-zA-Z]+");
    The problem is, the output still gives the following:
    Java Code:
    Cand Key <-- (problem here, shouldn't be outputting this as the input is #Cand Key (don't start with an alphabet))
    Nick Bray
    Michelle Ramone
    Rochelle Aroulins
    Paul Abraham
    Donna Lewis
    Cristina Saar
    Tracey Mumford
    How do I specify the regex to overcome this problem?

  9. #9
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    SW Missouri
    Posts
    17,421
    Rep Power
    25

    Default

    Can you give some complete examples of the Strings to be searched?

    Show which are to be matches and which NOT.
    You give one small example: "the input is #Cand Key" that is NOT to match.
    .equals("#Cand Key") would match that input.
    does the "Cand Key" string ever start at the begining of the string? Or does it always have a leading blank? The same for the end of the String, does it always have a trailing blank?
    then .equals(" Cand Key ") would find it. Are there delimiters other than a space?

    WIthout a good problem statement and good examples, how can anyone work on a solution?
    Last edited by Norm; 11-03-2008 at 02:38 AM.

  10. #10
    moaxjlou is offline Member
    Join Date
    Oct 2008
    Posts
    23
    Rep Power
    0

    Default

    Norm
    Can you give some complete examples of the Strings to be searched?
    Here is the .csv input file
    Java Code:
    #Cand Key,Name,#Cand Key
    #Z11236,Nick Bray,#Z11236
    #Z11345,Cristina Saar,#Z11345
    #Z14321,Paul Abraham,#Z14321
    #Z10987,Susan Curtis,#Z10987
    Above is the .csv input file

    does the "Cand Key" string ever start at the begining of the string?
    From the input, yes.

    WIthout a good problem statement and good examples, how can anyone work on a solution?
    The problem now is that the regex that I created cannot identify exact words, in other hands eventhough #Cand Key does not start with an alphabet but rather the # sign, it is still considered a match because the way I designed the regex (refer to my previous post). So I'm wondering how to modify my regex, but you suggested the usage of equals() method.

    Hence, I tried your above suggestions and put the equals() statement as part of the condition of the while loop. This is shown as below:

    Java Code:
    while(nameMatcher.find() && !(nameMatcher.equals("#Cand Key"))){
    	names = nameMatcher.group();
    	studentNames.add(names);
    						
    }
    The complete method is shown below for reference:
    Java Code:
    public ArrayList<String> generateStudentNames(){
    		BufferedReader nameReader = null;
    		Pattern studentNamePattern = Pattern.compile("[a-zA-Z]+\\s[A-Z][a-zA-Z]+");
    		String nameLine = null;
    		String names;
    		ArrayList<String> studentNames = new ArrayList<String>();
    		
    		if (input == null) 
    		return studentNames;
    		
    		else
    		try{
    		
    			nameReader = new BufferedReader (new FileReader(input));
    			
    			while((nameLine = nameReader.readLine()) != null){
    				
    				try{
    				Matcher nameMatcher = studentNamePattern.matcher(nameLine);
    				
    					while(nameMatcher.find() && !(nameMatcher.equals("#Cand Key"))){
    						names = nameMatcher.group();
    						studentNames.add(names);
    						
    					}
    					
    				
    				}
    				catch (NullPointerException e){
    				System.err.println("Exception occured:");
    				System.out.println(e.getMessage());
    				e.printStackTrace();
    				}
    				
    			}
    		}
    		
    		catch (FileNotFoundException ex) {
                ex.printStackTrace();
            }
    		catch (IOException ex) {
                ex.printStackTrace();
            }
    		 
    		finally {
                //Close the BufferedReader
                try {
                    if (nameReader != null)
                        nameReader.close();
                } catch (IOException ex) {
                    ex.printStackTrace();
                }
            }
    	
    	System.out.println(studentNames);
    	
    	return studentNames;
    	}
    However, when tested against the output below:
    Java Code:
    #Cand Key,Name,#Cand Key
    #Z11236,Nick Bray,#Z11236
    #Z11345,Cristina Saar,#Z11345
    #Z14321,Paul Abraham,#Z14321
    #Z10987,Susan Curtis,#Z10987
    The result still captures Cand Key:
    Java Code:
    [Cand Key, Cand Key, Nick Bray, Cristina Saar, Paul Abraham, Susan Curtis]
    What's wrong with my implementation, and how to solve this?
    Thanks for any help.
    Last edited by moaxjlou; 11-03-2008 at 01:56 PM.

  11. #11
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    SW Missouri
    Posts
    17,421
    Rep Power
    25

    Default

    Is the purpose of this program to extract the Student name from each record/line? In looking at the input data, it appears that there are three fields/columns separated by ,s in the input.
    The second column is the name field.
    You want to skip the heading: "Name" which is also on the first line.
    Can you use any of these details describing the input file to process it?
    For example the data starts with the second line, the first is the header -> Skip the first line when processing
    The desired data is the second field (separated by commas) on each line. Use a StringTokenizer to get the second field.

    With a regex, if a column heading had to parts, it would match.

  12. #12
    DarrylBurke's Avatar
    DarrylBurke is online now Member
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,242
    Rep Power
    19

    Default

    nameMatcher.find() will find a subsequence that matches. Ethier use nameMatcher.matches() or tie the regex to the start of the string.
    Java Code:
    "^[a-zA-Z]+\\s[A-Z][a-zA-Z]+"
    Not the ^ added at the start of the regex, which is the metacharacter for the beginning of input.

    db

  13. #13
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    SW Missouri
    Posts
    17,421
    Rep Power
    25

    Default

    If the data is a csv, then it seems to me that regex is NOT good enough. The solution is much simpler than that.
    The data is in rows and columns. What the OP wants is the second column starting in row 2. Skip the first row, then
    String.split(",") and grab column 2.

  14. #14
    moaxjlou is offline Member
    Join Date
    Oct 2008
    Posts
    23
    Rep Power
    0

    Default

    Hi Norm
    Skip the first row, then
    String.split(",") and grab column 2.
    Thanks, it's working :-)

Similar Threads

  1. Some help with regex and loop
    By moaxjlou in forum New To Java
    Replies: 21
    Last Post: 11-02-2008, 10:24 PM
  2. [SOLVED] More RegEx help
    By JT4NK3D in forum New To Java
    Replies: 2
    Last Post: 05-23-2008, 04:07 AM
  3. Regex Quantifiers Example
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 01-10-2008, 10:44 AM
  4. Regex pattern
    By ravian in forum New To Java
    Replies: 4
    Last Post: 12-11-2007, 10:20 AM
  5. Help with password matches
    By Albert in forum AWT / Swing
    Replies: 1
    Last Post: 07-10-2007, 04:17 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •