remove all punctuation except dots hyphens and spaces with a regular expression
I would like to remove all punctuation from an article that is a string but retain the spaces, hyphens and dots so I can still determine word and sentence boundarys.
I have tried
txt = txt.replaceAll("\\W([^\\.]|[^\\s]|[^\\-])", "");
however its matching a non word character followed by some thing that's not a dot or space or hyphen rather than a non word character except a dot or space or hyphen.
How do you do exceptions?, are they supported? If not how would I go about solving this problem with out specifying every single possible punctuation character (including non ascii characters)?
Thanks in advance.