Here are some quick thoughts:
Everyone who is suggesting you use something other than regular expression is giving you very good advice. On the other hand, it's always a good time to learn more about regular expression syntax...
An expression in square brackets --
[...] -- matches any single character inside those brackets. So writing
[,], which only contains a single character, is exactly identical to writing a simple unadorned comma:
.findall method returns a list of all matching groups in the string. A group is identified by parenthese --
(...) -- and they count from left to right, outermost first. Your final expression looks like this:
The outermost parentheses match the entire year, and the inside parentheses match the first two digits. Hence, for a date like "1989", the final two match groups are going to be
A group is identified by parentheses
(...) and they count from left to right, outermost first. Your final expression looks like this:
The outermost parentheses match the entire year, and the inside parentheses match the first two digits. Hence, for a date like "1989", the two match groups are going to be 1989 and 19. Since you don't want the inner group (first two digits), you should use a non-capturing group instead. Non-capturing groups start with
?:, used like this:
By the way, there is some good documentation on how to use regular expressions here.
Python has a date parser as part of the
import time time.strptime("December 31, 2012", "%B %d, %Y")
The above is all you need if the date format is always the same.
So, in real production code, I would write a regular expression that parses the date, and then use the results from the regular expression to build a date string that is always the same format.
Now that you said, in the comments, that this is homework, I'll post another answer with tips on regular expressions.
You have this regular expression:
pattern = "(January|February|March|April|May|June|July|August|September|October|November|December)[,][ ](0[1-9]|[0-9]|3)[,][ ]((19|20)[0-9][0-9])"
One feature of regular expressions is a "character class". Characters in square brackets make a character class. Thus
[,] is a character class matching a single character,
, (a comma). You might as well just put the comma.
Perhaps you wanted to make the comma optional? You can do that by putting a question mark after it:
Anything you put into parentheses makes a "match group". I think the mysterious extra "19" came from a match group you didn't mean to have. You can make a non-matching group using this syntax:
So, for example:
This would match "red socks" or "blue socks" but does not make a match group. If you then put that inside plain parentheses:
That would make a match group, whose value would be
"red socks" or
I think if you apply these comments to your regular expression, it will work. It is mostly correct now.
As for validating the date against the month, that is way beyond the scope of a regular expression. Your pattern will match
"February 31" and there is no easy way to fix that.