Phyton How to use pandas .replace() with list of regexs while honoring list order? python pandas replace with nan,python pandas replace with dictionary,

Here is a way to do this using a double list comprehension and the re.sub() function :

import re

A = pd.DataFrame({'wildcards' : ['(.*)activation.playready.microsoft.com',
                                 '(.*)v10.vortex-win.data.microsoft.com',
                                 '(.*)i.microsoft.com', '(.*)microsoft.com'],
                  'regex' : [re.compile('^(.*)activation.playready.microsoft.com$'),
                             re.compile('^(.*)v10.vortex-win.data.microsoft.com$'), 
                             re.compile('^(.*)i.microsoft.com$'), 
                             re.compile('^(.*)microsoft.com$')]})

B = pd.DataFrame({'server_hostname' : ['v10.vortex-win.data.microsoft.com',
                                       'www.microsoft.com']})
# For each server_hostname we try each regex and keep the longest matching one
B['wildcards'] = [max([re.sub(to_replace, value, x) for to_replace, value
                       in A[['regex', 'wildcards']].values
                       if re.sub(to_replace, value, x)!=x], key=len) 
                  for x in B['server_hostname']]

Output : 
                     server_hostname                              wildcards
0  v10.vortex-win.data.microsoft.com  (.*)v10.vortex-win.data.microsoft.com
1                  www.microsoft.com                      (.*)microsoft.com
Answer:1

An alternative tack, which unfortunately still needs an apply, is to use lastgroup. This entails compiling a single regex and then looking up the name of the matched group (row):

In [11]: regex = re.compile("|".join([f"(?P<i{i}>{regex})" for i, regex in s["wildcards"].items()]))

In [12]: regex
Out[12]:
re.compile(r'(?P<i42>(.*)activation.playready.microsoft.com)|(?P<i35>(.*)v10.vortex-win.data.microsoft.com)|(?P<i40>(.*)settings-win.data.microsoft.com)|(?P<i43>(.*)smartscreen.microsoft.com)|(?P<i39>(.*).playready.microsoft.com)|(?P<i38>(.*)go.microsoft.com)|(?P<i240>(.*)i.microsoft.com)|(?P<i238>(.*)microsoft.com)',
re.UNICODE)

In [13]: B.server_hostname.apply(lambda s: int(re.match(regex, s).lastgroup[1:]))
Out[13]:
146    238
205     40
341     43
406     35
667    238
Name: server_hostname, dtype: int64

In [14]: B.server_hostname.apply(lambda s: int(re.match(regex, s).lastgroup[1:])).map(s.wildcards)
Out[14]:
146                        (.*)microsoft.com
205      (.*)settings-win.data.microsoft.com
341            (.*)smartscreen.microsoft.com
406    (.*)v10.vortex-win.data.microsoft.com
667                        (.*)microsoft.com
Name: server_hostname, dtype: object

This attribute isn't exposed by pandas (but it might be possible to do something clever with the internals)...

Answer:2



  1. python pandas replace with nan
  2. python pandas replace with dictionary
  3. python pandas replace with condition
  4. python pandas replace with
  5. python pandas replace with none
  6. python dataframe replace with nan
  7. python dataframe replace with condition
  8. python pandas replace in column
  9. python dataframe replace with
  10. python pandas replace in string
  11. python dataframe replace with none
  12. python pandas replace nan with 0
  13. python pandas replace nan with none
  14. python pandas replace nan with null
  15. python pandas replace nan with value
  16. python pandas replace comma with dot
  17. python pandas replace string with nan
  18. python pandas replace nan with mean
  19. python pandas replace nan with empty string
  20. python pandas replace outliers with mean

For doing a regex substitution, there are three things that you give it: The match pattern The replacement pattern The original string There are three things that the regex engine finds that are of ...

For doing a regex substitution, there are three things that you give it: The match pattern The replacement pattern The original string There are three things that the regex engine finds that are of ...

I'm using python 2.7.2 and windows 7. I searched through internet, helps and other sources but i can't find an answer to my problem. One of my source imports tkinter, and this one imports _tkinter. ...

I'm using python 2.7.2 and windows 7. I searched through internet, helps and other sources but i can't find an answer to my problem. One of my source imports tkinter, and this one imports _tkinter. ...