Python中re的match、search、findall、finditer區別

Benjamin · 2020年10月25日17:05

以下文章轉載自：https://www.itread01.com/p/435642.html

這四個方法是從某個字串中尋找特定子串或判斷某個字串是否符合某個模式的常用方法。

1、match

re.match(pattern, string[, flags])

從首字母開始開始匹配,string如果包含pattern子串,則匹配成功,返回Match物件,失敗則返回None,若要完全匹配,pattern要以$結尾。

2、search

re.search(pattern, string[, flags])

若string中包含pattern子串,則返回Match物件,否則返回None,注意,如果string中存在多個pattern子串,只返回第一個。

3、findall

re.findall(pattern, string[, flags])

返回string中所有與pattern相匹配的全部字串,返回形式為陣列。

4、finditer

re.finditer(pattern, string[, flags])

返回string中所有與pattern相匹配的全部字串,返回形式為迭代器。

若匹配成功,match()/search()返回的是Match物件,finditer()返回的也是Match物件的迭代器,獲取匹配結果需要呼叫Match物件的group()、groups或group(index)方法。

group()、groups()與group(index)的區別,如下所示:

>>> import re
>>> s = '23432werwre2342werwrew'
>>> p = r'(/d*)([a-zA-Z]*)'
>>> m = re.match(p,s)
>>> m.group()
'23432werwre'
>>> m.group(0)
'23432werwre'
>>> m.group(1)
'23432'
>>> m.group(2)
'werwre'
>>> m.groups()
('23432', 'werwre')
>>> m = re.findall(p,s)
>>> m
[('23432', 'werwre'), ('2342', 'werwrew'), ('', '')]
>>> p=r'(/d+)'
>>> m=re.match(p,s)
>>> m.group()
'23432'
>>> m.group(0)
'23432'
>>> m.group(1)
'23432'
>>> m.groups()
('23432',)
>>> m=re.findall(p,s)
>>> m
['23432', '2342']

綜上:
group():母串中與模式pattern匹配的子串;
group(0):結果與group()一樣;
groups():所有group組成的一個元組,group(1)是與patttern中第一個group匹配成功的子串,group(2)是第二個,依次類推,如果index超了邊界,丟擲IndexError;
findall():返回的就是所有groups的陣列,就是group組成的元組的陣列,母串中的這一撮組成一個元組,那一措組成一個元組,這些元組共同構成一個list,就是findall()的返回結果。另,如果groups是隻有一個元素的元組,findall的返回結果是子串的list,而不是元組的list了。

例子
s =“1113446777”
用正則表示式把s分為1111, 3, 44, 6, 777

>>> import re
>>> s='1113446777'
>>> m = re.findall(r'(/d)/1*',s)
>>> print m
['1', '3', '4', '6', '7']
>>> m = re.search(r'(/d)/*',s)
>>> m.group()
>>> m=re.search(r'(/d)/1*',s)
>>> m.group()
'111'
>>> m.groups()
('1',)
>>> m.group(0)
'111'
>>> m.group(1)
'1'
>>> m.group(2)
Traceback (most recent call last):
File " ", line 1, in
IndexError: no such group
>>> m=re.finditer(r'(/d)/1*',s)
>>> m.next().group()
'111'
>>> m.next().group()
'3'
>>> m.next().group()
'44'
>>> m.next().group()
'6'
>>> m.next().group()
'777'
>>> m.next().group()
Traceback (most recent call last):
File " ", line 1, in
StopIteration

另一個例子:

>>> p = r'(/d)/1+([a-zA-Z]+)'
>>> s = '1111werwrw3333rertert4444'
>>> p = r'(/d)/1+([a-zA-Z]*)'
>>> import re
>>> re.findall(p,s)
[('1', 'werwrw'), ('3', 'rertert'), ('4', '')]
>>> m = re.search(p,s)
>>> m.group()
'1111werwrw'
>>> m.group(1)
'1'
>>> m.group(2)
'werwrw'
>>> m.groups()
('1', 'werwrw')
>>> m = re.finditer(p,s)
>>> m.next().group()
'1111werwrw'
>>> m.next().group()
'3333rertert'
>>> m.next().group()
'4444'
>>> m.next().group()
Traceback (most recent call last):
File " ", line 1, in
StopIteration