• 静思
  • 吴言片语
    • 吴言
    • 片语
    • 杂七杂八
  • 死于青春
    • 一路走好
  • 乌合麒麟
  • 纪念
    • 5.12
    • 3.23
  • GitHub
    • A List of Post-mortems
    • The Art of Command Line
  • 关于
    • Privacy Policy

程序员的信仰

金鳞岂是池中物,一遇风云便化龙

HOME » 技术生活 » Differences Among Greedy, Reluctant, and Possessive Quantifiers (for RexExp in Java)

Differences Among Greedy, Reluctant, and Possessive Quantifiers (for RexExp in Java)

2011 年 9 月 20 日 @ 上午 10:53 by Jay | 被踩了 4,984 脚

There are subtle differences among greedy, reluctant, and possessive quantifiers.

Greedy quantifiers are considered “greedy” because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression, the last thing it will try matching against is 1 or 0 characters.

The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input

Finally, the possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

To illustrate, consider the input string xfooxxxxxxfoo.

Enter your regex: .*foo // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text “xfooxxxxxxfoo” starting at index 0 and ending at index 13.

Enter your regex: .*?foo // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text “xfoo” starting at index 0 and ending at index 4.
I found the text “xxxxxxfoo” starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

The first example uses the greedy quantifier .* to find “anything”, zero or more times, followed by the letters “f” “o” “o”. Because the quantifier is greedy, the .* portion of the expression first eats the entire input string. At this point, the overall expression cannot succeed, because the last three letters (“f” “o” “o”) have already been consumed. So the matcher slowly backs off one letter at a time until the rightmost occurrence of “foo” has been regurgitated, at which point the match succeeds and the search ends.

The second example, however, is reluctant, so it starts by first consuming “nothing”. Because “foo” doesn’t appear at the beginning of the string, it’s forced to swallow the first letter (an “x”), which triggers the first match at 0 and 4. Our test harness continues the process until the input string is exhausted. It finds another match at 4 and 13.

The third example fails to find a match because the quantifier is possessive. In this case, the entire input string is consumed by .*+, leaving nothing left over to satisfy the “foo” at the end of the expression. Use a possessive quantifier for situations where you want to seize all of something without ever backing off; it will outperform the equivalent greedy quantifier in cases where the match is not immediately found


-- EOF --

除非注明(如“转载”、“[zz]”等),本博文章皆为原创内容,转载时请注明: 「转载自程序员的信仰©」
本文链接地址:Differences Among Greedy, Reluctant, and Possessive Quantifiers (for RexExp in Java)

分享

  • 点击分享到 Facebook (在新窗口中打开) Facebook
  • 点击以分享到 X(在新窗口中打开) X
  • 更多
  • 点击分享到Reddit(在新窗口中打开) Reddit
  • 点击分享到Telegram(在新窗口中打开) Telegram
  • 点击以在 Mastodon 上共享(在新窗口中打开) Mastodon

赞过:

赞 正在加载……

相关

Posted in: 技术生活 Tagged: java, regexp
← Learn Vim Progressively
X and XX Usgaes for Java →

android (9) apple (20) augmentum (9) Beijing (21) bt (8) career (28) coding (38) firefox (10) google (36) hibernate (11) ibm (11) iphone (10) java (93) linux (16) m$ (26) mac (58) macos (27) nazca (9) olympics (8) oo (8) playstation (10) rip (8) Shanghai (39) spring (9) tips (45) tommy emmanuel (8) ubuntu (12) usa (23) windows (9) 北航 (17) 博客 (29) 吐槽 (8) 周末 (9) 和谐社会 (26) 小资 (11) 愤青 (40) 方言 (10) 朋友 (77) 歌词 (8) 烟酒不分家 (18) 爱国 (19) 爱情 (8) 犯二 (15) 破解 (8) 足球 (11)

烫手山芋

  • 再谈苹果的输入法:这一次是靠OS X自带的输入法来翻身的~ - 被踩了 27,853 脚
  • 生活,就是一个期待跟着一个期待 - 被踩了 21,403 脚
  • 星巴克饮品缩写大全(Starbucks Drink ID Codes)[zz] - 被踩了 18,518 脚
  • 从一个全角冒号说一下我为什么不感冒iOS - 被踩了 14,457 脚
  • 有关Character.isLetter()和Character.isLetterOrDigit() - 被踩了 13,639 脚

刚拍的砖

  • leo 发表在《再谈苹果的输入法:这一次是靠OS X自带的输入法来翻身的~》
  • 花 发表在《再谈苹果的输入法:这一次是靠OS X自带的输入法来翻身的~》
  • 无名氏 发表在《从一个全角冒号说一下我为什么不感冒iOS》
  • Jay 发表在《Mac OS geek级问题》
  • Wei Wang 发表在《再谈苹果的输入法:这一次是靠OS X自带的输入法来翻身的~》

随便看看

  • 如何在CLI下更新Glassfish15 年 ago
  • “麦莎”来了20 年 ago
  • 博客后台升级到wordpress 2.9 beta 216 年 ago
  • 关于『程序出错后,程序员给测试人员的20条高频回复』14 年 ago
  • 彩云之南,美丽之江(序)9 年 ago

文以类聚

光阴似箭

其他操作

  • 登录
  • 条目 feed
  • 评论 feed
  • WordPress.org

Copyright © 2025 程序员的信仰.

Jay's Omega WordPress Theme by Jay

 

正在加载评论...
 

    %d