Greedy vs. Reluctant vs. Possessive Quantifiers
来源:互联网 发布:迅雷mac下载速度 编辑:程序博客网 时间:2024/05/14 11:33
Greedy vs. Reluctant vs. Possessive Quantifiers
Ref_01:http://stackoverflow.com/questions/5319840/greedy-vs-reluctant-vs-possessive-quantifiers
Ref_02:http://docs.oracle.com/javase/tutorial/essential/regex/quant.html
Question:
I found this excellent tutorial on regular expressions and while I intuitively understand what “greedy”, “reluctant” and “possessive” quantifiers do, there seems to be a serious hole in my understanding.
Specifically, in the following example:
Enter your regex: .*foo // greedy quantifierEnter input string to search: xfooxxxxxxfooI found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.Enter your regex: .*?foo // reluctant quantifierEnter input string to search: xfooxxxxxxfooI found the text "xfoo" starting at index 0 and ending at index 4.I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.Enter your regex: .*+foo // possessive quantifierEnter input string to search: xfooxxxxxxfooNo match found.
The explanation mentions eating the entire input string, letters been consumed, matcher backing off, rightmost occurrence of “foo” has been regurgitated, etc.
Unfortunately, despite the nice metaphors, I still don’t understand what is eaten by whom… Do you know of another tutorial that explains (concisely) how regular expressions engines work?
Alternatively, if someone can explain in somewhat different phrasing the following paragraph, that would be much appreciated:
- The first example uses the greedy quantifier .* to find “anything”, zero or more times, followed by the letters “f” “o” “o”. Because the quantifier is greedy, the .* portion of the expression first eats the entire input string. At this point, the overall expression cannot succeed, because the last three letters (“f” “o” “o”) have already been consumed (by whom?). So the matcher slowly backs off (from right-to-left?) one letter at a time until the rightmost occurrence of “foo” has been regurgitated (what does this mean?), at which point the match succeeds and the search ends.
- The second example, however, is reluctant, so it starts by first consuming (by whom?) “nothing”. Because “foo” doesn’t appear at the beginning of the string, it’s forced to swallow (who swallows?) the first letter (an “x”), which triggers the first match at 0 and 4. Our test harness continues the process until the input string is exhausted. It finds another match at 4 and 13.
- The third example fails to find a match because the quantifier is possessive. In this case, the entire input string is consumed by .*+, (how?) leaving nothing left over to satisfy the “foo” at the end of the expression. Use a possessive quantifier for situations where you want to seize all of something without ever backing off (what does back off mean?); it will outperform the equivalent greedy quantifier in cases where the match is not immediately found.
Answer:
- A greedy quantifier first matches as much as possible. So the .* matches the entire string. Then the matcher tries to match the f following, but there are no characters left. So it “backtracks”, making the greedy quantifier match one less thing (leaving the “o” at the end of the string unmatched). That still doesn’t match the f in the regex, so it “backtracks” one more step, making the greedy quantifier match one less thing again (leaving the “oo” at the end of the string unmatched). That still doesn’t match the f in the regex, so it backtracks one more step (leaving the “foo” at the end of the string unmatched). Now, the matcher finally matches the f in the regex, and the o and the next o are matched too. Success!
A reluctant or “non-greedy” quantifier first matches as little as possible. So the .* matches nothing at first, leaving the entire string unmatched. Then the matcher tries to match the f following, but the unmatched portion of the string starts with “x” so that doesn’t work. So the matcher backtracks, making the non-greedy quantifier match one more thing (now it matches the “x”, leaving “fooxxxxxxfoo” unmatched). Then it tries to match the f, which succeeds, and the o and the next o in the regex match too. Success!
In your example, it then starts the process over with the remaining unmatched portion of the string, following the same process.
- A possessive quantifier is just like the greedy quantifier, but it doesn’t backtrack. So it starts out with .* matching the entire string, leaving nothing unmatched. Then there is nothing left for it to match with the f in the regex. Since the possessive quantifier doesn’t backtrack, the match fails there.
- Greedy vs. Reluctant vs. Possessive Quantifiers
- Greedy vs. Reluctant vs. Possessive Quantifiers
- Greedy quantifiers, Reluctant quantifiers, Possessive quantifiers
- Differences Among Greedy, Reluctant, and Possessive Quantifiers
- Greedy Reluctant Possessive区别
- Greedy Reluctant Possessive 正则表达式
- greedy、reluctant和possessive量词
- greedy、reluctant和possessive量词的区别
- greedy、reluctant和possessive量词的区别
- 正则表达式:Greedy、Reluctant、Possessive 区别
- Java正则中的Greedy、Reluctant 、Possessive
- Java正则中的Greedy、Reluctant 、Possessive qualifier
- [转载]正则表达式 的greedy、reluctant和possessive量词
- Java正则表达中Greedy Reluctant Possessive 的区别
- 正则表达式Greedy、Reluctant、Possessive三种策略的区别
- java 正则表达式Greedy、Reluctant、Possessive的理解
- Java正则表达中Greedy Reluctant Possessive 的区别
- 正则表达式中Greedy、Reluctant、Possessive数量词的区别
- 2017/03/18 关于dos下使用批处理
- springmvc文件配置
- Android动画专题之逐帧动画
- Linux Ubuntu 启动、连接redis测试
- Lambda&Java多核编程-5-函数式接口与function包
- Greedy vs. Reluctant vs. Possessive Quantifiers
- android FileDownloader
- android接入QQ原生登录功能
- 后期材质(PostProcess Material)+ Custom Node 实现"描边"和"马赛克"效果【UE4】
- U-Boot2010.06移植(2440)-----添加支持YAFFS2文件系统
- 算法竞赛入门经典(刘汝佳)习题3-5 UVa227
- Postfix邮件服务器搭建之软件安装与配置
- MyBatis无法扫描Spring Boot别名的Bug
- Android 性能优化总结