Notes on a claim made by Li et al. USENIX Security 2022

Tachio Terauchi
August 30th 2022

The following paper makes a claim that our repair method may synthesize ReDoS vulnerable regexes:

Y. Li, Y. Sun, Z. Xu, J. Cao, Y. Li, R. Li, H. Chen, S. Cheung, Y. Liu, and Y. Xiao. RegexScalpel: Regular Expression Denial of Service (ReDoS) Defense by Localize-and-Fix. The 31st USENIX Security Symposium (USENIX Security 2022)

The claim is false, and we have notified the authors of the paper of the possible misunderstanding by sending the email below. However, the authors have not replied to the email and the false claim persists in their paper.

----The copy of the email sent to the authors----
Dear Yeting Li, Zhiwu Xu, Jialun Cao, Rongchen Li, Haiming Chen, and Shing-Chi Cheung,

We are the authors of the paper "Repairing DoS Vulnerability of Real-World Regexes" (S&P'22). We read your paper "RegexScalpel: Regular Expression Denial of Service (ReDoS) Defense by Localize-and-Fix" (USENIX Security'22) and found a claim regarding the repair method introduced in our S&P'22 paper. Your paper claims that the regexes synthesized by our repair method may still be vulnerable to ReDoS attacks. However, we think that the claim is false. We have proven in our paper that our repair method only synthesizes invulnerable regexes. In fact, the purported counterexamples given in your paper are actually invulnerable to ReDoS attacks. For example, your paper claims that the following regexes are vulnerable.

1. <span([^".1-8B-Y\[\\\]^b-dfh-y]*)font-style:italic([^>]*)>

(p. 3, For example, Chida and Terauchi [7] return a repaired regex ...)

2. a+b

(p. 15, For example, they cannot repair the regex a+b.)

However, these regexes are invulnerable to ReDoS attacks because they satisfy RWS1U condition (in fact, even S1U since these are pure regexes), and we have proven in our paper that any regex satisfying RWS1U is ReDoS invulnerable.

Indeed, for the first regex, you give concrete attack strings '<spannfont-style:italic'.repeat(10000), but you can empirically check that these are not attack strings as they do not cause super-linear behavior:


>>> import re
>>> re.match('<span([^".1-8B-Y\[\\\]^b-dfh-y]*)font-style:italic([^>]*)>', '<spannfont-style:italic' * 10000)


>>> node -e 'console.log(/^< span([^".1-8B-Y\[\\\]^b-dfh-y]*)font-style:italic([^>]*)>$/.test("< spannfont-style:italic".repeat(10000)))'

Note that we need to add the anchors ^ and $ for the case of Node.js because its test function does substring matching rather than whole string matching. We note that "spannfont" in your paper may be a typo of "span font", but the string '<span font-style:italic'.repeat(10000) also does not cause super-linear behavior (indeed, no strings will cause it because the regex is invulnerable as implied by our theorem).

Perhaps there is some misunderstanding about our paper. In particular, please note that our paper is concerned with matching the given regex on the whole input string, and not finding substrings of the input string that match the regex. Therefore, we kindly ask you to revise the erroneous claims on p.3 of p.15 of your paper that suggest that our repair method may synthesize vulnerable regexes.

Kind regards,

Nariyoshi Chida and Tachio Terauchi