Skip to content

Lock contention when re2:RE2 object is shared between threads #569

@michalsieron

Description

@michalsieron

Is it expected that absl::*::Mutex::(un)lock_shared operations will dominate runtime when a re2::RE2 object is shared among threads?

I noticed this issue when analyzing why switch to RE2 causes significant slowdown in falconindy/pkgfile#72. It's so bad, that doing the work in a single thread was faster than splitting it among threads. perf shows 80-90% of time is spent in the lock operations (falconindy/pkgfile#72 (comment)).

Here is a small reproducer: https://gist.github.com/michalsieron/a076eacb5abe7acc826105844022e270

#include <string>
#include <thread>
#include <vector>

#ifdef USE_PCRE
#include <pcre.h>
std::pair<pcre *, pcre_extra *> prepare_regex(const std::string& pattern) {
    const char *err;
    int offset;

    auto re = pcre_compile(pattern.c_str(), 0, &err, &offset, nullptr);
    if (re == nullptr) std::abort();
    auto re_extra = pcre_study(re, PCRE_STUDY_JIT_COMPILE, &err);
    if (err) std::abort();
    return std::pair<pcre *, pcre_extra *>(re, re_extra);
}
#else
#include <re2/re2.h>
std::unique_ptr<re2::RE2> prepare_regex(const std::string& pattern) {
    auto re = std::make_unique<re2::RE2>(pattern);
    if (!re->ok()) std::abort();
    return re;
}
#endif

int main() {
    std::string line = "example string";
    std::string pattern = "i won't match";

#ifdef SHARED
    auto re = prepare_regex(pattern);
#endif

    const auto MAX_REPS = 1'000'000;
    const auto num_workers = std::min<int>(std::thread::hardware_concurrency(), 64);
    std::vector<std::thread> workers;

    for (int i = 0; i < num_workers; i++) {
        workers.push_back(std::thread([&] {
#ifndef SHARED
            auto re = prepare_regex(pattern);
#endif
            for (int rep = 0; rep < MAX_REPS; rep++)
#ifdef USE_PCRE
                pcre_exec(re.first, re.second, line.c_str(), line.size(), 0, PCRE_NO_UTF16_CHECK, nullptr, 0);
#else
                re2::RE2::PartialMatch(line, *re);
#endif
        }));
    }

    for (auto& worker : workers)
        worker.join();
}

It supports either RE2 or PCRE (pass -DUSE_PCRE) and will either share (-DSHARED) or not the regex object.
Compile with g++ repro.cpp -O2 [-DSHARED] {-DUSE_PCRE -lpcre|-lre2}
Here are my results from testing with hyperfine:

Command Mean [ms] Min [ms] Max [ms] Relative
build/re2-separate 173.2 ± 12.2 163.8 217.5 1.63 ± 0.15
build/re2-shared 966.8 ± 116.4 861.5 1170.4 9.10 ± 1.21
build/pcre-shared 123.6 ± 18.1 104.2 170.8 1.16 ± 0.18
build/pcre-separate 106.3 ± 6.0 96.1 119.2 1.00

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions