A State of
Compile Time
Regular Expressions

Hana Dusíková

Hana Dusíková

  • National Body Chair of Czech Republic in WG21
  • Senior Researcher in Avast
  • Organizer of Avast Prague C++ Meetup

Regular Expressions

(almost every language has a regex library)

[a-z0-9]+[a-z0-9]+abcd[a-z]

🤷🏻‍♀️

Thanks to my friends:
Lucie Mohelníková, Michal Vaner, Martin Doložílek, and Filip Slunečko

C++ std::regex


						struct date {
							std::string year;
							std::string month;
							std::string day;
						};
							
						std::optional<date> extract_date(std::string_view input) noexcept {
							static std::regex pattern("([0-9]{4})/([0-9]{2})/([0-9]{2})");
							
							std::smatch_results<std::string_view::iterator> captures;
							
							if (!std::regex_match(input.begin(), input.end(), captures, pattern)) {
								return std::nullopt;
							}
							
							return date{
								captures[1].str(),
								captures[2].str(),
								captures[3].str()
							};
						}
						

... and it takes only 1.3 sec to compile 🤦🏻‍♀️

The Compile Time
Regular Expressions
Library

Compile Time Regular Expressions


							struct date {
								std::string year;
								std::string month;
								std::string day;
							};
							
							std::optional<date> extract_date(std::string_view input) noexcept {
								auto [match, y, m, d] = ctre::match<"([0-9]{4})/([0-9]{2})/([0-9]{2})">(input);
								
								if (!match) {
									return std::nullopt;
								}
								
								return date{y.str(), m.str(), d.str()};
							}
						

Error Reporting


						bool fnc(std::string_view view) {
							return ctre::match<"(hello">(view);
						}
						
In file included from test.cpp:1: In file included from include/ctre.hpp:5: include/ctre/functions.hpp:48:2: error: static_assert failed due to requirement 'correct' "Regular Expression contains syntax error." static_assert(correct, "Regular Expression contains syntax error."); ^ ~~~~~~~ test.cpp:2:15: note: in instantiation of function template specialization 'ctre::match<"(hello">' requested here return ctre::match<"(hello">(view); ^

Example: Unicode Support (🌈🦄🥰)


							constexpr bool is_emoji_only(std::u32string_view input) noexcept {
								return ctre::match<"\\p{emoji}+">(input);
							}
						

							  static_assert( is_emoji_only(U"🤯😍🥰🖖"));
							  static_assert(!is_emoji_only(U"this text is 👻"));
						
*unicode support is not yet fully merged (ecma-unicode branch)
**thanks to Corentin Jabot ❤️ for providing constexpr unicode tables

Example: DFA matching / searching


							bool contains_date(std::string_view input) noexcept {
								return ctre::fast_search<"[0-9]{4}/[0-9]{2}/[0-9]{2}">(input);
							}
						

Features of CTRE

  • PCRE2 compatible dialect
    • basic unicode support
  • constexpr
    • doesn't support runtime defined patterns
    • compile time constructing
    • constexpr & runtime matching
  • quick regex matching/searching
    • structured-bindings for extracting captures
    • DFA without captures
    • generates compact assembly

Technical details of CTRE

  • small, header only C++17/20 library (8k LoC)
  • supports all major compilers
    • clang 6.0
    • gcc 7.4
    • msvc 15.8.8+
  • open-source

Regular Expressions

hello|aloha|hallo|ahoj|bonjour

[a-z]+[0-9]+

A Regular Expression is (formally)

  • the empty set ( or {}),
  • an empty string (ε or {""}),
  • a literal character (e.g. a or {"a"}),
  • the concatenation of two REs (A⋅B),
  • the alternation of two REs (A|B),
  • or repetitions (Kleene star) (A*)

Non-formal syntax constructs

  • Optional A (A? is ε|A)
  • Plus repetition of A (A⋅A*)
  • Back-reference
not all features of a "regular expression"
implementation is technically a regular expression

Example: Non-formal syntax constructs

We are all used to a pattern like this:
(ct)?re|[a-f]+
The pattern is equivalent to:
(ε|ct)re|(a|b|c|d|e|f)(a|b|c|d|e|f)*
Every pattern can be converted into a formal form.

A Regular Expression
can be described as an abstract syntax tree

(ct)?re|[a-f]+

How to store the AST
in C++ compile-time evaluated code?

  • tree-like (allocated) data structure (merged in C++20, implemented only in EDG)
  • type based expression
    • expression templates (like boost::xpressive)
    • tuple-like empty types

Types as the building blocks


							// building blocks
							struct epsilon { };
							template <character_like auto C> struct ch { };
							
							// operations
							template <typename...> struct concat { };
							template <typename...> struct alt { };
							template <typename> struct star { };
							
							// for convenient usage
							template <typename E> using opt = alternation<E, epsilon>;
							template <typename E> using plus = concat<E, star<E>>;
						

Example: A Regex Type

Converting
A pattern into an AST: Parsing

How to convert the pattern?

  • Use a generic LL(1) parser for converting a pattern into a type.
  • The parser uses a provided PCRE compatible grammar.
  • Output type is the AST.

How does an LL(1) parser work?

  • Starts with a start symbol on the stack.
  • On every step it pops one symbol from the stack
    and checks the current character at the input.
  • Based on the pair of symbol and character it decides to:
    • push a string of symbols to the stack,
    • pop a character from the input,
    • or reject.
  • Repeat until the stack and input are empty then accept.

What does the grammar look like?

f(symbol,char) → (...) ε pop input reject accept
 ()*+?|otherε
→ S( alt0 ) mod seq alt      other char mod seq alt ε
alt0( alt0 ) mod seq alt      other char mod seq alt  
alt ε   | seq0 alt alt  ε
modεε* star + plus ? opt εεε
seq0( alt0 ) mod seq      other char mod seq  
seq( alt0 ) mod seq seq ε   εother char mod seq seq ε
( pop              
)   pop            
*     pop          
+       pop        
?         pop      
|           pop    
other             pop  
Z0               accept

How does an LL1 parser work?

input:
a*b*ε
step: 0
stack:
S
 ()*+?|otherε
→ S( alt0 ) mod seq alt      other char mod seq alt ε
alt0( alt0 ) mod seq alt      other char mod seq alt  
alt ε   | seq0 alt alt  ε
modεε* star + plus ? opt εεε
seq0( alt0 ) mod seq      other char mod seq  
seq( alt0 ) mod seq seq ε   εother char mod seq seq ε
terminal pop pop pop pop pop pop pop  
Z0               accept

How can we represent
the symbols in C++?


							struct S {};
							struct alt0 {};
							struct alt {};
							struct mod {};
							struct seq0 {};
							struct seq {};
					
							using start_symbol = S;
						

How can we represent
an LL(1) table in C++?


							// f (symbol, char) → (...)
							auto f(symbol, term<'c'>) -> list{...};
					
							// f (symbol, symbol) → pop input
							template <auto S> auto f(term<S>, term<S>) -> pop_input;
					
							// f (symbol, char) → reject
							auto f(...) -> reject;
					
							// f (Z0, ε) → accept
							auto f(empty_stack, epsilon) -> accept;
						

How can we pass
the grammar into the parser?


							struct pcre {
								struct S {};
								struct alt0 {};
								struct alt {};
								// ...
						
								using start_symbol = S;
						
								auto f(...) -> reject;
								// ...
							}
						

How is the parser used?


							constexpr bool ok = parser<pcre, "a+b+">::correct;
						

How is the parser implemented?


							template <typename Grammar, ...> struct parser {
								//...
								auto next_move = Grammar::f(top_of_stack, current_term);
								//...
							}
						

How do we implement the input string?


							template <typename CharT, size_t N> struct fixed_string {
								CharT data[N+1];
								// constexpr constructor from const char[N]
								constexpr auto operator[](size_t i) const noexcept { return data[i]; }
								constexpr size_t size() const noexcept { return N; }
								constexpr auto operator<=>(const fixed_string &) = default;
							};
					
							template <typename CharT, size_t N>
							 fixed_string(const CharT[N]) -> fixed_string<CharT, N>;
						 
(more info about class NTTP in p0732 by Jeff Snyder ❤️ and Louis Dionne ❤️)
(Jeff's talk Data in the Type System: Complex Non-Type Template Parameters in C++20)

How do we implement the stack?


							template <typename... Ts> struct list { };
							
							template <typename... Ts, typename... As>
							 constexpr auto push(list<Ts...>, As...) -> list<As..., Ts...>;
							 
							template <typename T, typename... As>
							 constexpr auto pop(list<T, Ts...>) -> list<Ts...>;
							 
							template <typename T, typename... Ts>
							 constexpr auto top(list<T, Ts...>) -> T;
							 
							struct empty { };
							constexpr auto top(list<>) -> empty;
						

How does it fit together?


							constexpr bool ok = parser<pcre, "a*b*">::correct;
						

LL1 constexpr Parser


							template <typename Grammar, fixed_string Str> struct parser {
								
								static constexpr bool correct = parse(list<Grammar::start_symbol>{});
							
								// return current term
								template <size_t Pos> constexpr auto get_character() const {
									if constexpr (Pos < Str.size()) return term<Str[Pos]>{};
									else return epsilon{};
								}
							
								// prepare each step and move to next
								template <size_t Pos = 0, typename S> static constexpr bool parse(S stack) {
							
									auto symbol = top(stack);
									auto current_term = get_character<Pos>();
									
									auto next_move = decltype(Grammar::f(symbol, current_term)){};
							
									return move<Pos>(next_move, pop(stack));
								}
						
								// pushing something to stack (epsilon case included)
								template <size_t Pos, typename... Push, typename Stack>
								static constexpr bool move(list<Push...>, Stack stack) {
									
									return parse<Pos>(push(stack, Push{}...));
								}
				
								// move to next character
								template <size_t Pos, typename Stack>
								static constexpr bool move(pop_input, Stack stack) {
							
									return parse<Pos+1>(stack);
								}
				
								// reject
								template <size_t Pos, typename Stack>
								static constexpr bool move(reject, Stack) {	
									return false;
								}
				
								// accept
								template <size_t Pos, typename Stack>
								static constexpr bool move(accept, Stack) {
									return true;
								}
					
							};
						

We need more than
just returning a boolean.

A Pattern to the AST:
Let there be a type

How can we build a type from a string?

Where are the semantic actions placed?

 ()*+?|otherε
→ S( alt0 ) mod seq alt      other char mod seq alt ε
alt0( alt0 ) mod seq alt      other char mod seq alt  
alt ε   | seq0 alt alt  ε
modεε* star + plus ? opt εεε
seq0( alt0 ) mod seq      other char mod seq  
seq( alt0 ) mod concat seq ε   εother char mod concat seq ε
( pop              
)   pop            
*     pop          
+       pop        
?         pop      
|           pop    
other             pop  
Z0               accept

What do the Semantic Action
symbols look like?


							struct pcre {
								struct _char: action {};
								struct alpha: action {};
								struct digit: action {};
								struct seq: action {};
								struct star: action {};
								struct plus: action {};
								struct opt: action {};
								// ...
							}
						

What are the changes to the parser?


							template <typename Grammar, fixed_string Str> struct parser {
								
								template <typename Subject>
								static constexpr auto output = parse(list<Grammar::start_symbol>{}, Subject{});
							
								// return current term
								template <size_t Pos> constexpr auto get_character() const {
									if constexpr (Pos < Str.size()) return term<Str[Pos]>{};
									else return epsilon{};
								}
							
								// prepare each step and move to next
								template <size_t Pos = 0, typename S, typename T>
								static constexpr auto parse(S stack, T subject) {
							
									auto symbol = top(stack);
									
									if constexpr (is_semantic_action(symbol)) {
										auto previous_term = get_character<Pos-1>();
										
										auto next_subject = decltype( modify(symbol, previous_term, subject) ){};
										
										return parse<Pos>(pop(stack), next_subject);
									} else {
										auto current_term = get_character<Pos>();
										auto next_move = decltype(Grammar::f(symbol, current_term));
							
										return move<Pos>(next_move, pop(stack), subject);
									}
								}
						
								// pushing something to stack (epsilon case included)
								template <size_t Pos, typename... Push, typename Stack, typename Subject>
								static constexpr bool move(list<Push...>, Stack stack, Subject subject) {
									
									return parse<Pos>(push(stack, Push{}...), subject);
								}
				
								// move to next character
								template <size_t Pos, typename Stack, typename Subject>
								static constexpr bool move(pop_input, Stack stack, Subject subject) {
							
									return parse<Pos+1>(stack, subject);
								}
				
								// reject
								template <size_t Pos, typename Stack, typename Subject>
								static constexpr auto move(reject, Stack, Subject subject) {	
									return std::pair{false, subject};
								}
				
								// accept
								template <size_t Pos, typename Stack>
								static constexpr auto move(accept, Stack, Subject subject) {
									return std::pair{true, subject};
								}
								
								bool correct = output<list<>{}>.first;
								
								using type = decltype( output<list<>{}>.second );
							};
						

What does building from a string look like?

What about the modify function?

Building the AST on a stack


							// pushing a character
							template <character_like auto C, typename... Ts>
							 auto modify(pcre::_char, term<C>, list<Ts...>) 
							  -> list<ch<C>, Ts...>;
							  
							// concatenating a sequence (notice the switched order of A & B on the stack)
							template <character_like auto C, typename A, typename B, typename... Ts>
							 auto modify(pcre::seq, term<C>, list<B, A, Ts...>)
							  -> list<concat<A, B>, Ts...>;
							  
							// adding to a concatenated sequence
							template <character_like auto C, typename... As, typename B, typename... Ts>
							 auto modify(pcre::seq, term<C>, list<B, concat<As...>, Ts...>)
							  -> list<concat<As..., B>, Ts...>;
							  
							// making an alternate (again, switched order of A & B)
							template <character_like auto C, typename A, typename B, typename... Ts>
							 auto modify(pcre::alt, term<C>, list<B, A, Ts...>)
							  -> list<alt<A, B>, Ts...>;
							  
							// adding to an alternate
							template <auto C, typename... As, typename B, typename... Ts>
							 auto modify(pcre::alt, term<C>, list<B, alt<As...>, Ts...>)
							  -> list<alt<As..., B>, Ts...>;
							  
							// making something optional
							template <character_like auto C, typename Opt, typename... Ts>
							 auto modify(pcre::opt, term<C>, list<Opt, Ts...>)
							  -> list<opt<Opt>, Ts...>;
							  
							// making a plus cycle
							template <character_like auto C, typename Plus, typename... Ts>
							 auto modify(pcre::plus, term<C>, list<Plus, Ts...>)
							  -> list<plus<Plus>, Ts...>;
							
							// making a star cycle
							template <character_like auto C, typename Star, typename... Ts>
							 auto modify(pcre::star, term<C>, list<Star, Ts...>)
							  -> list<star<Star>, Ts...>;
						

We have the AST!


						static_assert(std::is_same_v<
							,
							decltype( top(parser<pcre,"">::type) )
						>);
						

How do we match
a regular expression
in the form of an AST?

Finite Automaton

"And if thou gaze long at a finite automaton,
a finite automaton also gazes into thee."
– Friedrich Nietzsche

What's a finite automaton?

(Q, Σ, δ, q0, F)
  • a finite set of states Q
  • a finite set of input symbols (the alphabet) Σ
  • a transition function δ: Q × Σ →
    • P(Q) (nondeterministic FA)
    • Q (deterministic FA)
  • a start state q0 ∈ Q
  • a set of accept (final) states F ⊆ Q
A finite automaton accepts exactly the same class of languages as a regular expression.

Example: Integral Numbers
[0-9]+

How do you convert
a RegEx into an FA?

∅ (empty set)

There is a direct mapping between
the RE definition and operations over the FAs.

And we need to map it onto C++ code.

How to represent an FA in C++?

(Q, Σ, δ, q0, F)
  • a finite set of states Qset<int> int (implicit)
  • a finite set of input symbols (the alphabet) Σchar32_t
  • a transition function δ: Q × Σ →
    • P(Q) (nondeterministic FA) set<tuple<int, int, char32_t>>
    • Q (deterministic FA)
  • a start state q0 ∈ Q int(0)
  • a set of accept (final) states F ⊆ Q set<int>

constexpr fixed_set


							template <size_t Sz, typename T> class fixed_set {
								T data[Sz];
							public:
								constexpr auto begin();
								constexpr auto end();
								constexpr auto begin() const;
								constexpr auto end() const;
			
								constexpr size_t size() const;
								constexpr auto & operator[](size_t);
								constexpr const auto & operator[](size_t) const;
								
								constexpr auto insert(T);
								
								template <typename K> constexpr auto erase(K);
								constexpr auto erase(T *);
								
								constexpr set();
								// ...
							};
						

constexpr finite_automaton


							struct transition {
								int source;
								int target;
								char32_t term;
								
								constexpr transition(int, int, char32_t);
								constexpr bool match(char32_t c) const;
								// comparable with int against source
							};
							
							template <size_t Tr, size_t FSz> struct finite_automaton {
								fixed_set<Tr, transition> transitions; // sorted as source + target tuple
								fixed_set<FSz, int> final_states;
								
								// normal constructor / copy constructor ...
							};
						

Basic building blocks


								static constexpr auto empty = finite_automaton<0,0>{};
							
								static constexpr auto epsilon = finite_automaton<0,1>{{}, {0}};
							
								template <char32_t C> static constexpr auto one_char = finite_automaton<1,1>{
									{transition(0, 1, C)},
									{1}
								};
						

Concatenation algorithm

Concatenation algorithm


							template <finite_automaton Lhs, finite_automaton Rhs> struct concat_two {
								
								constexpr static auto calculate() {
									// precalculate size
									constexpr size_t tr_count = Lhs.transitions.size() + Rhs.transitions.size();
									constexpr size_t tr_start = Rhs.count_transitions(0);
								
									// calculate the first available state id
									constexpr int prefix = Lhs.get_max_state() + 1;
								
									// create output object
									finite_automaton<(tr_count + tr_start), Rhs.final_states.count()> out;
								
									// copy left FA into output
									copy(Lhs.transitions.begin(), Lhs.transitions.end(), out.transitions.begin());
							
									// connect left final states to the right start state
									for (int f: Lhs.final_states) {
										Rhs.foreach_transition_from(0, [&](transition t){
											t.source = f;
											t.target += prefix;
											out.add_transition(t);
										});
									}
								
									// copy right FA into output
									for (transition t: Rhs.transitions) {
										t.source += prefix;
										t.target += prefix;
										out.add_transition(t);
									}
								
									// mark final states from right as the final states
									for (int f: Rhs.final_states) out.mark_final(f + prefix);
									
									return out;
								}
								
								// store the output
								static constexpr auto value = calculate();
							};
							
							// shorthand
							template <finite_automaton Lhs, finite_automaton Rhs> 
							  static constexpr auto concat = concat_two<Lhs, Rhs>::value;
						

Other algorithms

  • alternation
  • repeating (Kleene star)

Both other algorithms work the same,
pre-calculate the size and then populate the content.

constexpr auto fa = concat<one_char<'a'>, one_char<'b'>>;

  • I called it a "compile time" function with compile-time arguments.
  • Unlike a constexpr function, it can take constexpr variables as arguments
    and maintain their "constexpr-bility".
  • It makes sure the value is cached.

Determinization

Remove nondeterministic transitions, can generate 2n new states.

Determinization algorithm

Determinization

Remove nondeterministic transitions, can generate 2n new states.

Constexpr implementation must be iterative,
as the whole output can't be calculated in one step.

It's useful to minimize the FA after the determinization.

We have a deterministic finite automaton

The constexpr problem (NFA determinization)

"Abstraction good ... but also bad"
– Chandler Carruth

constexpr problems

  • every operation counts
    • there are limits (and different in every compiler)
  • move semantics don't matter
  • problematic error handling & debugging
    • use a conditional UB or throw-statement as form of an assert
    • printf styled debugging
      template <typename T> struct identify_type;

Matching
a Regular Expression

"hello" =~ /aloha|[a-z]+/

Matching and searching


							ctre::match<"aloha|[a-z]+">("aloha"sv);
					
							// search(x) is actually match(.*x.*)
							ctre::search<"aloha|[a-z]+">("blahblah aloha blahblah"sv);
						

Deterministic match wrapper


							template <fixed_string re> bool fast_match(const std::forward_range auto & rng) {
								static_assert(parser<pcre, re>::correct);
								using RE = parser<pcre, re>::type;
								
								constexpr auto dfa = fa::minimize<fa::determinize<nfa_from<RE>>>;
							
								return match_state<dfa>(rng.begin(), rng.end());
							}
						

Building the FA from a RE's AST


							namespace fa {
								static constexpr auto empty = finite_automaton<0, 0>{};
							}
							
							template <typename RE>
							static constexpr auto nfa_from = build(fa::empty, list<RE>{});
						

RE is a type, finite_automaton is a value.

Building the empty RE


							auto build(const fa_like auto seed, list<>) {
								// for an empty RE seed is empty FA
								
								return seed;
								
							}
						

Building epsilon (empty string)


							namespace fa {
								// epsilon is FA with a start state final
								static constexpr auto epsilon = finite_automaton<0, 1>{{}, {0}};
							}
							
							template <typename... Ts>
							auto build(const fa_like auto lhs, list<epsilon, Ts...>) {
								
								return build(fa::concat<lhs, fa::epsilon>, list<Ts...>{});
								
							}
						

Building a character


							namespace fa {
								// epsilon is FA with a start state final
								template <char32_t Term> static constexpr auto 
								character = finite_automaton<1, 1>{{transition{0, 1, Term}}, {1}};
							}
							
							template <char32_t Term, typename... Ts>
							auto build(const fa_like auto lhs, list<character<Term>, Ts...>) {
								
								return build(fa::concat<lhs, fa::character<Term>>, list<Ts...>{});
								
							}
						

Building a concat sequence


							template <typename... Content, typename... Ts>
							auto build(const fa_like auto lhs, list<concat<Content...>, Ts...>) {
								
								return build(lhs, list<Content..., Ts...>{});
								
							}
						

Building an alternation


							template <typename... Options, typename... Ts>
							auto build(const fa_like auto lhs, list<alt<Options...>, Ts...>) {
								
								constexpr auto inner = fa::alternative<build(fa::empty, Options)...>;
								
								return build(fa::concat<lhs, inner>, list<Ts...>{});
								
							}
						

Building a Kleene star (cycle)


							template <typename Content, typename Ts...>
							auto build(const fa_like auto seed, list<star<Content>, Ts...>) {
								
								auto content = build(fa::empty, Content);
								
								return build(fa::concat<lhs, fa::star<content>>, list<Ts...>{});
								
							}
						

We have a finite automaton,
now we need to do the matching.

Deterministic match wrapper


							template <fixed_string re> bool fast_match(const std::forward_range auto & rng) {
								static_assert(parser<pcre, re>::correct);
								using RE = parser<pcre, re>::type;
								
								constexpr auto dfa = fa::minimize<fa::determinize<nfa_from<RE>>>;
							
								return match_state<dfa>(rng.begin(), rng.end());
							}
						

Matching a state


							template <finite_automaton DFA, int State = 0, typename Iterator>
							constexpr bool match_state(Iterator it, Iterator end) noexcept {
								
								// we can end correctly only if this state is final
								if constexpr (DFA.is_final(State)) {
									if (end == it) return true;
								} else {
									if (end == it) return false;
								}
	
								// transitions are sorted by source state
								constexpr auto index = DFA.first_transition_index_of(State);

								// tail-recursion
								return choose_transition<DFA, index, State>(it, end);
							}
						

Matching a transition


							template <finite_automaton DFA, int Index, int State, typename Iterator>
							constexpr bool choose_transition(Iterator it, const Iterator end) noexcept {
								// check if we are still in same state
								if constexpr (Index >= DFA.transitions.size()
								  || DFA.transitions[Index].source != State) {
									
									return false;
								}
								
								if (DFA.transitions[Index].match(*it)) {
									// this transition is the one, go to next state (tail recursion)
									return match_state<DFA, DFA.transitions[Index].target>(std::next(it), end);
								}
								
								// try next one (also tail recursion)
								return choose_transition<DFA, Index+1, State>(it, end);
							}
						

The deterministic matching

  • O(n * t) time complexity
  • O(1) dynamic memory complexity

n = length of the input, t = average number of transitions per state

  • doesn't support capture tracking
  • theoretically the quickest approach
  • allows streamed data

Other matching algorithms

  • NFA backtracking (used for captures in CTRE)
  • NFA without backtracking

Comparison

Let's read some assembly :)

Benchmarks

Measurement methodology

  • grep-like scenario searching in CSV file (1.3GiB) with 6.5 MLoC
  • Median of 3 measurements
  • MacBook Pro 13" 2016 i7 3.3Ghz
  • GCC 9.1 & clang 8 (-std=2a -O3)

Runtime Searching (clang): [a-z0-9]+[0-9]

Measured code (grep-like utility)


							#include <everything>
							
							int main(int argc, char ** argv) {
								auto re = PREPARE("PATTERN");
								auto lines = load_file_by_lines(argv[1]);
								
								size_t count = 0;
								
								// some warm-up
								
								auto start = steady_clock::now();
								for (string_view line: lines) {
									if (re.SEARCH(line)) count++;
								}
								auto end = steady_clock::now();
								
								// print results
								
							}
						

Compilation (clang): [a-z0-9]+[0-9]

Future of the CTRE library

  • Better (constexpr) parser (which can still emit type)
  • NFA matching with captures

I really need runtime RegEx!

Check Hal Finkel's talk:
Faster Compile Times and Better Performance:
Bringing Just-in-Time Compilation to C++

(on Friday 9:00)

Ok, Five things I showed you…

  • How regular expression engines work.
  • Connection between theory and code.
  • Parsing of a pattern during compilation.
  • Building of a finite automaton.
  • Comparison between different regex implementations.

And four things you should remember…

  • Compilers can optimize your code, if you help them.
  • Combining TMP and constexpr code can
    help you express things more precisely.
  • Stop pretending that constexpr C++ is a compiled language.
  • Try the CTRE library, not just for speed but for code clarity.

Thank you!

💁🏻‍♀️ compile-time.re