std::regex_token_iterator

From cppreference.com
< cpp‎ | regex
Regular expressions library
Classes
(C++11)
(C++11)
(C++11)
Algorithms
(C++11)
(C++11)
(C++11)
Iterators
(C++11)
regex_token_iterator
(C++11)
Exceptions
(C++11)
Traits
(C++11)
Constants
(C++11)
(C++11)
Regex Grammar
Defined in header <regex>
template <

class BidirIt,
class CharT = typename std::iterator_traits <BidirIt> :: value_type,
class Traits = std::regex_traits <CharT>

> class
(since C++11)

std::regex_token_iterator is a read-only LegacyForwardIterator

On construction, it constructs an std::regex_iterator and on every increment it steps through the requested sub-matches from the current match_results, incrementing the underlying std::regex_iterator

The default-constructed std::regex_token_iterator is the end-of-sequence iterator. When a valid std::regex_token_iterator

Just before becoming the end-of-sequence iterator, a std::regex_token_iterator may become a suffix iterator, if the index -1

A typical implementation of std::regex_token_iterator holds the underlying std::regex_iterator, a container (e.g. std::vector < int > ) of the requested submatch indices, the internal counter equal to the index of the submatch, a pointer to std::sub_match, pointing at the current submatch of the current match, and a std::match_results

Type requirements

-
BidirIt must meet the requirements of LegacyBidirectionalIterator

Specializations

Several specializations for common character sequence types are defined:

Defined in header <regex>
Type Definition
std::cregex_token_iterator std:: regex_token_iterator < const char * >
std::wcregex_token_iterator std:: regex_token_iterator < const wchar_t * >
std::sregex_token_iterator std:: regex_token_iterator < std::string ::const_iterator>
std::wsregex_token_iterator std:: regex_token_iterator < std::wstring ::const_iterator>

Member types

Member type Definition
value_type std::sub_match<BidirIt>
difference_type std::ptrdiff_t
pointer const value_type*
reference const value_type&
iterator_category std::forward_iterator_tag
iterator_concept (C++20) std::input_iterator_tag
regex_type std::basic_regex<CharT, Traits>

Member functions

constructs a new regex_token_iterator
(public member function)
(destructor)
(implicitly declared)
destructs a regex_token_iterator, including the cached value
(public member function)
assigns contents
(public member function)
(removed in C++20)
compares two regex_token_iterators
(public member function)
accesses current submatch
(public member function)
advances the iterator to the next submatch
(public member function)

Notes

It is the programmer's responsibility to ensure that the std::basic_regex object passed to the iterator's constructor outlives the iterator. Because the iterator stores a std::regex_iterator

Example

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <regex>
 
int main()
{
    // Tokenization (non-matched fragments)
    // Note that regex is matched only two times; when the third value is obtained
    // the iterator is a suffix iterator.
    const std::string text = "Quick brown fox.";
    const std::regex ws_re("\\s+"); // whitespace
    std::copy(std::sregex_token_iterator(text.begin(), text.end(), ws_re, -1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
 
    std::cout << '\n';
 
    // Iterating the first submatches
    const std::string html = R"(<p><a href="http://google.com">google</a> )"
                             R"(< a HREF ="http://cppreference.com">cppreference</a>\n</p>)";
    const std::regex url_re(R"!!(<\s*A\s+[^>]*href\s*=\s*"([^"]*)")!!", std::regex::icase);
    std::copy(std::sregex_token_iterator(html.begin(), html.end(), url_re, 1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
}

Output:

Quick
brown
fox.
 
http://google.com
http://cppreference.com

Defect reports

The following behavior-changing defect reports were applied retroactively to previously published C++ standards.

DR Applied to Behavior as published Correct behavior
LWG 3698
(P2770R0)
C++20 regex_token_iterator was a forward_iterator
while being a stashing iterator
made input_iterator[1]
  1. iterator_category was unchanged by the resolution, because changing it to std::input_iterator_tag