Chapter 2 Working with Numbers and Strings
2.1 Converting between numeric and string types
Use
std::to_string()
to convert a numeric (including integral and floating point type) to string.Use
std::stoi()
to convert a string to an integer type. Other than the string, it accepts another two parameters, which are the address of variable to store the number of characters processed and the number indicating the base (default is 10).Note that the
0
(and0x
) prefix in the string is only valid when the base is 0 or 8 (0 or 16).Use
std::stod()
to convert a string to a double type. It doesn’t accept the number indicating the base explicitly while the string still has several forms like decimal floating point (containinge
), binary floating point (containing0x
andp
),inf
andnan
.The functions converting string to numeric types can throw two exceptions potentially, which are
std::invalid_argument
andstd::out_of_range
.
2.2 Limits and other properties of numeric types
std::numeric_limits
, which is a class template, provides some information about numeric types, among which the most common used is::min()
and::max()
.- Since C++11, all static members of
std::numeric_limits
areconstexpr
, which can be used everywhere including as constant expression, so the C-style macro of numeric properties can be deprecated completely. - reference
2.3 Generating pseudo-random numbers
When talking about random numbers in modern C++, we need to be clear about two concepts: engines and distributions:
- Engines are used to produce random numbers with a uniform distribution.
- Distributions are used to convert the output of engine to a specified distribution.
So things are clear: choose an engine to produce a random number and use a distribution to convert it to, say, a range we want:
c++1
2
3
4
5
6std::random_device rd{};
auto mtgen = std::mt19937{ rd() };
auto ud = std::uniform_int_distribution<>{ 1, 6 };
for (auto i = 0; i < 20; ++i) {
auto number = ud(mtgen);
}First, we use
std::random_device
engine to produce a random number as seed. Then use it to seed another enginestd::mt19937
, which will be used by distributions later. And then define a uniform distribution to limit the range to between 1 and 6. Finally invoke the distribution with the chosen engine to produce random numbers in the range we want.
2.5 Creating cooked user-defined literals
Since C++11, we can create cooked user-defined literals with
operator""
:c++1
2
3
4
5
6constexpr size_t operator"" _KB(const unsigned long long size) {
return static_cast<size_t>(size * 1024);
}
auto size{ 4_KB }; // size_t size = 4096;
auto buffer = std::array<byte, 1_KB>{};There are some points to mention:
- For integral type, the argument needs to be
unsigned long long
and for floating-point type, it needs to belong double
, i.e. literals should handle the largest possible values. - It’s recommended to define the literal operator in a separate namespace and then
using
it to avoid name collision. - It’s also recommended to prefix the user-defined suffix with an underscore (
_
) to avoid conflict with standard literal suffix introduced in C++14 (such ass
,min
and so on).
- For integral type, the argument needs to be
2.6 Creating raw user-defined literals
Raw literal operators, as fallbacks of cooked literal operators, accept a string of char as parameter:
c++1
2T operator "" _suffix(const char*);
template<char...> T operator "" _suffix();
2.7 Using raw string literals to avoid escaping characters
Raw string literals has two forms:
c++1
2R"( literal )"
R"delimiter( literal )delimiter"The principle is what you see is what you get, e.g.:
c++1
2
3
4
5
6auto sqlselect {
R"(SELECT *
FROM Books
WHERE Publisher='Paktpub'
ORDER BY PubDate DESC)"s
};even the
\n
will be included in the string.
2.8 Creating a library of string helpers
One thing worth noting is that return value of
remove()
algorithm is the first iterator after the new range, so an extraerase()
is needed:c++1
2std::string str = "Text with some spaces";
str.erase(std::remove(str.begin(), str.end(), ' '), str.end());
2.9 Verifying the format of a string using regular expressions
Use a regular expression to match against a string:
c++1
2
3auto pattern {R"(^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$)"s};
auto rx = std::regex{pattern};
auto valid = std::regex_match("marius@domain.com"s, rx);When constructing the
std::regex
, we can specify some extra options. e.g. to ignore letter case:c++1
auto rx = std::regex{pattern, std::regex_constants::icase};
Actually
std::regex_match()
has several overloads, among which there is one to return the matched subexpressions:c++1
2
3auto rx = std::regex{R"(^([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,})$)"s};
auto result = std::smatch{};
auto success = std::regex_match(email, result, rx);Note that three pairs of parentheses in the regular expressions, which indicates the subexpression needed to match. After calling
std::regex_match()
, the matching results can be queried from thestd::smatch
:c++1
2
3
4cout << result[0].str() << endl; // the entire expression
cout << result[1].str() << endl; // subexpression 1
cout << result[2].str() << endl; // subexpression 2
cout << result[3].str() << endl; // subexpression 3
2.10 Parsing the content of a string using regular expressions
Just like
std::regex_match()
, we can usestd::regex_search()
to parse the content of a string:c++1
2
3
4auto match = std::smatch{};
if (std::regex_search(text, match, rx)) {
std::cout << match[1] << '=' << match[2] << std::endl;
}However,
std::regex_search()
just performs a one-time search, i.e. it won’t iterate over the string to find all substrings that match. To solve this, we could usestd::sregex_iterator
orstd::sregex_token_iterator
:c++1
2
3
4
5auto end = std::sregex_iterator{};
for (auto it = std::sregex_iterator{ std::begin(text), std::end(text), rx };
it != end; ++it) {
std::cout << (*it)[1] << '=' << (*it)[2] << std::endl;
}
2.11 Replacing the content of a string using regular expressions
Use
std::regex_replace()
to replace the content of a string. The parameters of it are as follows:- the input string on which the replacement will be performed,
- a
std::basic_regex
that is used to match against, - the string format that is used to replace,
- and some flags.
c++1
2
3auto text{ "bancila, marius"s };
auto rx = std::regex{ R"((\w+),\s*(\w+))"s };
auto newtext = std::regex_replace(text, rx, "$2 $1"s);The last two parameters are worth mentioning. The string format can use a match identifier to indicate a substring. e.g.
$1
means the first subexpression matched,$&
means the entire match,$'
means the substring after the last match and so on.And as the last parameter, the flags can be something like
std::regex_constants::format_first_only
, which means just replace once.
2.12 Using string_view instead of constant string references
C++17 introduces
std::string_view
, which is a non-owning (doesn’t manage lifetime of the data) constant (cannot modify) reference to a string, to solve the problem of performance cost due to temporary string objects.std::string_view
provides interfaces which are almost the same withstd::string
so typically we can almost always replaceconst std::string &
withstd::string_view
unless astd::string
is indeed needed.Essentially
std::string_view
just holds a pointer to the start position of the character sequence and a length of it.It provides
remove_prefix()
andremove_suffix()
methods to resize the range.std::string_view
can be constructed from astd::string
and vice versa.