Guide¶
Installation and building¶
NOTE: the following instructions assume a Unix-like system with a command line.
If your development environment is different (e.g. Visual Studio), these steps cannot be followed verbatim.
Some IDEs, like Visual Studio, have CMake integration built into them.
In some other environments, you may have to use the CMake GUI or your system’s command line.
To build a CMake project without make
, use cmake --build .
.
scnlib
uses CMake for building.
If you’re already using CMake for your project, integration is easy with find_package
.
$ mkdir build
$ cd build
$ cmake ..
$ make -j
$ make install
# Find scnlib package
find_package(scn CONFIG REQUIRED)
# Target which you'd like to use scnlib
# scn::scn-header-only to use the header-only version
add_executable(my_program ...)
target_link_libraries(my_program scn::scn)
Tests and benchmarks¶
To build and run the tests and benchmarks for scnlib, clone the repository, and build it with CMake.
Building and running tests:
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make -j
$ ctest -j4
Building and running the runtime performance benchmarks:
# create build directory like above
$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON -DSCN_NATIVE_ARCH=ON ..
$ make -j
# disable CPU frequency scaling
$ sudo cpupower frequency-set --governor performance
# run the benchmark of your choosing
$ ./benchmark/runtime/integer/bench-int
# re-enable CPU frequency scaling
$ sudo cpupower frequency-set --governor powersave
Without CMake¶
Headers for the library can be found from the include/
directory, and source files from the src/
directory.
Building and linking the library:
$ mkdir build
$ cd build
$ c++ -c -I../include/ ../src/*.cpp
$ ar rcs libscn.a *.o
libscn.a
can then be linked, as usual.
# in your project
$ c++ ... -Lpath-to-scn/build -lscn
When building as header-only, src/
has to be in the include path, and SCN_HEADER_ONLY
must be defined to 1
:
In this case, a separate build stage obviously isn’t required
# in your project
$ c++ ... -Ipath-to-scn/include -Ipath-to-scn/src -DSCN_HEADER_ONLY=1
Basic usage¶
scn::scan
can be used to parse various values from a source range.
A range is an object that has a beginning and an end.
Examples of ranges are string literals, std::string
and std::vector<char>
.
Objects of these types, and more, can be passed to scn::scan
.
To learn more about the requirements on these ranges, see the API documentation on source ranges.
After the source range, scn::scan
is passed a format string.
This is similar in nature to scanf
, and has virtually the same syntax as std::format
and {fmt}.
In the format string, arguments are marked with curly braces {}
.
Each {}
means that a single value is to be scanned from the source range.
Because scnlib
uses variadic templates, type information is not required in the format string,
like it is with scanf
(like %d
).
After the format string, references to arguments to be parsed are given.
// Scanning an int
int i;
scn::scan("123", "{}", i):
// i == 123
// Scanning a double
double d;
scn::scan("3.14", "{}", d);
// d == 3.14
// Scanning multiple values
int a, b;
scn::scan("0 1 2", "{} {}", a, b);
// a == 0
// b == 1
// Note, that " 2" was not scanned,
// because only two integers were requested
// Scanning a string means scanning a "word" --
// that is, until the next whitespace character
// this is the same behavior as with iostreams
std::string str;
scn::scan("hello world", "{}", str);
// str == "hello"
Compare the above example to the same implemented with std::istringstream
:
int i;
std::istringstream{"123"} >> i;
double d;
std::istringstream{"3.14"} >> d;
int a, b;
std::istringstream{"0 1 2"} >> a >> b;
std::string str;
std::istringstream{"hello world"} >> str;
Or with sscanf
:
int i;
std::sscanf("123", "%d", &i);
double d;
std::sscanf("3.14", "%lf", &d);
int a, b;
std::sscanf("0 1 2", "%d %d", &a, &b);
// Not really possible with scanf!
// Using a fixed size buffer
char buf[6] = {0};
std::sscanf("hello world", "%5s", buf);
// buf == "hello"
Error handling and return values¶
scnlib does not use exceptions.
The library compiles with -fno-exceptions -fno-rtti
and is perfectly usable without them.
Instead, it uses return values to signal errors.
This return value is truthy if the operation succeeded.
Using the .error()
member function more information about the error can be gathered.
If an error occurs, the arguments that were not scanned are not written to. Beware of using possibly uninitialized variables.
int i;
// "foo" is not an integer
auto result = scn::scan("foo", "{}", i);
// fails, i still uninitialized
if (!result) {
std::cout << result.error().msg() << '\n';
}
Unlike with scanf
, partial successes are not supported.
Either the entire scanning operation succeeds, or a failure is returned.
int a, b;
// "foo" is still not an integer
auto result = scn::scan("123 foo", "{} {}", a, b);
// fails -- result == false
// a is written to, a == 123
// b is still uninitialized
Oftentimes, the entire source range is not scanned, and the remainder of the range may be useful later.
The leftover range can be accessed with the member function .range()
.
int i;
auto result = scn::scan("123 456", "{}", i);
// result == true
// i == 123
// result.range() == " 456"
result = scn::scan(result.range(), "{}", i);
// result == true
// i == 456
// result.range() == ""
The return type of .range()
is a valid range, but it is of an library-internal, user-unnameable type.
Its type is not the same as the source range.
If possible for the given source range type, the .reconstruct()
member function can be used to create a range of the original source range type.
Note, that .reconstruct()
is not usable with string literals.
std::string source{"foo bar"};
std::string str;
auto result = scn::scan(source, "{}", str);
// result == true
// str == "foo"
// result.reconstruct() == " bar"
The result type has some additional useful member functions. These include:
.empty()
: returnstrue
if the leftover range is empty, meaning that there are definitely no values to scan from the source range any more.
.string()
,.string_view()
, and.span()
: like.reconstruct()
, except they work for every contiguous range, and return a value of the type specified in the function name
See the API documentation for more details.
To enable multiple useful patterns, the library provides a function scn::make_result
.
This function will return the result object for a given source range, that can be later reassigned to.
For example:
auto result = scn::make_result("foo");
int i;
if (result = scn::scan(result.range(), "{}", i)) {
// success
// i is usable
} else {
// failure
// result contains more info, i not usable
}
Or:
auto result = scn::make_result("123 456");
int i;
while (result = scn::scan(result.range(), "{}", i)) {
// success
// i is usable:
// iteration #1: i == 123
// iteration #2: i == 456
}
// failure
// can either be an invalid value or EOF
// in this case, it's EOF
// i not modified, still i == 456
Files and stdin
¶
To easily read from stdin
, use scn::input
or scn::prompt
.
They work similarly scn::scan
.
// Reads an integer from stdin
int i;
auto result = scn::input("{}", i);
// Same, but with an accompanying message sent to stdout
int i;
auto result = scn::prompt("Write an integer: ", "{}", i);
To use scn::scan
with stdin
, use scn::cstdin()
.
It returns a scn::file&
, which is a range mapping to a FILE*
.
int i;
auto result = scn::scan(scn::cstdin(), "{}", i);
scn::input
and scn::prompt
sync with <cstdio>
automatically,
so if you wish to mix-and-match scn::input
and scanf
, it’s possible without any further action.
scn::scan
and scn::cstdin()
don’t do this, but you must explicitly call scn::cstdin().sync()
when synchronization is needed.
int i, j;
scn::input("{}", i);
std::scanf("%d", &j);
int i, j;
scn::scan(scn::cstdin(), "{}", i);
scn::cstdin().sync(); // needed here, because we wish to use <cstdio>
std::scanf("%d", &j);
You can also scan from other file handles than stdin
.
You can either use scn::file
or scn::owning_file
, depending on if you want to handle the lifetime of the FILE*
yourself, or let the library handle it, respectively.
auto f = std::fopen("file.txt", "r");
scn::file file{f};
f.close();
// file now unusable
scn::owning_file file{"file.txt", "r"};
Both scn::file
and scn::owning_file
are valid source ranges, and can be passed to scn::scan
and other scanning functions.
scn::owning_file
is a child class of scn::file
, so scn::owning_file& -> scn::file&
is a valid conversion.
There’s also scn::mapped_file
for easier management of memory mapped files, see the API documentation for more.
Other scanning functions¶
scn::scan_default
¶
Oftentimes, specific parsing configuration through the format string is not required.
In this case, scn::scan_default
can be used.
Using it has some performance benefits, as a format string doesn’t need to be parsed.
Using scn::scan_default
with N args has the same semantics as
using scn::scan
with a format string with N space-separated "{}"
s.
int a, b;
auto result = scn::scan_default("123 456", a, b);
// result == true
// a == 123
// b == 456
// Equivalent to:
int a, b;
auto result = scn::scan("123 456", "{} {}", a, b);
scn::scan_value
¶
If you only wish to scan a single value with default options, you can avoid using output arguments by using scn::scan_value
.
The return value of scn::scan_value<T>
contains a .value()
member function that returns a T
if the operation succeeded.
auto result = scn::scan_value<int>("123");
// result == true
// result.value() == 123
As is evident by the presence of an extra member function, the return type of scan_value
is not the same as the one of scan
.
The return type of scan
inherits from scn::wrapped_error
, but the return type of scan_value
inherits from scn::expected<T>
.
To use make_result
with make_value
, this needs to be taken into account:
auto result = scn::make_result<scn::expected<int>>(...);
result = scn::scan_value<int>(result.range());
The return types of scan
and scan_value
are not compatible, and cannot be assigned to each other.
Localization: scn::scan_localized
¶
By default, scnlib
isn’t affected by changes to the global C or C++ locale.
All functions will behave as if the global locale was set to "C"
.
A std::locale
can be passed to scn::scan_localized
to scan with a locale.
This is mostly used with numbers, especially floats, giving locale-specific decimal separators.
Because of the way std::locale
is, parsing with a locale is significantly slower than without one.
This is because the library effectively has to use iostreams for parsing.
// Reads a localized float
double d;
auto result = scn::scan_localized(std::locale{"fi_FI.UTF-8"}, "2,73", "{}", d);
// result == true
// d == 2.73
Because scan_localized
uses iostreams under the hood, the results will not be identical to scn::scan
,
even if std::locale::classic()
was passed.
scn::getline
¶
scn::getline
works similarly to std::getline
.
It takes a range to read from, a string to read into, and optionally a delimeter character defaulting to '\n'
.
std::string line;
auto result = scn::getline("first\nsecond\nthird", line);
// result == true
// line == "first"
// result.range() == "second\nthird" (note that the delim '\n' is skipped)
// setting '\n' explicitly
result = scn::getline(result.range(), line, '\n');
// result == true
// line == "second"
// result.range() == "third"
// delim doesn't have to be '\n' or even whitespace
result = scn::getline(result.range(), line, 'r');
// result == true
// line == "thi"
// result.range() == "d"
If the string to read into passed to scn::getline
is a scn::string_view
, and the source range is contiguous,
the string_view
is modified to point into the source range.
This increases performance (no copying or memory allocations) at the expense of lifetime safety.
std::string source = "foo\nbar";
scn::string_view line;
auto result = scn::getline(source, line);
// result == true
// line == "foo"
// result.range() == "bar"
// line.data() == source.data() (point to the same address -- `line` points to `source`)
scn::ignore_until
and scn::ignore_until_n
¶
scn::ignore_until_n
is like std::istream::ignore
.
It takes an integer N and a character C,
and reads the source range until either N characters were read or character C was found from the source range.
scn::ignore_until
works in the same way, except the only condition for stopping to read is finding the end character.
This is effectively equivalent to passing std::numeric_limits<std::ptrdiff_t>::max()
as N to scn::ignore_until_n
.
scn::scan_list
and temporaries¶
To easily scan multiple values of the same type, scn::scan_list
can be used.
It takes a source range and a container to write the scanned values to.
Its return type is similar to that of scn::scan
.
std::vector<int> list;
auto result = scn::scan_list("123 456 789", list);
// result == true
// list == [123, 456, 789]
scn::scan_list
can also be passed a third argument marking a delimeter character:
std::vector<int> list;
auto result = scn::scan_list("123, 456, 789", list, ',');
// result == true
// list == [123, 456, 789]
scn::scan_list
will read until an invalid value or delimeter is found or the source range is exhausted.
scn::scan_list_until
can be used to control this behavior.
As its third argument, it takes a character, until which it will read the source range, similar to getline
.
The delimeter character argument is still the last argument and optional.
std::vector<int> list;
auto result = scn::scan_list_until("123 456 789\n123", list, '\n');
// result == true
// list == [123, 456, 789]
// result.range() == "123"
If you’ve already allocated memory for the list, scan_list
and scan_list_until
can be passed a scn::span
.
Because the container must be passed to scan_list
as an lvalue reference, the span
must be constructed separately,
which can be tedious.
The library provides some helpers for this.
// Doing everything explicitly
std::vector<int> list(64, 0);
auto span = scn::make_span(list);
auto result = scn::scan_list("123 456 789", span);
// result == true
// list == span == [123, 456, 789]
// Using scn::temp
// Takes an rvalue and makes it usable as an argument to scanning functions requiring an lvalue reference
// Useful with spans and other views
std::vector<int> list(64, 0);
auto result = scn::scan_list("123 456 789", scn::temp(scn::make_span(list)));
// result == true
// list == span == [123, 456, 789]
// Using scn::make_span_list_wrapper
// Takes a container and returns a span into it, already wrapped with scn::temp
// Effectively equivalent to the example above
std::vector<int> list(64, 0);
auto result = scn::scan_list("123 456 789", scn::make_span_list_wrapper(list));
// result == true
// list == span == [123, 456, 789]
scn::temp
can be also utilized elsewhere
std::vector<char> buffer(64, 0);
// Reads up to 64 chars into the buffer
auto result = scn::scan_default(source, scn::temp(scn::make_span(buffer)));
User types¶
To scan a value of a program-defined type, specialize scn::scanner
struct int_and_double {
int i;
double d;
};
template <typename CharT>
struct scn::scanner<CharT, int_and_double> : scn::empty_parser<CharT> {
template <typename Context>
error scan(int_and_double& val, Context& ctx)
{
auto r = scn::scan(ctx.range(), "[{}, {}]", val.i, val.d);
ctx.range() = std::move(r.range());
return r.error();
}
};
// ...
int_and_double val;
auto result = scn::scan("[123, 3.14]", "{}", val);
// result == true
// val.i == 123
// val.d == 3.14
The above example inherits from scn::empty_parser
.
This implements the format string functionality for this type.
scn::empty_parser
is a good default choice, as it only accepts empty format strings.
You could also inherit from other scanner types (like scn::scanner<CharT, int>
),
or implement parse()
by hand (see reader.h
in the library source code).
Alternatively, you could also include the header <scn/istream.h>
.
This enables scanning of types with a std::istream
compatible operator>>
.
Using this functionality is discouraged, as using iostreams to scan these values presents some difficulties with error recovery,
and will lead to worse performance.
Specializing scn::scanner
should be preferred.
Tuple-based scanning API¶
By including <scn/tuple_return.h>
, you’ll get access to an alternative API,
which returns the scanned values in a tuple instead of output parameters.
See Rationale for why this is not the default API.
These functions are slightly slower compared to their output-parameter equivalents, both in runtime and compile time.
#include <scn/tuple_return.h>
// Way more usable with C++17 structured bindings
// Can also be used without them
auto [result, i] = scn::scan_tuple<int>("123", "{}");
// result == true
// i == 123
Miscellaneous¶
string
vs string_view
vs span<char>
¶
Three types that at first glance might appear quite similar, have significant differences what comes to how they’re scanned by the library.
std::string
works very similarly to how it works with <iostream>
.
It scans a “word”: a sequence of letters separated by spaces.
More precisely, it reads the source range into the string, until a whitespace character is found or the range reaches its end.
span<char>
works like istream.read
: it copies bytes from the range into the buffer it’s pointing to.
string_view
works like std::string
, except it doesn’t copy, but changes its data pointer to point into the source stream.
Scanning a string_view
works only with contiguous ranges, and may lead to lifetime issues, but it will give you better performace (avoids copying and allocation).
scn::string_view source{"hello world"};
std::string str;
scn::scan(source, "{}", str);
// str == "hello"
scn::string_view sv;
scn::scan(source, "{}", sv);
// sv == "hello"
// sv.data() == source.data() -- sv points to source
// Make sure that `source` outlives `sv`
std::vector<char> buffer(5, '\0'); // 5 bytes, all zero
scn::span<char> s = scn::make_span(buffer);
scn::scan(source, "{}", s);
// s == buffer == "hello"
// Reads 5 bytes, doesn't care about whitespace
// No lifetime problems, the data is copied into the span/the buffer it points to
Wide ranges¶
Source ranges have an associated character type, either char
or wchar_t
.
This character type is determined by the type of deferencing an iterator into the range, which is either CharT
or scn::expected<CharT>
.
For most use cases, this type is char
. In this case, the range is said to be narrow.
If the character type is wchar_t
, the range is said to be wide.
The return types of scanning narrow and wide ranges are incompatible and cannot be mixed.
char
, std::string
, scn::string_view
, and scn::span<char>
cannot be scanned from a wide range.
wchar_t
, std::wstring
, scn::wstring_view
, and scn::span<wchar_t>
cannot be scanned from a narrow range.
Wide ranges are useful if your source data is wide (often the case on Windows). Narrow ranges should be preferred if possible, however.
The encoding of wide ranges is assumed to be whatever is set in the global C locale. The encoding must be ASCII-compatible.