c++ - How to split a sentence with an escaped whitespace? -
i want split sentence using whitespace delimiter except escaped whitespaces. using boost::split , regex, how can split it? if not possible, how else?
example:
std::string sentence = "my dog fluffy\\ cake likes jump"; result:
my
dog
fluffy\ cake
likes
to
jump
three implementations:
- with boost spirit
- with boost regex
- handwritten parser
with boost spirit
here's how i'd boost spirit. might seem overkill, experience teaches me once you're splitting input text require more parsing logic.
boost spirit shines when scale "just splitting tokens" real grammar production rules.
#include <boost/spirit/include/qi.hpp> namespace qi = boost::spirit::qi; int main() { std::string const sentence = "my dog fluffy\\ cake likes jump"; using = std::string::const_iterator; f = sentence.begin(), l = sentence.end(); std::vector<std::string> words; bool ok = qi::phrase_parse(f, l, *qi::lexeme [ +('\\' >> qi::char_ | qi::graph) ], // words qi::space - "\\ ", // skipper words); if (ok) { std::cout << "parsed:\n"; (auto& w : words) std::cout << "\t'" << w << "'\n"; } else { std::cout << "parse failed\n"; } if (f != l) std::cout << "remaining unparsed: '" << std::string(f,l) << "'\n"; } with boost regex
this looks succinct but
- requires linking boost_regex
- uses "black magic" negative behind assertion: http://www.regular-expressions.info/lookaround.html
#include <iostream> #include <boost/regex.hpp> #include <boost/algorithm/string_regex.hpp> #include <vector> int main() { std::string const sentence = "my dog fluffy\\ cake likes jump"; std::vector<std::string> words; boost::algorithm::split_regex(words, sentence, boost::regex("(?<!\\\\)\\s"), boost::match_default); (auto& w : words) std::cout << " '" << w << "'\n"; } using c++11 raw literals write regular expression less obscurely:
boost::regex(r"((?<!\\)\s)"), meaning "any whitespace not following backslash"
handwritten parser
this more tedious, spirit grammar generic, , allow nice performance.
however, doesn't scale gracefully spirit approach once start adding complexity grammar. advantage spend less time compiling code spirit version.
#include <iostream> #include <iterator> #include <vector> template <typename it, typename out> out tokens(it f, l, out out) { std::string accum; auto flush = [&] { if (!accum.empty()) { *out++ = accum; accum.resize(0); } }; while (f!=l) { switch(*f) { case '\\': if (++f!=l && *f==' ') accum += ' '; else accum += '\\'; break; case ' ': case '\t': case '\r': case '\n': ++f; flush(); break; default: accum += *f++; } } flush(); return out; } int main() { std::string const sentence = "my dog fluffy\\ cake likes jump"; std::vector<std::string> words; tokens(sentence.begin(), sentence.end(), back_inserter(words)); (auto& w : words) std::cout << "\t'" << w << "'\n"; }
Comments
Post a Comment