c++ - How to split a sentence with an escaped whitespace? -


i want split sentence using whitespace delimiter except escaped whitespaces. using boost::split , regex, how can split it? if not possible, how else?

example:

std::string sentence = "my dog fluffy\\ cake likes jump"; 

result:
my
dog
fluffy\ cake
likes
to
jump

three implementations:

  1. with boost spirit
  2. with boost regex
  3. handwritten parser

with boost spirit

here's how i'd boost spirit. might seem overkill, experience teaches me once you're splitting input text require more parsing logic.

boost spirit shines when scale "just splitting tokens" real grammar production rules.

live on coliru

#include <boost/spirit/include/qi.hpp> namespace qi = boost::spirit::qi;  int main() {     std::string const sentence = "my dog fluffy\\ cake likes jump";     using = std::string::const_iterator;     f = sentence.begin(), l = sentence.end();      std::vector<std::string> words;      bool ok = qi::phrase_parse(f, l,             *qi::lexeme [ +('\\' >> qi::char_ | qi::graph) ], // words             qi::space - "\\ ", // skipper             words);      if (ok) {         std::cout << "parsed:\n";         (auto& w : words)             std::cout << "\t'" << w << "'\n";     } else {         std::cout << "parse failed\n";     }      if (f != l)         std::cout << "remaining unparsed: '" << std::string(f,l) << "'\n"; } 

with boost regex

this looks succinct but

live on coliru

#include <iostream> #include <boost/regex.hpp> #include <boost/algorithm/string_regex.hpp> #include <vector>  int main() {     std::string const sentence = "my dog fluffy\\ cake likes jump";      std::vector<std::string> words;     boost::algorithm::split_regex(words, sentence, boost::regex("(?<!\\\\)\\s"), boost::match_default);      (auto& w : words)         std::cout << " '" << w << "'\n"; } 

using c++11 raw literals write regular expression less obscurely: boost::regex(r"((?<!\\)\s)"), meaning "any whitespace not following backslash"

handwritten parser

this more tedious, spirit grammar generic, , allow nice performance.

however, doesn't scale gracefully spirit approach once start adding complexity grammar. advantage spend less time compiling code spirit version.

live on coliru

#include <iostream> #include <iterator> #include <vector>  template <typename it, typename out> out tokens(it f, l, out out) {     std::string accum;     auto flush = [&] {          if (!accum.empty()) {             *out++ = accum;             accum.resize(0);         }     };      while (f!=l) {         switch(*f) {             case '\\':                  if (++f!=l && *f==' ')                     accum += ' ';                 else                     accum += '\\';                 break;             case ' ': case '\t': case '\r': case '\n':                 ++f;                 flush();                 break;             default:                 accum += *f++;         }     }     flush();     return out; }  int main() {     std::string const sentence = "my dog fluffy\\ cake likes jump";      std::vector<std::string> words;      tokens(sentence.begin(), sentence.end(), back_inserter(words));      (auto& w : words)         std::cout << "\t'" << w << "'\n"; } 

Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -