Instead of loading a logfile in Python and creating a new copy of it every time the code transforms it, code written as generators works through the logfile line-by-line, sparing memory while separating transformations clearly into separate code segments. Consider a short example:
import re
def matches_numeric(file_lines):
for line in file_lines:
if re.match('^[0-9 \t\.]$',line):
yield line
if __name__ == '__main__':
file_handle=open('infile.txt','r')
data=matches_numeric(file_handle)
columnar=split_columns(data)
checked=basic_checks(columnar)
red_flags=search_error(checked)
for line in red_flags:
print line
This way, the transformations are represented clearly and are easy to mix and match.
C++ doesn't have a yield keyword. It's iterators don't signal that they are complete by throwing StopIteration. Instead the iterator has to match an end-of-stream iterator, so that's what we can construct in C++. This means that the moral, but not syntactic, equivalent of Python generators is a function that returns a pair of iterators.
template<class SOURCE>
boost::array<split_iterator< SOURCE >,2> split_line(boost::array< SOURCE,2>& begin_end) {
boost::array<split_iterator< SOURCE >,2> iters = {{
split_iterator< SOURCE >(begin_end), split_iterator< SOURCE >()
}};
return iters;
}
These iterators are packages in a boost::array, but you could use a std::pair, or not package them, as you please, but the goal is the same, to create a nice way to express a series of transformations.
std::ifstream in_file("z.txt");
auto file_line=file_by_line(in_file);
auto splits=split_line(file_line);
while (splits[0]!=splits[1]) {
for (auto word=begin(*splitted); word!=end(splitted); word++) {
std::cout << *word << ":";
}
std::cout << std::endl;
}
The C++ looks similar to the Python, but each transformation is building on the type of the previous transformation, so it ends up doing type chaining in a less explicit way than boost::accumulators.
The code is on github.
The code is on github.