In trying to improve the performance of Pan one of the places that I noticed a potential problem was in building newsrc lines. The code was writing a read article range, '123-3214', into a buffer and then appending that into a std::string. I used valgrind and kcachegrind to profile the code. The problem with this is that each append would cause a new string to be allocated & then old one to be copied into it. As the string becomes longer this takes more time & also causes the address space to be fragmented.
My first attempt at optimizing this was to use a std::deque for temporary storage of the entire string then copy that into the output string as the final step. Profiling showed varied results from slower to significantly faster. The slow down was cause by the iterators used during the copy. For short strings they dominated the performance of the function, however as the final string became larger the time savings of not having to allocate & copy more than made up for this.
After realizing this i decided to profile my newsrc files to determine the line lengths I was using. The results indicate that about 75% of the lines were under 256 bytes. This was to small for the deque to be useful. This time I decided to use a string for the temporary and reserve an initial 256 bytes so that most of the lines could be written without all the copying. The profile showed a definite speedup doing this. I could actually optimize this further by making the reserved space a function of the number of ranges in the line and a fixed estimate of the number of bytes per range. This would nearly eliminate the need for a reallocation but i haven't tried this.
No comments:
Post a Comment