Saturday, August 21, 2010

Keeping Pan responsive

Keeping an event driven program responsive when dealing with potentially long operations can be difficult.  There are two ways of doing this: multi-threading and asynchronous operations.  Each is best suited to different tasks with some overlap between them.  Network IO has already been handled mostly using async-io with threading being used for the connection setup due to api limitations.  File IO is a different story.  Pan's usage of files doesn't really lend itself to async-io therefore threading is used.  Well, at present only attachment saving is handled with other threads.  The other slow part is the loading & saving of the group header files.  Currently this code runs in the main thread which means that it stalls network io, especially when dealing with large groups which can easily take minutes.

A couple of articles by Herb Sutter have given me an idea for solving this problem, Active Objects and Prefer Futures and Callbacks.  The basic idea is to have an active object that handles loading and saving groups.  The hard part will be making the parts that need to load groups async.  The saving of groups might be easy if that is trgigered by a ref count.  The object would be something like this:

class GroupHandler {
  public:
    typedef std::tr1::function<void(group*)> Callback;
    void load_group(quark group, Callback cb)
      {queue.add(1,cb);}
    void save_group(quark group)

      {queue.add(2);}
    ~GroupHandler(){ join(thread); }

  private:
    virtual void do_load(quark group, Callback);
    virtual void do_save(quark group);

    bool done;
    Queue queue;
    Thread *thread;
}

Thursday, August 19, 2010

strings & memory

In trying to improve the performance of Pan one of the places that I noticed a potential problem was in building newsrc lines.  The code was writing a read article range, '123-3214', into a buffer and then appending that into a std::string.  I used valgrind and kcachegrind to profile the code.  The problem with this is that each append would cause a new string to be allocated & then old one to be copied into it.  As the string becomes longer this takes more time & also causes the address space to be fragmented.

  My first attempt at optimizing this was to use a std::deque for temporary storage of the entire string then copy that into the output string as the final step.  Profiling showed varied results from slower to significantly faster.  The slow down was cause by the iterators used during the copy.  For short strings they dominated the performance of the function, however as the final string became larger the time savings of not having to allocate & copy more than made up for this.

  After realizing this i decided to profile my newsrc files to determine the line lengths I was using.  The results indicate that about 75% of the lines were under 256 bytes.  This was to small for the deque to be useful.  This time I decided to use a string for the temporary and reserve an initial 256 bytes so that most of the lines could be written without all the copying.  The profile showed a definite speedup doing this.  I could actually optimize this further by making the reserved space a function of the number of ranges in the line and a fixed estimate of the number of bytes per range.  This would nearly eliminate the need for a reallocation but i haven't tried this.