Friday, May 21, 2010

Developing new account types, Part 3: Updating folders (part 2)

This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my Web Forums extension to explain the necessary actions in creating new account types. I hope to add a new post once every two weeks (I cannot guarantee it, though).

This blog post is a continuation of my previous post, which is being broken up into multiple segments to lower the amount of text one has to read in a single sitting. The current step is to actually implement the folder update.

Folder updating

To actually achieve our goal of getting a correct message list, we are going to modify the implementation of updateFolder. This function is called whenever a folder is selected in the folder pane; conceptually, you can view the function as causing the cached database to be resynchronized with the actual folder. For example, this is where a local folder would actually reparse the mailbox if the database was incorrect or missing.

This function essentially consists of three steps: figure out new messages, process them (i.e., apply filters), and then announce to the world that they exist. Some account types (like IMAP) may need to do more involved message processing, but this is the general gist of what goes on [4]. I'll ignore the processing step until I start talking about filters.

Database Details Devil

To start with, I'll cover the last step. Announcing to the world that a message exist boils down to adding a new header to the database. So how do you add a new header to the database? It requires three easy steps: create the header, populate the fields, and then add it to the database. With the proper listener setup, all of the other notification is done for you automatically. But as they say, the devil is in the details.

Let me begin by explaining some things about messages. There are five different representations of the message: the message key, the message header, the message ID, the message URI, and the necko URL object. Siddharth Agarwal has a nice diagram that shows how to convert between these representations. The last two are more concerned with displaying messages; it is the first three that are interesting right now.

Message keys are the internal database key for a message; the tuple (folder, key) is guaranteed to be unique by the database. Message keys are unsigned 32-bit integers (with 0xFFFFFFFF, or -1 in 2's complement, reserved as the "no message here" key). In general, any time a property needs to refer to another message, the message key is used; as a consequence, it means that such properties cannot refer to stuff across folders.

Message IDs are the RFC 5322 identifier for a message. These identifiers are supposed to be unique (for logical messages, not in a "the message at offset 0x234f3d in this file" sense). The most important use case for message IDs is that they are a critical component for threading.

The message header object is an object of type nsIMsgDBHdr. These are objects are directly backed by the database. However, many of the properties do not notify the database of changes, so you generally do not want to actually set them. Like all generalities [5], there are exceptions to this rule. Right now, we want to manipulate headers before adding them to database, and therefore we do not want to notify people of changes to not-yet-existing headers, so we want to actually use the fields of nsIMsgDBHdr.

So, the first thing you need to do is to decide what your message key is. Message keys are going to be used to get the message URI, so it should be a property that is easy to associate with methods. IMAP uses message UIDS, local folders the offset into the mbox [6], and NNTP uses the key numbers in the group. In my case, it appears that the forum assigns each post a unique number, so that is what I'll use.

After the message key, the most important properties are the major ones for display. The author attribute correlates to the "From" header, subject to the "Subject" header, and date to the "Date" header. All of these will be used to generate values in the thread pane columns; things would look strange without these.

The other major property in the display is flags. Flags, as the name implies, is an integer where each bit corresponds to a different flag. The most important of these are probably HasRe, Flagged, and New. Flags should be set with OrFlags and AndFlags instead of manipulating the value directly. And don't set these values with the mark* methods, as these cause notifications to be fired (remember that we haven't added the message to the database yet).

If you want to do real threading, you will want to set message IDs and references [7]. The References header is a space-separated list of message ID tokens (wrapped in angle brackets), although the parser routine in the database does a pretty good job of ignoring any random crap. The list is in the reverse order of hierarchy, so the last element is the message's parent, second-to-last the grandparent, etc.

Threading is implemented in the following manner. First, the database attempts to find a message for each message ID in reverse order. If it finds one, that is made the parent header and threading stops. Otherwise, if correct threading is enabled, an attempt is to made to find a thread which has that message ID. Otherwise, if use strict threading is not enabled, a thread that has a message which has the same subject (without Re) is used as the thread. If threading without re is disabled, the message has to have the HasRe flag checked to perform the last step. Finally, if a thread could not be found by this point, a new one is created.

To combine messages in a thread, then, the References field needs to be set for the messages. If people enable correct threading (this is done by default), you can use a simple trick: create a valid message ID for each thread and stuff that as the References header.

A practical example

In my case, I have an author (without email addresses), a subject (with possible non-ASCII text but without Re: stuff), a date in a standard format, as well as a simple per-thread unique identifier for message keys. I also want to make threads—although this will only be two-level threads. Ideally, I should also be flagging the sticky threads, but I'll leave that for a later version. So what does this code look like?

_loadThread: function (document, firstMsgId) {
  let database = this._folder.getDatabase();
  let conv = Cc['@mozilla.org/messenger/mimeconverter;1']
               .getService(Ci.nsIMimeConverter);
  let subject = /* one for the thread */
  let hostname = this._folder.server.hostName;
  let charset = document.characterSet;
  /* for each new message */ {
    let postID = /* generate msg key */;
    let author = /* get author name */;
    let date = new Date(/* get text string*/);
    let msgHdr = database.CreateNewHdr(postID);
    // The | is to prevent accidental message delivery
    msgHdr.author = conv.encodeMimePartIIStr_UTF8(
      author + " <" + author + "@" + hostname + "|>", true, charset, 0, 72);
    msgHdr.subject = conv.encodeMimePartIIStr_UTF8(subject, false, charset,
      0, 72);
    // PRTime is in µs, JS date in ms
    msgHdr.date = date * 1000;
    msgHdr.Charset = charset;
    msgHdr.messageId = postID + "@" + document.documentURI;
    if (firstMsgId) {
      msgHdr.setReferences("<" + firstMsgId + ">");
      msgHdr.OrFlags(Ci.nsMsgMessageFlags.HasRe);
    } else {
      firstMsgId = msgHdr.messageId;
   }
   msgHdr.OrFlags(Ci.nsMsgMessageFlags.New);
   database.AddNewHdrToDB(msgHdr, true);
  }
}

First, we get a reference to the database. Remember we implemented this in our last step, so this shouldn't present any problems. We also get the things that are shared in this thread: the subject, hostname of the server, and the charset. For each of the posts, we collect the post ID, the author, and the date of the post as text strings, and then convert them into an integer, string, and a date respectively.

Using the CreateNewHdr function, we get a new message header that we can manipulate. Since I'm trying to be aware of non-ASCII text, I'm using the MIME encoding strings to prepare the author and subject. Remember that the MIME specifications want you to encode non-ASCII text in the headers; the function we use is the simplest way to do the encoding.

If you're not working with actual email, the from string can be contorted. What I did was to create a fictituous email that could be theoretically tied back to the author in a systematic way (for a possible future compose code that does forum private messaging). The purpose of the pipe character at the end is to prevent accidental mail delivery; I also used the hostName and not the realHostName, so this email address would be traceable even if the user changes the host name on me.

The message date I have is a formatted string; the Date constructor is pretty handy at converting most forms of these strings into a usable JS Date object. Then I have a JS Date object, which is measured in milliseconds, whereas the date attribute is a PRTime, which is measured in microseconds, so I need to multiply by 1000 to actually set the property. Ironically, the date is actually stored in seconds in database and is converted to and from microseconds on the fly.

The Charset attribute, apparently only used for search right now, is derived from the character set as reported by the DOM. This means that it is the same character set as would be assumed by the layout engine, including character set overrides.

The message ID is simpler to generate: valid URIs are pretty much valid right-hand-sides of a message ID. A post is pretty much representable as a tuple of the thread page and the path to the post in the DOM, so this message ID is also an easy way to get to the message. References are also generated as I described above; in a later version, I may try to do sniffing to figure out from quoting who is replying to whom and recreate actual threads. Note that when setting the message ID, the outer angle brackets are optional.

The last thing I set is the flags. A complete listing of flags can be found on MDC. In this case, the only flags I care about are HasRe (since I want to generate "Re:" headers) and New; most of the others will probably be set by the user in the UI.

Finally, we add the header to the database. The last parameter tells the database to tell anyone listening that we have a new message. After we have loaded all of the messages, we need to commit the database:

database.Commit(Ci.nsMsgDBCommitType.kLargeCommit);

A brief note to make here: it doesn't really matter if you do a large or session commit, they both end up doing the same thing. Small commits end up doing nothing.

Notes

  1. Like most synchronization stuff, you theoretically also have to deal with deletion on the remote side as well as read changes, etc. The more I think about it, the more I'm torn on whether or not I should implement it. For now, I'll recommend that you weigh the cost of trying to determine deleted messages versus the commonality of deletion or other modification.
  2. Except, I am told, that all words that end in -tion in French are female.
  3. Incidentally, this is a major part of the reason why there is a 4 GiB limit on mailbox size in Thunderbird and SeaMonkey.
  4. What about In-Reply-To, you may ask. This information is pretty much redundant with References, so what happens is that, for the purposes of computing threading, this header is appended to the References header. And you do this before calling on the database header.

No comments: