Featured image of post Rewriting IRC log timestamps

Rewriting IRC log timestamps

Recently I had the desire to rewrite the timestamps in my IRC logs. I have logs going back to 2013, and over the years I’ve had three different timestamp formats:

  • the default:
    23:59 < someone> Hello
  • a more verbose timestamp:
    [12/24/24 23:59:59] < someone> Hello
  • and then a Unix timestamp prepended on the front:
    1735106399 [12/24/24 23:59:59]
  • … and then I moved to another timezone, so the code has to deal with that too!:
    1735109999 [12/24/24 23:59:59]

The first one doesn’t include what day it was (or the seconds), so we have to look at earlier lines in the log (which include the date in either of 2 more timestamp formats) to find the date:

  • --- Log opened Thu Nov 21 10:32:28 2013
  • --- Day changed Fri Nov 22 2013

Why bother?

I’ve also written code to parse out the timestamp and message many times through the years. I help keep a trivia channel running, and while adding a record of each user’s highest answer-streak to the bot, I needed to go back through my logs for the historical data. Before editing the logs, this would have meant writing code to deal with all 5 timestamp formats, and ultimately turn them all into a Unix timestamp before processing the lines.

I never saved most of this code the past times I did it, so every time I had to write it again, and deal (or not!) with the different timestamps every time.

After editing the logs, all it takes is a single (relatively) simple regex to capture the timestamp, nick, and message:
/^(\d+) \[.+?\] <.(.+?)> (.*)$/

Perl!

This is where Perl shines. Its brevity and expressiveness can come at the expense of readability, but I haven’t found any language that can beat it for one-time text processing. This particular script only took me about an hour to write and test, and the meat is just this:

if (/^--- (?:Log opened|Day changed) \w\w\w (?<b>\w\w\w) (?<d>\d\d) (?:(?<H>\d\d):\d\d:\d\d )?(?<Y>\d\d\d\d)/) {
	$sm = $months{$+{b}};
	$sd = $+{d};
	$sy = $+{Y}-1900; # year = %Y - 1900
} elsif ($sd && s/^(?<H>\d\d):(?<M>\d\d) //) { # starts with HH:MM
	my $newts = strftime(FMT, 0, $+{M}, $+{H}, $sd, $sm, $sy);
	$_ = "$newts$_";
} elsif (s@^\[(?<m>\d\d)/(?<d>\d\d)/(?<y>\d\d) (?<H>\d\d):(?<M>\d\d):(?<S>\d\d)\]@@) { # starts with [mm/dd/yy HH:MM:SS]
	my $newts = strftime(FMT, $+{S}, $+{M}, $+{H}, $+{d}, $+{m}-1, (100+$+{y})); # year = 2000 + %y - 1900
	$_ = "$newts$_";
} elsif (/^1\d{9} \[/) { # starts with unixts
	next; # leave it alone
}

I have 1.5GB of logs, and running this script over them was almost instantaneous. Winner!

Edge cases

There are a couple of edge cases that I didn’t deem worth worrying about, given the context.

  • The seconds are “made up” for some of the oldest lines. That’s okay, though. I don’t care whether something happened 10 years ago or 10 years and 30 seconds ago. They all end up as :00, so it’s obvious at a glance, too. This is a limitation of irssi’s default format, and there’s nothing anyone can do to go back in time and fix it.
  • I restarted irssi when I changed my timezone. If I had somehow changed it without restarting irssi, or if there had been other Log opened messages within the gap of the timezone change, there would be no reliable way to determine which timezone a message had occurred in.
  • It’s not portable. irssi probably writes out locale-specific month abbreviations, so it should really use something like strptime. I didn’t need to deal with that - and if you need to, you can easily update the %months hash to handle it.

Source

I’m putting the full source code out there in case someone finds it useful to do a similar task (or I want to do it again). As usual, don’t worry about the license.

If you are trying this code, consider running it outputting to your terminal on a couple of small log files (cd irclogs/network; perl nick1.log nick2.log), before using the command at the top of the file to update all your logs. The sample command will leave you a backup of your file suffixed with .orig, just in case.

If you’re just moving from the default timestamp format to something else, the only changes you should need to make are:

  1. Remove all 3 $ENV{TZ} lines
  2. Update the use constant FMT => "%s [%m/%d/%y %H:%M:%S]"; to match your new log_timestamp setting in irssi
  3. ???
  4. Profit, or something

the result of this work: historical streak dates

If you’ve got other timestamp formats mixed in (or you already changed your timestamp format), you’ll probably need to add more elsifs.