]> jfr.im git - irc.git/blob - software/!RELEASES/ircservices/achurch.org/services/lists/ircservices-coding/2007/003312.html
RELEASE -> !RELEASE
[irc.git] / software / !RELEASES / ircservices / achurch.org / services / lists / ircservices-coding / 2007 / 003312.html
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2 <HTML>
3 <HEAD>
4 <TITLE> [IRCServices Coding] Analysis of the 5.1.2 bug
5 </TITLE>
6 <LINK REL="Index" HREF="index.html" >
7 <LINK REL="made" HREF="mailto:ircservices-coding%40ircservices.za.net?Subject=%5BIRCServices%20Coding%5D%20Analysis%20of%20the%205.1.2%20bug&In-Reply-To=">
8 <META NAME="robots" CONTENT="index,nofollow">
9 <META http-equiv="Content-Type" content="text/html; charset=us-ascii">
10 <LINK REL="Previous" HREF="003311.html">
11 <LINK REL="Next" HREF="003313.html">
12 </HEAD>
13 <BODY BGCOLOR="#ffffff">
14 <H1>[IRCServices Coding] Analysis of the 5.1.2 bug</H1>
15 <B>Andrew Church</B>
16 <A HREF="mailto:ircservices-coding%40ircservices.za.net?Subject=%5BIRCServices%20Coding%5D%20Analysis%20of%20the%205.1.2%20bug&In-Reply-To="
17 TITLE="[IRCServices Coding] Analysis of the 5.1.2 bug">achurch at achurch.org
18 </A><BR>
19 <I>Tue Oct 30 00:45:50 PDT 2007</I>
20 <P><UL>
21 <LI>Previous message: <A HREF="003311.html">[IRCServices Coding] ircservices-church ITP
22 </A></li>
23 <LI>Next message: <A HREF="003313.html">[IRCServices Coding] Analysis of the 5.1.2 bug
24 </A></li>
25 <LI> <B>Messages sorted by:</B>
26 <a href="date.html#3312">[ date ]</a>
27 <a href="thread.html#3312">[ thread ]</a>
28 <a href="subject.html#3312">[ subject ]</a>
29 <a href="author.html#3312">[ author ]</a>
30 </LI>
31 </UL>
32 <HR>
33 <!--beginarticle-->
34 <PRE> Now that a bit of time has passed since the release of Services
35 5.1.2, and (hopefully) everyone has upgraded, I thought I'd post a more
36 detailed explanation of exactly how the critical bug that prompted this
37 release came about. (As you are probably aware, various and sundry bugs
38 have pushed Services all the way to 5.1.6 since then, but in this text
39 I'll focus on the bug that was fixed by 5.1.2. Just as a reminder, all
40 5.1.x releases through 5.1.5 have bugs allowing users to crash Services,
41 so you should immediately upgrade to 5.1.6 if you haven't already.)
42
43 The Bug
44 =======
45
46 The bug itself is fairly straightforward (as those who read the
47 diff file will have already realized): using the ChanServ AKICK VIEW
48 command on a channel with at least one autokick caused Services to
49 crash, without exception. By default, the AKICK command requires
50 channel access level 100 to use; this is the basis for the &quot;sufficiently
51 privileged users&quot; text in the announcement. In reality, unless channel
52 registration was disabled, any user could register a new channel (thus
53 gaining founder status) and use the AKICK command on that channel,
54 effectively allowing any user to crash Services. (Ironically, when
55 channel registration was disabled, a separate bug--fixed in 5.1.3--had
56 the potential to let any user cause a crash.)
57
58 A most embarrassing bug, indeed...
59
60 How It Came About
61 =================
62
63 One of the changes implemented in Services 5.1 was to unify (from
64 an interface standpoint, at least; internally, there's still far too
65 much code duplication) the listing functionality of all pseudoclient
66 commands. Among other things, this included removing the entry numbers
67 from the ChanServ ACCESS, XOP (SOP/AOP/HOP/VOP/NOP), and AKICK commands,
68 mostly to avoid the confusion that can result from multiple users
69 deleting entries in the middle of the list.
70
71 Conceptually, this is a simple change, only requiring the removal
72 of the corresponding parameter from the code that outputs entries to the
73 user. But theory and practice are different beasts, as they say; in the
74 case of AKICK VIEW, the data for each entry is formatted using a
75 language-specific string, requiring a change not only in the code itself
76 but also in each of the language files. I actually made this latter
77 change, presumably with the intent of going back and updating the code
78 once I had all the language files updates out of the way. However, it
79 seems that I forgot to make that final update, leaving the entry number
80 present in the the code that formatted autokick entries for the VIEW
81 subcommand. This resulted, eventually, in something like printf(&quot;%s&quot;,1)
82 --which of course crashed when the program tried to dereference the
83 entry number as a pointer.
84
85 Ordinarily, GCC can catch inconsistencies between format strings
86 and parameters like this; I always compile Services using GCC's -Werror
87 option, to ensure I catch as many potential problems as possible. With
88 language strings, however, the format string itself is located in a data
89 file separate from the source code. Since the data isn't accessible to
90 the compiler, these checks can't be made, and the error slipped by
91 unnoticed until this bug was reported.
92
93 Are Any Other Commands At Risk?
94 ===============================
95
96 With respect to this particular change, no. The removal of list
97 entry numbers affected only the ChanServ ACCESS LIST, XOP LIST, and
98 AKICK LIST/VIEW commands. Of these, XOP LIST and AKICK LIST use literal
99 format strings for their output, which were corrected as part of the
100 code change; ACCESS LIST uses the language system, but its code and
101 format strings were both updated properly, and it is likewise not
102 susceptible to this problem.
103
104 With respect to all commands in general, all I can say is &quot;I hope
105 not.&quot; I've done my best, within the constraints of time and an eleven-
106 year-old codebase, to program defensively; since 5.1.2, I've also gone
107 over all uses of language strings, and added some checks to my release
108 process (see below) to help avoid any more slipping in. But given that
109 I let these bugs by, who knows what else might be hiding?
110
111 Lessons To Be Learned
112 =====================
113
114 Lesson 1: Test suites
115
116 Ah, our good old friend, testing. I actually had started work on a
117 test suite for Services at one point, which was the main motivation for
118 the OperServ debug commands. Regrettably, between a lack of time in my
119 earlier days of Services development (being not nearly as fluent in
120 progamming then as I am now, writing a test suite to interact with a
121 server was a formidable task, and I chose to use my time working on
122 Services itself instead) and a lack of interest later, I never took the
123 test suite beyond simple nickname operations.
124
125 Of course, even a basic test that just ran through all the commands
126 would have caught this bug, and to that I can only say: Mea culpa.
127 Lack of interest or otherwise, I failed to follow best practices, and
128 that failure came back to bite me.
129
130 Lesson 2: Format strings are dangerous
131
132 Services makes use of the standard printf()-style format strings in
133 its language files, partly out of convenience and partly because I
134 didn't know any better when I designed the language system. While the
135 flexibility of format strings is undeniably useful, their implementation
136 leaves much to be desired--especially in a language like C that does not
137 pass type information around. In fact, inconsistencies between format
138 strings in different language files has resulted in similar crashes in
139 the past (including the less-dangerous one fixed in version 5.1.3). Of
140 course, most of the danger in using format strings comes from the fact
141 that strings are second-class citizens in the C data type world: to
142 process a string, you access successive bytes through a pointer--and if
143 that particular value happens to be something that's not a pointer,
144 BOOM. So this lesson could just as easily be &quot;C strings are dangerous&quot;.
145
146 So what can be done about it? Well, one of the obvious solutions--
147 and probably the best idea in the long run--is to rewrite the
148 pseudoclients in a higher-level language, such as Perl; Perl's dynamic
149 typing ensures that this sort of problem can't occur. (Then again,
150 dynamically typed languages have their own problems, so this isn't a
151 perfect solution.) I'd actually had thoughts about implementing a Perl
152 base for pseudoclients in a future version of Services, but I don't have
153 nearly enough time or interest to do so at this point.
154
155 Another option would be to change the language file system to use a
156 custom formatting system. Parameters could be named, for example, and
157 an array of structures including parameter names and data types along
158 with the parameters themselves could be passed to routines like
159 notice_lang(). Of course, this has the downside of turning what's
160 currently a simple list of parameters into a more complex variable
161 declaration; and since the data types would have to be specified
162 manually, there's still the danger of mismatched types, leading to
163 potential crashes. Nonetheless, this might be a more feasible approach
164 for those who are interested in improving Services' robustness but don't
165 want to go as far as rewriting half of it in Perl.
166
167 I've implemented two crude solutions in Services 5.1.3. One is the
168 use of a script, modified from one provided by German translator Jacek
169 Margos, to check all language files for format string inconsistencies at
170 release time. The other is a simple compilation check: at release time,
171 I run a test compilation where notice_lang() and notice_help() calls
172 using fixed message IDs are replaced with notice() calls that use the
173 corresponding strings from the English language file (lang/en_us.l).
174 The resulting executable is of course useless from a multilingual point
175 of view, but the compiler will complain about any cases where format
176 strings and parameters don't match. While this won't catch cases where
177 the format string is selected by a variable, it would have found the
178 particular bug fixed in version 5.1.2.
179
180 Final Thoughts
181 ==============
182
183 Needless to say, this bug in particular, as well as the fact that I
184 had to make five releases of a supposedly &quot;stable&quot; version of Services
185 in the space of ten days to correct other bugs of varying seriousness,
186 is quite embarrassing. I'm of course acutely aware of the trouble it's
187 caused everyone using Services, of the repeated interruptions in normal
188 network operations that will have resulted from upgrades (if not
189 crashes)--and let none say that it's &quot;just IRC&quot;! But at the same time,
190 I take pride in my work as a software developer, and seeing myself make
191 such trivial mistakes as these is, well, depressing to say the least.
192 I could make excuses that the code base is aging (which is true), or
193 that I've lost too much interest in the program to do a proper job
194 (which is also true, to be honest--that's the main reason I declared 5.1
195 to be the final version of Services). I could even claim that I've got
196 another project twice the sice of Services that's considerably more
197 stable (which is true as well, thanks in no small part to the experience
198 I've gained from Services itself). But since none of that changes the
199 facts, all I can say in the end is that I screwed up, and I'm sorry.
200
201 --Andrew Church
202 <A HREF="http://lists.ircservices.za.net/mailman/listinfo/ircservices-coding">achurch at achurch.org</A>
203 <A HREF="http://achurch.org/">http://achurch.org/</A>
204 </PRE>
205
206
207 <!--endarticle-->
208 <HR>
209 <P><UL>
210 <!--threads-->
211 <LI>Previous message: <A HREF="003311.html">[IRCServices Coding] ircservices-church ITP
212 </A></li>
213 <LI>Next message: <A HREF="003313.html">[IRCServices Coding] Analysis of the 5.1.2 bug
214 </A></li>
215 <LI> <B>Messages sorted by:</B>
216 <a href="date.html#3312">[ date ]</a>
217 <a href="thread.html#3312">[ thread ]</a>
218 <a href="subject.html#3312">[ subject ]</a>
219 <a href="author.html#3312">[ author ]</a>
220 </LI>
221 </UL>
222
223 </body></html>