]>
Commit | Line | Data |
---|---|---|
212380e3 | 1 | The hostmask/netmask system. |
2 | Copyright(C) 2001 by Andrew Miller(A1kmm)<a1kmm@mware.virtualave.net> | |
212380e3 | 3 | |
4 | Contents | |
5 | ======== | |
6 | * Section 1: Motivation | |
7 | * Section 2: Underlying mechanism | |
8 | - 2.1: General overview. | |
9 | - 2.2: IPv4 netmasks. | |
10 | - 2.3: IPv6 netmasks. | |
11 | - 2.4: Hostmasks. | |
12 | * Section 3: Exposed abstraction layer | |
13 | - 3.1: Parsing masks. | |
14 | - 3.2: Adding configuration items. | |
15 | - 3.3: Initialising or rehashing. | |
16 | - 3.4: Finding IP/host confs. | |
17 | - 3.5: Deleting entries. | |
18 | - 3.6: Reporting entries. | |
19 | ||
20 | Section 1: Motivation | |
21 | ===================== | |
22 | Looking up config hostnames and IP addresses(such as for I-lines and | |
23 | K-lines) needs to be implemented efficiently. It turns out a hash | |
24 | based algorithm like that employed here performs well on the average | |
25 | case, which is what we should be the most concerned about. A profiling | |
26 | comparison with the mtrie code using data from a real network confirmed | |
27 | that this algorithm performs much better. | |
28 | ||
29 | Section 2: Underlying mechanism | |
30 | =============================== | |
31 | 2.1: General overview | |
32 | --------------------- | |
33 | In short, a hash-table with linked lists for buckets is used to locate | |
34 | the correct hostname/netmask entries. In order to support CIDR IPs and | |
35 | wildcard masks, the entire key cannot be hashed, and there is a need to | |
36 | rehash. The means for deciding how much to hash differs between hostmasks | |
37 | and IPv4/6 netmasks. | |
38 | ||
39 | 2.2: IPv4 netmasks | |
40 | ------------------ | |
41 | In order to hash IPv4 netmasks for addition to the hash, the mask is first | |
42 | processed to a 32 bit address and a number of bits used. All unused bits | |
43 | are set to 0. The mask could be in the forms: | |
44 | 1.2.3.4 => 1.2.3.4 32 | |
45 | 1.2.3.* => 1.2.3.0 24 | |
46 | 1.2 => 1.2.0.0 16 | |
47 | 1.2.3.64/26 => 1.2.3.64 26 | |
48 | The number of whole bytes is then calculated, and only those bytes are | |
49 | hashed. (e.g. 1.2.3.64/26 and 1.2.3.0/24 hash the same). | |
50 | When a complete IPv4 address is given so that an IPv4 match can be found, | |
51 | the entire IP address is first hashed, and looked up in the table. Then | |
52 | the most significant three bytes are hashed, followed by the most | |
53 | significant two, the most significant one, and finally the 'identity hash' | |
54 | bucket is searched(to match masks like 192/7). | |
55 | ||
56 | 2.3: IPv6 netmasks | |
57 | ------------------ | |
58 | As per IPv4 netmasks, except that instead of rehashing with a one byte | |
59 | granularity, a 16 bit(two byte) granularity is used, as 16 rehashes is | |
60 | considered too great a fixed offset to be justified for a (possible) | |
61 | slight reduction in hash collisions. | |
62 | ||
63 | 2.4: Hostmasks | |
64 | -------------- | |
65 | On adding a hostmask to the hash, all of the hostmask right of the next | |
66 | dot after the last wildcard character in the string is hashed, or in the | |
67 | case that there are no wildcards in the hostmask, the entire string is | |
68 | hashed. | |
69 | On searching for a hostmask match, the entire hostname is hashed, followed | |
70 | by the entire hostmask after the first dot, followed by the entire | |
71 | hostmask after the second dot, and so on. Finally, the 'identity' hash | |
72 | bucket is checked, to catch hostnames like *test*. | |
73 | ||
74 | Section 3: Exposed abstraction layer | |
75 | ==================================== | |
76 | Section 3.1: Parsing masks | |
77 | -------------------------- | |
78 | Call "parse_netmask()" with the netmask and a pointer to an irc_inaddr | |
79 | structure to be filled in, as well as a pointer to an integer where the | |
80 | number of bits will be placed. | |
81 | Always check the return value. If it returns HM_HOST, it means that the | |
82 | mask is probably a hostname mask. If it returns HM_IPV4, it means it was | |
83 | an IPv4 address. If it returns HM_IPV6, it means it was an IPv6 address. | |
84 | If parse_netmask returns HM_HOST, no change is made to the irc_inaddr | |
85 | structure or the number of bits. | |
86 | ||
87 | Section 3.2: Adding configuration items | |
88 | --------------------------------------- | |
89 | Call "add_conf_by_address" with the hostname or IP mask, the username, | |
90 | and the ConfItem* to associate with this mask. | |
91 | ||
92 | Section 3.3: Initialising and rehashing | |
93 | ---------------------------------------- | |
94 | To initialise, call init_host_hash(). This only needs to be done once on | |
95 | startup. | |
96 | On rehash, to wipe out the old unwanted conf, and free them if there are | |
97 | no references to them, call clear_out_address_conf(). | |
98 | ||
99 | Section 3.4: Finding IP/host confs | |
100 | ---------------------------------- | |
101 | Call find_address_conf() with the hostname, the username, the address, | |
102 | and the address family. | |
103 | To find a d-line, call find_dline() with the address and address family. | |
104 | ||
105 | Section 3.5: Deletiing entries | |
106 | ------------------------------ | |
107 | Call delete_one_address_conf() with the hostname and the ConfItem*. | |
108 | ||
109 | Section 3.6: Reporting entries | |
110 | ------------------------------ | |
111 | Call report_dlines, report_exemptlines, report_Klines() or report_Ilines() | |
112 | with the client pointer to report to. Note these walk the hash, which is | |
113 | inefficient, but these are not called often enough to justify the memory | |
114 | and maintenance clockcycles to for more efficient data structure. |