Official Versions: In
    English or
    French or
    Italian or
    Bulgarian
    Current maintainer since 2013: Matthias Andree <m-a@users.sf.net>
    Previous maintainer of ten years: David Relson <relson@osagesoftware.com>
This document is intended to answer frequently asked questions about bogofilter.
Bogofilter is a fast Bayesian spam filter along the lines suggested by Paul Graham in his article A Plan For Spam. bogofilter uses Gary Robinson's geometric-mean algorithm with the Fisher's method modification to classify email as spam or non-spam.
The bogofilter home page at SourceForge is the central clearinghouse for bogofilter resources.
Bogofilter was started by Eric S. Raymond on August 19, 2002. It gained popularity in September 2002, and a number of other authors have started to contribute to the project.
The NEWS file describes bogofilter's version history starting with version 1.0.0. Older news (before release 1.0.0) are in the NEWS.0 file.
Bogofilter is some kind of a bogometer or bogon filter, i.e., it tries to identify bogus mail by measuring the bogosity.
See the man page's THEORY OF OPERATION section for an introduction. The main source for understanding this is Gary Robinson's Linux Journal article "A Statistical Approach to the Spam Problem".
After you read all this you might ask some questions. The first could be "Is bogofilter really a Bayesian spam filter?" Bogofilter is based on Bayes' theorem and uses it in the initial calculations and other statistical methods later. Without doubt it is a statistical spam filter with a Bayesian flavor.
Other questions you might have might concern the basic assumptions of Bayes' theory. Two short answers are: "No, they are not satisfied" and "We don't care as long as it works". A longer answer will mention that the basic assumption that "an e-mail is a random collection of words, each independent of the others" is violated. There are several places where practice doesn't follow theory. Some are always present, and some which will depend on the way you use bogofilter:
As the man page explains, bogofilter tries to understand how badly the null hypothesis fails. Some people argue that "those departures from reality usually work in our favor" (from Gary's article). Some argue that, even then, we should not violate too much. Nobody really knows. Just keep in mind that problems might occur if you push too hard. The key to bogofilter's approach is: What matters most is simply what works in the real world.
Now that you have been warned, have fun and use bogofilter as suits you best.
There are currently four mailing lists for bogofilter:
| List Address | Links | Description | 
|---|---|---|
| bogofilter-announce@bogofilter.org | [subscribe] [archives: mailman] | An announcement-only list where new versions are announced. | 
| bogofilter@bogofilter.org | [subscribe] [archives: mailman] | A discussion list where any conversation about bogofilter may take place. | 
| bogofilter-dev@bogofilter.org | [subscribe] [archives: mailman] | A list for sharing patches, development, and technical discussions. | 
| bogofilter-cvs@lists.sourceforge.net | [subscribe] [archive] | Mailing list for announcing code changes to the SVN archive. (The CVS name is a leftover from before the migration for our users' convenience.) | 
The bogofilter-announce list is moderated and is used only for important announcements (eg: new versions). It is low traffic. If you have subscribed to the user's list or the developer's list, you don't need to subscribe to the announce list. Messages posted to the announce list are also distributed to the others.
To classify messages as ham (non-spam) or spam, bogofilter
    needs to learn from your mail. To start with it is best to have
    collections (that are as large as possible) of messages you know
    for sure are ham or spam. (Errors here will cause problems later,
    so try hard;-). Warning: Only use your mail; using other
    collections (like a spam collection found on the web), might cause
    bogofilter to draw a wrong conclusion — after all you want it to
    understand your mail.
Once you have the spam and ham collections, you have basically four choices. In all cases it works better if your training base (the above collections) is bigger, rather than smaller. The smaller your training collection is, the higher the number of errors bogofilter will make in production. Let's assume your collection is two mbox files: ham.mbox and spam.mbox.
Method 1) Full training. Train bogofilter with all your messages. In our example:
    bogofilter -s < spam.mbox
    bogofilter -n < ham.mboxNote: Bogofilter's contrib directory includes two scripts that both use a train-on-error technique. This technique scores each message and adds to the database only those messages that were scored incorrectly (messages scored as uncertain, ham scored as spam, or spam scored as ham). The goal is to build a database of those words needed to correctly classify messages. The resulting database is smaller than the one build using full training.
Method 2) Use the script bogominitrain.pl (in the contrib
    directory). It checks the messages in the same order as your
    mailbox files. You can use the -f option which will
    repeat this until all messages in your training collection are
    classified correctly (you can even adjust the level of
    certainty). Since the script makes sure the database understands
    your training collection "exactly" (with your chosen
    precision), it works very well. You can use -o to
    create a security margin around your spam_cutoff. Assuming
    spam_cutoff=0.6 you might want to score all ham in your
    collection below 0.3 and all spam above 0.9. Our example is:
bogominitrain.pl -fnv ~/.bogofilter ham.mbox spam.mbox '-o 0.9,0.3'
Method 3) Use the script randomtrain (in the contrib directory). The script generates a list of all the messages in the mailboxes, randomly shuffles the list, and then scores each message, with training as needed. In our example:
randomtrain -s spam.mbox -n ham.mbox
As with method 4, it works better if you start with full training using several thousand messages. This will give a database that is more comprehensive and significantly bigger.
Method 4) If you have enough spams and non-spams in your training collection, separate out some 10,000 spams and 10,000 non-spams into separate mbox files, and train as in method 1. Then use bogofilter to classify the remaining spams and non-spams. Take any messages that it classifies as unsure or classifies incorrectly, and train with those. Here are two little scripts you can use to classify the train-on-error messages:
    #! /bin/sh
    #  class3 -- classify one message as bad, good or unsure
    cat >msg.$$
    bogofilter $* <msg.$$
    res=$?
    if [ $res = 0 ]; then
        cat msg.$$ >>corpus.bad
    elif [ $res = 1 ]; then
        cat msg.$$ >>corpus.good
    elif [ $res = 2 ]; then
        cat msg.$$ >>corpus.unsure
    fi
    rm msg.$$
        #! /bin/sh
    # classify -- put all messages in mbox through class3
    src=$1;
    shift
    formail -s class3 $* <$src
    In our example (after the initial full training):
    classify spam.mbox [bogofilter options]
    bogofilter -s < corpus.good
    rm -f corpus.*
    classify ham.mbox [bogofilter options]
    bogofilter -n < corpus.bad
    rm -f corpus.*It is important to understand the consequences of the methods just described. Doing full training as in methods 1 and 4 produces a larger database than does training with methods 2 or 3. If your database size needs to be small (for example due to quota limitations), use methods 2 or 3.
Full training with method 1 is fastest. Training on error (as in methods 2, 3 and 4) is effective, but the initial training takes longer.
    bogofilter -M -s -I ~/mail/Spam
    bogofilter -M -n -I ~/mail/NonSpam
        bogofilter -s -B ~/Maildir/.Spam
    bogofilter -n -B ~/Maildir/.NonSpam
        bogofilter -M -Ns -I ~/mail/Missed_Spam
    bogofilter -M -Sn -I ~/mail/False_Spam
        bogofilter -s -B ~/Maildir/.Missed_Spam
    bogofilter -n -B ~/Maildir/.False_Spam
    Bogofilter will make mistakes once in a while. So ongoing training is important. There are two main methodologies for doing this. First, you can train with every incoming message (using the -u option). Second, you can train on error only.
Since you might want to rebuild your database at some point, for example when a major new feature is implemented in bogofilter, it can be very useful to update your training collection continuously.
Bogofilter always does the best it can with the information
    available to it.  However, it will make mistakes, i.e., classify
    ham as spam (false positives) or spam as ham (false negatives). To
    reduce the likelihood of repeating the mistake, it is necessary to
    train bogofilter with the errant message.  If a message is
    incorrectly classified as spam, use switch -n to
    train with it as ham.  Use switch -s to train with a
    spam message.
Bogofilter has a -u switch that automatically
    updates the wordlists after scoring each message.  As bogofilter
    sometimes misclassifies a message, monitoring is necessary to
    correct any mistakes. Corrections can be done using
    -Sn to change a message's classification from spam to
    non-spam and -Ns to change it from non-spam to spam.
Correcting a misclassified message may affect classification for other message. The smaller your database is, the higher is the likelihood that a training error will cause a misclassification.
Using a method like #2 or #3 (above) can compensate for this effect. Repeat the training with your complete training collection (including all the new messages added since the earlier training). This will add messages to the database which show that adverse effect on both sides until you have a new equilibrium.
An alternative strategy, based on method 4 in the previous section, is the following: Periodically take blocks of messages and use the scripts in method 4 above to classify them. Then manually review the good, bad and unsure files, correct any errors, and split the unsures into spam and non-spam. Until you have accumulated some 10,000 spam and 10,000 non-spam in your training database, train with the good, the bad, and the separated errors and unsures; thereafter, train with only the separated and unsures, discarding the messages that bogofilter already classifies correctly.
Bogofilter understands the traditional Unix mbox format, the Maildir and MH formats. Note though that bogofilter does not support subfolders, you will have to explicitly list them in MH or Maildir++ folders - just mention the full path to the subfolder.
For unsupported formats, you will have to convert the mailbox to a format bogofilter understands. Mbox is often convenient because it can be piped into bogofilter.
For example, to convert UW-IMAP/PINE mbx format to mbox:
mailtool copy /full/path/to/mail.mbox '#driver.unix//full/path/to/mbox'
or:
    for MSG in /full/path/to/maildir/* ; do 
        formail -I Status: < "$MSG" >> /full/path/to/mbox
    done
    Bogofilter can instructed to display information on the scoring of a message by running it with flags "-v", "-vv", "-vvv", or "-R".
X-Bogosity: Ham, tests=bogofilter, spamicity=0.500000
    X-Bogosity: Ham, tests=bogofilter, spamicity=0.500000
      int  cnt    prob   spamicity  histogram
     0.00   29  0.000209  0.000052  #############################
     0.10    2  0.179065  0.003425  ##
     0.20    2  0.276880  0.008870  ##
     0.30   18  0.363295  0.069245  ##################
     0.40    0  0.000000  0.069245
     0.50    0  0.000000  0.069245
     0.60   37  0.667823  0.257307  #####################################
     0.70    5  0.767436  0.278892  #####
     0.80   13  0.836789  0.334980  #############
     0.90   32  0.984903  0.499835  ################################
        Each row shows an interval, the count of tokens with scores in that interval, the average spam probability for those tokens, the message's spamicity score (for those tokens and all lesser valued tokens), and a bar graph corresponding to the token count.
In the above histogram there are a lot of low scoring tokens and a lot of high scoring tokens. They "balance" one another to give the spamicity score of 0.5000
    X-Bogosity: Ham, tests=bogofilter, spamicity=0.500000
                          n    pgood     pbad      fw     U
    "which"              10  0.208333  0.000000  0.000041 +
    "own"                 7  0.145833  0.000000  0.000059 +
    "having"              6  0.125000  0.000000  0.000069 +
    ...
    "unsubscribe.asp"     2  0.000000  0.095238  0.999708 +
    "million"             4  0.000000  0.190476  0.999854 +
    "copy"                5  0.000000  0.238095  0.999883 +
    N_P_Q_S_s_x_md      138  0.00e+00  0.00e+00  5.00e-01
                             1.00e-03  4.15e-01  0.100
        The columns printed contain the following information:
        The final lines show:
The "-R" output is formatted for use with the R language for statistical computing. More information is available at The R Project for Statistical Computing.
Bogofilter's default configuration will classify a message as spam or non-spam. The SPAM_CUTOFF parameter is used for this. Messages with scores greater than or equal to SPAM_CUTOFF are classified as spam. Other messages are classified as ham.
There is also a HAM_CUTOFF parameter. When used, messages must have scores less than or equal to HAM_CUTOFF to be classified as ham. Messages with scores between HAM_CUTOFF and SPAM_CUTOFF are classified as unsure. If you look in bogofilter.cf, you will see the following lines:
    #### CUTOFF Values
    #
    #    both ham_cutoff and spam_cutoff are allowed.
    #    setting ham_cutoff to a non-zero value will
    #    enable tri-state results (Spam/Ham/Unsure).
    #
    #ham_cutoff  = 0.45
    #spam_cutoff = 0.99
    #
    #    for two-state classification:
    #
    ## ham_cutoff = 0.00
    ## spam_cutoff= 0.99
    To turn on Yes/No/Unsure classification, remove the #'s from the last two lines.
Alternatively, if you'd rather use labels Yes/No/Unsure instead of Spam/Ham/Unsure, remove the #'s from the following bogofilter.cf line:
## spamicity_tags = Yes, No, Unsure
Once that's done, you may want to set the filtering rules for your mail program to include rules like:
    if header contains "X-Bogosity: Spam", put in Spam folder
    if header contains "X-Bogosity: Unsure", put in Unsure folder
    Alternatively, bogofilter.cf has directives for modifying the Subject: line, i.e.
    #### SPAM_SUBJECT_TAG
    #
    #    tag added to "Subject: " line for identifying spam or unsure
    #    default is to add nothing.
    #
    ##spam_subject_tag=***SPAM***
    ##unsure_subject_tag=???UNSURE???
    With these subject tags, the filtering rules would look like:
    if subject contains "***SPAM***", put in Spam folder
    if subject contains "???UNSURE???", put in Unsure folder
    "Training on error" involves scanning a corpus of known spam and non-spam messages; only those that are misclassified, or classed as unsure, get registered in the training database. It's been found that sampling just messages prone to misclassification is an effective way to train; if you train bogofilter on the hard messages, it learns to handle obvious spam and non-spam too.
This method can be enhanced by using a "security margin". By increasing the spam cutoff value and decreasing the ham cutoff value, messages which are close to a cutoff will be used for training. Using security margins improves results when training on error. In general, greater margins help more (although too much also isn't optimal). As a rule of thumb spam cutoff +/- 0.3 gives good results. For tristate mode, you might try the middle of the unsure interval +/- 0.3 for training.
Repeating training on error on the same message corpus can improve accuracy. The idea is that messages which were rated correctly in the first place might after some more training be rated wrongly which will then be corrected.
"Training to exhaustion" is repeating training on error, with the same message corpus, until no errors remain. Also this method can be improved with security margins. See Gary Robinson's Rants on this topic for more details.
Note: bogominitrain.pl has a -f option
    to do "training to exhaustion".  Using -fn avoids
    repeated training for each message.
The "-u" switch (autoupdate) is used to automatically expand the wordlist. When this switch is used and bogofilter classifies a message as Spam or Ham, the message's tokens are added to the wordlist with a ham/spam tag (as appropriate).
As an example, suppose a new "Refinance now - best Mortgage rates" message comes in. It will have some words that bogofilter has seen and (probably) some new ones as well. Using '-u' the new words will be added to the wordlist so that bogofilter can better recognize the next, related message.
If/when you use to use '-u', you need to be on the lookout for classification errors and retrain bogofilter with any messages that have been classified incorrectly. An incorrectly classified message that is auto-updated _may_ cause bogofilter to make additional classification errors in the future. This is the same problem as when you (the sys admin) incorrectly register a ham message as spam (or vice versa).
If you have a working SpamAssassin installation (or care to create one), you can use its return codes to train bogofilter. The easiest way is to create a script for your MDA that runs SpamAssassin, tests the spam/non-spam return code, and runs bogofilter to register the message as spam (or non-spam). The sample procmail recipe below shows one way to do this:
    BOGOFILTER     = "/usr/bin/bogofilter"
    BOGOFILTER_DIR = "training"
    SPAMASSASSIN  = "/usr/bin/spamassassin"
    :0 HBc
    * ? $SPAMASSASSIN -e
    #spam yields non-zero
    #non-spam yields zero
    | $BOGOFILTER -n -d $BOGOFILTER_DIR
    #else (E)
    :0Ec
    | $BOGOFILTER -s -d $BOGOFILTER_DIR
    :0fw
    | $BOGOFILTER -p -e
    :0:
    * ^X-Bogosity:.Spam
    spam
    :0:
    * ^X-Bogosity:.Ham
    non-spam
    Many people get unsolicited email using Asian language charsets. Since they don't know the languages and don't know people there, they assume it's spam.
The good news is that bogofilter does detect them quite successfully. The bad news is that this can be expensive. You have basically two choices:
You can simply let bogofilter handle it. Just train bogofilter with the Asian language messages identified as spam. Bogofilter will parse the messages as best it can and will add tokens to the spam wordlist. The wordlist will contain many tokens which don't make sense to you (since the charset cannot be displayed), but bogofilter can work with them and successfully identify Asian spam.
A second method is to use the "replace_nonascii_characters" config file option. This will replace high-bit characters, i.e. those between 0x80 and 0xFF, with question marks, '?'. This keeps the database much smaller. Unfortunately this conflicts with European language which have many accented vowels and consonant in the high-bit range.
If you are sure you will not receive any legitimate messages in those languages, you can kill them right away. This will keep the database smaller. You can do this with an MDA script.
Here's a procmail recipe that will sideline messages written with Asian charsets:
    ## Silently drop all Asian language mail
    UNREADABLE='[^?"]*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987'
    :0:
    * 1^0 $ ^Subject:.*=\?($UNREADABLE)
    * 1^0 $ ^Content-Type:.*charset="?($UNREADABLE)
    spam-unreadable
    :0:
    * ^Content-Type:.*multipart
    * B ?? $ ^Content-Type:.*^?.*charset="?($UNREADABLE)
    spam-unreadable
        With the above recipe, bogofilter will never see the message.
You can periodically compact the database so it occupies a minimum of disk space. Assuming your wordlist is in directory ~/.bogofilter, for bogofilter 0.93.0 (or newer) use:
bf_compact ~/.bogofilter wordlist.db
For bogofilter older than 0.93.0, use:
    cd ~/.bogofilter
    bogoutil -d wordlist.db | bogoutil -l wordlist.db.new
    mv wordlist.db wordlist.db.prv
    mv wordlist.db.new wordlist.db
    The script is needed to duplicate your database environment (in order to support BerkeleyDB transaction processing). Your original directory will be renamed to ~/.bogofilter.old and ~/.bogofilter will contain the new database environment.
Since older versions of bogofilter don't use Berkeley DB transactions, the database is just a single file (wordlist.db) and it isn't necessary to use the script. The commands shown above create a new compact database and rename the original file to wordlist.db.prv
Note: it's O.K. to use the script with old versions of bogofilter.
To find the spam and ham counts for a token (word) use bogoutil's '-w' option. For example, "bogoutil -w $BOGOFILTER_DIR/wordlist.db example.com" gives the good and bad counts for "example.com".
If you want the spam score in addition to the spam and ham counts for a token (word) use bogoutil's '-p' option. For example, "bogoutil -p $BOGOFILTER_DIR/wordlist.db example.com" gives the good and bad counts for "example.com".
To find out how many messages are in your wordlists query the special token .MSG_COUNT, i.e., run command "bogoutil -w $BOGOFILTER_DIR/wordlist.db .MSG_COUNT" to see the counts for the spam and ham wordlists.
To tell how many tokens are in your wordlists pipe the output of bogoutil's dump command to command "wc", i.e. use "bogoutil -d $BOGOFILTER_DIR/wordlist.db | wc -l " to display the count.
Yes.  Bogofilter can be run with multiple wordlists.  For
    example, if you have both user and
    system wordlists, bogofilter can be instructed to
    check the user list and, if the word isn't there, then check the
    system list.  Alternatively, it can be instructed to add together
    the information from the two lists.
Following are the config file options and some examples:
A wordlist has several attributes, notably type, name, filename, and precedence.
Example 1 - merge user and system lists:
    wordlist R,user,~/wordlist.db,1
    wordlist R,system,/var/spool/bogofilter/wordlist.db,1
    Example 2 - prefer user to system list:
    wordlist R,user,~/wordlist.db,2
    wordlist R,system,/var/spool/bogofilter/wordlist.db,3
    Example 3 - prefer system to user list:
    wordlist R,user,~/wordlist.db,5
    wordlist R,system,/var/spool/bogofilter/wordlist.db,4
    Note 1: bogofilter's registration flags ('-s', '-n', '-u', '-S', '-N' ) will apply to the lowest numbered list.
Note 2: having lists of types 'R' and 'I' of the same precedence won't be allowed because the types are contradictory.
Through the use of an ignore list, bogofilter will ignore the listed tokens when scoring the message.
Example:
    wordlist I,ignore,~/ignorelist.db,7
    wordlist R,system,/var/spool/bogofilter/wordlist.db,8
    Because ignorelist.db has a lower index (7), than
    wordlist.db (8), bogofilter will stop looking when
    finds a token in ignorelist.db.
Note: Technically, bogofilter gives a score of ROBX to the tokens and expects the min_dev parameter to drop them from the scoring.
There are two main methods for building/maintaining an ignore list.
First, a text file can be created and maintained using any text editor. Bogoutil can convert the text file to database format, e.g. "bogoutil -l ignorelist.db < ignorelist.txt".
Alternatively, echo ... | bogoutil ... can be used
    to add a single token, for example "ignore.me", as in:
echo ignore.me | bogoutil -l ~/ignorelist.db
Run script bogoupgrade. For more info, run "bogoupgrade -h" to see its help message or run "man bogoupgrade" and read its man page.
NOTE: some distributors rename all the db_ utilities given below by inserting or appending the version number, with or without dot, for instance db4.1_verify or db_verify-4.2. There is no standard on the renaming of these utilities.
If you think your wordlists are hosed, you can see what BerkeleyDB thinks by running:
db_verify wordlist.db
You may be able to recover some (or all) of the tokens and their counts with the following commands:
bogoutil -d wordlist.db | bogoutil -l wordlist.new.db
or - if there has been more damage to the token list - with
    db_dump -r wordlist.db > wordlist.txt
    db_load wordlist.new.db < wordlist.txt
    You can also use a text file instead of a pipe, as in:
    bogoutil -d wordlist.db > wordlist.txt
    bogoutil -l wordlist.db.new < wordlist.txt
    Wordlists can be converted from raw storage to unicode using:
    bogoutil -d wordlist.db > wordlist.raw.txt
    iconv -f iso-8859-1 -t utf-8 < wordlist.raw.txt > wordlist.utf8.txt
    bogoutil -l wordlist.db.new < wordlist.utf8.txt
    or:
bogoutil --unicode=yes -m wordlist.db
Wordlists can be converted from unicode to raw storage using:
    bogoutil -d wordlist.db > wordlist.utf8.txt
    iconv -f utf-8  -t iso-8859-1 < wordlist.utf8.txt > wordlist.raw.txt
    bogoutil -l wordlist.db.new < wordlist.raw.txt
    or:
bogoutil --unicode=no -m wordlist.db
The above methods work best when the wordlist is based on the iso-8859-1 charset. If your wordlist is based on a different charset, for example CP866 or KOI8-R, use that charset in the above commands.
For a wordlist containing tokens from multiple languages, particularly non-european languages, the conversion methods described above may not work well. Building a new wordlist (from scratch) will likely work better as the new wordlist will be based solely on unicode.
How to do this is fully documented in file doc/README.db section 2.2.1. We suggest you read the whole section.
In brief, use these commands:
    cd ~/.bogofilter
    bogoutil -d wordlist.db > wordlist.txt
    mv wordlist.db wordlist.db.old
    bogoutil --db-transaction=yes -l wordlist.db < wordlist.txt
    If everything went well, you can remove the backup files:
rm wordlist.db.old wordlist.txt
How to do this is fully documented in file doc/README.db section 2.2.2. We suggest you read the whole section.
In brief, you can use bogoutil to dump/load the wordlist, for example:
    cd ~/.bogofilter
    bogoutil -d wordlist.db > wordlist.txt
    mv wordlist.db wordlist.db.old
    rm -f log.?????????? __db.???
    bogoutil --db-transaction=no -l wordlist.db < wordlist.txt
    The transactional and concurrent modes of BerkeleyDB require a lock table that corresponds to the data base in size. See the README.db file for a detailed explanation and a remedy.
The size of the lock table can be set in bogofilter.cf or in DB_CONFIG. Bogofilter.cf uses the db_lk_max_locks and db_lk_max_objects directives, while DB_CONFIG uses the set_lk_max_objects and set_lk_max_locks directives.
After changing these values in DB_CONFIG, run command
bogoutil --db-recover /your/bogofilter/directory
to rebuild the lock tables.
You have a problem with your BerkeleyDB database. There are two likely causes: either you've hit a max size limit or the database is corrupt.
Some mail transfer agents, such as Postfix, impose file size limits. When bogofilter's database reaches that limit, write problems will occur.
To show the database size use:
ls -lh $BOGOFILTER_DIR/wordlist.db
To show the postfix setting:
postconf | grep mailbox_size_limit
To set the limit to 73MB (or whatever size is right for you):
postconf -e mailbox_size_limit=73000000
If you think your database may be corrupt, read How can I tell if my wordlists are corrupted? FAQ entry.
Some distributors (for instance the Fedora Project) package Berkeley DB with support for POSIX threading and hence POSIX mutexes, but your system does not support POSIX mutexes (whether it does, depends on the kernel version and exact processor type).
To work around this problem:
Yes, it can. There are multiple, distinct strategies for doing this. The two extremes are:
As a middle ground, the bogofilter administrator can create and maintain the global wordlists and each user can be given the choice of using the global wordlist or a private wordlist. An MDA, such as procmail, can be programmed to first apply the global wordlist (with a very stringent spam cutoff) and then (if necessary) apply the user's wordlist.
If you're just reading from them, there are no problems. When you're updating them, you need to use the correct file locking to avoid data corruption. When you compile bogofilter, you will need to verify that the configure script has set "#define HAVE_FCNTL 1" in your config.h file. Popular UNIX operating systems will all support this. If you are running an unusual, or an older version of an operating system, make sure it supports fcntl(). If your system does not support fcntl(), then you will not be able to share wordlist files over NFS without the risk of data corruption.
Next, make sure you have NFS set up properly, with "lockd" running. Refer to your NFS documentation for more information about running "lockd" or "rpc.lockd". Most operating systems with NFS turn this on by default.
For shared directories (NFS directories used by multiple
    machines, for instance, Sparc/Itanium/Alpha and x86), the
    architecture-specific parts can be installed separately by giving
    a different --exec-prefix (it will default to
    --prefix)
    
Likely the return codes are being reformatted by waitpid(2).
    In C use WEXITSTATUS(status) in sys/wait.h, or comparable macro,
    to get the correct value.  In Perl you can just use
    'system("bogofilter $input") >> 8'.  If you want more info, run
    "man waitpid".
Over time bogofilter accumulated a large number of functions. Some of those were discontinued or changed. Please read the NEWS file for details.
The lexer, i.e., that part of bogofilter which extracts tokens from a message, evolves. This results in different readings of messages with the consequence that some tokens in the database can no longer be used.
If you encounter this problem, you are strongly advised to rebuild your database. If this is not an option for you, you might want to use version 0.15.13 and read the documentation which comes with it for how to migrate your database.
Bogoutil lets you dump a wordlist and load the tokens into a new wordlist. With the added use of awk and grep, counts can be zeroed and tokens with zero counts for both spam and non-spam can be deleted.
The following commands will delete the tokens from spam messages:
    bogoutil -d wordlist.db | \
    awk '{print $1 " " $2 " 0"}' | grep -v " 0 0" | \
    bogoutil -l wordlist.new.db
    The following commands will delete the tokens from non-spam messages:
    bogoutil -d wordlist.db | \
    awk '{print $1 " 0 " $3}' | grep -v " 0 0" | \
    bogoutil -l wordlist.new.db
    If you don't already have a v3.0+ version of BerkeleyDB, then download it (take one of the 4.4.X, 4.3.X or 4.2.X versions), unpack it, and do these commands in the db directory:
    $ cd build_unix
    $ sh ../dist/configure
    $ make
    # make install
    Next, download a portable version of bogofilter.
Be sure that your PATH environment variable begins with /usr/xpg6/bin:/usr/xpg4/bin:/usr/ccs/bin (/usr/xpg6/bin is only present on Solaris 10 and can be omitted on Solaris 9 and older versions). That is required for POSIX compliance.
Unpack it, and then do:
    $ ./configure --with-libdb-prefix=/usr/local/BerkeleyDB.4.4
    $ make
    # make install-strip
    You will either want to put a symlink to libdb.so in /usr/lib, or use a modified LD_LIBRARY_PATH environment variable before you start bogofilter. On newer systems, the most convenient way is probably to use the crle(1) tool to set the path permanently so BerkeleyDB is available to all applications.
    $ LD_LIBRARY_PATH=/usr/lib:/usr/local/lib:/usr/local/BerkeleyDB.4.4
    $ export LD_LIBRARY_PATH
    Note that some "make" versions shipped with older Solaris version break when you try to build bogofilter outside of its source directory. Either build in the source directory (as suggested above) or use GNU make (gmake).
If your Solaris GCC complains with "ld: fatal: file values-Xa.o: open failed: No such file or directory", install the SUNWarc package.
The FreeBSD ports collection carries the latest stable versions of bogofilter to be compiled from source. The bogofilter ports are also auto-built and provided as binary packages for you to install.
The binary packages approach uses default installed software. To install bogofilter from binary package, type, as the privileged user:
pkg install -y bogofilter
The ports from-source approach uses the highly recommended portmaster and portsnap software packages. To install portmaster, type (you need to do this only once), as root:
pkg install -y portmaster
To install or upgrade bogofilter, just upgrade your portstree using portsnap, then type, as root:
portmaster mail/bogofilter
Note: This assumes you are root. If not, read through the remainder of this FreeBSD section and then see how you can build if you haven't got root privileges.
pkgsrc should be offering a reasonably recent stable bogofilter release. See http://www.pkgsrc.org/ for information on pkgsrc.
See the file doc/programmer/README.hp-ux in the source distribution.
Bogofilter has been successfully built on many operating systems using GNU make and the native make commands. However, bogofilter's Makefile doesn't work with some make commands.
GNU make is recommended for building bogofilter because we know it works. We cannot support less capable make commands. If your non-GNU make command can successfully build bogofilter, that's great. If you encounter problems, the right thing to do is install GNU make. If your non-GNU make can't build bogofilter, we're sorry but you're on your own. If it takes just a minor and clean patch to make it compatible, we might take it.
To install bogofilter to a non-standard path (as non-root user
    you don't have the permission to the normal paths), you need to
    provide the installation prefix when you run ./configure.
    
 After downloading and unpacking the 
    source code, run ./configure --prefix=PATH where
    PATH is the installation prefix for the generated files (binaries,
    man pages etc.).  Then run the usual build commands —
    make && make check && make install.
    
If you need to apply patches, get the source
    code and unpack it using tar -xzf or gunzip
    | tar -xf - (as appropriate). Change to the
    source directory and run ./configure --prefix=PATH
    where PATH is the installation prefix for the generated files
    (binaries, man pages etc.).  Apply your patches than run
    make && make install.
    
When space is tight, you can use make
    install-strip instead of make install.  Doing
    this will save space, but crashes can't be debugged unless more
    information on reproducing the bug is provided to the
    developers.
If you are configuring a data base path for instance with --with-libdb-prefix or via CPPFLAGS and LIBS, be sure to pass in an absolute path (with leading slash), a relative path will not work. Example: use --with-libdb-prefix=/usr/local/BerkeleyDB.4.2, but not --with-libdb-prefix=../BerkeleyDB.4.2
Bogofilter is known to work with kmail, mozilla-mail, mutt, alpine, sylpheed-claws. A google search will help you find more information on using bogofilter with the mail program you use.
Use a mail filter (procmail, maildrop, etc.) to filter mail into different folders based on bogofilter's return code and set mutt key bindings to train bogofilter on errors:
    macro index S "|bogofilter -s\ns=junkmail"  "Learn as spam and save to junk"
    macro pager S "|bogofilter -s\ns=junkmail"  "Learn as spam and save to junk"
    macro index H "|bogofilter -n\ns="          "Learn as ham and save"
    macro pager H "|bogofilter -n\ns="          "Learn as ham and save"
    These will pipe the selected message through bogofilter, training a false-ham as spam or vice versa, then offer to save the message to a different folder.
Add a filtering rule to run bogofilter on incoming messages and an action to perform if it's spam
    condition:
    * test "bogofilter < %F"
    action:
    * move "#mh/YOUR_SPAM_BOX"
    Note: this assumes that bogofilter is in your path!
Create two Claws actions - one for marking messages as spam and one for marking messages as ham. Use the "Mark As Spam" action for messages incorrectly classified as ham and use the "Mark As Ham" action for messages incorrectly classified as spam.
    Mark as ham / spam:
    * bogofilter -n -v -B "%f" (mark ham)
    * bogofilter -s -v -B "%f" (mark spam)
    Another approach is to save incorrectly classified messages in a folder (or folders) and run a script like:
    #!/bin/sh
    CONFIGDIR=~/.bogofilter
    SPAMDIRS="$CONFIGDIR/spamdirs"
    MARKFILE="$CONFIGDIR/lastbogorun"
    for D in `cat "$SPAMDIRS"`; do
        find "$D" -type f -newer "$MARKFILE" -not -name ".sylpheed*"
    done|bogofilter -bNsv
    touch "$MARKFILE"
    This script can be used as an action and/or made into a toolbar button. It will register as spam the messages in ${SPAMDIRS} that are newer than ${MARKFILE}.
Additional information is available at the Sylpheed-Claws's wiki.
Another approach is to run bogofilter from procmail, maildrop, etc and have Claws check the X-Bogosity header and filter messages into Spam and Unsure folders, e.g.:
    Condition:
        header "X-Bogosity" matchcase "Spam"
    Action:
        move "#mh/Mailbox/Spam"
    Condition:
        header "X-Bogosity" matchcase "Unsure"
    Action:
        move "#mh/Mailbox/Unsure"
    Any messages in the Unsure folder should be used for training, as should messages incorrectly classified as ham or spam. The actions below will handle these cases:
    Register Spam:
        bogofilter -s < "%f"
    Register Ham:
        bogofilter -n < "%f"
    Unregister Spam:
        bogofilter -S < "%f"
    Unregister Ham:
        bogofilter -N < "%f"
    To look inside the bogofilter scoring mechanism, the following diagnostic are useful:
    BogoTest -vv:
        bogofilter -vv < "%f"
    BogoTest -vvv:
        bogofilter -vvv < "%f"
    Additional information on this approach is available here.
You need to include the separate file vm-bogofilter.el (included in bogofilter's contrib directory). The latest version of the file is at http://www.cis.upenn.edu/~bjornk/bogofilter/vm-bogofilter.el) in your emacs path.
Then, just add in your ~/.vm configuration file:
;; load bogofilter capabilities (spam) ;; (require 'vm-bogofilter) ;; short-key for bogofilter ;; C (shift-c) means spam message ;; K (shift-k) means ham message (define-key vm-mode-map "K" 'vm-bogofilter-is-spam) (define-key vm-mode-map "C" 'vm-bogofilter-is-clean)
All the messages are filtered by bogofilter each time you check newly arrived e-mail. When you change the status of an e-mail, the bogofilter header is changed (X-Bogosity: header).
There is a limit: you cannot change multiple message headers at one time in VM; you have to do it message by message.
The default setting of the 'mh-junk-program' option is 'Auto-detect' which means that MH-E will automatically choose one of SpamAssassin, Bogofilter, or SpamProbe in that order. If, for example, you have both SpamAssassin and Bogofilter installed and you want to use BogoFilter, then you can set this option to 'Bogofilter'.
The 'J b' ('mh-junk-blacklist') command trains the spam program in use with the content of the range and then handles the message(s) as specified by the 'mh-junk-disposition' option. By default, this option is set to 'Delete Spam' but you can also specify the name of the folder which is useful for building a corpus of spam for training purposes.
In contrast, the 'J w' ('mh-junk-whitelist') command reclassifies a range of messages as ham if it were incorrectly classified as spam. It then refiles the message into the '+inbox' folder.
For more information, see the MH-E home page