aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorReiner Steib2006-04-20 20:14:50 +0000
committerReiner Steib2006-04-20 20:14:50 +0000
commit93f86ee0b157d3f328ebd407b11abd6002a4b130 (patch)
tree5df26e098cb03ba01d3b54dd14438eb7ce5bc80f
parent5a02d811ed1b04c4f877e1bfd8607750dc02cf22 (diff)
downloademacs-93f86ee0b157d3f328ebd407b11abd6002a4b130.tar.gz
emacs-93f86ee0b157d3f328ebd407b11abd6002a4b130.zip
2006-04-20 Reiner Steib <Reiner.Steib@gmx.de>
* gnus.texi (Spam Statistics Package): Fix typo in @pxref. (Splitting mail using spam-stat): Fix @xref. 2006-04-20 Chong Yidong <cyd@stupidchicken.com> * gnus.texi (Spam Package): Major revision of the text. Previouly this node was "Filtering Spam Using The Spam ELisp Package".
-rw-r--r--man/ChangeLog12
-rw-r--r--man/gnus.texi613
2 files changed, 328 insertions, 297 deletions
diff --git a/man/ChangeLog b/man/ChangeLog
index 8f5e306d290..100920e311a 100644
--- a/man/ChangeLog
+++ b/man/ChangeLog
@@ -1,3 +1,13 @@
12006-04-20 Reiner Steib <Reiner.Steib@gmx.de>
2
3 * gnus.texi (Spam Statistics Package): Fix typo in @pxref.
4 (Splitting mail using spam-stat): Fix @xref.
5
62006-04-20 Chong Yidong <cyd@stupidchicken.com>
7
8 * gnus.texi (Spam Package): Major revision of the text. Previouly
9 this node was "Filtering Spam Using The Spam ELisp Package".
10
12006-04-20 Carsten Dominik <dominik@science.uva.nl> 112006-04-20 Carsten Dominik <dominik@science.uva.nl>
2 12
3 * org.texi: (Time stamps): Better explanation of the purpose of 13 * org.texi: (Time stamps): Better explanation of the purpose of
@@ -8,7 +18,7 @@
82006-04-18 J.D. Smith <jdsmith@as.arizona.edu> 182006-04-18 J.D. Smith <jdsmith@as.arizona.edu>
9 19
10 * misc.texi (Shell Ring): Added notes on saved input when 20 * misc.texi (Shell Ring): Added notes on saved input when
11 navigating off the end of the history list. 21 navigating off the end of the history list.
12 22
132006-04-18 Chong Yidong <cyd@mit.edu> 232006-04-18 Chong Yidong <cyd@mit.edu>
14 24
diff --git a/man/gnus.texi b/man/gnus.texi
index 75e6243ba5e..2f1a7322dc0 100644
--- a/man/gnus.texi
+++ b/man/gnus.texi
@@ -799,7 +799,8 @@ Various
799* Moderation:: What to do if you're a moderator. 799* Moderation:: What to do if you're a moderator.
800* Image Enhancements:: Modern versions of Emacs/XEmacs can display images. 800* Image Enhancements:: Modern versions of Emacs/XEmacs can display images.
801* Fuzzy Matching:: What's the big fuzz? 801* Fuzzy Matching:: What's the big fuzz?
802* Thwarting Email Spam:: A how-to on avoiding unsolicited commercial email. 802* Thwarting Email Spam:: Simple ways to avoid unsolicited commercial email.
803* Spam Package:: A package for filtering and processing spam.
803* Other modes:: Interaction with other modes. 804* Other modes:: Interaction with other modes.
804* Various Various:: Things that are really various. 805* Various Various:: Things that are really various.
805 806
@@ -818,7 +819,8 @@ Image Enhancements
818 819
819* X-Face:: Display a funky, teensy black-and-white image. 820* X-Face:: Display a funky, teensy black-and-white image.
820* Face:: Display a funkier, teensier colored image. 821* Face:: Display a funkier, teensier colored image.
821* Smileys:: Show all those happy faces the way they were meant to be shown. 822* Smileys:: Show all those happy faces the way they were
823 meant to be shown.
822* Picons:: How to display pictures of what you're reading. 824* Picons:: How to display pictures of what you're reading.
823* XVarious:: Other XEmacsy Gnusey variables. 825* XVarious:: Other XEmacsy Gnusey variables.
824 826
@@ -828,28 +830,19 @@ Thwarting Email Spam
828* Anti-Spam Basics:: Simple steps to reduce the amount of spam. 830* Anti-Spam Basics:: Simple steps to reduce the amount of spam.
829* SpamAssassin:: How to use external anti-spam tools. 831* SpamAssassin:: How to use external anti-spam tools.
830* Hashcash:: Reduce spam by burning CPU time. 832* Hashcash:: Reduce spam by burning CPU time.
831* Filtering Spam Using The Spam ELisp Package::
832* Filtering Spam Using Statistics with spam-stat::
833 833
834Filtering Spam Using The Spam ELisp Package 834Spam Package
835 835
836* Spam ELisp Package Sequence of Events:: 836* Spam Package Introduction::
837* Spam ELisp Package Filtering of Incoming Mail:: 837* Filtering Incoming Mail::
838* Spam ELisp Package Global Variables:: 838* Detecting Spam in Groups::
839* Spam ELisp Package Configuration Examples:: 839* Spam and Ham Processors::
840* Blacklists and Whitelists:: 840* Spam Package Configuration Examples::
841* BBDB Whitelists:: 841* Spam Back Ends::
842* Gmane Spam Reporting:: 842* Extending the Spam package::
843* Anti-spam Hashcash Payments:: 843* Spam Statistics Package::
844* Blackholes::
845* Regular Expressions Header Matching::
846* Bogofilter::
847* ifile spam filtering::
848* spam-stat spam filtering::
849* SpamOracle::
850* Extending the Spam ELisp package::
851 844
852Filtering Spam Using Statistics with spam-stat 845Spam Statistics Package
853 846
854* Creating a spam-stat dictionary:: 847* Creating a spam-stat dictionary::
855* Splitting mail using spam-stat:: 848* Splitting mail using spam-stat::
@@ -20797,7 +20790,8 @@ four days, Gnus will decay the scores four times, for instance.
20797* Fetching a Group:: Starting Gnus just to read a group. 20790* Fetching a Group:: Starting Gnus just to read a group.
20798* Image Enhancements:: Modern versions of Emacs/XEmacs can display images. 20791* Image Enhancements:: Modern versions of Emacs/XEmacs can display images.
20799* Fuzzy Matching:: What's the big fuzz? 20792* Fuzzy Matching:: What's the big fuzz?
20800* Thwarting Email Spam:: A how-to on avoiding unsolicited commercial email. 20793* Thwarting Email Spam:: Simple ways to avoid unsolicited commercial email.
20794* Spam Package:: A package for filtering and processing spam.
20801* Other modes:: Interaction with other modes. 20795* Other modes:: Interaction with other modes.
20802* Various Various:: Things that are really various. 20796* Various Various:: Things that are really various.
20803@end menu 20797@end menu
@@ -22479,8 +22473,6 @@ This is annoying. Here's what you can do about it.
22479* Anti-Spam Basics:: Simple steps to reduce the amount of spam. 22473* Anti-Spam Basics:: Simple steps to reduce the amount of spam.
22480* SpamAssassin:: How to use external anti-spam tools. 22474* SpamAssassin:: How to use external anti-spam tools.
22481* Hashcash:: Reduce spam by burning CPU time. 22475* Hashcash:: Reduce spam by burning CPU time.
22482* Filtering Spam Using The Spam ELisp Package::
22483* Filtering Spam Using Statistics with spam-stat::
22484@end menu 22476@end menu
22485 22477
22486@node The problem of spam 22478@node The problem of spam
@@ -22796,41 +22788,107 @@ hashcash cookies, it is expected that this is performed by your hand
22796customized mail filtering scripts. Improvements in this area would be 22788customized mail filtering scripts. Improvements in this area would be
22797a useful contribution, however. 22789a useful contribution, however.
22798 22790
22799@node Filtering Spam Using The Spam ELisp Package 22791@node Spam Package
22800@subsection Filtering Spam Using The Spam ELisp Package 22792@section Spam Package
22793@cindex spam filtering
22794@cindex spam
22795
22796The Spam package provides Gnus with a centralized mechanism for
22797detecting and filtering spam. It filters new mail, and processes
22798messages according to whether they are spam or ham. (@dfn{Ham} is the
22799name used throughout this manual to indicate non-spam messages.)
22800
22801@menu
22802* Spam Package Introduction::
22803* Filtering Incoming Mail::
22804* Detecting Spam in Groups::
22805* Spam and Ham Processors::
22806* Spam Package Configuration Examples::
22807* Spam Back Ends::
22808* Extending the Spam package::
22809* Spam Statistics Package::
22810@end menu
22811
22812@node Spam Package Introduction
22813@subsection Spam Package Introduction
22801@cindex spam filtering 22814@cindex spam filtering
22815@cindex spam filtering sequence of events
22802@cindex spam 22816@cindex spam
22803 22817
22804The idea behind @file{spam.el} is to have a control center for spam detection 22818You must read this section to understand how the Spam package works.
22805and filtering in Gnus. To that end, @file{spam.el} does two things: it 22819Do not skip, speed-read, or glance through this section.
22806filters new mail, and it analyzes mail known to be spam or ham.
22807@dfn{Ham} is the name used throughout @file{spam.el} to indicate
22808non-spam messages.
22809 22820
22810@cindex spam-initialize 22821@cindex spam-initialize
22811First of all, you @strong{must} run the function 22822@vindex spam-use-stat
22812@code{spam-initialize} to autoload @code{spam.el} and to install the 22823To use the Spam package, you @strong{must} first run the function
22813@code{spam.el} hooks. There is one exception: if you use the 22824@code{spam-initialize}:
22814@code{spam-use-stat} (@pxref{spam-stat spam filtering}) setting, you
22815should turn it on before @code{spam-initialize}:
22816 22825
22817@example 22826@example
22818(setq spam-use-stat t) ;; if needed
22819(spam-initialize) 22827(spam-initialize)
22820@end example 22828@end example
22821 22829
22822So, what happens when you load @file{spam.el}? 22830This autoloads @code{spam.el} and installs the various hooks necessary
22823 22831to let the Spam package do its job. In order to make use of the Spam
22824First, some hooks will get installed by @code{spam-initialize}. There 22832package, you have to set up certain group parameters and variables,
22825are some hooks for @code{spam-stat} so it can save its databases, and 22833which we will describe below. All of the variables controlling the
22826there are hooks so interesting things will happen when you enter and 22834Spam package can be found in the @samp{spam} customization group.
22827leave a group. More on the sequence of events later (@pxref{Spam 22835
22828ELisp Package Sequence of Events}). 22836There are two ``contact points'' between the Spam package and the rest
22829 22837of Gnus: checking new mail for spam, and leaving a group.
22830You get the following keyboard commands: 22838
22839Checking new mail for spam is done in one of two ways: while splitting
22840incoming mail, or when you enter a group.
22841
22842The first way, checking for spam while splitting incoming mail, is
22843suited to mail back ends such as @code{nnml} or @code{nnimap}, where
22844new mail appears in a single spool file. The Spam package processes
22845incoming mail, and sends mail considered to be spam to a designated
22846``spam'' group. @xref{Filtering Incoming Mail}.
22847
22848The second way is suited to back ends such as @code{nntp}, which have
22849no incoming mail spool, or back ends where the server is in charge of
22850splitting incoming mail. In this case, when you enter a Gnus group,
22851the unseen or unread messages in that group are checked for spam.
22852Detected spam messages are marked as spam. @xref{Detecting Spam in
22853Groups}.
22854
22855@cindex spam back ends
22856In either case, you have to tell the Spam package what method to use
22857to detect spam messages. There are several methods, or @dfn{spam back
22858ends} (not to be confused with Gnus back ends!) to choose from: spam
22859``blacklists'' and ``whitelists'', dictionary-based filters, and so
22860forth. @xref{Spam Back Ends}.
22861
22862In the Gnus summary buffer, messages that have been identified as spam
22863always appear with a @samp{$} symbol.
22864
22865The Spam package divides Gnus groups into three categories: ham
22866groups, spam groups, and unclassified groups. You should mark each of
22867the groups you subscribe to as either a ham group or a spam group,
22868using the @code{spam-contents} group parameter (@pxref{Group
22869Parameters}). Spam groups have a special property: when you enter a
22870spam group, all unseen articles are marked as spam. Thus, mail split
22871into a spam group is automatically marked as spam.
22872
22873Identifying spam messages is only half of the Spam package's job. The
22874second half comes into play whenever you exit a group buffer. At this
22875point, the Spam package does several things:
22876
22877First, it calls @dfn{spam and ham processors} to process the articles
22878according to whether they are spam or ham. There is a pair of spam
22879and ham processors associated with each spam back end, and what the
22880processors do depends on the back end. At present, the main role of
22881spam and ham processors is for dictionary-based spam filters: they add
22882the contents of the messages in the group to the filter's dictionary,
22883to improve its ability to detect future spam. The @code{spam-process}
22884group parameter specifies what spam processors to use. @xref{Spam and
22885Ham Processors}.
22886
22887If the spam filter failed to mark a spam message, you can mark it
22888yourself, so that the message is processed as spam when you exit the
22889group:
22831 22890
22832@table @kbd 22891@table @kbd
22833
22834@item M-d 22892@item M-d
22835@itemx M s x 22893@itemx M s x
22836@itemx S x 22894@itemx S x
@@ -22838,189 +22896,103 @@ You get the following keyboard commands:
22838@kindex S x 22896@kindex S x
22839@kindex M s x 22897@kindex M s x
22840@findex gnus-summary-mark-as-spam 22898@findex gnus-summary-mark-as-spam
22841@code{gnus-summary-mark-as-spam}. 22899@findex gnus-summary-mark-as-spam
22842 22900Mark current article as spam, showing it with the @samp{$} mark
22843Mark current article as spam, showing it with the @samp{$} mark. 22901(@code{gnus-summary-mark-as-spam}).
22844Whenever you see a spam article, make sure to mark its summary line
22845with @kbd{M-d} before leaving the group. This is done automatically
22846for unread articles in @emph{spam} groups.
22847
22848@item M s t
22849@itemx S t
22850@kindex M s t
22851@kindex S t
22852@findex spam-bogofilter-score
22853@code{spam-bogofilter-score}.
22854
22855You must have Bogofilter installed for that command to work properly.
22856
22857@xref{Bogofilter}.
22858
22859@end table 22902@end table
22860 22903
22861Also, when you load @file{spam.el}, you will be able to customize its 22904@noindent
22862variables. Try @code{customize-group} on the @samp{spam} variable 22905Similarly, you can unmark an article if it has been erroneously marked
22863group. 22906as spam. @xref{Setting Marks}.
22864
22865@menu
22866* Spam ELisp Package Sequence of Events::
22867* Spam ELisp Package Filtering of Incoming Mail::
22868* Spam ELisp Package Global Variables::
22869* Spam ELisp Package Configuration Examples::
22870* Blacklists and Whitelists::
22871* BBDB Whitelists::
22872* Gmane Spam Reporting::
22873* Anti-spam Hashcash Payments::
22874* Blackholes::
22875* Regular Expressions Header Matching::
22876* Bogofilter::
22877* ifile spam filtering::
22878* spam-stat spam filtering::
22879* SpamOracle::
22880* Extending the Spam ELisp package::
22881@end menu
22882
22883@node Spam ELisp Package Sequence of Events
22884@subsubsection Spam ELisp Package Sequence of Events
22885@cindex spam filtering
22886@cindex spam filtering sequence of events
22887@cindex spam
22888
22889You must read this section to understand how @code{spam.el} works.
22890Do not skip, speed-read, or glance through this section.
22891
22892There are two @emph{contact points}, if you will, between
22893@code{spam.el} and the rest of Gnus: checking new mail for spam, and
22894leaving a group.
22895
22896Getting new mail is done in one of two ways. You can either split
22897your incoming mail or you can classify new articles as ham or spam
22898when you enter the group.
22899
22900Splitting incoming mail is better suited to mail backends such as
22901@code{nnml} or @code{nnimap} where new mail appears in a single file
22902called a @dfn{Spool File}. See @xref{Spam ELisp Package Filtering of
22903Incoming Mail}.
22904
22905For backends such as @code{nntp} there is no incoming mail spool, so
22906an alternate mechanism must be used. This may also happen for
22907backends where the server is in charge of splitting incoming mail, and
22908Gnus does not do further splitting. The @code{spam-autodetect} and
22909@code{spam-autodetect-methods} group parameters (accessible with
22910@kbd{G c} and @kbd{G p} as usual), and the corresponding variables
22911@code{gnus-spam-autodetect-methods} and
22912@code{gnus-spam-autodetect-methods} (accessible with @kbd{M-x
22913customize-variable} as usual).
22914
22915When @code{spam-autodetect} is used, it hooks into the process of
22916entering a group. Thus, entering a group with unseen or unread
22917articles becomes the substitute for checking incoming mail. Whether
22918only unseen articles or all unread articles will be processed is
22919determined by the @code{spam-autodetect-recheck-messages}. When set
22920to @code{t}, unread messages will be rechecked.
22921
22922@code{spam-autodetect} grants the user at once more and less control
22923of spam filtering. The user will have more control over each group's
22924spam methods, so for instance the @samp{ding} group may have
22925@code{spam-use-BBDB} as the autodetection method, while the
22926@samp{suspect} group may have the @code{spam-use-blacklist} and
22927@code{spam-use-bogofilter} methods enabled. Every article detected to
22928be spam will be marked with the spam mark @samp{$} and processed on
22929exit from the group as normal spam. The user has less control over
22930the @emph{sequence} of checks, as he might with @code{spam-split}.
22931
22932When the newly split mail goes into groups, or messages are
22933autodetected to be ham or spam, those groups must be exited (after
22934entering, if needed) for further spam processing to happen. It
22935matters whether the group is considered a ham group, a spam group, or
22936is unclassified, based on its @code{spam-content} parameter
22937(@pxref{Spam ELisp Package Global Variables}). Spam groups have the
22938additional characteristic that, when entered, any unseen or unread
22939articles (depending on the @code{spam-mark-only-unseen-as-spam}
22940variable) will be marked as spam. Thus, mail split into a spam group
22941gets automatically marked as spam when you enter the group.
22942
22943So, when you exit a group, the @code{spam-processors} are applied, if
22944any are set, and the processed mail is moved to the
22945@code{ham-process-destination} or the @code{spam-process-destination}
22946depending on the article's classification. If the
22947@code{ham-process-destination} or the @code{spam-process-destination},
22948whichever is appropriate, are @code{nil}, the article is left in the
22949current group.
22950
22951If a spam is found in any group (this can be changed to only non-spam
22952groups with @code{spam-move-spam-nonspam-groups-only}), it is
22953processed by the active @code{spam-processors} (@pxref{Spam ELisp
22954Package Global Variables}) when the group is exited. Furthermore, the
22955spam is moved to the @code{spam-process-destination} (@pxref{Spam
22956ELisp Package Global Variables}) for further training or deletion.
22957You have to load the @code{gnus-registry.el} package and enable the
22958@code{spam-log-to-registry} variable if you want spam to be processed
22959no more than once. Thus, spam is detected and processed everywhere,
22960which is what most people want. If the
22961@code{spam-process-destination} is @code{nil}, the spam is marked as
22962expired, which is usually the right thing to do.
22963
22964If spam can not be moved---because of a read-only backend such as
22965@acronym{NNTP}, for example, it will be copied.
22966 22907
22967If a ham mail is found in a ham group, as determined by the 22908Normally, a ham message found in a non-ham group is not processed as
22968@code{ham-marks} parameter, it is processed as ham by the active ham 22909ham---the rationale is that it should be moved into a ham group for
22969@code{spam-processor} when the group is exited. With the variables 22910further processing (see below). However, you can force these articles
22911to be processed as ham by setting
22970@code{spam-process-ham-in-spam-groups} and 22912@code{spam-process-ham-in-spam-groups} and
22971@code{spam-process-ham-in-nonham-groups} the behavior can be further 22913@code{spam-process-ham-in-nonham-groups}.
22972altered so ham found anywhere can be processed. You have to load the
22973@code{gnus-registry.el} package and enable the
22974@code{spam-log-to-registry} variable if you want ham to be processed
22975no more than once. Thus, ham is detected and processed only when
22976necessary, which is what most people want. More on this in
22977@xref{Spam ELisp Package Configuration Examples}.
22978 22914
22979If ham can not be moved---because of a read-only backend such as 22915@vindex gnus-ham-process-destinations
22980@acronym{NNTP}, for example, it will be copied. 22916@vindex gnus-spam-process-destinations
22917The second thing that the Spam package does when you exit a group is
22918to move ham articles out of spam groups, and spam articles out of ham
22919groups. Ham in a spam group is moved to the group specified by the
22920variable @code{gnus-ham-process-destinations}, or the group parameter
22921@code{ham-process-destination}. Spam in a ham group is moved to the
22922group specified by the variable @code{gnus-spam-process-destinations},
22923or the group parameter @code{spam-process-destination}. If these
22924variables are not set, the articles are left in their current group.
22925If an article cannot not be moved (e.g., with a read-only backend such
22926as @acronym{NNTP}), it is copied.
22927
22928If an article is moved to another group, it is processed again when
22929you visit the new group. Normally, this is not a problem, but if you
22930want each article to be processed only once, load the
22931@code{gnus-registry.el} package and set the variable
22932@code{spam-log-to-registry} to @code{t}. @xref{Spam Package
22933Configuration Examples}.
22934
22935Normally, spam groups ignore @code{gnus-spam-process-destinations}.
22936However, if you set @code{spam-move-spam-nonspam-groups-only} to
22937@code{nil}, spam will also be moved out of spam groups, depending on
22938the @code{spam-process-destination} parameter.
22939
22940The final thing the Spam package does is to mark spam articles as
22941expired, which is usually the right thing to do.
22981 22942
22982If all this seems confusing, don't worry. Soon it will be as natural 22943If all this seems confusing, don't worry. Soon it will be as natural
22983as typing Lisp one-liners on a neural interface@dots{} err, sorry, that's 22944as typing Lisp one-liners on a neural interface@dots{} err, sorry, that's
2298450 years in the future yet. Just trust us, it's not so bad. 2294550 years in the future yet. Just trust us, it's not so bad.
22985 22946
22986@node Spam ELisp Package Filtering of Incoming Mail 22947@node Filtering Incoming Mail
22987@subsubsection Spam ELisp Package Filtering of Incoming Mail 22948@subsection Filtering Incoming Mail
22988@cindex spam filtering 22949@cindex spam filtering
22989@cindex spam filtering incoming mail 22950@cindex spam filtering incoming mail
22990@cindex spam 22951@cindex spam
22991 22952
22992To use the @file{spam.el} facilities for incoming mail filtering, you 22953To use the Spam package to filter incoming mail, you must first set up
22993must add the following to your fancy split list 22954fancy mail splitting. @xref{Fancy Mail Splitting}. The Spam package
22994@code{nnmail-split-fancy} or @code{nnimap-split-fancy}: 22955defines a special splitting function that you can add to your fancy
22956split variable (either @code{nnmail-split-fancy} or
22957@code{nnimap-split-fancy}, depending on your mail back end):
22995 22958
22996@example 22959@example
22997(: spam-split) 22960(: spam-split)
22998@end example 22961@end example
22999 22962
23000Note that the fancy split may be called @code{nnmail-split-fancy} or 22963@vindex spam-split-group
23001@code{nnimap-split-fancy}, depending on whether you use the nnmail or 22964@noindent
23002nnimap back ends to retrieve your mail. 22965The @code{spam-split} function scans incoming mail according to your
23003 22966chosen spam back end(s), and sends messages identified as spam to a
23004Also, @code{spam-split} will not modify incoming mail in any way. 22967spam group. By default, the spam group is a group named @samp{spam},
23005 22968but you can change this by customizing @code{spam-split-group}. Make
23006The @code{spam-split} function will process incoming mail and send the 22969sure the contents of @code{spam-split-group} are an unqualified group
23007mail considered to be spam into the group name given by the variable 22970name. For instance, in an @code{nnimap} server @samp{your-server},
23008@code{spam-split-group}. By default that group name is @samp{spam}, 22971the value @samp{spam} means @samp{nnimap+your-server:spam}. The value
23009but you can customize @code{spam-split-group}. Make sure the contents 22972@samp{nnimap+server:spam} is therefore wrong---it gives the group
23010of @code{spam-split-group} are an @emph{unqualified} group name, for 22973@samp{nnimap+your-server:nnimap+server:spam}.
23011instance in an @code{nnimap} server @samp{your-server} the value 22974
23012@samp{spam} will turn out to be @samp{nnimap+your-server:spam}. The 22975@code{spam-split} does not modify the contents of messages in any way.
23013value @samp{nnimap+server:spam}, therefore, is wrong and will
23014actually give you the group
23015@samp{nnimap+your-server:nnimap+server:spam} which may or may not
23016work depending on your server's tolerance for strange group names.
23017
23018You can also give @code{spam-split} a parameter,
23019e.g. @code{spam-use-regex-headers} or @code{"maybe-spam"}. Why is
23020this useful?
23021 22976
23022Take these split rules (with @code{spam-use-regex-headers} and 22977@vindex nnimap-split-download-body
23023@code{spam-use-blackholes} set): 22978Note for IMAP users: if you use the @code{spam-check-bogofilter},
22979@code{spam-check-ifile}, and @code{spam-check-stat} spam back ends,
22980you should also set set the variable @code{nnimap-split-download-body}
22981to @code{t}. These spam back ends are most useful when they can
22982``scan'' the full message body. By default, the nnimap back end only
22983retrieves the message headers; @code{nnimap-split-download-body} tells
22984it to retrieve the message bodies as well. We don't set this by
22985default because it will slow @acronym{IMAP} down, and that is not an
22986appropriate decision to make on behalf of the user. @xref{Splitting
22987in IMAP}.
22988
22989You have to specify one or more spam back ends for @code{spam-split}
22990to use, by setting the @code{spam-use-*} variables. @xref{Spam Back
22991Ends}. Normally, @code{spam-split} simply uses all the spam back ends
22992you enabled in this way. However, you can tell @code{spam-split} to
22993use only some of them. Why this is useful? Suppose you are using the
22994@code{spam-use-regex-headers} and @code{spam-use-blackholes} spam back
22995ends, and the following split rule:
23024 22996
23025@example 22997@example
23026 nnimap-split-fancy '(| 22998 nnimap-split-fancy '(|
@@ -23030,21 +23002,23 @@ Take these split rules (with @code{spam-use-regex-headers} and
23030 "mail") 23002 "mail")
23031@end example 23003@end example
23032 23004
23033Now, the problem is that you want all ding messages to make it to the 23005@noindent
23034ding folder. But that will let obvious spam (for example, spam 23006The problem is that you want all ding messages to make it to the ding
23035detected by SpamAssassin, and @code{spam-use-regex-headers}) through, 23007folder. But that will let obvious spam (for example, spam detected by
23036when it's sent to the ding list. On the other hand, some messages to 23008SpamAssassin, and @code{spam-use-regex-headers}) through, when it's
23037the ding list are from a mail server in the blackhole list, so the 23009sent to the ding list. On the other hand, some messages to the ding
23038invocation of @code{spam-split} can't be before the ding rule. 23010list are from a mail server in the blackhole list, so the invocation
23039 23011of @code{spam-split} can't be before the ding rule.
23040You can let SpamAssassin headers supersede ding rules, but all other 23012
23041@code{spam-split} rules (including a second invocation of the 23013The solution is to let SpamAssassin headers supersede ding rules, and
23042regex-headers check) will be after the ding rule: 23014perform the other @code{spam-split} rules (including a second
23015invocation of the regex-headers check) after the ding rule. This is
23016done by passing a parameter to @code{spam-split}:
23043 23017
23044@example 23018@example
23045nnimap-split-fancy 23019nnimap-split-fancy
23046 '(| 23020 '(|
23047 ;; @r{all spam detected by @code{spam-use-regex-headers} goes to @samp{regex-spam}} 23021 ;; @r{spam detected by @code{spam-use-regex-headers} goes to @samp{regex-spam}}
23048 (: spam-split "regex-spam" 'spam-use-regex-headers) 23022 (: spam-split "regex-spam" 'spam-use-regex-headers)
23049 (any "ding" "ding") 23023 (any "ding" "ding")
23050 ;; @r{all other spam detected by spam-split goes to @code{spam-split-group}} 23024 ;; @r{all other spam detected by spam-split goes to @code{spam-split-group}}
@@ -23053,58 +23027,68 @@ nnimap-split-fancy
23053 "mail") 23027 "mail")
23054@end example 23028@end example
23055 23029
23030@noindent
23056This lets you invoke specific @code{spam-split} checks depending on 23031This lets you invoke specific @code{spam-split} checks depending on
23057your particular needs, and to target the results of those checks to a 23032your particular needs, and target the results of those checks to a
23058particular spam group. You don't have to throw all mail into all the 23033particular spam group. You don't have to throw all mail into all the
23059spam tests. Another reason why this is nice is that messages to 23034spam tests. Another reason why this is nice is that messages to
23060mailing lists you have rules for don't have to have resource-intensive 23035mailing lists you have rules for don't have to have resource-intensive
23061blackhole checks performed on them. You could also specify different 23036blackhole checks performed on them. You could also specify different
23062spam checks for your nnmail split vs. your nnimap split. Go crazy. 23037spam checks for your nnmail split vs. your nnimap split. Go crazy.
23063 23038
23064You should still have specific checks such as 23039You should set the @code{spam-use-*} variables for whatever spam back
23065@code{spam-use-regex-headers} set to @code{t}, even if you 23040ends you intend to use. The reason is that when loading
23066specifically invoke @code{spam-split} with the check. The reason is 23041@file{spam.el}, some conditional loading is done depending on what
23067that when loading @file{spam.el}, some conditional loading is done 23042@code{spam-use-xyz} variables you have set. @xref{Spam Back Ends}.
23068depending on what @code{spam-use-xyz} variables you have set. This 23043
23069is usually not critical, though. 23044@c @emph{TODO: spam.el needs to provide a uniform way of training all the
23070 23045@c statistical databases. Some have that functionality built-in, others
23071@emph{Note for IMAP users} 23046@c don't.}
23072 23047
23073The boolean variable @code{nnimap-split-download-body} needs to be 23048@node Detecting Spam in Groups
23074set, if you want to split based on the whole message instead of just 23049@subsection Detecting Spam in Groups
23075the headers. By default, the nnimap back end will only retrieve the 23050
23076message headers. If you use @code{spam-check-bogofilter}, 23051To detect spam when visiting a group, set the group's
23077@code{spam-check-ifile}, or @code{spam-check-stat} (the splitters that 23052@code{spam-autodetect} and @code{spam-autodetect-methods} group
23078can benefit from the full message body), you should set this variable. 23053parameters. These are accessible with @kbd{G c} or @kbd{G p}, as
23079It is not set by default because it will slow @acronym{IMAP} down, and 23054usual (@pxref{Group Parameters}).
23080that is not an appropriate decision to make on behalf of the user. 23055
23081 23056You should set the @code{spam-use-*} variables for whatever spam back
23082@xref{Splitting in IMAP}. 23057ends you intend to use. The reason is that when loading
23083 23058@file{spam.el}, some conditional loading is done depending on what
23084@emph{TODO: spam.el needs to provide a uniform way of training all the 23059@code{spam-use-xyz} variables you have set.
23085statistical databases. Some have that functionality built-in, others 23060
23086don't.} 23061By default, only unseen articles are processed for spam. You can
23087 23062force Gnus to recheck all messages in the group by setting the
23088@node Spam ELisp Package Global Variables 23063variable @code{spam-autodetect-recheck-messages} to @code{t}.
23089@subsubsection Spam ELisp Package Global Variables 23064
23065If you use the @code{spam-autodetect} method of checking for spam, you
23066can specify different spam detection methods for different groups.
23067For instance, the @samp{ding} group may have @code{spam-use-BBDB} as
23068the autodetection method, while the @samp{suspect} group may have the
23069@code{spam-use-blacklist} and @code{spam-use-bogofilter} methods
23070enabled. Unlike with @code{spam-split}, you don't have any control
23071over the @emph{sequence} of checks, but this is probably unimportant.
23072
23073@node Spam and Ham Processors
23074@subsection Spam and Ham Processors
23090@cindex spam filtering 23075@cindex spam filtering
23091@cindex spam filtering variables 23076@cindex spam filtering variables
23092@cindex spam variables 23077@cindex spam variables
23093@cindex spam 23078@cindex spam
23094 23079
23095@vindex gnus-spam-process-newsgroups 23080@vindex gnus-spam-process-newsgroups
23096The concepts of ham processors and spam processors are very important. 23081Spam and ham processors specify special actions to take when you exit
23097Ham processors and spam processors for a group can be set with the 23082a group buffer. Spam processors act on spam messages, and ham
23098@code{spam-process} group parameter, or the 23083processors on ham messages. At present, the main role of these
23099@code{gnus-spam-process-newsgroups} variable. Ham processors take 23084processors is to update the dictionaries of dictionary-based spam back
23100mail known to be non-spam (@emph{ham}) and process it in some way so 23085ends such as Bogofilter (@pxref{Bogofilter}) and the Spam Statistics
23101that later similar mail will also be considered non-spam. Spam 23086package (@pxref{Spam Statistics Filtering}).
23102processors take mail known to be spam and process it so similar spam 23087
23103will be detected later. 23088The spam and ham processors that apply to each group are determined by
23104 23089the group's@code{spam-process} group parameter. If this group
23105The format of the spam or ham processor entry used to be a symbol, 23090parameter is not defined, they are determined by the variable
23106but now it is a @sc{cons} cell. See the individual spam processor entries 23091@code{gnus-spam-process-newsgroups}.
23107for more information.
23108 23092
23109@vindex gnus-spam-newsgroup-contents 23093@vindex gnus-spam-newsgroup-contents
23110Gnus learns from the spam you get. You have to collect your spam in 23094Gnus learns from the spam you get. You have to collect your spam in
@@ -23258,8 +23242,8 @@ When autodetecting spam, this variable tells @code{spam.el} whether
23258only unseen articles or all unread articles should be checked for 23242only unseen articles or all unread articles should be checked for
23259spam. It is recommended that you leave it off. 23243spam. It is recommended that you leave it off.
23260 23244
23261@node Spam ELisp Package Configuration Examples 23245@node Spam Package Configuration Examples
23262@subsubsection Spam ELisp Package Configuration Examples 23246@subsection Spam Package Configuration Examples
23263@cindex spam filtering 23247@cindex spam filtering
23264@cindex spam filtering configuration examples 23248@cindex spam filtering configuration examples
23265@cindex spam configuration examples 23249@cindex spam configuration examples
@@ -23384,11 +23368,11 @@ bogofilter or DCC).
23384 23368
23385Because of the @code{gnus-group-spam-classification-spam} entry, all 23369Because of the @code{gnus-group-spam-classification-spam} entry, all
23386messages are marked as spam (with @code{$}). When I find a false 23370messages are marked as spam (with @code{$}). When I find a false
23387positive, I mark the message with some other ham mark (@code{ham-marks}, 23371positive, I mark the message with some other ham mark
23388@ref{Spam ELisp Package Global Variables}). On group exit, those 23372(@code{ham-marks}, @ref{Spam and Ham Processors}). On group exit,
23389messages are copied to both groups, @samp{INBOX} (where I want to have 23373those messages are copied to both groups, @samp{INBOX} (where I want
23390the article) and @samp{training.ham} (for training bogofilter) and 23374to have the article) and @samp{training.ham} (for training bogofilter)
23391deleted from the @samp{spam.detected} folder. 23375and deleted from the @samp{spam.detected} folder.
23392 23376
23393The @code{gnus-article-sort-by-chars} entry simplifies detection of 23377The @code{gnus-article-sort-by-chars} entry simplifies detection of
23394false positives for me. I receive lots of worms (sweN, @dots{}), that all 23378false positives for me. I receive lots of worms (sweN, @dots{}), that all
@@ -23424,6 +23408,29 @@ through my local news server (leafnode). I.e. the article numbers are
23424not the same as on news.gmane.org, thus @code{spam-report.el} has to check 23408not the same as on news.gmane.org, thus @code{spam-report.el} has to check
23425the @code{X-Report-Spam} header to find the correct number. 23409the @code{X-Report-Spam} header to find the correct number.
23426 23410
23411@node Spam Back Ends
23412@subsection Spam Back Ends
23413@cindex spam back ends
23414
23415The spam package offers a variety of back ends for detecting spam.
23416Each back end defines a set of methods for detecting spam
23417(@pxref{Filtering Incoming Mail}, @pxref{Detecting Spam in Groups}),
23418and a pair of spam and ham processors (@pxref{Spam and Ham
23419Processors}).
23420
23421@menu
23422* Blacklists and Whitelists::
23423* BBDB Whitelists::
23424* Gmane Spam Reporting::
23425* Anti-spam Hashcash Payments::
23426* Blackholes::
23427* Regular Expressions Header Matching::
23428* Bogofilter::
23429* ifile spam filtering::
23430* Spam Statistics Filtering::
23431* SpamOracle::
23432@end menu
23433
23427@node Blacklists and Whitelists 23434@node Blacklists and Whitelists
23428@subsubsection Blacklists and Whitelists 23435@subsubsection Blacklists and Whitelists
23429@cindex spam filtering 23436@cindex spam filtering
@@ -23728,6 +23735,15 @@ You should not enable this if you use @code{spam-use-bogofilter-headers}.
23728 23735
23729@end defvar 23736@end defvar
23730 23737
23738@table @kbd
23739@item M s t
23740@itemx S t
23741@kindex M s t
23742@kindex S t
23743@findex spam-bogofilter-score
23744Get the Bogofilter spamicity score (@code{spam-bogofilter-score}).
23745@end table
23746
23731@defvar spam-use-bogofilter-headers 23747@defvar spam-use-bogofilter-headers
23732 23748
23733Set this variable if you want @code{spam-split} to use Eric Raymond's 23749Set this variable if you want @code{spam-split} to use Eric Raymond's
@@ -23829,20 +23845,21 @@ purpose. A ham and a spam processor are provided, plus the
23829should be used. The 1.2.1 version of ifile was used to test this 23845should be used. The 1.2.1 version of ifile was used to test this
23830functionality. 23846functionality.
23831 23847
23832@node spam-stat spam filtering 23848@node Spam Statistics Filtering
23833@subsubsection spam-stat spam filtering 23849@subsubsection Spam Statistics Filtering
23834@cindex spam filtering 23850@cindex spam filtering
23835@cindex spam-stat, spam filtering 23851@cindex spam-stat, spam filtering
23836@cindex spam-stat 23852@cindex spam-stat
23837@cindex spam 23853@cindex spam
23838 23854
23839@xref{Filtering Spam Using Statistics with spam-stat}. 23855This back end uses the Spam Statistics Emacs Lisp package to perform
23856statistics-based filtering (@pxref{Spam Statistics Package}). Before
23857using this, you may want to perform some additional steps to
23858initialize your Spam Statistics dictionary. @xref{Creating a
23859spam-stat dictionary}.
23840 23860
23841@defvar spam-use-stat 23861@defvar spam-use-stat
23842 23862
23843Enable this variable if you want @code{spam-split} to use
23844spam-stat.el, an Emacs Lisp statistical analyzer.
23845
23846@end defvar 23863@end defvar
23847 23864
23848@defvar gnus-group-spam-exit-processor-stat 23865@defvar gnus-group-spam-exit-processor-stat
@@ -23902,18 +23919,17 @@ One possibility is to run SpamOracle as a @code{:prescript} from the
23902@xref{Mail Source Specifiers}, (@pxref{SpamAssassin}). This method has 23919@xref{Mail Source Specifiers}, (@pxref{SpamAssassin}). This method has
23903the advantage that the user can see the @emph{X-Spam} headers. 23920the advantage that the user can see the @emph{X-Spam} headers.
23904 23921
23905The easiest method is to make @file{spam.el} (@pxref{Filtering Spam 23922The easiest method is to make @file{spam.el} (@pxref{Spam Package})
23906Using The Spam ELisp Package}) call SpamOracle. 23923call SpamOracle.
23907 23924
23908@vindex spam-use-spamoracle 23925@vindex spam-use-spamoracle
23909To enable SpamOracle usage by @file{spam.el}, set the variable 23926To enable SpamOracle usage by @file{spam.el}, set the variable
23910@code{spam-use-spamoracle} to @code{t} and configure the 23927@code{spam-use-spamoracle} to @code{t} and configure the
23911@code{nnmail-split-fancy} or @code{nnimap-split-fancy} as described in 23928@code{nnmail-split-fancy} or @code{nnimap-split-fancy}. @xref{Spam
23912the section @xref{Filtering Spam Using The Spam ELisp Package}. In 23929Package}. In this example the @samp{INBOX} of an nnimap server is
23913this example the @samp{INBOX} of an nnimap server is filtered using 23930filtered using SpamOracle. Mails recognized as spam mails will be
23914SpamOracle. Mails recognized as spam mails will be moved to 23931moved to @code{spam-split-group}, @samp{Junk} in this case. Ham
23915@code{spam-split-group}, @samp{Junk} in this case. Ham messages stay 23932messages stay in @samp{INBOX}:
23916in @samp{INBOX}:
23917 23933
23918@example 23934@example
23919(setq spam-use-spamoracle t 23935(setq spam-use-spamoracle t
@@ -23945,14 +23961,14 @@ database to live somewhere special, set
23945 23961
23946SpamOracle employs a statistical algorithm to determine whether a 23962SpamOracle employs a statistical algorithm to determine whether a
23947message is spam or ham. In order to get good results, meaning few 23963message is spam or ham. In order to get good results, meaning few
23948false hits or misses, SpamOracle needs training. SpamOracle learns the 23964false hits or misses, SpamOracle needs training. SpamOracle learns
23949characteristics of your spam mails. Using the @emph{add} mode 23965the characteristics of your spam mails. Using the @emph{add} mode
23950(training mode) one has to feed good (ham) and spam mails to 23966(training mode) one has to feed good (ham) and spam mails to
23951SpamOracle. This can be done by pressing @kbd{|} in the Summary buffer 23967SpamOracle. This can be done by pressing @kbd{|} in the Summary
23952and pipe the mail to a SpamOracle process or using @file{spam.el}'s 23968buffer and pipe the mail to a SpamOracle process or using
23953spam- and ham-processors, which is much more convenient. For a 23969@file{spam.el}'s spam- and ham-processors, which is much more
23954detailed description of spam- and ham-processors, @xref{Filtering Spam 23970convenient. For a detailed description of spam- and ham-processors,
23955Using The Spam ELisp Package}. 23971@xref{Spam Package}.
23956 23972
23957@defvar gnus-group-spam-exit-processor-spamoracle 23973@defvar gnus-group-spam-exit-processor-spamoracle
23958Add this symbol to a group's @code{spam-process} parameter by 23974Add this symbol to a group's @code{spam-process} parameter by
@@ -24001,8 +24017,8 @@ the user marks some messages as spam messages, these messages will be
24001processed by SpamOracle. The processor sends the messages to 24017processed by SpamOracle. The processor sends the messages to
24002SpamOracle as new samples for spam. 24018SpamOracle as new samples for spam.
24003 24019
24004@node Extending the Spam ELisp package 24020@node Extending the Spam package
24005@subsubsection Extending the Spam ELisp package 24021@subsection Extending the Spam package
24006@cindex spam filtering 24022@cindex spam filtering
24007@cindex spam elisp package, extending 24023@cindex spam elisp package, extending
24008@cindex extending the spam elisp package 24024@cindex extending the spam elisp package
@@ -24109,9 +24125,8 @@ to the @code{spam-autodetect-methods} group parameter in
24109 24125
24110@end enumerate 24126@end enumerate
24111 24127
24112 24128@node Spam Statistics Package
24113@node Filtering Spam Using Statistics with spam-stat 24129@subsection Spam Statistics Package
24114@subsection Filtering Spam Using Statistics with spam-stat
24115@cindex Paul Graham 24130@cindex Paul Graham
24116@cindex Graham, Paul 24131@cindex Graham, Paul
24117@cindex naive Bayesian spam filtering 24132@cindex naive Bayesian spam filtering
@@ -24138,7 +24153,11 @@ non-spam mail. Use the 15 most conspicuous words, compute the total
24138probability of the mail being spam. If this probability is higher 24153probability of the mail being spam. If this probability is higher
24139than a certain threshold, the mail is considered to be spam. 24154than a certain threshold, the mail is considered to be spam.
24140 24155
24141Gnus supports this kind of filtering. But it needs some setting up. 24156The Spam Statistics package adds support to Gnus for this kind of
24157filtering. It can be used as one of the back ends of the Spam package
24158(@pxref{Spam Package}), or by itself.
24159
24160Before using the Spam Statistics package, you need to set it up.
24142First, you need two collections of your mail, one with spam, one with 24161First, you need two collections of your mail, one with spam, one with
24143non-spam. Then you need to create a dictionary using these two 24162non-spam. Then you need to create a dictionary using these two
24144collections, and save it. And last but not least, you need to use 24163collections, and save it. And last but not least, you need to use
@@ -24224,8 +24243,10 @@ The filename used to store the dictionary. This defaults to
24224@node Splitting mail using spam-stat 24243@node Splitting mail using spam-stat
24225@subsubsection Splitting mail using spam-stat 24244@subsubsection Splitting mail using spam-stat
24226 24245
24227In order to use @code{spam-stat} to split your mail, you need to add the 24246This section describes how to use the Spam statistics
24228following to your @file{~/.gnus.el} file: 24247@emph{independently} of the @xref{Spam Package}.
24248
24249First, add the following to your @file{~/.gnus.el} file:
24229 24250
24230@lisp 24251@lisp
24231(require 'spam-stat) 24252(require 'spam-stat)