Index of /pub/CPAN/authors/id/P/PA/PALVARO

 Name                       Last modified      Size  Description
 Parent Directory                                -   
 Bloom-Faster-1.3.1.meta    2007-03-17 04:08  312   
 Bloom-Faster-1.3.1.readme  2007-03-10 08:16  2.5K  
 Bloom-Faster-1.3.1.tar.gz  2007-03-17 15:25  8.5M  
 Bloom-Faster-1.3.meta      2007-02-23 14:48  310   
 Bloom-Faster-1.3.readme    2007-02-23 14:45  2.5K  
 Bloom-Faster-1.3.tar.gz    2007-02-23 14:51  477K  
 Bloom-Faster-1.4.meta      2007-03-17 16:44  310   
 Bloom-Faster-1.4.readme    2007-03-10 08:16  2.5K  
 Bloom-Faster-1.4.tar.gz    2007-03-17 16:48  602K  
 Bloom-Faster-1.6.2.meta    2010-06-13 06:05  312   
 Bloom-Faster-1.6.2.readme  2009-06-22 09:19  2.5K  
 Bloom-Faster-1.6.2.tar.gz  2010-06-13 06:16   21K  
 Bloom-Faster-1.6.meta      2009-06-23 11:41  307   
 Bloom-Faster-1.6.readme    2009-06-22 09:31  2.5K  
 Bloom-Faster-1.6.tar.gz    2009-06-23 11:42   22K  
 Bloom-Faster-1.7.meta      2010-06-13 07:06  310   
 Bloom-Faster-1.7.readme    2009-06-22 09:19  2.5K  
 Bloom-Faster-1.7.tar.gz    2010-06-13 07:17   21K  
 CHECKSUMS                  2021-11-22 07:55  4.5K  
 README                     2007-02-24 02:54  2.5K

NAME
    Bloom::Faster - Perl extension for the c library libbloom.

INSTALLATION
    see INSTALL

SYNOPSIS
      use Bloom::Faster;
  
      # m = ideal vector size.  
      # k = # of hash functions to use. 

      my $bloom = new Bloom::Faster({m => 1000000,k => 5});

      # this gives us very tight control of memory usage (a function of m)
      # and performance (a function of k).  but in most applications, we won't
      # know the optimal values of either of these.  for these cases, it is 
      # much easier to supply:
      #
      # n = number of expected elements to check for duplicates,
      # e = acceptable error rate (probability of false positive)
      #
      # my $bloom = new Bloom::Faster({n => 1000000, e => 0.00001});

      while (<>) {
            chomp;
            # Bloom::Faster->add() returns true when the value is a duplicate.
            if ($bloom->add($_)) {
                    print "DUP: $_\n";
            }
      }

DESCRIPTION
    Bloom filters are a lightweight duplicate detection algorithm proposed
    by Burton Bloom
    (http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal), with
    applications in stream data processing, among others. Bloom filters are
    a very cool thing. Where occasional false positives are acceptable,
    bloom filters give us the ability to detect duplicates in a fast and
    resource-friendly manner.

    The allocation of memory for the bit vector is handled in the c layer,
    but perl's oo capability handles the garbage collection. when a
    Bloom::Faster object goes out of scope, the vector pointed to by the c
    structure will be free()d. to manually do this, the DESTROY builtin
    method can be called.

    A bloom filter perl module is currently avaible on CPAN, but it is
    profoundly slow and cannot handle large vectors. This alternative uses a
    more efficient c library which can handle arbitrarily large vectors (up
    to the maximum size of a "long long" datatype (at least
    9223372036854775807, on supported systems ).

  EXPORT
    None by default.

  Exportable constants
      HASHCNT
      PRIME_SIZ
      SIZ

SEE ALSO
    libbbloom.so

AUTHOR
    Peter Alvaro and Dmitriy Ryaboy, <palvaro@ask.com>

COPYRIGHT AND LICENSE
    Copyright (C) 2006 by Peter Alvaro and Dmitriy Ryaboy

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself, either Perl version 5.8.5 or, at
    your option, any later version of Perl 5 you may have available.