Mollom:
Under the
Hood
The initial
version of
the back
end for
Mollom, a
Web service that keeps invading
hordes of spambots at bay, was
based on a custom Java SE–based
XML-RPC server. To allow increased
focus on Mollom’s core Machine
Learning technology, Mollom co-
creators Dries Buytaert and Benja-
min Schrauwen decided to switch to
a Java EE 6 back end, hosted on a
GlassFish Server Open Source
Edition 3. 1. 1 cluster. The Mollom
team now extensively uses several
core technologies such as JAX-RS
with Jersey, the Java Persistence
API, and Enterprise JavaBeans 3. 1
(both singleton and stateless
session beans). This has enabled
the team to quickly develop an
extended REST version of their API.
For storage, Buytaert and Schrau-
wen chose a Cassandra cluster,
because the back end is very write-
intensive. Cassandra also allows the
Mollom team to easily scale the
storage layer across multiple data
centers. Currently Mollom is running
on dedicated machines, and to
further reduce maintenance work,
Buytaert and Schrauwen plan to
move the infrastructure to Amazon’s
cloud environment, and to take
advantage of the upcoming cloud
features in GlassFish Server Open
Source Edition.
Buytaert says. “We know that’s
not human behavior. It helps us
identify and block these malicious
automated programs.”
By intelligently combining all
these techniques and only using
the CAPTCHA as a final test to
weed out the small number of
false positives and false negatives,
Mollom provides the best-possible
quality in terms of blocking spam
while delivering the best-possible
user experience for visitors to the
sites it protects.
Buytaert clears his head
by Mollom’s “campfire,”
a place where staffers can
brainstorm or just relax.
ENLISTING JAVA
Mollom is a real-time service,
eliminating spam, filtering profanity, and safeguarding all kinds
of sites—from small blogs that
may only get one spam message
per week to large social networks
such as Europe’s Netlog, with its
85 million registered users, and
large media companies such as
Sony Music Entertainment, which
host hundreds of popular artists’ sites. When someone makes
a comment on a protected site,
that site sends the comment to
Mollom. Mollom analyzes the
content and then returns a reply
to the site. “It’s important for
us to have a low latency so we
can process complex statistical
analysis in milliseconds,” Buytaert
explains. “Visitors to Websites
don’t want to wait after clicking
Submit to see if their comments
were accepted. To accomplish
this, we need a technology
that can keep a lot of data in
persistent memory so we can
instantly classify content.”
When Buytaert created
Mollom with his partner,
Benjamin Schrauwen, they
initially built everything from
scratch. But as Mollom grew
to the point where it was
handling millions of mes-
sages per day, Buytaert and
Schrauwen found that they
were spending too much
time on infrastructure-
related issues. That’s when
they switched to GlassFish
Server Open Source Edition.
“We migrated to GlassFish
because it let us worry less
about memory management,
XML parsing, database connection pooling,
REST handling, and a lot of other factors that
make big information systems work,” Buytaert
says. “It lets us scale that runtime environment
more effectively. By migrating from our home-
grown Java-based solution to GlassFish, we
freed ourselves to focus more on the domain-
specific challenges of Mollom. It’s a great fit.”
Buytaert says new users can get Mollom
up and running within minutes. They simply
create an account on mollom.com, install a
plug-in, and enter public and private keys.
“The volume of spam comments we block
is one measure of Mollom’s success, but the
JAVA TECH
ABOUT US
quality of our service is equally important,”
Buytaert says. “Our efficiency rate is currently 99.96 percent. That means if we see
10,000 messages, we make four mistakes.
We know we are helping to keep the Web
a little bit cleaner. In business terms, that
means Website owners can spend a lot less
time worrying about the content that doesn’t
belong and more time creating valuable services for their user base. </article>
blog
David Baum and Ed Baum are freelance writers
specializing in innovative businesses, emerging technologies, and compelling lifestyles.