Introducing Drools 5

Brian Sam-Bodden
  • August 2009
  • Java

A Java Rule Engine for the Rest of Us

For most Java developers the idea of using a Rule Engine evokes thoughts of vendors in suits selling their bosses a complex and expensive piece of software they don’t need and the introduction of something completely foreign and intrusive to their code base. Drools 5 (http://www.jboss.org/drools/) aims to change this perception by bridging the gap between the Java developer and world of Rule-based systems.

The Quick and Dirty on Rule Engines

The simplest explanation of what a rule engine is that is a very efficient pattern matcher. It matches data, referred to as “facts” against rules. Rules are simple if-then constructs that operate on the matched data. For example, imagine an example application in which the data is a loan application containing the credit score for the loan applicant. A rule could express a credit score requirement for a loan such as:

"A loan application for ACME loans for which the loan applicant has a credit score lower than 680 will be rejected"

Mortgage: Represents a loan product belonging to a lender. It contains amongst other data, the name of the lender providing the loan.

  • LoanApplication: Represents an application for a specific loan product. It contains information specific to the applicant such as the credit score and the lender associated with the application.
  • RejectionNotice: Represents a communication to the applicant that the application has been rejected
rule "LowCreditScoreRejection"
dialect "java"
  when
    mortgage:Mortgage(
      lender:lenderName == "ACME"
    )
    application:LoanApplication(
      lender == mortgage.lenderName,
      score:creditScore < 680
    )
  then
  application.reject("the score " + score + " is too low. \n" +
                     "A credit score of at least 680 is required");
  insert(new RejectionNotice(application));
end

Listing SAM-1 A simple rule

The rule named “LowCreditScoreRejection” is a typical Drools Rule Language (DRL) rule. It has two parts; the “when” part, also known as the “predicate”, “premise”, “condition” or simply as the “Left-hand side” (LHS for short) and the “then” part or “consequence”, “action”, “conclusion” or “Right-hand side” (RHS)

The Rule Condition

The “when” part or rule condition determines the patterns to be matched. That is, the types and characteristics of the objects that will activate the rule. In the example shown, we are looking to match two objects, an object of type Mortgage and an object of type LoanApplication. The “mortage” object must have a lenderName (mortgage.getLender) equals to “ACME” and the “application” must have a matching lender name and a creditScore (application.getCreditScore) that is less than 680.

As you can see this rule only gets evaluated if there are two objects of the aforementioned types present and it is only activated if those two objects properties match the conditions in parenthesis.

An observant Java developer will notice that the rule sort of looks like Java code but not quite. The when part list the classes of the objects that must be present and the values of the properties of those instances needed to activate the rule. The when part is purely a pattern matching expression using first order logic. You can think of the when part as the “where” clause in a SQL statement. Just like in a SQL statement you can create aliases for the objects being matches (and their properties). In the “LowCreditScoreRejection” rule we have three aliases “mortgage”, “lender”, “application” and “score”. Once you have aliased a matched object that object can be used somewhere else in the rule condition and also in the rule consequence.

The Rule Consequence

We learned that the rule condition part of the rule determines what pattern of objects that will activate the rule. The “then” part or rule consequence is what happens when the conditions set forth in the “when” part are met. In Drools the consequence is simply a block of Java code. In the “LowCreditScoreRejection” rule example there are two things happening in the “then” part. First we are calling the “reject” method on the application and passing a message telling them why the application is being rejected (notice the use of the alias “score”). Next and last we are creating a new object of type RejectionNotice, passing the application object in the constructor and then passing the newly created object to the insert method. The insert method is a Drools working memory method that tells the rule engine that there is a new object that should be considered when evaluating the rules. In this case the expectation is that there will be another rule that has a condition expecting objects of type RejectionNotice and that will act upon them.

The Rule Engine

From the simple example of the “LowCreditScoreRejection” rule we see that a rule engine is a system that matches facts (our data objects) against rules. The rules are then used to infer conclusions about the data. In the example, the conclusions inferred were rejection of the loan application and the creation of the rejection notice. This type of system is what is referred to as a data-driven forward chaining reasoning system. At the heart of the system is an “inference engine”; the component that does the pattern matching, activates the rules and determines how to execute the activated rules. This process is typically referred to as truth maintenance. Under the covers Drools uses a custom version of the popular Rete algorithm (see http://en.wikipedia.org/wiki/Rete_algorithm). As we can see the “forward chaining” process starts with the available facts and uses the rules to infer more facts (such as RejectionNotice) until a desired goal as been reached (determining whether a loan application is approved or rejected and communicating the rejections).

Main Drools Classes

The rules like “LowCreditScoreRejection” rule are contained in DRL files (files with the extension .drl) that comply with Drools native rule language syntax. To use the created rules in a Java application you must:

  1. KnowledgeBuilder: A knowledge builder is the class that knows how to load and process DRL files. The impact of parsing and building in memory objects from the DRL file happens at this point. An instance of a knowledge builder can be obtained from the KnowledgeBuilderFactory using the newKnowledgeBuilder() method
  2. KnowledgePackage: A knowledge package is the result of processing a DRL file; a serializable object. The knowledge builder has a getKnowledgePackages() that returns a collection of knowledge packages.
  3. KnowledgeBase: A knowledge base is where the knowledge packages are deployed to so that they can be used at runtime. The knowledge base is the object that will be used to create knowledge sessions. This knowledge base is a thread safe construct that serves as a factory of sessions.
  4. KnowledgeSession: The knowledge session is the object used to interact with the rule engine. It is throught a session that you will make the rule engine aware of the facts (the data) that the rule engine will evaluate against the available rules. Inside the Rule Engine

Inside the rule engine, after you have inserted your facts into a knowledge session and invoked the session fireAllRules() method:

  1. Rules are matched against the facts in the knowledge session using the pattern matcher (Rete algorithm)
  2. The unordered list of activations (the Rules that apply based on the facts) is ordered internally (by a conflict resolver) in the session’s agenda
  3. The first Rule in the agenda is the executed, possibly triggering steps 1 and 2 again

This loop is how a Rule Engine infers knowledge from existing facts using rules. You can see that a Rule Engine using pattern matching to reduce/transform the problem space to arrive at a set of facts that can be considered a solution to the problem at hand.

Getting Started with Drools 5: Classifying Your Twitter Network

Rule Engine development introduces enough conceptual complexity (mainly inherited from the A.I. lingo and academia) that feels fairly unapproachable to us Java developers. So, let tackle a simple but yet representative problem using Drools 5.

Ranking and Classifying Twitter Users

Recently Twitter has become the darling of the social media applications. Classified by some as micro blogging, Twitter’s goal is for users to constantly answer the question “What are you doing?” Of course, the intended usage of a tool by its creators has no bearing on how people will actually use the tool. As an active Twitter user it is especially annoying to deal with those attempting to exploit the tool in ways detrimental to other users experience.

Yes, I am speaking about spammers! Recently blogger Allan Young’s wrote about what he termed the “Twitter Influence Ratio” (http://allantyoung.com). In that blog entry he wrote about a simple way to measure a Twitter user’s influence as the ratio of the number of followers to the number of people the user is following. The article talks about the inexactitude of the measurement for certain notable users such as Robert Scoble.

Based on the twitter influence ratio, Evan Prodromou came up with a simple scale to classify twitter users:

  • 1:5 => Twitter Caster
  • 1:2 => Notable
  • 1:1 => Socially Healthy
  • 2:1 => Newbie
  • 5:1 => Twitter Spammer

Twitter Drools 5.0 Ranking / Classifier Tool

The goal of our Drools 5 project will be to corroborate or repudiate the classification above by also providing a number than can be used to further narrow down the classification. Reading of the possible ways people judge whether a user is a potential spammer (see references) I’ve collected what I think are a small set of rules that can help us narrow down a Twitter user classification:

  • User has no picture”: A characteristic of spammers (but also of newbies) is that they don’t have a picture set other than the “brownie” icon that Twitter uses by default. This rule will subtract 30 points from the user ranking if he or she doesn’t have a picture set.
  • “Follower with no mutual followers”: If a user follows you but none of your followers also follow he or she. This at best seems to be an indication that the user following is a total different social circle. This rule will subtract 10 points.
  • "Follower no being followed back”: If a user follows you but you are not following them back. This will indicate that the interest is one sided (of course you could also have forgotten to follow back). This rule will subtract 5 points.
  • “Inactive follower”: If a user follows you but has not produced any tweets in the last 30 days, we will consider that user inactive and subtract 15 points.
  • “Low Activity follower”, “Medium activity follower”, “High activity follower” and “Hyper activity follower”: These four rules will check the average tweets per day for a user and classify them based on the following ranges: 0.0-0.5 is Low, 0.5-3.0 is Medium, 3.0-5.0 is High, > 5.0 is Hyper. The activity rules will add/subtract -5, +5, +10, +15 points respectively.

You might be asking how scientific or statistically accurate the rules above are, and the answer is: “I haven’t a clue”. These rules are exploratory, just a learning algorithm the rule author can use rules to discover hidden patterns in the data. What I’m attempting to do here to set a framework that can be easily tweaked and enhanced.

Interacting with the Twitter API

One of the big decisions you’ll face with implementing a rule-based system with Drools is how much Java to put in your rules. As with any other object-oriented application we want encapsulate complex behavior to make our rules more readable. There are also things that are much easily accomplished, tested and developed outside of the realm of the rules engine.

To deal with the interaction with Twitter I decided to use Twitter4J (http://yusuke.homeip.net/twitter4j), a Java library to interact with the Twitter API. Using Twitter4J I created a simple collection of static utility methods contained in the class TwitterUtils.java. Some of the available methods are:

  • Double getTwitterInfluenceRatio(User user)
  • Boolean hasSetProfileImage(User user)
  • Double averageTweetsPerDay(User user)
  • Boolean inactiveForTheLast(User user, Integer days)
  • Integer followersInCommon(Twitter twitter, User target)
  • Boolean isFollowing(Twitter twitter, String target)

Twitter4J provides the classes twitter4j.Twitter which represents the authenticated Twitter user and twitter4j.User which represents detailed information about a Twitter user.

Classifying Users Using the Twitter Influence Ratio

Implementing a Rule Based system is all about choices and trade-offs. The first rule we’ll develop is an example of such a trade off. The rule is more of a utility rule to extract and classify the users. To accomplish this I’m using a simple Java enumeration called TwitterUserType that contains a static method to return the right enumeration value given a twitter user’s influence ratio:

public enum TwitterUserType {
  UNCLASSIFIED      (Double.MIN_VALUE, 0.0),
  TWITTER_CASTER    (0.0, 0.2),
  NOTABLE           (0.2, 0.5),
  SOCIALLY_HEALTY   (0.5, 1.0),
  NEWBIE            (1.0, 2.0),
  POTENTIAL_SPAMMER (2.0, Double.MAX_VALUE); 
          
  private Double low, high;
          
  TwitterUserType(Double low, Double high) {
    this.low = low;
    this.high = high;
  }
          
  public static TwitterUserType getType(Double influenceRatio) {
    for (TwitterUserType userType : EnumSet.range(TWITTER_CASTER, POTENTIAL_SPAMMER)) {
      if ((influenceRatio > userType.low) &&
      (influenceRatio <= userType.high)) {
        return userType;
      }
    }
          
    return UNCLASSIFIED;
  }
}

Listing SAM-2 A Java Enum to classify Twitter Users

declare Follower
  user : User
  classification : TwitterUserType
  follows : Twitter
  hasPicture : Boolean
  followedBack : Boolean
  inactive : Boolean
  averageTweetsPerDay : Double
  followersInCommon : Integer
  followeesInCommon : Integer
  ranking : Double
end

Listing SAM-3 A DRL custom data type

In Listing SAM-3 we declare a simple POJO called Follower that will contain some of the metrics used by the rules. The Rule “Extract and classify followers” will match any object of type Twitter (the Twitter4J class representing the authenticated Twitter user), extract its followers and for each of the follower it will create a Follower object and set its values using the static methods in TwitterUtils. Each object created will be then inserted into the knowledge session using the insert method.

rule "Extract and classify followers"
dialect "java"
when
  twitter : Twitter()
then
  for (User user : twitter.getFollowers()) {
    Follower follower = new Follower();
    follower.setUser(user);
    follower.setFollows(twitter);
    follower.setClassification(TwitterUserType.getType(TwitterUtils.getTwitterInfluenceRatio(user)));
    follower.setHasPicture(TwitterUtils.hasSetProfileImage(user));
    follower.setFollowedBack(TwitterUtils.isFollowing(twitter, user));
    follower.setInactive(TwitterUtils.inactiveForTheLast(user, 30));
    follower.setAverageTweetsPerDay(TwitterUtils.averageTweetsPerDay(user));
    follower.setFollowersInCommon(TwitterUtils.followersInCommon(twitter, user));
    follower.setFolloweesInCommon(TwitterUtils.followingInCommon(twitter, user));
    follower.setRanking(0.00);
    logger.info("Inserting follower => " + user.getScreenName());
    insert(follower);
  }
end

Listing SAM-4 A utility rule to extract and classify Twitter followers

One of things that you’ll discover early on is that your application domain objects might not be well suited to be used as rule engine facts. In the case of this simple tool, having an simple data object such as the DRL specific Follower object makes the rule creation simpler and consequently makes the rules much more readable and easy to maintain.

Writing the Twitter Ranking Rules

With the Follower objects created and inserted into the knowledge session we can now write a set of rules that match Follower objects with certain characteristics.

The “User has no picture” rule matches any object of type Follower where the hasPicture boolean value is false. It aliases the matched object as “follower” and in the consequence it subtract 30.0 points from the ranking value.

rule "User has no picture"
dialect "java"
when
  follower : Follower(hasPicture == false)
then
  follower.setRanking(follower.getRanking() - 30.0);
end

Listing SAM-5 The “User has no picture” rule

The “Follower with no mutual followers” rule is equally simple:

rule “Follower with no mutual followers”
dialect “java”
when
  follower : Follower(followersInCommon == 0)
then
  follower.setRanking(follower.getRanking() – 10.0);
end

Listing SAM-6 The “Follower with no mutual followers” rule

As you can see once we created and populated objects suitable for the rule engine, writing the rules becomes a simple task. The rest of the rules are left as an exercise for the reader (or you can download them with the complete sample application on github).

Writing the Java Application

The Java application that will exercise our Twitter rules is a simple class with a main method. We’ll pass a Twitter username and password as arguments via the String[] arguments. The code needed to read the DRL file and create a knowledge package is stardard boilerplate Drools code as shown in Listing SAM-7:

// get a knowledge builder
KnowledgeBuilder knowledgeBuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();
          
// parse and compile the DRL file
knowledgeBuilder.add(
  ResourceFactory.newClassPathResource("TwitterRules.drl", TwitterDroolsExample.class),
  ResourceType.DRL);
          
// check the builder for errors
          
if (knowledgeBuilder.hasErrors()) {
  logger.error(knowledgeBuilder.getErrors().toString());
  throw new RuntimeException("Unable to compile \"TwitterRules.drl\".");
}
          
// get the compiled packages (which are serializable)
Collection pkgs = knowledgeBuilder.getKnowledgePackages();
          
// add the packages to a knowledgebase (deploy the knowledge packages).
KnowledgeBase knowledgeBase = KnowledgeBaseFactory.newKnowledgeBase();
knowledgeBase.addKnowledgePackages(pkgs);

Listing SAM-7 Loading and compiling the rules

With a knowledge base in place we can now create and configure a knowledge session as shown in Listing SAM-8:

StatefulKnowledgeSession knowledgeSession = knowledgeBase.newStatefulKnowledgeSession();
Twitter twitter = new Twitter(twitterUser, twitterPassword);
knowledgeSession.insert(twitter);
knowledgeSession.fireAllRules();

Listing SAM-8 Asserting facts and firing the rules

The knowledge session is created using the knowledge base and then we insert a Twitter4J Twitter object which will be matched by our “Extract and classify followers” which in turn will produce Follower objects that will be evaluated by the ranking rules.

Extracting the Results

The astute reader will notice that the DRL-scoped Follower objects represent the “result” of our process. The question is then, how do we retrieve those objects from the knowledge session after the rules have executed. Although we could put System.out.println statements in our rules consequence blocks or even logging statements in a real application you’ll most likely need to retrieve those objects from the knowledge session to be used in your Java application.

To retrieve an object from the knowledge session Drools provides a querying facility. Drools queries are just like rules that have no consequence block. For example if we wanted to just retrieve all Follower objects in the knowledge session we could write a query like:

query "get all followers"
  follower : Follower()
end

Listing SAM-9 A Drools Query

The Drools query in Listing SAM-9 simply matches any objects of type Follower and aliases those as “follower”. In the Java code then we could use the query as shown in Listing SAM-10 to retrieve the results:

FactType followerType = knowledgebase.getFactType( "org.drools.examples", "Follower" );
          
QueryResults results = knowledgeSession.getQueryResults("get all followers");
          
for (Iterator i = results.iterator(); i.hasNext();) {
  QueryResultsRow row = i.next();
  Object follower = row.get("follower");
  User user = (User) followerType.get( follower, "user" );
          
  TwitterUserType type = (TwitterUserType) followerType.get( follower, "classification" );
  Double ranking = (Double) followerType.get( follower, "ranking" );
  logger.info(user.getScreenName() + " is a " + type + " with a ranking of " + ranking);
}

Listing SAM-10 Using a Drools Query and dealing with custom data types

Running the Application

Running the application will produce output similar to:

stuarthalloway is a TWITTER_CASTER with a ranking of -5.0
bmaso is a NEWBIE with a ranking of 5.0
bobmcwhirter is a NOTABLE with a ranking of 5.0
jaredrichardson is a NOTABLE with a ranking of 10.0

As we can see from the output the numeric ranking begins to shed light on the influence and intentions of your followers in Twitter. As we can see above our friend Stu is classified as a Twitter Caster but due to his low activity or average tweets per day, our rules took 5 points of his ranking while Jared is ranked a little lower, as a Twitter Notable but due to his high activity he get a ranking of +5. The next step is to tweak our rules based on observation and investigation of the flagged users. We can see that the simple Twitter Influence Ratio is not sufficient to accurate predict a twitter spammer. But if we continue adding rules that go deeper than the simple static analysis we’ve perform here we can start getting closer to our goal. With a few more rules, possibly taking advantage of semantic analysis we could look at the hash tags, URLs embedded in tweets and other content analysis and more accurately classify Twitter users.

Conclusion

Rule engines can provide a Java developer with an environment in which logic and data are clearly separated. In the simple example used in this article it is easy to see how the DRL file becomes you laboratory of centralized knowledge about the problem at hand. Once the plumbing code is in place you can truly concentrate on the “business logic” in atomic, discrete, manageable chunks.

In this article we’ve barely scratched the surface of the capabilities provided by Drools. Drools 5 is a complete offering that includes the rule engine (Drools Expert), a Business Rule Management System (Drools Guvnor), a process/workflow engine (Drools Flow) and an event processing/temporal reasoning engine (Drools Fusion).

Resources

The code for the example above can be found at https://github.com/bsbodden/drools-twitter

Share