Smarter ValueObjects & an (even more) elegant Builder

Value Objects (VOs) are prevalent and needed in traditional Java programming. They’re almost everywhere —  to hold information within a process, for message-passing, and various other areas.

Apart from having getters and setters for the properties, on several occasions, there’s a requirement for these VOs to implement equals() and hashCode(). Developers usually hand-write these methods or use the modern IDE templates to generate them. This works fine initially or until there’s a need to update the VOs with one or more additional properties.

With an update, the baggage that comes with new properties includes:

  • a new set of getters and setters,
  • updates required to equals(), hashCode(), and,
  • update required to toString(),if needed

This is, of course, cumbersome, error-prone, and the simple VO soon starts looking like an airplane cockpit!

Google’s AutoValue framework is a smart approach to address this issue. With just a couple of annotations, almost all of the “junk” is done away with, and the class becomes smarter — any future property updates, including getters, setters, as well as equals()*, hashCode()** and toString() are all handled automagically!

The VO then just looks like a basic set of properties of the given type, like so:

import com.google.auto.value.AutoValue;

@AutoValue
abstract class CartItem {
    abstract int itemCode();

    abstract int quantity();

    abstract int price();

    static CartItem create(int itemCode, int quantity, int price) {
        return new AutoValue_CartItem(itemCode, quantity, price);
    }
}

Note the default presence of a static factory method create(), as suggested in Effective Java [Bloch, 2017], Item 2.

The use of this annotated VO would be no different from a typical one. For instance, the CartItem defined above would have a simple invocation like this:

@Test
public void create() throws Exception {
    CartItem item1 = CartItem.create(10,33, 12);
    CartItem item2 = CartItem.create(10,33, 12);

    assertEquals(item1, item2); // this would be true
}

Apart from the default support for a static factory, AutoValue also supports Builder classes, within the VOs. Armed with this knowledge, let’s take another jab at the example in my previous post on Builders.
We continue with the same Cake example and add the required annotations and modifiers. The updated version of the class would then be:

import com.google.auto.value.AutoValue;

@AutoValue
abstract class Cake {
    // Required params
    abstract int flour();
    abstract int bakingPowder();

    // Optional params
    abstract int eggs();
    abstract int sugar();
    abstract int oil();

    static Maker builder(int flourCups, int bkngPwdr) {
        // return builder instance with defaults for non-required field
        return new AutoValue_Cake.Builder().flour(flourCups).bakingPowder(bkngPwdr).eggs(0).sugar(0).oil(0);
    }

    @AutoValue.Builder
    abstract static class Maker {
        abstract Maker flour(int flourCups);
        abstract Maker bakingPowder(int bkngPwdr);
        abstract Maker eggs(int eggCount);
        abstract Maker sugar(int sugarMg);     
        abstract Maker oil(int oilOz);

        abstract Cake build();
    }
}

Observe that:

  • the member Builder class (named Maker here) just needs to be marked with @AutoValue.Builder annotation, and the framework takes care of everything else
  • in the parent class, we could also have had a no-arg builder() method but we specifically want to have only one way of building this class — with the required params
  • as shown above, the optional parameters should be set to their default values since we want the flexibility of choosing only the relevant optional params. [With non-primitive members, @Nullable can be used.]

Just to complete the discussion, here is an example of the ease with which this new builder can be invoked:

@Test
public void makeCakes() {

    // Build a cake without oil
    Cake cakeNoOil = Cake.builder(2, 3).sugar(2).eggs(2).build();

    assertNotNull(cakeNoOil);

    // Check that it has 0 oil
    assertEquals(0, cakeNoOil.oil()); // default

    // Make cake with oil
    Cake cakeWOil = Cake.builder(2, 3).sugar(2).oil(1).eggs(2).build();

    // Obviously, both the cakes are different
    assertNotEquals(cakeNoOil, cakeWOil); // valid

    // Another cake that's same as cake w/ oil
    Cake anotherCakeWOil = Cake.builder(2, 3).sugar(2).oil(1)
            .eggs(2).build();

    assertEquals(cakeWOil, anotherCakeWOil); // valid
}

There are many other fine-grained things that can be done while using AutoValue, like specifying getters for specific properties or customizing toString(), etc.

It’s impressive how AutoValue facilitates writing and static factory methods and builders quickly — taking the headache out of defining and updating VOs.

[Full implementation of the abovementioned example is here.]

Further reading:

  1. AutoValue with Builders
  2. Project Lombok also addresses the VO update issue, along with other things

* Effective Java [Bloch, 2017], Item 10
** Effective Java [Bloch, 2017], Item 11

Books: Java

EffectiveJava-Bloch

When I was writing the last post, I realized how much I used to be in awe of Effective Java [Bloch, 2017]. It was a book that covered what no other did. It wasn’t just coding — there’re plenty of books where one could learn “Your first Java Program” and beyond, and throw an air punch. It wasn’t also about language syntax and semantics — Java Complete Reference [Schildt, 2017] fitted the bill there*, or OOPs (every other Java book started with OOPs concepts). Rather, Effective Java covered the basics of writing elegant Java code, and as a by-product, also underlined how easy was it to be swayed away by ‘convention’. I wouldn’t recommend it as a Java learner’s first book. But it should very well be one’s second Java book and the one that she/he would keep revisiting throughout the programming career.

Ever since I’ve picked it up again, it’s become tough to keep it aside. With each of its topic, I realize, over time, how much have I drifted away from the delight of writing good code, and how much do I need to still learn.
Go read it now if you haven’t had a chance yet. What’s more, the much-awaited 3/E, which covers Java 7, 8 and 9, is out now!

While on this topic, let me talk about another one of my favourite books on Java — Thinking in Java [Eckel, 1998].

ThinkingInJava-Eckel

This is the book which I considered as a Java developer’s Bible at one point in time. Since there are no new editions after the 4/E, the syntactical parts might be a bit obsolete now. But still, in my opinion, it’s the best book to get one’s (Java & OOPS) fundamentals in place.

* I have always found Java Complete Reference a bit too elaborate to my liking. Most of which is about the language syntax and semantics. All of that might have been useful in the early days of the Internet when it wasn’t that easy to look up things online. But I doubt if that’s needed now.

The elegance of Builder pattern

Paraphrasing Josh Bloch in Effective Java [Bloch, 2017, Item 2]:

While creating objects, in cases where the number of optional parameters of an object is considerable, say 4 or more, one might think of static factory methods [Bloch, 2017, Item 1] as a solution — but they’re more suitable for a small set of parameters. When there are several optional params, static factories cannot be used as it’s cumbersome to imagine and cater to all possible parameter combinations. Another approach that’s proposed in such cases is using JavaBeans but it has its own shortcomings.

Therefore, we usually go with multiple (telescoping) constructors for such requirements. For example:

public Cake(int oilTbsp, int flourMg){
  this(oilTbsp, flourMg, 0);
}

public Cake(int oilTbsp, int flourMg, int eggCount){
  this(oilTbsp, flourMg, eggCount, 0);
}

public Cake(int oilTbsp, int flourMg, int eggCount, int bakingPowderMg){
  this(oilTbsp, flourMg, eggCount, bakingPowederMg);
}

//...

Such implementations, although purpose-serving, are a bit contrived in that the class client needs to tally the parameters accurately. Consider a large parameter list, and this would be an overkill.

A variation of Builder pattern [Gamma, 1995], is what Bloch suggests, for such cases. In it, a builder class is a static member of the class it builds, for example:

public class Cake{
  //...
  private Cake(Builder builder){
    //...
  }

  public static class Builder{
    //...
  }
}

Since the original constructor is hidden, the client first gets a reference to the Builder class — passing all the required params to its constructor or static factory. The client then calls setters on the returned builder object to set the optional parameters of interest. Finally, the client makes a call to the build() method to generate an immutable object.
Since the builder setter methods return the builder itself, the invocations can be chained, like so:

// Set only the parameters of interest
Cake cake = new Cake.Builder(350, 45).egg(2).sugar(240).cocoa(35)...build();

As is apparent, this is intuitive as well as concise.

A builder can be further enhanced by enabling it to build more than one object, based on parameters. One has to be cautious, however, to disallow building an object of an inconsistent state. This can be ensured by validating the passed parameters as early as possible and throwing a suitable exception.

Builders can also be used to automate certain tasks and fill in the fields. For example, autoincrementing the object Id, etc.

As Josh Bloch advises, we should be using Builders as often as possible, especially in cases where the number of parameters is significant. They’re a simple and elegant alternative to telescoping constructors or JavaBeans.

[Full implementation of the Cake builder example is here.]

A Spark learning.

About a month back, I’d done something I was not very proud of — a piece of code that I was not very happy about — and I had decided to get back to it when time permits.

The scenario was something close to the typical Word Count problem, in which the task required counting words at specific indices, and then print the unified count per word. Of course, there were many other things to consider in the final solution since a streaming context was being dealt with — but those are outside the purview of what I’m trying to highlight here.

An abridged problem statement could be put as:

Given a comma-separated stream of lines, pick the words at indices j and k from each line, and print the cumulative count of all the words at jth and kth positions.

So the ‘crude’ solution to isolate all the words was:

  1. Implement a PairFunction<String, String, Integer>, to get all the words at a given index
  2. Use the above function to get all occurrences of words at index jand get a Pair (Word -> Count) RDD
  3. Use the same function to get all occurrences of words at index k, and get another Pair (Word -> Count) RDD
  4. Use the RDD union() operator to combine the above two RDDs, and get a unified RDD
  5. Do other operations (reduceByKey, etc. …)

As is apparent, anyone would cringe at this approach, especially due to the two passes (#2, #3) over the entire data set — even though it gets the work done!
So I decided to revisit this piece, with the tool of additional knowledge about what all Spark offers.

One useful tool is the flatMap operation that Spark Java 8 offers. By Spark’s definition:

flatMap is a DStream operation that creates a new DStream by generating multiple new records from each record in the source DStream

Given our requirement, this was exactly what was needed — create two records (one for each jth and kth index word), for each incoming line. This would, of course, benefit us in that we have the final (unified) RDD in just a single pass of the incoming stream of lines.

I went ahead with a flatMapToPair implementation, like so:

JavaPairDStream<String, Integer>  unified = lines.flatMapToPair((s) -> {
        String a[] = s.split(",");
        List<Tuple2<String, Integer>> apFreq = new ArrayList<>();
        apFreq.add(new Tuple2<>(a[Constants.J_INDEX],1));
        apFreq.add(new Tuple2<>(a[Constants.K_INDEX],1));
        return apFreq.iterator();
});

To further validate the benefits, I ran some tests* with datasets ranging from 1M to 100M records and the benefits of flatMap approach were more and more pronounced as data grew bigger.

Following were the observations.

flatmap

As we can see, whilst the difference is ~2s for 1 million records, it becomes almost twice as we reach 10M and more than twice at around 100M mark.
It’s therefore, obvious that production systems (e.g. a real-time analytics solution), where data the volume is much higher, need to be cautious about the choice of each operation (transformation, filtering or any other action), as these govern the inter-stage as well as the overall throughput of a Spark application.

 


* Test conditions:
– Performed on a 3-Node (m4.large) Spark cluster on AWS, using Spark 2.2.0 on Hadoop 2.7
– Considers only the time spent on a particular stage (union or flatMap), available via Spark UI
– Each reading is an average of time taken in 3 separate runs

Reliable Distributed Communication with JGroups

We were looking for an alternative to Java Messaging Service (JMS) for a project requirement – I had had my fair share of unpleasant experiences with JMS from the past. With JMS, most of which had to do with housekeeping involved in the event of a broker outage.

Issue at hand

The solution we were working on was a legacy web-based application, which was being enabled for a cluster. Scalability, it seems, was not designed into the system. The solution, therefore, demanded making the system scalable with a quick-turnaround, minimal code changes, and of course, if not positive, then no negative impact on the application throughput.

The main challenge was the presence of a programmatic cache (using core Java data structures), for some specific requirements, which of course, needed to be made cluster-aware. Data consistency is the primary concern in such cases, because the application pages can be serviced by any potentially any node.

In this post, I want to talk about how we achieved the same – since such a scenario of different nodes needing to “talk” can arise in many a applications of the present day – as distributed computing is becoming more and more popular.

Analysis

When we say cluster-aware we imply that the clustered application, might need to talk to its peers at various junctures. It might or might not decide to perform an operation based on those interactions – but the conversation does need to occur. There are of course several options today to facilitate the same – JMS, Hazelcast, JGroups, etc. JMS requires the presence of an active broker process at all times. The housekeeping involved in broker outages, especially when message persistence is enabled (for reliability), becomes a problem. Solutions like Hazelcast required us to modify the existing code so as to use the data structures provided by them. This wasn’t feasible because we wanted to minimize the code changes and rather work out a solution on top of the existing code base.

We settled for JGroups because it seemed to address all the concerns:

  • there is no active process required beforehand, but rather, each node either joins an existing communication session, or else, starts one of its own. This means that there’s no single point of failure (SPOF)
  • message reliability, ordering and flow control are all configurable
  • code changes could be contained, because it involved minor modifications.

JGroups is a reliable IP  toolkit that is based upon a flexible protocol stack. IP , or broadcast, implies sending a message to multiple recipients. Traditionally, multicast communication is UDP (User Datagram Protocol) based, and is unreliable. For instance, think of a streaming video from YouTube – where once the streaming starts, little does it matter that a few frames are missed or jagged. Such are typically unreliable UDP based transmissions.

JGroups addresses the shortcomings of traditional IP multicast by adding aspects like reliability and membership to it.

Flexible protocol stack implies that there are options to tailor JGroups behaviour according to the application requirements. For example, aspects like Failure Detection, Encryption, etc., can be configured.

Design

Conceptually, the design was very simple: the caches on each node needed to “talk” to other nodes, whenever a significant operation occurred. Since we were dealing with user-defined entities, these operations mostly involved some kind of update of these entities.

Since each of the node needed a JGroups handle in order to be able to talk, we invoked a local JGroups session on each. Through the concept of groups and membership, a JGroups session looks for other active sessions when it comes alive. Consequently, members belonging to the same group are aware of all the peers. For example:
Suppose node n1 comes up as the very first node, and initializes a JGroups session, creating a group called ProjectXGroup. Any subsequent node, say, n2, n3, also does the same, but in its case, it find an active ProjectXGroup already in place, and hence joins it, rather than creating a new session. Thus, each node (peer) is aware of all the members in that form the group. JGroups, by design, provides failure detection (FD) – that any node that goes down is removed from the list of active members, thus ensuring that at any given point of time, all the group members are aware of the current state of the group.

Any local update of one or more entities required the change to be propagated to all the peers. We identified that this should be a 3-step process. Given an entity EntityX, the following needed to be done on all the nodes:

  • Lock EntityX
  • Update ExtityX
  • Unlock EntityX

We also needed to ensure that failure to perform any of these operations should result in a rollback, which also needed to be taken care of. For any given operation, we subdivided it into smaller tasks. An example of a task could be LockTask, UpdateTask, UnlockTask, etc. It is these tasks that would be the language of communication. Typically a task would have taskID, information about the entities involved, status flag (SCHEDULED, SUCCESSFUL, FAILED..), etc.

JGroups has options to either broadcast a message or unicast it. Since our implementation required that caches on all the peers be consistent, a sender broadcasts a task to all the peers. Each peer, would then perform the task, and report the outcome back to the sender. Continuing with our earlier example, given a task T1 which the involves the following sub-tasks on an entity known as E1:

  1. Lock E1
  2. Update E1
  3. Unlock E1

Let’s consider the first sub-task — Lock E1:
This would require E1 to be locked on all the nodes, and therefore we invoke the LockTask, with information about E1. Now, suppose n2 is a node which is catering to the user at a given point of time, then n2 would first lock E1 locally, and then broadcast this LockTask to the group.  Since it is a broadcast, all the nodes (including n2) would receive and process the LockTask (and lock E1 locally). We can easily identify the sender and not perform the task there (since the task was issued to other nodes only after the sender had performed it). This means that n2 would ignore the task, but n1, n3..nn would need to perform it.

Once the given task is processed, in order to report the outcome back, we set the appropriate status flags in the original Task object and send it back to the sender. This time, however, we set it as a unicast communication (by specifying the recipient), since we know who is this message directed to. What it boils down to in terms of JGroups is: after performing the task n1, n3,…nn would all get the handle to the JGroups session and can use the same LockTask instance to report back the outcome of a given task. Upon receiving responses from all the peers, and if the responses are positive, the sender can then proceed with other operations (sub-tasks) — Update E1, Unlock E1…etc.

Exceptional Scenarios

There are various junctures where the above conversation could fail, and thus, all the exceptional scenarios need to be taken care of. There were several failure scenarios that we handled. To list a few:

  • No response (or response timeout) from one or more peers
  • Failure of subtask in the sender node
  • Failure of subtask in any peer, etc.

Summary

We discussed how JGroups can be an apt solution in situations where reliable distributed communication is necessary. We also saw an example where JGroups was used for enabling intra-node communication for broadcast and unicast services.

anyString() v/s anyString() in Mocking frameworks

So I was trying out mocking frameworks for the first time a few days ago, and there’s an interesting observation: even though EasyMock, PowerMock, and Mockito and said to be brothers-in-arms, there are a few nuances to take care of. One of them is the following.

There are convenience methods (called matchers) provided by both Mockito and EasyMock to define a message signature that can accept any value of a given parameter type. It is claimed, that these any implementations (e.g. anyObject, anyString, etc.) are interchangeable, but that does not seem to be the case.

Consider a very simple scenario, where one would want to mock a login() implementation. That is to say, success (true in this case) must be returned, regardless of the parameters passed.

class N {
    
    public boolean login(String username, String password) {
        
        return doLogin(username, password);
    
    }

    private boolean doLogin(String u, String p){
        
        //validate login
        //...
        //...
        
        return true;
    }
}

Now that the method is defined, let’s take a look at the test case where we would mock is call.

@Test
public void testMockLogin() throws Exception {
  
  // mocking only specific methods
  N n = createPartialMock(N.class, "doLogin", String.class, String.class);
  boolean expected = true;

  expectPrivate(n, "doLogin", anyString(), anyString()).andReturn(expected);
  replay(n);

  boolean actual = n.login("foo", "bar");

  verify(n);

  assertEquals("Expected and actual did not match", expected, actual);

}

Easy, peasy! Except for line no. 7, where anyString() is used via the import org.mockito.Matchers.anyString, causing the following exception:

java.lang.AssertionError:
Unexpected method call N.doLogin("foo", "bar"):
N.doLogin("", ""): expected: 1, actual: 0
at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:44)
at org.powermock.api.easymock.internal.invocationcontrol.EasyMockMethodInvocationControl.invoke(EasyMockMethodInvocationControl.java:91)
at org.powermock.core.MockGateway.doMethodCall(MockGateway.java:124)
at org.powermock.core.MockGateway.methodCall(MockGateway.java:185)
at com.pugmarx.mock.N.doLogin(N.java)
...

The same test case passes when the anyString() implementation of Mockito is replaced by the implementation provided by EasyMock (via org.easymock.EasyMock.anyString).
If we dig a bit, we realize that one of the reasons could be that EasyMock.anyString returns null, as against the empty string returned by mockit.Matchers.anyString.

[Thank you for troig from the wonderful StackOverflow community for helping me on this query.]

Not guilty!

We have been recruiting a lot lately. I try to stick to the basics, and sometimes get surprised by the responses I receive. A lot of seemingly successful candidates — who can take any J2EE question you throw at them — fumble when the convenience of their favourite data structures are taken away from them. For example, a simple Map implementation seems to give them a tough time.

I am surprised by one of the answers I receive quite a lot when asked about data structures. That, “I don’t remember — it was a long time ago” (that they were ‘taught’ about it). In other words: they plead “not guilty!”. Now, even though we are not recruiting for a programmer position, so to speak — such an answer does spark a bit of outrage within me. And then I ponder: what are we doing in our colleges? Are we getting too driven by the shiny-white things like BigData, BigAnalysis and IoT and what not — that we don’t really worry much about the basics of computer science. First of all: is the essence of computer science even put forth to the students? That it’s a science…that one has to develop a scientific mindset! I believe, that’s something above all the big college tag/big technology/big offer/big robot that one develops.

Even after college, do they ever go back and revisit the basics they had been ‘taught’? Or at least think about them?

One of the candidate we came across recently could not proceed on a small programming problem because we took away the luxury of being able to use HashMap from him. When I asked why couldn’t he create a Map of his own, the response was: “I am not sure what language are Java Maps written in.”

Take a peek inside .jar files!

jar tf is a very useful command if we just want to peek into a jar. For example, I had a recent requirement of scanning several jars to look for Hibernate *.cfg.xml files. Naturally, it would have been tedious had I had to open each jar to check. The following simple script (using jar tf) came to the rescue.

for f in `find . -name '*.jar'`;do echo --$f--;jar tf $f | grep -n cfg.xml;done

Python, Spring, and Java concurrency

Attended a Python training recently, and instantly fell in love with it. Mostly, because I could appreciate the ‘beauty’ of the language coming from the oh-so-verbose-Java background. And it is then that I realized, what is it that prevented me from exploring it wayy back when I had heard about it! Hmm.
Anyway, now that I have been formally introduced to it, I’m looking forward to it as a long-term association. 🙂

Also attended Spring training. I don’t have words to express on what all the Spring community has done to come-up with a light-weight J2EE framework, with seamless integration with about anything! I did have a chance to work with 1.2.x version somewhere around 2006, but then other technologies took priority and I could not keep in touch. Nevertheless, I plan to explore it much more this time around.

Java concurrency is another thing I’ve started following up now. I remember, of the numerous interviews I’ve attended, I’ve (confessionally) remained on the back-foot when it comes to discussions about concurrency. Decided to break this trend, and actively see what it’s all about.

I would highly recommend Java Concurrency in Practice to get your hands dirty with Java concurrency fundamentals. It has my favourite author ‘Joshua Bloch’ as a co-author.
The only let-down is that this book is currently out-of-stock in India (at least Mumbai & Pune (as of Jun’10)), but it’s worth the wait!

Sorting using Comparable or Comparator

Of late, I’ve been involved in taking some interviews. Many of the candidates seem to be confused about the very simple concept of Comparator and Comparable.
Comparable is an interface in which the contract is to provide your implementation of compareTo() method. For example:


// compares on the basis of last name
class MyVO implements Comparable{

private String firstName;
private String lastName;
...
public int compareTo(Object o){
return lastName.compareTo(((MyVO)o).getLastName());
}

...
}

Whereas, Comparator is similar to the C concept of function pointers (functors). In Java, we incorporate this by defining a class which implements Comparator, and overriding compare() method.
If we think a bit broad, we can relate to Strategy design pattern — provide different Comparator implementations for different situations. For example, you could switch from sorting on the basis of firstName instead of

lastName at runtime, based upon user-input.

A comparator implementation is as follows.


class LastNameComparator implements Comparator{

public int compare(Object o1, Object o2){
return ((MyVO)o1).getLastName().compareTo(((MyVO)o2).getLastName());
}

}

Once this is done, there are numerous ways in which you could sort your Collection.

See: Using the Strategy Design Pattern for Sorting POJOs