Oct 6

Strings are a critical part of the Java language and especially so in a web application. Here are a variety of pointers for dealing with Strings.

Use StringUtils

A lot of the common String operations that aren’t in the JDK library are available in Commons Lang StringUtils. Check out those methods and you’ll find your code looking much cleaner (and you’ll have to write less of it!). Anytime you’re about to write a piece of simple string manipulation code that you think should be in the library, check Commons first.

Concatenating a List<String> or a String[]

Check out StringUtils.join() to see if it’s already done for you! Also, see the discussion below about the different ways to concatenate strings.

Case insensitive comparison

Strings have a .equalsIgnoreCase() method that lets you do what you actually want rather than worrying about case conversions.

Prefer StrBuilder to StringBuffer/StringBuilder

Instead of the JDK StringBu* classes and the associated confusion explained below, try using StrBuilder. Apache Commons has created this very handy class that has a much more powerful and flexible API than StringBuffer. Give it a try and you won’t be able to go back to the regular StringBu* classes.

StringBuffer, StringBuilder and +

Strings in Java are immutable, so to append two Strings together the JVM must create another String. When you do that just once, the performance impact is negligible but if you do it many times (in a loop, or a more complicated piece of code) the cost of all of those intermediate Strings adds up. In order to deal with this problem, the StringBuffer and StringBuilder class were added to the Java core libraries. StringBuffer is threadsafe but slightly slower (because of the synchronization) and StringBuilder is not threadsafe but slightly faster. Both of the builder implementations have the con that they make the code harder to read.

The difference between using StringBu* and the ‘+’ operator is generally trivial, except in loops where it can become a significant performance hit.

As a general guideline, you should use the ‘+’ operator to concatenate simple Strings. The java compiler will actually convert code like:

a = b + c + d;

To look like:

a = new StringBuilder(b).append(c).append(d).toString();

So you get the performance when the code is compiled, and easier readability while the code is still in source. As of 1.5, the compiler can NOT yet optimize away the ‘+=‘ operator.

This also means that if you use + in a loop, then you’re incurring a performance penalty for creating a new StringBuilder in every iteration. They’ll be garbage collected eventually, but its not a good habit to get into. Make sure to declare the buffer/builder outside of the loop:

StringBuilder b = new StringBuilder();
 
for(String s : stringList){
    b.append(s);
}
 
return b.toString();

Again, most of the basic use cases where you’d want to do that are actually already implemented for you in Commons Lang StringUtils.

Don’t optimize String concatenations!

This may seem to go against the advice of the rest of this post, but bear with me. Don’t immediately go into your codebase fixing these issues everywhere they crop up. That’s just a waste of time. Use profiling to determine which loops are actually causing slowdowns and fix those. Just keep these tips in mind as you write new code and focus on core readability as a primary requirement.


Aug 10

The Collections Framework is an awesome way to represent may common types of data using the mathematical basis for sets. Sun has a great tutorial that explains what the framework is and how to use it. This is meant to be a simple guide (in the traditions of Effective Java) on how to use the framework correctly.

To make it clear at a glance, the good and bad code samples are marked in green and red respectively.

Collections have an isEmpty() method

The Collections framework goes out of its way to make the code reflect what you’re actually thinking. The proper way to check if a collection is empty is not to check if its size is zero. Use the isEmpty() method!

Avoid returning null to mean an empty collection

Leaving the possibility of returning a null instead of an empty collection makes your caller have to write extra boilerplate code to deal with that case. Most of the time an empty collection has the same meaning that a null would. Trying to use null to represent an error in whatever was supposed to generate the collection bypasses Java’s preferred way for handling errors: Exceptions.

Create an empty collection using Collections.empty***() methods

If you need an empty collection that’s guaranteed to stay empty (for example in your test code) use the Collections utility class to create them:

List myEmptyList = Collections.emptyList();
Set myEmptySet = Collections.emptySet();
Map myEmptyMap = Collections.emptyMap();

There are also constants for List.EMPTY_LIST, Set.EMPTY_SET, Map.EMPTY_MAP, but they’re best avoided since they do not support generics.

Iterate through collections using the foreach form when possible

You want the code to match your thought process as much as possible. Sometimes you can’t use the new foreach syntax introduced in Java 5 because you need access to the loop counter but in most cases the best way to iterate through a collection looks like this:

List foo;
for(String bar : foo) {
    System.out.println(bar);
}

The same foreach loop also works for arrays, with the exact same syntax.

Choosing which collection type to use

When you create a new data structure, don’t think of whether it’s a hash table, tree, whatever. Think about what the relationship of what you’re storing is. Is it a Collection, a Set, a Map, or a List?

If you’re not clear on what the difference is between a Collection, Map, Set, and a List check out the Javadocs (Sun’s Cliff’s notes). Start with what the requirements are for the data structure. Does it need to be ordered? Does it need to be unique? etc. Those will tell you which interface you need to use, and then you can decide what the underlying implementation should be. What if you use the wrong type of map/set/list and performance sucks? Well, then changing that implementation involves one line of code. And no one else needs to know or worry about it.

The left side is always an interface!

Like the title says, what you’re assigning to should almost never be an implementation. As a corollary, you should never return an implementation instead of an interface.

Instead of:

ArrayList foo = new ArrayList();
HashMap bar = new LinkedHashMap();

public TreeMap foo(){
…
}

do this:

List foo = new ArrayList();
Map bar = new LinkedHashMap();

public SortedMap foo(){
…
}

There might be a few exceptions to this rule, but they’re VERY rare. If you find yourself in one of those, stop and think really hard about whether you do actually need to break it. Better yet, ask the developer next to you about what she thinks.

If you’re explicitly casting, chances are something is wrong. Use generics.

Try to use generics on your collections. Knowing if foo is a list of Integers or Strings will make your client code much easier to read and will protect you from a lot of mistakes. Much more importantly, it will make it significantly easier for the next person who reads the code to understand what’s going on. If you find that you need a collection that needs to store multiple types of objects, then you should ask yourself whether you really have the right data structure and interface. Almost always the need to do that is a sign of a bad OO design.