Semlab : Muri : System : Code packages


Sections: Overview of packages | How to use packages | How WE use packages

Overview of packages

A modern programming language needs a way to isolate functional units of code larger than a single class, and Java is no exception. The problem is that if I write a class called Parser and you write a class called Parser, how does one distinguish between the two? The usual solution is called "namespaces"; a namespace is a distinct set of names that have been defined. You can refer to a definition of a class or other symbol by its absolute name (including the namespace) which is unambiguous, or you can "import" the whole namespace so that you can refer to its members by their relative names (which is more convenient, but brings back the ambiguity problem).

Java, C++, Python, and many other languages each have some kind of namespace feature... in Java, this is how it works: each class belongs to a package, which must be specified as the first non-comment line in each source file. Other classes in the same package can be referred to by their short names, but if you want to refer to a class from a different package, you must either specify the long name (relative to its container package) or import the class into the current namespace.

Packages can contain other packages or just classes, or both. You use the . (period) character to separate package names in a chain. You're probably already used to things like "import java.util.Vector"; let's dissect this and see what it actually means.

First, all classes provided as part of the standard Java library are divided into classes; those that are part of the core Java specification are in the "java" package. The "java" package contains another package called "util", which contains a class called "Vector". The long name "java.util.Vector" refers unambiguously to this class. If you want to use a Vector you need to tell Java which of the (possibly) infinite variety of Vector classes you want to use; you can do this by using the long name everywhere you mean Vector, or by putting the above "import" statement at the top of your file, which means "in this file, 'Vector' should be taken to mean 'java.util.Vector'." (There's also "import java.util.*", which imports everything from a package into the current namespace; this is a bad practice and should be discouraged, as it lengthens compile times and potentially brings back the exact problem packages were designed to prevent because you don't know what you might be importing. It's best to refer to the individual classes you need by name.)

Now that we've seen what packages are and why to use them, let's talk about how Java actually uses them.

Mechanics of packages in Java

Each class in Java should start with a package statement; if there's no package statement then the class becomes a member of the anonymous package. There's only one anonymous package, which contains all classes without package statements.

The package layout, for better or for worse (more on this value judgment later), must mirror your on-disk directory structure, where packages correspond to directories. So if you have a class named csli.agents.manager.AgentManager, it must be in a file whose relative pathname (from some base point) is csli/agents/manager/AgentManager.java. Now the value judgment: this is initially a pain because it forces you to work a certain way and will probably force you to slightly change existing code when converting it to packages for the first time, and forces you to go edit the files if you move them to another directory, but in the long run it makes things much easier because you always know where to find a given class (and so does the Java compiler, which helps it work without Makefiles. Be happy with it!).

Here's the clincher: a class can automatically refer to any other class in the same package by just its short name, and the class will be found. This goes for the anonymous package too, so if you have a bunch of classes in the same directory with no package information, they will be able to refer to each other. But there's absolutely no way for a class to refer to another class in a different directory, unless you use packages. This means that packages are mandatory for a project of any size, once you want to divide the source code into different disk directories, and especially once you want to share source code between different projects. On the other hand, once you do start using packages to share code between different projects, you'll wonder how you ever did things without packages.

To refer to classes from another directory, you just use the import statement at the top of your file (after its own package statement but before any real code), or, if you're only using a certain class once or twice, you can just spell its name out (including package info) wherever you use it. You're already doing this with Java's provided classes like java.util. stuff, java.lang. stuff, javax.swing. stuff, etc., and it's not really any different for packages you create yourself.

The final nice thing about this is instead of adding each directory in your project to your CLASSPATH (which is often necessary when using anonymous packages), you have only to add the root directory of your project to CLASSPATH, and it will use the directory information encoded in the package name to find stuff automatically.

Our package layout for CSLI/Semlab projects

We've (so far) created a top-level package called csli, with sub-packages called muri (for MURI-specific stuff), witas (for Witas-specific stuff), and agents (for OAA agents which can be shared between both projects and presumably future stuff too.

The package mechanism lends itself nicely to a hierarchical organization (the same as you'd do with directories, which is no accident), so, for example, under csli.muri there's a project called tutor, which is the logic for our tutoring system, whose whole name would be csli.muri.tutor. Inside csli.muri.tutor there are further subprojects that relate only to the tutor, such as csli.muri.tutor.session. Note that this exactly mirrors the directory structure, so even for non-Java stuff (example, our Gemini grammar which is in csli/muri/grammar) you can create the directories in a way that seems logical, and for the files that are Java classes, the package mechanism maps onto the existing directory structure in a logical way.

In the agents directory (csli.agents package), there are several subdirectories: manager (which is a Java utility for starting and stopping agents), and v1 and v2 which are versions of the natural language enabling agents built on incompatible source code bases. The v1 (version 1) agents were copied from the Witas project as of "final/demo3" stage, and enhanced for the MURI project while preserving the same semantics on operations. The v2 (version 2) agents are a completely new effort by the Witas people learning from their v1 effort, and do not preserve these semantics, hence you can't substitute a v2 agent for a v1 agent. Regardless, each of v1 and v2 should have agents for speech recognition, text-to-speech, &c; you can run them standalone (java csli.agents.v1.tts.CSLI_TTSAgent), or use them in your own programs ("import csli.agents.v1.nlinterface2.CSLI_NLInterfaceWindow2").


mginzton@csli / 5/30/2001