Thomas Neidhart of SpaceApplications has implemented the Toma query language on top of Ontopia. Toma is a query language for Topic Maps designed by Rani Pinchuk of SpaceApplications. The implementation is currently in the sandbox part of Subversion, and not part of Ontopia proper, but the wiki page explains how to check it out and run it. Currently, Toma queries can only be run using a command-line client (or the API), but work is currently under way to make Toma available in the Omnigator query plug-in.
Toma implementation in Ontopia
October 15, 2009 by ontopiaOKS and Ontopia
October 13, 2009 by ontopiaSome users who have written to us have been confused about the relationship between the old OKS (Ontopia Knowledge Suite) and the new open source Ontopia edition. They have also been confused about what is and isn’t allowed with the new open source edition. So we thought we’d post a little about that to make things clearer.
The commercial edition of the product, which existed from May 2001 until June 2009, was the Ontopia Knowledge Suite (OKS). It came in four different editions:
- OKS Personal, which was just the engine.
- OKS Professional, which was everything but the database backend.
- OKS Enterprise, which was everything.
- OKS Samplers, which was the free download. It was basically OKS Professional minus the tools for making new applications.
Now there is just Ontopia, which is almost exactly the same as the old OKS Enterprise. The difference is that we took out the code that deals with license keys since we don’t sell license keys any more. We also cleaned up a bit, and threw out some legacy stuff that nobody used.
As for what you can do with the new open source Ontopia that’s simple: anything the Apache Software License says you can do. Specifically, that means you can download it as many times as you want, install it anywhere you want, and use it for any purpose you want. All for free.
Ontopia Code Camp
October 2, 2009 by ontopiaWe are arranging a code camp as part of the TMRA 2009 (Topic Maps Research and Applications) conference in Leipzig, on November 11. The code camp will introduce new developers and users to the project, the product suite, and show how to develop with (and for) Ontopia. At the end we’ll close with a discussion of what users and developers want to see from the project.
Lars Marius Garshol and Geir Ove Grønmo, the two lead developers of Ontopia since 2000, will be leading the code camp. For more information, please see the official code camp program.
We hope to see you in Leipzig!
A faster and more compact Set
September 23, 2009 by ontopiaLooking at some of the new code that is being added to Ontopia I thought it might be useful to point out that as part of Ontopia we have a class called CompactHashSet, which is both faster and more compact than java.util.HashSet. So when you use Sets in your code, it might be worth your while to use the CompactHashSet instead of HashSet.
The in-memory engine uses Sets throughout to for all the collections in the core API. There are a lot of these: the set of topics, the set of names for each topic, the set of item identifiers for each topic, … Ontopia originally used a lot of memory, and we identified HashSet as the source of a lot of this, and CompactHashSet was written in order to reduce memory usage somewhat. An interesting side-effect was that it also turned out to be faster.
HashSet uses open hashing, which means that each bucket in the hash table refers to a linked list of entries whose hash codes place them in the same bucket. This means that for each entry in the set an extra linked list element object must be allocated, which of course requires extra memory.
CompactHashSet, by contrast, uses closed hashing. This is a strategy where if the bucket you want to place a new entry in is already occupied, you run further down the hash table (in a predictable way!) looking for free buckets. This means you can do away with the linked list, thus saving memory.
So how much faster and more compact is CompactHashSet? I put together a little unscientific test and ran in three times for each of the implementations. The test first adds to the set 1 million times, then does 1 million lookups, then traverses the set, then removes 1 million objects. (See below for the test code.) This is the results:
CLASS TIME MEMORY HashSet 4477 44644624 HashSet 4447 44651984 HashSet 4500 44632824 CompactHashSet 2416 22886464 CompactHashSet 2351 22889408 CompactHashSet 2370 22895872
In other words, in our little test, CompactHashSet is nearly twice as fast and uses about half as much memory compared to HashSet. When you win on both speed and memory use there isn’t much left, really…
Except, of course, reliability. Initially, there were some problems with the CompactHashSet class. In some cases, the run down the hashtable to find free buckets could get into an infinite loop without ever finding a bucket. That’s now been solved. And, many years ago, there was a memory leak in it, causing deleted objects to be kept when rehashing. This caused serious performance problems for some customers and took months to track down.
EDIT: Discussion on reddit.com shows that many people misunderstood the above. The infinite loop bug was found during initial testing. The memory leak was found once customers started deploying the code for real, which may have been about a year after it was implemented. This was in 2003. Since then we have found exactly zero bugs in the code. END EDIT
By now, however, we have an extensive test suite for the class, and it’s been in use unchanged for many years with no problems. Using CompactHashSet should be entirely safe.
The test code
If you want to try the test yourself, here is the test code:
import java.util.Set;
import java.util.HashSet;
import java.util.Iterator;
import net.ontopia.utils.CompactHashSet;
public class TestHashSet {
private static final int TIMES = 1000000;
private static final int MAX = 5000000;
public static void main(String[] argv) {
// first, get the JIT going
test(false, new CompactHashSet());
test(false, new HashSet());
// then, do real timings
for (int ix = 0; ix < 3; ix++)
test(true, new HashSet());
for (int ix = 0; ix < 3; ix++)
test(true, new CompactHashSet());
}
public static void test(boolean output, Set set) {
long start = System.currentTimeMillis();
if (output) {
System.gc(); System.gc();
}
long before = Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();
// add
for (int ix = 0; ix < TIMES; ix++)
set.add(new Long(Math.round(Math.random() * MAX)));
if (output) {
System.gc(); System.gc();
long after = Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();
System.out.println("Memory before: " + before);
System.out.println("Memory after: " + after);
System.out.println("Memory usage: " + (after - before));
}
// lookup
int count = 0;
for (int ix = 0; ix < TIMES; ix++) {
Long number = new Long(Math.round(Math.random() * MAX));
if (set.contains(number))
count++;
}
// iterate
Iterator it = set.iterator();
while (it.hasNext()) {
Long number = (Long) it.next();
}
// remove
for (int ix = 0; ix < TIMES; ix++) {
Long number = new Long(Math.round(Math.random() * MAX));
set.remove(number);
}
if (output)
System.out.println("TIME: " + (System.currentTimeMillis() - start));
}
}
NRK/Skole logo competition
September 23, 2009 by ontopiaThe NRK/Skole project needs a new logo, and has decided to crowdsource the problem by announcing a logo competition. The competition announcement has text in both Norwegian and English.
Hardship at NRK
September 21, 2009 by ontopiaAt NRK Per Magnus Løvold is the internal developer responsible for keeping the application running, and so he is the one who had to deal with the frustration from the editorial team when it was not working properly.
In the end he made a wow not to shave before the problem was solved, possibly inspired by Harald Finehair. He sent us some photographic evidence, which we thought worth sharing:
Ontopia 5.0.2 released
September 14, 2009 by ontopiaOntopia 5.0.1 has been out for just about a week, but we decided to release 5.0.2 today even so. The reason is that 5.0.2 fixes a performance bug that is important for one of our customers. Beyond this fix the new release contains a few very minor fixes, but is otherwise nearly identical to 5.0.1.
Performance problems at NRK/Skole
September 14, 2009 by ontopiaIn the presentation of the NRK/Skole case at the August 27 meeting the customer spent some time on the performance problems that they have been experiencing recently, and there was quite a bit of discussion at the meeting of what the possible causes might be. Given that the whole site is based on Ontopia, much of the suspicion obviously centered on the Ontopia software and possible performance problems with it.
The initial report we had from the customer was that the site was “unstable”, meaning that it would sometimes not respond or be very slow. On occasion it would also have to be restarted. There was just one server with Ontopia deployed on it, but traffic on the site could only be described as modest (it’s not officially in use by schools yet), so it should easily be able to handle the traffic.
The customer wanted to switch to Ontopia 5.0.0 from OKS 4.1 in order to be able to use two load-balanced servers in a cluster (OKS 4.1 is priced per CPU, so it was too expensive to do it with this version). Unfortunately, testing showed that Ontopia 5.0.0 performed even worse than 4.1. It was at this point that the customer asked us to investigate and solve the performance problems.
Investigation
Our first step was to use TologSpy, a query profiler for tolog queries, to see if any of the queries in the application were slow. We quickly determined that the new optimizer made one of the queries very slow because it, well, de-optimized the query quite considerably. However, this was a problem with Ontopia 5.0.0 only, and couldn’t be the cause of the problems with OKS 4.1. Further, inserting a comment with an pragma to the optimizer should solve the problem.
Further testing showed that other than this the application seemed perfectly fine. We tested it with JMeter scripts running 10 client threads in parallel without being able to provoke any real problems. TologSpy showed that the bad query was still an issue (even with the pragma), but other than that everything seemed just fine.
So we changed tactics. Instead of using JMeter to pound some randomly selected URIs we decided to write a Python script to play back portions of the access log from the production site. This gave the same results, initially. Then the customer noticed that two sets of URLs were “hanging” and not giving responses. So further investigation naturally focused on these.
Problem #1
It turned out that for some clips the metadata on the left would never be shown. Instead, the server would use all available memory and 100% CPU until the thread died with an out-of-memory error. Further digging showed that some of the metadata fields were being parsed with an Antlr-based parser written specifically for this project (ie: not part of Ontopia at all) in order to have the metadata nicely formatted. Running the parser from the command-line without Ontopia or a web server we found that some string values would make it consume 100% CPU until it hit the ceiling of 700 MB RAM, and then crash.
Studying the Antlr-based parser showed that its lexer had a token definition which could match empty strings. If an illegal character (that is, one that couldn’t match any token) appeared, the lexer would take the only option open to it and generate an infinite list of empty string tokens in front of that character.
This, of course, was the problem. Any time a user asked for one of these clips the server would use all available CPU and all memory, dumping out most of the objects in the Ontopia object cache. Typically, the user would reload when the server didn’t respond the first time, thus setting off the same thing again. Other pages would of course be slow during this time, especially as all caches would have to be filled up again, and the server would appear frozen/sluggish for a while before things calmed down again.
Fixing this required just changing a few lines in the .g file in the project code.
Problem #2
The second problem was the de-optimized tolog query. Setting the pragma option to turn off the new optimizer for some reason did not solve the problem. Running the query in the tolog plug-in in Omnigator worked fine, even if the query was a little slow. Running from the project Java code, however, the query would never complete.
It took a while to work out what was causing this, but in the end it was realized that the inference rules were running differently in Omnigator from in the Java code. The Omnigator plugin passes all rule declarations to the query processor together with the query as a single string, but the Java code was parsing the rule declarations into a DeclarationContextIF object, which was then passed in together with the query.
Further study showed that the code which optimizes the inference rule was not being run when rules were being parsed into a context object. They were, however, run when the rules were declared as part of the query. Once this observation was made, the fix was quite simple.
Conclusion
A bug in the project code (in the parser) was in other words the cause of the performance and stability issues. A fairly obscure bug in the tolog parser was the reason why 5.0.0 performed even worse than the old version. Now that these two issues are out of the way the site is behaving much better, and the customer will be able to change over to a load-balanced environment with two clustered Ontopia servers.
Ontopia 5.0.1 is released!
September 7, 2009 by ontopiaAfter a few days in beta with 30 downloads and no error reports we have released Ontopia 5.0.1. This release mostly holds bug fixes and minor improvements throughout the product. From the release notes:
The following changes have been made:
- Support for tolog.properties added (blog post)
- Refactored Vizigator to simplify embedding
- Dependency on Apache ORO removed
- Ontopoly can now be built as a .jar file
- Ontopia now runs on Google AppEngine, thanks to two small fixes (118, 114)
- Minor improvements in TologSpy to make reports more readable
- tolog can now count to zero
- tolog plug-in in Omnigator can show query plan without running query
- New version of Italian Opera topic map.
The following bugs have been fixed:
- Synchronization problem in DB2TM fixed
- XTM 1.0 crash on duplicate non-topic objects
- XTM 2.0 crash on duplicate non-topic objects
- NullPointerException on names in TMAPI 2.0
- TMSync doesn’t synchronize reifying topics
- Reification of sub-topic maps carred into main TM in XTM 2.0
- Cannot log in to new installation
tolog.properties
August 31, 2009 by ontopiaOntopia 5.0.0 enabled a new tolog optimizer by default, a move we knew risked causing difficulties for customers. One customer (NRK/Skole) has reported that they suspect this change caused problems for their application following the upgrade.
It is possible to control which optimizers are turned on and off in queries by including a comment in a special syntax. Unfortunately, in order for NRK to turn off the new optimizer in their application, they need to add that OPTION string to every single query in their application, since in 5.0.0 there is no way to set the option globally.
To avoid this, revision 450 implements a file called tolog.properties. This is a normal Java properties file, loaded from the classpath. In the new version, options in the query take precedence over options in tolog.properties, which again take precedence over the defaults.
This means you can now control properties globally for your application, should you wish to. If there is no tolog.properties everything will work as before.

