Introducing Ontopia REST

It is with great joy that we announce a new addition to the Ontopia framework: Ontopia REST. This new module aims to provide a complete REST implementation for core Ontopia functionality. The module is loosely based on the work David Damen did in the Tropics sandbox project. As such it also uses the well known and often used Restlet Java framework.

Goal

The goal of this new module is two-fold:

  • To provide developers new ways to use the power of Ontopia, without the need for in-depth knowledge of the full Ontopia code
  • To work towards replacing deprecated and antiquated modules such as
    • Webeditor framework
    • TMRAP
    • Vizigator
    • Navigator framework
    • Omnigator

Where the first goal requires only an Ontopia release to realize, the second goal will be an undertaking of its own.

Functionality

A short summary of functionality included:

  • Getting, adding, changing and removing all Topic Map constructs
  • Communication mainly focussed on JSON, by leveraging Jackson
  • XTM, CTM, LTM and TMXML exposing of full Topic Map
  • XTM and TMXML exposing of topic fragments
  • Paging on collection requests
  • Exposing methods in indexes such as ClassInstanceIndexIF and SearcherIF

Status

The module that was added to the Ontopia source contains a big part of the functionality envisioned. A large test suite has been set up to cover all the implemented functionality, containing close to 600 tests. However, there is still work to be done:

  • Documentation
  • TOLOG integration
  • Extensibility
  • Dedicated clients (javascript, java, …)

For this last point we ask you for help. If you are an Ontopia enthusiast that has experience in creating REST clients and that wants to help us out, please let us know!

How to get it

Ontopia REST will be added to the next major release of Ontopia. You can build it from source if you would like to use or test it before the release. You can find the source in the ontopia-rest directory in the master branch.

 

Open source perks

With the move to GitHub, several perks of being an open source project came to light:

Travis-CI

travis-mascot-200pxGitHub has a nice integration with Travis-CI, which offers free continuous integration for open source projects. Every push to a branch or pull-request branch can lead to a build and test of the project. The configuration of the build process is contained in the repository so that each branch may determine it’s own testing parameters.

We’ve enabled the Travis-CI functionality for the main Ontopia repository, and the results are publically available:

Screen Shot 2016-07-07 at 14.45.54

The automated building and testing will assist the developers with determining if branches or pull requests can be merged into the master branch or more work should be done first.

Codacy

SyZuYA39_400x400Codacy offers code analysis and measurements in a cloud service model. These measures can uncover possible improvements of the project. Improvements such as coding style, performance and security threats.

As with Travis, use of Codacy is free for open source projects. The analysis and measurements of Ontopia are publically available.

Some of the issues Codacy reports are a good practice exercise for people that would like to contribute to the project without needed a full in-depth understanding of all the code. Feel free to open merge requests referencing the issues you resolve.

Ontopia has moved to GitHub

It might be old news to the members of the Ontopia mailinglist, but we kinda forgot to update the rest of the world. So here it is.

As Google announced to shut down GoogleCode back in March 2015, we set out to find a new home for the Ontopia code base. GoogleCode would go into read-only mode in August, so we had to find a new home before this happened or we would loose the ability to commit new changes to Ontopia. Continue reading Ontopia has moved to GitHub

Ontopia JDO

It is with great pleasure that we announce the next step in the Ontopia development, the Ontopia JDO project.

Rationale
As we worked toward a release of Ontopia 5.3.1, we encountered the massive block of code that is Ontopia RDBMS backend. This huge block of extremely important code was developed as part of the OKS, before it was open sourced. As such, most of this code is now at least 6 years old, and the core code probably even older. We have come to understand the limitations of this code throughout the years, to name a few important ones:

  • Supported databases are limited
  • Optimization requires extensive knowledge of the code
  • Optimization is database dependent
  • Tracing and debugging is complicated
  • Full text search is complicated, and database dependent
  • TOLOG RDBMS is incomplete or broken
  • Adding use case oriented optimized SQL queries requires complex code hacking

When faced with these issues during real world projects, we set out to improve upon Ontopia. Soon thereafter we came to the conclusion that improving or changing the code would be a massive undertaking. So instead we choose to research new, yet proven, technologies that offer features that Ontopia requires.

JDO
JDO, which stands for Java Data Objects, is a specification of Java object persistence. It allows a domain model, represented by Pojo’s, to be mapped to a persistence store. JDO was initially designed in JSR 12 in 2002 and the last version (3.0) was released in 2010. JDO was not the only ORM technology we had a look at, but best suited the needs of Ontopia.

DataNucleus
Because JDO is a JSR, several implementations exist. We choose to work with Datanucleus initially, as it is the reference implementation for the latest JDO specification and supports the largest number of data stores.

Benefits
Here are some of the benefits we should be able to achieve with this project:

  • Many datastores: RDBMS (all?), Graph based (Neo4j!) , Document based (Mongo, …), Object based, web based.
  • External optimization: optimization is (mostly) part of the JDO abstraction layer, which means we won’t have to program it.
  • Use of open source community: JDO and Datanucleus are maintained by a large open source community, which means we get improvements on each new version.
  • Better integration: extending Ontopia’s datamodel with your own JDO persisted Pojo’s should now be possible.
  • Basic full-text searching for every datastore: The project provides a very basic full text search over JDO.
  • TOLOG RDBMS remake possible: the inner workings of tolog-rdbms creates JDO queries that are converted to SQL. This could now directly leverage JDO features.

Downside
Sadly, there is a downside. The RDBMS schema has changed. Although the schema closely resembles the Ontopia 5.x schema, it was impossible to fully reuse it. We plan to create a tool that can migrate from an existing RDBMS backend to a new JDO backend as optimized as possible to mitigate this issue.

Project status
The code committed to GitHub at the time of this post has been in development for about a year. It has been tested within the scope of Ontopia code, meaning all the backend tests in net.ontopia.topicmaps.core. Beyond these basic tests, Morpheus has tested the project in combination with existing frameworks and projects based on Ontopia. All these tests are now successful, which means the project is ready to be beta tested.

Roadmap
The project goes into beta testing with this post. We ask you, the Ontopia community, to test it in your projects and frameworks. We especially would like to see all the different datastores tested before we officially claim that Ontopia can be used with all stores. Do not hesitate to ask questions, report issues, or even better: create pull requests. In the coming days we will add known issues and todo’s to the issue tracker. See the README on GitHub to get started with ontopia-jdo.

The Ontopia committers.

Ontopia.toMaven()

Ontopia’s developer team is committed to switch from Ant to Maven as build and project management tool for the Ontopia code base. Making this switch has been ongoing work since 2009. This blog post serves as a summary of the work that has been done so far and the work that still needs to be done. 

Why Maven?
Ontopia’s biggest problem is that the code base forms one massive block, that cannot be split up. Many developers and end users have complained about this and have requested a change to modularize the product. The Ant build file that is currently used to build Ontopia is about 3000 lines long and has become difficult to maintain. Also, as we discovered along the way, it contains obsolete parts and many tasks are heavily tangled. Cleaning up the build file is not straightforward and will remain a problem as the project evolves. At TMRA 2010 Morpheus has presented a proposal to start using Maven instead of Ant.
Maven is a project management and comprehension tool that has become increasingly popular over the last couple of years. It uses convention over configuration. Instead of configuring every setting over and over again, Maven uses conventions for commonly used tasks. It uses a standard for directory naming and for the build cycle that is used to compile, test, build and deploy software. As a result, it takes a lot less XML to tell the system how Ontopia should be built. Of course this requires the code base to follow the convention, which is what we’ve been working on since July 2009.
Additional benefits of following the Maven convention is that the directory structure starts to reflect the modular architecture of the code base and that test files and resources become separated from the actual code. This creates a more transparent code base, in which developers can find their way more easily.
Maven is used extensively in software written in Java. It is mature software and much support can be found online. Many plugins are available for, for instance, pre-compiling JSP, creating Docbook documentation, etc. We believe it Maven is currently the best option for Ontopia. Later it would be possible to create build scripts based on other project comprehension tools like Gradle or Buildr, which use the same file layout as Maven.

What needed to be done to support a modularized architecture?
To modularize Ontopia we distinguish three main parts:

  • Java code. Core functionalities, db2tm, classify, navigator, etc only contain java code. We’ve split up these functionalities into modules where this was possible.
  • Web applications: Omnigator, Ontopoly and the supporting web applications are now modules of Ontopia. Each application can be build separately if needed, and will eventually end up in the distribution.
  • Distribution: for most of the users, this is what Ontopia is. The zip file containing tomcat and all tools and applications. Currently this module builds only a tomcat distribution, but it is set up to allow for other container server distributions to be added in the future.

Maven has a specific project structure. To implement this structure we needed to move a lot of files into the correct location. After all the code was moved into the correct location and was once again compilable, we started working on the test cases. Maven forces testing on every build, which currently Ontopia doesn’t. Maven also automatically detects test classes. Most of our work went into changes the test cases into Maven-runnable test cases. During this process we discovered that not every test case of Ontopia is being tested in the current build process.
The next step was to move all the web applications into maven modules. The new Maven web applications are now being pre-compiled, which brought up some old and broken code.
Finally, the distribution needed to be redefined. The Maven modules are collected and placed into a freshly downloaded tomcat.

What will change for end users?
We aim toward a build that generates essentially the same distribution as the one that is now available, so that users are not directly affected by the changes. After a successful transfer to Maven, we can start improving the quality of Ontopia. This of course translates into fewer bugs for users.

What will change for developers?
The biggest changes in this process are aimed at the ease of use of Ontopia as project dependency and the maintenance of Ontopia itself. Developers using Ontopia will get more choice in what part of Ontopia they would like to use. For example: a Topicmap browsing web application would be dependent on the Navigator module only.
The work of the Ontopia developers can now be aimed more directly at a certain module, allowing for easier splitting of developer tasks. The modules now have clear lines between them, so that debugging becomes easier.

What has been done so far?
Currently, we are at about 90% completion of the conversion to maven. We have created a branch on Google code, called ontopia-maven, in which we are working. The steps we have taken so far are, amongst others:

  • Java code has been moved into modules (100%)
  • Most of the test files have been modified to the new situation (99%)
  • Web applications have been moved into modules (90%)
  • A distribution with Tomcat has been added (80%)

We are still working on finishing the TMRAP service, the documentation, the vizlet and the overall fine-tuning of the distribution.

When is the switch expected to be finished?
At the moment there is no date set for any action after finishing the branch. There are several decisions to be made before we can define a time frame. Of course there will be ample notification in advance of any major changes. Our best guess is to finalize the transition in the summer of 2011.

How will the merging be done?
At the moment there are several thoughts about how we can merge the results of our work back into the trunk:

  • Replacing the trunk with the branch. This means we have to apply every commit on the trunk since the branching moment on the branch. The current trunk would then be tagged as the last non-maven version.
  • A ‘normal’ merge to the trunk. The usual way of ending the life of a branch is by merging it back into the trunk. We expect that this will create a lot of conflicts due to the amount of moves, copies and changes that needed to be done. During the process of fixing these conflicts, the trunk would be locked until we reach a stable build.
  • Replacing the trunk with a backup plan. This would be almost the same as the second option, except for a safety precaution: we would create a new branch from the trunk as backup for emergency fixes/builds. Once the merge is finished, we can merge back any emergency changes from the trunk branch.

None of these options has been chosen yet, and we are open to suggestions from people with experience in massive branch merges.

How is the progress monitored?
The Ontopia Maven branch is currently deployed in a Hudson server at Morpheus, which runs nightly builds and also runs a Sonar analysis. This is used to keep track of the buildability of the maven project, and the status of the test cases. It also provides us with some nice metrics:

  • Modules: 18 (10 java, 7 web applications, 1 distribution)
  • Lines of code: 148,895
  • Java classes: 2145
  • Tests: 4529 (of which 39 are currently failing)
  • Test success: 99,1%

We are looking into the possibility of sharing access to the Hudson and Sonar results.

Is there something to see / play around with?
Yes there is! The Ontopia maven code is publicly available in the Ontopia repository, under branches/ontopia-maven/ontopia-maven. If you want to build Ontopia yourself, please install Maven and run from the project’s root directory:

 mvn clean install -Dmaven.test.failure.ignore=true -Pontopia-distribution-tomcat
The failure.ignore setting is temporary to work around the last failing testcases. Without this fix, the build process will halt on the first failure. Since Maven will run all test cases (including RDMS), the number of test failures will be a few hundred if you do not provide an RDBMS property file (-DargLine=”-Dnet.ontopia.topicmaps.impl.rdbms.PropertyFile=/path/to/file.props”). The -Pontopia-distribution-tomcat is an additional profile setting to include the distribution in the build. It is not included in a default build. Once the build is complete, you can find the distribution in the ontopia-distribution-tomcat/target/ontopia-distribution-tomcat-**/ folder. 

How can I help? Whom to contact for questions?
You can help us by building Ontopia with Maven yourself and either trying out the distribution or the new artifacts as dependencies in other projects. Issues you find can be reported on the Ontopia issue tracker. Keep in mind however that this branch is quite old and might not contain fixes already committed to the trunk.
Any of the Ontopia contact options can put you into contact with people that can answer your questions.

Conclusion
Switching to Maven will be a great leap in the maintainability of the Ontopia code base. Building, testing, releasing, etc. of the code will be done based on a standardized life cycle. The file layout will be more transparent by standardizing the directory structure and separating test and resource files from the code. Also, by using Maven’s modularized approach, we will be able to build parts of Ontopia separately and gain the possibility to create customized distributions, for example for different web containers.

The conversion is now almost complete, but still residing in a Subversion branch, awaiting to be merged back into the trunk. We are looking forward to meet you on the other side.

A faster and more compact Set

Looking at some of the new code that is being added to Ontopia I thought it might be useful to point out that as part of Ontopia we have a class called CompactHashSet, which is both faster and more compact than java.util.HashSet. So when you use Sets in your code, it might be worth your while to use the CompactHashSet instead of HashSet.

The in-memory engine uses Sets throughout for all the collections in the core API. There are a lot of these: the set of topics, the set of names for each topic, the set of item identifiers for each topic, … Ontopia originally used a lot of memory, and we identified HashSet as the source of a lot of this, and CompactHashSet was written in order to reduce memory usage somewhat. An interesting side-effect was that it also turned out to be faster.

HashSet uses open hashing, which means that each bucket in the hash table refers to a linked list of entries whose hash codes place them in the same bucket. This means that for each entry in the set an extra linked list element object must be allocated, which of course requires extra memory.

CompactHashSet, by contrast, uses closed hashing. This is a strategy where if the bucket you want to place a new entry in is already occupied, you run further down the hash table (in a predictable way!) looking for free buckets. This means you can do away with the linked list, thus saving memory.

So how much faster and more compact is CompactHashSet? I put together a little unscientific test and ran it three times for each of the implementations. The test first adds to the set 1 million times, then does 1 million lookups, then traverses the set, then removes 1 million objects. (See below for the test code.) This is the results:

CLASS          TIME   MEMORY 
HashSet        4477   44644624
HashSet        4447   44651984
HashSet        4500   44632824
CompactHashSet 2416   22886464
CompactHashSet 2351   22889408
CompactHashSet 2370   22895872

In other words, in our little test, CompactHashSet is nearly twice as fast and uses about half as much memory compared to HashSet. When you win on both speed and memory use there isn’t much left, really…

Except, of course, reliability. Initially, there were some problems with the CompactHashSet class. In some cases, the run down the hashtable to find free buckets could get into an infinite loop without ever finding a bucket. That’s now been solved. And, many years ago, there was a memory leak in it, causing deleted objects to be kept when rehashing. This caused serious performance problems for some customers and took months to track down.

EDIT: Discussion on reddit.com shows that many people misunderstood the above. The infinite loop bug was found during initial testing. The memory leak was found once customers started deploying the code for real, which may have been about a year after it was implemented. This was in 2003. Since then we have found exactly zero bugs in the code. END EDIT

By now, however, we have an extensive test suite for the class, and it’s been in use unchanged for many years with no problems. Using CompactHashSet should be entirely safe.

The test code

If you want to try the test yourself, here is the test code:

import java.util.Set;
import java.util.HashSet;
import java.util.Iterator;
import net.ontopia.utils.CompactHashSet;

public class TestHashSet {
  private static final int TIMES = 1000000;
  private static final int MAX = 5000000;
  
  public static void main(String[] argv) {
    // first, get the JIT going
    test(false, new CompactHashSet());
    test(false, new HashSet());

    // then, do real timings
    for (int ix = 0; ix < 3; ix++)
      test(true, new HashSet());
    for (int ix = 0; ix < 3; ix++)
      test(true, new CompactHashSet());
  }

  public static void test(boolean output, Set set) {
    long start = System.currentTimeMillis();

    if (output) {
      System.gc(); System.gc();
    }
    long before = Runtime.getRuntime().totalMemory() -
      Runtime.getRuntime().freeMemory();
    
    // add
    for (int ix = 0; ix < TIMES; ix++)
      set.add(new Long(Math.round(Math.random() * MAX)));

    if (output) {
      System.gc(); System.gc();
      long after = Runtime.getRuntime().totalMemory() -
        Runtime.getRuntime().freeMemory();
      System.out.println("Memory before: " + before);
      System.out.println("Memory after: " + after);
      System.out.println("Memory usage: " + (after - before));
    }
    
    // lookup
    int count = 0;
    for (int ix = 0; ix < TIMES; ix++) {
      Long number = new Long(Math.round(Math.random() * MAX));
      if (set.contains(number))
        count++;
    }

    // iterate
    Iterator it = set.iterator();
    while (it.hasNext()) {
      Long number = (Long) it.next();
    }

    // remove
    for (int ix = 0; ix < TIMES; ix++) {
      Long number = new Long(Math.round(Math.random() * MAX));
      set.remove(number);
    }

    if (output)
      System.out.println("TIME: " + (System.currentTimeMillis() - start));
  }
}