Skip to content

Implementation of data types

Sébastien Doeraene edited this page Mar 16, 2012 · 12 revisions

Implementation of data types

This section explains how to write new data types in the Object Model.

A data type, or implementation in the object model is what we usually call a class in a classical object-oriented language. It is a collection of fields and methods acting on these fields.

The skeleton

To get started with a new data type, you should copy and paste this skeleton. As a running example, we will explain how to write the Cons data type.

You can put the following, e.g., in the file datatypes-decl.hh of the experiment application.

#include <mozartcore.hh>

namespace mozart {

//////////
// Cons //
//////////

// Forward declaration of the Type subclass for Cons
class Cons;

// Stuff generated by the generator (don't worry about it now)
#ifndef MOZART_GENERATOR
#include "Cons-implem-decl.hh"
#endif

template <>
class Implementation<Cons> {
public:
  typedef SelfType<Cons>::Self Self;
public:
  // The standard constructor (related to make<Cons>)
  inline
  Implementation(VM vm, UnstableNode* head, UnstableNode* tail);

  // The garbage-collection constructor (always the same signature)
  inline
  Implementation(VM vm, GC gc, Self from);

  // Any method you want, just as in a regular class
  // For example:
  StableNode* getHead() {
    return &_head;
  }

  StableNode* getTail() {
    return &_tail;
  }
private:
  // Any field you want, just as in a regular class
  // For Cons we'll have:
  StableNode _head;
  StableNode _tail;
};

// Stuff generated by the generator (don't worry about it now)
#ifndef MOZART_GENERATOR
#include "Cons-implem-decl-after.hh"
#endif

}

So what does this declaration says? At the C++ level, I guess you figured. It's a template specialization of Implementation<T>, specialized for the type parameter Cons. At the object model level, however, it is much more meaningful!

This piece of code declares a new data type, called Cons. This data type is non-copiable and non-transient (the default). Its type identity can be accessed with Cons::type(). Moreover, it links this data type to a memory representation, which is the Implementation<Cons> class itself, a means of garbage-collect an entity of this type, etc.

Part of this magic is implemented by a rather clever type system over the C++ type system, written as a collection of (variadic) template classes in the core object model headers (store.hh, storage.hh, type.hh and memword.hh). The rest of the magic is just generated automatically by a clang-based generator.

Most of the things generated, you do not care about. They are true boilerplate. But you should know that the actual declaration of the class Cons is entirely generated, and contains the following:

class Cons: public Type {
private:
  typedef SelfType<Cons>::Self Self;
public:
  Cons() : Type("Cons",
                /* copiable  = */ false,
                /* transient = */ false) {}

  /** Type identity of Cons
   *  It is the singleton instance of this class.
   */
  static const Cons* const type() {
    return &RawType<Cons>::rawType;
  }

  inline
  void gCollect(GC gc, RichNode from, StableNode& to) const;

  inline
  void gCollect(GC gc, RichNode from, UnstableNode& to) const;
};

In Implementation<Cons>, there are a few things you probably do not understand yet. That typedef Self, for example. Do not let them bother you, we will explain them in times. For most practical purposes, you can consider Self as an alias for Implementation<Cons>*.

For now, focus on the things you do understand:

  • A constructor, which takes a contextual VM, and the head and tail of the Cons,
  • Two fields, which are StableNode's, for storing the head and the tail,
  • Two accessor methods for the head and the tail.

A pretty regular C++ class, I should say.

This class actually defines the behavior of your data type, entirely. Its memory layout as well as the operations you can call on it.

The implementation of the constructors, as well as any non-trivial method, should be put in a file named datetypes.hh. For the minimal skeleton we showed above, it should contain the following:

#include "datatypes-decl.hh"

namespace mozart {

//////////
// Cons //
//////////

#ifndef MOZART_GENERATOR
#include "Cons-implem.hh"
#endif

Implementation<Cons>::Implementation(VM vm, UnstableNode* head,
                                     UnstableNode* tail) {
  _head.init(vm, *head);
  _tail.init(vm, *tail);
}

Implementation<Cons>::Implementation(VM vm, GC gc, Self from) {
  gc->gcStableNode(from->_head, _head);
  gc->gcStableNode(from->_tail, _tail);
}

}

Again, here you can implement the methods as in any regular class. In the regular constructor, here we initialize the head and tail with the parameters given. Usually nodes are passed through UnstableNode*.

The garbage-collection constructor instructs the GC that it should GC-copy from->_head (resp. from->_tail) into _head (resp. _tail). How it does it, you need not know at that point. Just make sure that, in your GC constructor you:

  • Use gc->gcStableNode and/or gc->gcUnstableNode to copy nodes,
  • Use the regular assignment operator of C++ to copy any other simple data (int, bool, etc.).

The generator

Now you have what you should write by hand. But there are still parts of the code that are missing. The generator will write them for you, but you need to instruct him to do so.

I will not expand on this now. In the experiment application, we have set up CMakeLists.txt so that the generator is run automatically on interfaces.hh, which includes datatypes-decl.hh. Hence, you need not worry about it.

When working in the core aspects of Mozart, you should just modify coreinterfaces.hh. If, e.g., you have named your file cons-decl.hh, at the beginning of coreinterfaces.hh, add:

#include "cons-decl.hh"

and at the end, inside the conditional includes, add:

#include "cons.hh"

The datatypes.hh/cons.hh file should never be read by the generator, so its inclusion must always be protected by #ifndef MOZART_GENERATOR.

How to use your new data type

Now that you have defined your brand new data type, you'll want to use it. You may never instantiate an Implementation<Cons> directly. You must always go through the UnstableNode.make<Cons>() method to do so. Using the as<Cons>() method, you may call any public method of Implementation<Cons> through a node.

#include "interfaces.hh"

UnstableNode head, tail, cons;
head.make<SmallInt>(vm, 5);
tail.make<Atom>(vm, u"nil");
cons.make<Cons>(vm, &head, &tail);

RichNode richCons = cons;
cout << richCons.type()->getName() << endl;

UnstableNode head2(vm, *richCons.as<Cons>().getHead());
RichNode richHead = head2;
cout << richHead.type()->getName() << endl;
cout << richHead.as<SmallInt>().value() << endl;

Memory layout

The machinery of the Object Model takes care of all the details of memory management. But if you care about how exactly a node of type Cons behave in memory, then you simply have the following: the first word in the node is Cons::type(), and the second word in the node is an Implementation<Cons>*. It points to an actual Implementation<Cons> in memory, i.e., to an area with 4 words: 2 for _head and 2 for _tail.

Clone this wiki locally