Samstag, 8. August 2015

Let's, please, not C++

Let's face it, there are basically two "favourite" languages for doing performance-oriented development: C and C++.

I stopped having troubles with C years ago. It was my second (programming) language (after Pascal), I still do the occasional development in it. It's dirty, but not too dirty and --I think-- C has a very good reason to be dirty: Tools that are used to do dirty work tend to get dirty themselves. Show me a masonry chisel without mortar dust and I show a a masonry chisel that isn't used.

C++, on the other hand, is a different monster. I once read about C++ that it includes almost every language feature known to man, but always only the most simple implementation of that feature. I do not agree with that. C++ includes many features, but what it implements might by called the internally most lean version of a feature. That is not the same as "simple" and it is certainly not the same as "easy to use".

The problem seems to me, that those seemingly simple (i.e. internally lean) implementations are usually leaking badly and the interaction with other language features typically adds layers of complexity that can slow you down to a crawl. I call this "pile design": "Just dump everything over there". You can make C++ do almost everything, but need a lot of lubricant (both in terms of time and code) to make C++' cogs turn properly. And even if the machine seemingly runs alright, there is always another corner case that you did not think of.

Whenever you want to do so something "right" in C++, you also seem to immediately need a serious pile of workaround engineering. By this, I mean every piece of questionable code that needs to be written to make the core code behave properly according to user expectations.

I do judge a language by the amount of time I spend making my code adhere to the concepts of the language, which includes --for example-- the behaviour patterns expected from standard library functionality. And in C++ this amount of time is very much unproportionally large. And it usually means, that either the language or the standard library (or both) are misdesigned.

Well, yes, C++ is misdesigned. Its language designers seem to constantly make a specific mistake: They mistake internal conceptual leanness with external conceptual leanness. Internal conceptual leanness is achieved when --inside your own code-- you try to use as few different concepts as possible. This --often enough-- also includes writing as little code as possible, but --most importantly-- it means that somebody reading your code does not have to read the full language design part of the computer library to understand it. External conceptual leanness means that you expose as little complexity as possible.

So internal leanness is a combination of
  • fast (performance wise) algorithms
  • efficiency/simplicity for the tool/language creator
  • limited number of internal concepts
  • and so on
External leanness is a combination of
  • Easy access to algorithms
  • simple APIs without over-complex dependencies
  • limited number of user-visible concepts
The goal in every design (language or software) is to have as much possible external conceptual leanness without sacrificing features and keeping internal conceptual complexity low enough to be manageable. The trick (and the glory) in the design is to manage internal and external leanness which are quite often at odds.

From my personal experience:  Designs by committee are never externally lean.

And C++ fails almost spectacularly to be externally lean. In many parts of the language itself and of the standard library, it tries very hard to be internally lean (down to really, really bad variable names, which is the dark side of internal leanness), for example by exposing internals of library functionality that should better be wrapped (to make it externally lean).

Let me give you an example: You want an indexable collection that mimics an STL container but allows listening to modifications, e.g. when you call insert(), you want notify_before_add() (or similar) to be called.

If you are a C++ programmer, you might already have guessed, that the above collection is quite tricky to get right. First of all, a proper STL container need a real load of annoying boilerplate code to work properly. Which is bad in itself, but given the functionality provided it might be acceptable. It still is a failure towards external leanness (it is actually does for internal leanness, because it keeps code duplication in the STL low).
However, remember that some (those that return at least non-const forward iterators) of the STL containers pass back references which you can modify in place. Nicely enough, you cannot test for forward iterators.

std::set's iterator type changes from bidirectional to constant bidirectional in C++11.  The documentation goes to great lengths telling you that you cannot modify an element inside a set, but downgrading the iterator to its non-mutable type was probably simply overlooked in earlier versions. I can imagine, tracking all of the exceptions to your own rules can get you lost.

For example,

    std::vector<int> v;
    v.push_back(1000);
    *(v.begin()) = 2000;

is a perfectly valid operation to modify the first element of an std::vector.
So when we want to be notified of that change, (interpreting it as either a remove followed by an add (or an actual replace, if you have a notification for that), you need to wrap the iterators. All four of them, because you cannot make assumptions.

This is not, too bad (you need to do so in Java, too). My point in language misdesign is, however, the externalizing of the lvalue reference to the internal object. Most of the sequential STL collections allows you to in-place modify the object inside the collection, which creates a real load of complexity should you ever try to wrap a collection.

Side note: Fortunately, you cannot modify std::map's keys in place, that would be really awkward. So this

    std::map<int, std::string> m;
    std::get<0>(*m.begin()) = 1;

does not compile.

Let me show you the problem with operator[]. Every language I have seen so far gets the fact right, that --when you allow to overload the index operator-- you really need to provide two operations: get and set. C# does it (this[]), Python does it (__setitem__(), __getitem__()), Ruby ([] and []=) and so on.

Guess which language decided that a[i] = b can be decomposed into two operations, namely "auto temp = a[i]; temp = b;" and hence, you only need the accessor part of [], because that is so much simpler (internally)? 

Yeah, of course.
If you are a Python and/or C# developer, you might be inclined to protest that

a[i] = x;

and


value_type &temp = a[i];
temp = x;

are two very different things. Yes, in your (and in my) world  that makes perfect sense.
But this is C++ and C++ often seems to make a point to be different. Different as in not externally lean. In C++,

a[i] = x;

means "copy the contents of x into the object at the ith position inside a". Since the object (i.e. memory) at a[i] does not move, from C++' perspective, the collection's contents have not been modified. So in C++, unlike everywhere else I know of, a[i] = x is not considered a modification to the container.
I think it is those little annoyances of being different for the sake of internal leanness that makes C++ suck so often.
Hence in C++, if you want the notifying collection, you need some kind of descriptor to act as the returned reference:
template<typename C, typename Ref=typename C::reference>
class Descriptor {
public:
private:
  C &_collection;
  reference _ref;

public:
   Descriptor(C &collection, Ref ref)
     : _collection(collection), _ref(ref)
   {
   }

   template<typename T>

   operator=(T value)
   {
      _collection.notify_before_remove(_ref);
      _ref = value;
      _collection.notify_after_added(value);     
      return *this;
   }

    operator Ref()
    {
       return _ref;
    }
};

template<C=typename std::vector<T>>
class Array {
public:
   typedef typename C::value_type value_type;
  typedef Descriptor<Array<C>, typename C::reference> reference;
   // and quite a few more

private:
   C _items;

public:
   Descriptor<Array<C>, typename C::reference>>
   operator[](size_t n)
   {
      return Descriptor<Array<C>, typename C::reference>>(*this, _items[n]);
   }
}

Of course, we are going against C++ implied semantics here. For the standary library containers, everything you stuff in there is just a memory location that it manages. It does not care what you do with that memory, as long as you provide the standard library with sufficient tools (copy and move constructors, copy-assignment and move-assigment operators) to shuffle around the contents of the managed memory to a new location if needed.

<functional> offers reference_wrapper<> which also covers callables. As usual: Caveat emptor for the overrider since nothing is virtual.