C++ future and the pointer
published at 20.12.2013 17:21 by Jens Weller
[Update from 23.12.13 in italic]
The last weeks after Meeting C++ 2013 I've been thinking a lot about C++, and also a little bit about pointers. While C++11 brought only little changes for pointers (nullptr f.e.), the semantics and usage of pointers in C++ has changed over the last years.
I'd like to start with the pointer it self. Its simply type* pt = nullptr; with C++11. The pointer it self is a core mechanic of C, hence C++ has not really invented the pointer. And afaik C did neither, but C defines the semantics of the pointer and how to use it for C and C++. Actually the pointer is a variable that does store a value, but an adress in memory. If you dereference the pointer, you can access the value it points to. The pointer it self is a very basic variable, it doesn't know if it points to something useful, and is also not notified if its adress value gets invalidated. With C there is the simple rule that a pointer that has the adress value of 0, does not point to anything, and hence does not contain a valid value. Every other pointer should point to some useful adress in theory, but in practice some pointers either are not initialized correctly or the pointed value runs out of scope.
With C++11, the correct way to initialise a pointer to 0 is to use the keyword nullptr, this enables the compiler to understand, that this pointer is currently 0. Also there is a tradition to mask 0 with NULL or other defines/statements. C++11 replaces this now with nullptr. Also, C++ introduced references, which act as aliases for variables, the advantage is that a reference always has to be initialized, so it should point to something useful when its life starts. Still, a reference is only a dereferenced pointer, so the actual value it references can again run out of scope and hence the reference is not valid anymore. While you can set a pointer to 0, you can't do that with a reference.
But with C++11, and the years that have lead to C++11, things have changed a little bit. While the pointer is a core mechanic, you will rarely see it in modern C++ code written for libraries or applications. Long before C++11, boost had a very helpful set of smart pointer classes, which encapsulate the pointer it self, but expose the core mechanics of it via operator overloading. The smart pointer it self should not be a pointer, but live on the stack or as a member of an object. smart pointers use RAII to solve a problem, that actually is not the pointers fault. When creating memory on the heap, new returns a pointer to this memory allocation, so when ever dynamic memory is needed, also a pointer is needed, to act as sort of a handle to the created object. But the pointer it self is only a simple variable, not knowing anything about ownership or freeing the object on the heap again. The smart pointer takes this role, to own the pointer and free its pointed to heap value, once it goes out of scope. Living on the stack means that when ever the stack is being destroyed, the value on the heap will get freed, even when an exception occurs.
Now, over the years a few different styles have occured in C++, starting by C with Classes and heavily using pointers, to OOP Frameworks such as wxWidgets or Qt. The trend in the last 5-10 years has been towards a style called modern C++, a style that tries to use the language to its full extend, and tries to find out, which feature is useful for what solution or problems. Namely boost has been a leading framework written in this style, and with C++11, the C++ Standard it self tries to follow this style in its library design. Also with this, value semantics have become popular, and are together with move-semantics a key element in the future of C++. So, what got me thinking about pointers at the first place is this slide from Tony van Eerds Meeting C++ keynote. It has two columns, one for reference semantics, and one for value semantics, and it brings the topic down to this catchy phrase:
Oh NO! Pointers! vs. Oh, no pointers!
So, with C++11 (and C++14, ...) and value semantics, the trend goes away from using the pointer. It might be still in the background, but with C++14 even new and delete are not directly used anymore, new gets abstracted into make_shared/make_unique, which use new internally and return a smart pointer. Both shared_ptr and unique_ptr act as value semantic types. The smartpointer also takes care of delete at the end of its scope. This got me thinking, can every usage of a pointer, as it can fill different "roles" in C++, be replaced?
Inheritance and virtual functions
One key use of pointers is to use inheritance in order to have a common interface to a related set of types. I like the shape example to demonstrate this. There is a base class called shape, which has a virtual function called area, which is then implemented in the derived classes Rectangle, Circle and Triangle. Now one can have a container of pointers (e.g. std::vector<Shape*>), that contains pointers to different shape objects, which all know how to calculate the area. This semantic is IMHO the most wide spread usage of pointers in C++, especially when OO is used heavily. Now, the good news is that this still works with smart pointers, as they emulate the pointer, and access it internally. boost even has a pointer container, which frees its content instead of holding elements of smartpointers.
Now, not directly correlated to pointers, virtual function calls (aka dynamic dispatch) are a bit slower and often not as good to use for the optimizer. So, when the types are known at runtime, one simply could use static dispatch or compile time polymorphism to call the correct function at runtime without using virtual in the interface. There is a known pattern called CRTP to implement this behavoir. A recent blogentry showed, that this can gain performance in GCC 4.8, but interestingly the comments state, that with GCC 4.9, the optimizer will able to optimize further also for the dynamic dispatch. But lets get back to the pointer.
The maybe pointer
Sometimes the mechanic of a pointer is used to have a sort of optional value as a parameter to or return value from a function. Often its default is then 0, and the user can choose to hand over a valid pointer to the function. Or in the return case, the function can when returning a null pointer signal that it failed. For the fail case in modern C++ are often used exceptions, but on some embedded platforms exceptions are not working, so its still a valid use case in some areas of C++. Again, the role could be filled with a smart pointer, that would act as a handle to the pointer. But often, this would be an overhead (heap usage), or not really filling the maybe role. This role can be filled with an optional type, that indicates if it holds a valid value, or not. The boost libraries have boost::optional for this, and for some time it looked like that a very similar optional class would be part of the C++14 standard. So, currently std::optional will be moved first into a technical specification (TS) and become part of C++14 or C++1y.
The current standard uses already a sort of optional type, for example std::set::insert will return a pair<iterator,bool>, where the second parameter indicates if the value could be inserted into the set. In case of an iterator returning the end iterator would be a valid alternative, but if returning a value, this role has been in the past often been a pointer, that is 0 when the function could not succeed. So this role could be filled by an optional type:
optional<MyValue> ov = queryValue(42); if(ov) cout << *ov; else cerr << "value could not be retrieved";
So, the optional type, has as the smart pointer type, some of the semantics of a pointer, and fills a certain role. But it is value semantic, and should mostly live on the stack.
While writing down my thoughts on the usage of pointers in C++, I've mostly thought about use cases where the pointer is getting replaced(smart pointers and optional types f.e.) , and overlooked that actually for a few use cases the pointer stays useful. Also thanks for the feedback through reddit, email and social media.
Non owning pointers are such a use case, where pointers will stay useful for the coming years. While shared_ptr has weak_ptr, unique_ptr has no such counter part, so a non-owning raw pointer can take this role. For example in a relation between parent and child objects forming a tree or graph. But in the far future of C++, this role could be filled with exempt_ptr.
Also how to hand values to a function is a case where pointers still can be useful, Herb Sutter has written a very good GotW about this in May. Also Eric Niebler talked in his keynote at Meeting C++ about this, and how move-semantics influence how you should pass and return values:
|small/POD/sink||pass by value|
|all others||pass by const ref|
|Output||return by value|
|Input/Output||non const ref / stateful Algorithm Object|
This table is from Eric Nieblers Keynote, look at slide 16/31 (actually, read all slides)
So, Eric Niebler says, you should enable move-semantics when possible. A sink argument is for example an argument to vector::emplace_back where it could just be moved into the right place. Same for output, by returning by value the compiler can apply move-semantics or copyellision when its useful. For objects with Input/Output mechanics, non const refs are still a valid option, but Eric pointed in his keynote to stateful Algorithm Objects which take a sink argument to their constructor initially.
When passing by (non) const ref, passing as a pointer would do the same, only with the difference, that you should test that argument against nullptr then. I personally favor references over pointers when passing arguments into functions/methods or constructors.
A little earlier I mentioned, that from my view a pointer is just a normal variable, which holds an adress, or to precise mostly an adressnumber from the value it points to. This adressnumber can be manipulated, you can add or subtract from it. This is used to traverse over arrays or to calculate distances between two pointers, which also is mostly useful for arrays. The case of the traversal actually is what an iterator can do, so in generic code, the pointer can fill the role of an iterator. But in my many years as a C++ programmer, I have actually never used arithmetic operations on a pointer it self, so in C++, pointer arithmetic is already very good abstracted. In my opinion its important to understand how pointer arithmetics work, to fully understand what a pointer exactly is and does in code.
In theory, C++ can be used with out pointers, but as they are a core language feature from C and C++, the pointer it self will stay for sure. But its role changes, you don't have to understand how pointers work anymore to use C++. As C++ keeps developing, C++11 and C++14 have moved into a direction of abstraction, and making things easier for the developers. With smart pointers and optional types, the use case for the pointer is either wrapped safely into value semantic types, or even fully replaced by them.