C generics and preprocessor madness

Date: 2025-06-18
Edited: 2025-06-27

One of the features I miss most when writing C is the complete lack of any sort of sane syntax for generics. Yes, the lack of memory safety is a problem, but for the sorts of hobby projects I end up using C for it’s less of a big deal.

Realistically, I should just use C++, but that’s the boring option. I’ve debated writing a compiler that adds some extensions to C, like templates, sum types, pattern matching, and such, but that’s a lot of work (and who in their right mind would use or standardize my hobby compiler?). You’re also effectively reinventing C++ at that point.

So, I’m left with the features that come inbuilt with C.

Void pointers

Void pointers are the first thing that comes to mind when writing a generic C datastructure. You could implement a generic vector as follows:


struct vector 
    void *data;
    size_t num_elements;
    size_t alloc_size;
    size_t element_size;
;

int vector_new(struct vector *vec, size_t element_size);
int vector_push(struct vector *vec, void *elem);
int vector_pop(struct vector *vec);
void vector_delete(struct vector *vec);

Of course, vector_new could return a pointer to a newly-allocated vector if you wish to hide the internal implementation of the vector, but you get the idea.

This has some downsides. First, the type erasure.

There’s nothing about the type struct vector that tells you what type it holds, you have to communicate that with comments or variable names. If you change the type it holds, the compiler isn’t going to provide errors telling you what to fix. You could grep for variable names/comments, but even that is error-prone; what if you misname a variable or don’t add a comment saying the type of the vector?

It’s also not typesafe. There’s nothing stopping you from storing multiple types of the same size in this vector, either on purpose or, more likely, by accident.

There’s also a small overhead for storing the element size with the vector, but that’s admittedly pretty minimal.

The benefit of this approach is that the code to implement it is pretty normal and pretty readable. A bit annoying, sure, the constant calls to memcpy() are a pain, but still readable. I’m guilty of using this approach occasionally when I can’t be bothered to do things “properly”.

Multiple includes

Another approach to generics is to intentionally include a file multiple times. You require users to predefine a macro, say VECTOR_TYPE, before including the file containing the vector file. You can then do something like this:

Usage:


#define VECTOR_TYPE int
#include "vector.h"

VECTOR(int) new_vec;
VECTOR_NEW(int, &new_vec);
int num = 1234;
VECTOR_PUSH(int, &new_vec, &num);
VECTOR_POP(int, &new_vec);
/* ... */
VECTOR_DELETE(int, &new_vec);

vector.h:


#ifndef VECTOR_TYPE
#error "Must define VECTOR_TYPE to the desired type"
#endif

#define VECTOR(TYPE) struct vector_##TYPE
#define VECTOR_NEW(TYPE, VECPTR) vector_new_##TYPE(VECPTR)
#define VECTOR_PUSH(TYPE, VECPTR, ELEMPTR) vector_push_##TYPE(VECPTR, ELEMPTR)
// ...etc

static inline int vector_new_##VECTOR_TYPE(VECTOR(VECTOR_TYPE) *vec)

    /* implementation */


// ...etc


#undef VECTOR_TYPE

This works fine. It is typesafe, although it does require that you pass the type to each function you call, which is annoying.

You also have to do the whole #define and #include nonsense for each type you want to use within a single file. Again, annoying, but manageable.

A related approach is adding back the include guards and using a macro called something like VECTOR_DEFINE_FUNCTIONS that takes the vector type as an argument and expands to the definitions of the proper functions.

This latter approach is a bit more flexible, as you could have separate macros for defining function declarations and definitions, which could be useful. The downside of the latter approach is multiline macro definitions are horrendously ugly (as you’ll later see).

Another downside to this approach is that it defines a bunch of functions which might be a bit of a hazard if you ever define any function with the same name. This issue is probably avoidable if you call each function something like vector_push_internal_##TYPE or make the function names all-caps.

The final flaw of this approach is the requirement that your types be a single identifier. As someone who doesn’t usually typedef their structs, this is the biggest (and likely pettiest) issue for me.

typeof() and Statement-Expressions

A fair warning before we start - typeof() is a C23 feature, and before that was supported as an extension by GCC, Clang, and probably some other compilers. If you’re using MSVC, you may be out of luck. The same goes for statement-expressions, except they’re not even standardized.

typeof() is similar to sizeof() in that it operates on the type of an expression; specifically, it evaluates to the type of a given expression.

Statement-expressions are described here.

I won’t really show an implementation here, as the implementations are annoying and this blog was mostly an excuse to test a macro expander I’m developing. Here’s a project of mine that uses this form of generics extensively, if you’d like to take a look, though.

Statement-expressions allow you to have local scoping and to effectively return variables from a macro. They are still quite flawed; any internal variables will need to be named specially to avoid name collisions. But, they allow you to write code that looks as follows:


VECTOR_TYPE(int) newvec;
vector_new(newvec);
vector_push(newvec, 123);
vector_pop(newvec);
vector_delete(newvec);

(The actual code I’ve linked is a bit more complicated because I wanted to provide a facility for the use of a generic allocator.)

Note the fact that pointers aren’t needed as these are macros, rather than functions. You can even pass these vectors around between functions, something like int func(VECTOR_TYPE(int) *vector); will do the trick.

This approach is also typesafe; effectively, you specify a type when creating the struct for a given datastructure, and then write some generic macros that use typeof() to figure out what type the datastructure passed in was and act accordingly.

The major downside of this is, as stated before, the ugly implementations. The issue of name collisions means that variable names look horrible, and multiline macros don’t make things any better.

Seriously, I can’t quite convey how soul-draining implementing some of these was. If something goes wrong with your implementation, the error messages are damn near unreadable, and the code is incredibly ugly.

Another downside is bad interop with C++ code. The article above describes this in detail. Then again, if you’re using C++, you have templates anyways.

I don’t think this approach is viable for complicated datastructures (at least, not if I want to keep my sanity intact), but for simple things (vectors, hashmaps, heaps, deques, etc), it’s fine.

The syntax for using the “functions” made by this approach is perfect, though. It’s kinda similar to actual C++ templates, you even get pseudo-references since macros don’t need pointers to work. I can’t decide if pseudo-references are an awesome feature or just incredibly evil.