protobuf


https://protobuf.dev

Protobuf is a language-independent serialization and deserialization format with code generation for most programming languages and a relatively compact wire (binary) encoded format.

Why?

syntax = "proto3";

package dev.npry.machines.example;

message Vec3 {
    float x = 1;
    float y = 2;
    float z = 3;
}

message IMUReading {
    Vec3 accel = 1;
    Vec3 gyro = 2;
    Vec3 mag = 3;
}

message TempHumReading {
    float temp_c = 1;
    float relative_humidity = 2;
}

message SensorReading {
    uint64 timestamp_epoch_millis = 1;

    oneof reading {
        IMUReading imu = 2;
        TempHumReading temp_hum = 3;
    };
}

Automatically generates something like this:

$ nanopb_generator sensor.proto 
# produces -> sensor.pb.h, sensor.pb.c
// sensor.pb.h
typedef struct {
    float x;
    float y;
    float z;
} vec3; 

typedef struct {
    vec3 accel;
    vec3 gyro;
    vec3 mag;
} imu_reading;

typedef struct {
    float temp_c;
    float relative_humidity;
} temp_hum_reading;

typedef struct {
    uint64_t timestamp_epoch_millis;

    union {
        temp_hum_reading temp_hum;
        imu_reading imu;
    };
} sensor_reading;

#define sensor_reading_fields // encoded field info
// cont'd for other structs

Which lets you write code like this:

// main.c
#include <stdlib.h>
#include <stdio.h>
#include <pb.h>
#include "sensor.pb.h"

int main(void) {
    uint8_t buf[256];
    pb_ostream_t ostream = pb_ostream_from_buffer(buf, 256);

    imu_reading reading = read_imu();

    sensor_reading reading {
        .timestamp_epoch_millis = now(),

        .imu = reading,
        .has_imu = true,
    };

    if !pb_encode(&ostream, sensor_reading_fields, &reading) {
        fprintf(stderr, "failed to encode protobuf: %s\n", PB_GET_ERROR(ostream));
        return EXIT_FAILURE;
    }

    uart_send(buf, ostream.bytes_written);
    uart_receive(buf, 256);

    pb_istream_t istream = pb_istream_from_buffer(buf, 256);
    if !pb_decode(&istream, sensor_reading_fields, &reading) {
        fprintf(stderr, "failed to decode protobuf: %s\n", PB_GET_ERROR(ostream));
        return EXIT_FAILURE;
    }

    if reading.has_temp_hum {
        printf("remote temperature: %fC\n", reading.temp_hum.temp_c);
    }

    return EXIT_SUCCESS;
}

Or in Python, you get classes, something like this (API inexact):

$ protoc --python_out=gen
# betterproto is preferred (get the beta!)
class TempHumReading():
    @property
    def temp_c(self) -> float:
        # ...
    
    # ...

class SensorReading():
    @property
    def timestamp_epoch_millis(self) -> int:
        # ...
    
    @property
    def temp_hum(self) -> Optional[TempHumReading]:
        # ...

    # ...

def main():
    bytes_ = read_serial()
    reading = SensorReading.decode(bytes_)

    print(reading.temp_hum.temp_c)

Or in Rust, Go, Java, JavaScript, C#, C++ -- code generation and bindings exist for every mainstream language, and it means you don't need to rewrite your serialization code everywhere if you need to change something.

But JSON exists, I hear you saying -- yes, but JSON is not compact, efficient, or generally suitable for embedded systems. Any self-describing format (msgpack, BSON, CBOR, RON, etc., etc.) is going to have to pay the overhead at least for string field names, where ProtoBuf does not in the general case.

Similar alternatives include FlatBuffers (also a Google project, more efficient than pb but clunkier) and Cap'nProto (to my knowledge libraries expect a full OS environment / not suitable for embedded).

personal conventions

I typically follow the approach of having one message type per (channel, direction) tuple, discriminated in function by a oneof field.

syntax = "proto3";

package dev.npry.machines.example;

// Unit/the empty tuple. "void" in C-style parlance.
message Unit {}

message Uplink {
    oneof command {
        Unit start_doing_something = 1;
        uint64 do_something_n_times = 2;
    }
}

message Downlink {
    oneof response {
        bool doing_something_status = 1;
        uint64 did_something_times = 2;
    }
}

Uplink and Downlink are also conventional for me when there is a clear vector direction of communication centrality. "Up" is a universal reference regardless of if I'm client/server/receiver/transmitter momentarily or generally. This direction doesn't always exist, but when it does, it's usually absolute.

I generally do not use protobuf service definitions because code generators don't exist for embedded (to my knowledge).