Sunday, December 06, 2020

C Enum equivalent in Rust

In C code we often use enums to represent constants like so:

enum TokenType {
	TOK_OFS = 256,
	TOK_and,
	TOK_break,
	TOK_STRING,

	FIRST_RESERVED = TOK_OFS + 1,
	LAST_RESERVED = TOK_while - TOK_OFS
};

Rust also has an enum type but it is not at all like the C enum. The Rust enum is more like a discriminated union in C.

Superficially though you can almost write something like above in Rust.

enum TokenType {
    TOK_OFS = 256,
    TOK_and,
    TOK_break,
    TOK_STRING,

    FIRST_RESERVED = TOK_OFS + 1,
    LAST_RESERVED = TOK_while - TOK_OFS
}

But this will not compile. Firstly Rust expects explicit conversion from enum to int, so you can try:

enum TokenType {
    TOK_OFS = 256,
    TOK_and,
    TOK_break,
    TOK_STRING,

    FIRST_RESERVED = TOK_OFS as isize + 1,
    LAST_RESERVED = TOK_while as isize - TOK_OFS as isize
}

However this will not work either because an enum in Rust is not really a constant. The enum discriminant value needs to be unique so we cannot have two instances TOK_and and FIRST_RESERVED with the same value.

Perhaps we could try this:

enum TokenType {
    TOK_OFS = 256,
    TOK_and,
    TOK_break,
    TOK_STRING,

}

const FIRST_RESERVED :isize =  TOK_OFS as isize + 1;
const LAST_RESERVED: isize  = TOK_while as isize - TOK_OFS as isize;

This compiles, but the enums are a pain to use as constants because of the need to explicitly convert to integer values.

In the end I ended up doing:

const TOK_OFS: i32 = 256;

const TOK_and: i32 = 257;
const TOK_break: i32 = 258;
const TOK_STRING: i32 = 301;

const FIRST_RESERVED: i32 = TOK_OFS + 1;
const LAST_RESERVED: i32 = TOK_while - TOK_OFS;

Not great but it gives me what I need.

Beginning Rust

I am translating one of my projects to Rust as a way of learning Rust. To make things more interesting, I am trying to implement my own memory allocators and data structures. After all, in C or C++ that is something I can do easily, so it is worthwhile figuring out how much effort this will be in Rust.

Here is my first piece of code. Lets first look at the C version and then at my attempt to do this in Rust.

struct lexer_state {
	const char *buf;
	size_t bufsize;
	size_t n;
	const char *p;
};  
struct lexer_state *raviX_init_lexer(const char *buf, size_t buflen)
{
	struct lexer_state *ls = (struct lexer_state *)calloc(1, sizeof(struct lexer_state));
	ls->buf = buf;
	ls->bufsize = buflen;
	ls->n = ls->bufsize;
	ls->p = ls->buf;
	return ls;
}
enum { EOZ = -1 }; /* end of stream */
#define cast_uchar(c) cast(unsigned char, c)
static inline int zgetc(struct lexer_state *z) { return z->n-- > 0 ? cast_uchar(*z->p++) : EOZ; }

The goal here is to take a buffer as the input source and return one character at a time, or EOZ when the input is exhausted.

What you can observe above is that the buffer is supplied by the caller, and the C code assumes that the caller will ensure that the buffer is valid as long as lexer_state is active.

Here is my attempt to do this in Rust. Bear in mind that I am new to Rust and this is my first ever Rust code, therefore I may not be doing this the best possible way.

pub struct Source<'a> {
    len: usize,
    bytes: &'a [u8],
    n: usize,
}

pub const EOZ: i32 = -1;

impl<'a> Source<'a> {
    pub fn new(input: &'a str) -> Source {
        Source {
            len: input.len(),
            bytes: input.as_bytes(),
            n: 0,
        }
    }

    pub fn getc(&mut self) -> i32 {
        let ch = if self.n >= self.len {
            EOZ
        } else {
            self.bytes[self.n] as i32
        };
        self.n += 1;
        ch
    }
}

The Rust version takes a string as input, and every time, getc() is called, it returns a byte from the input string, or EOZ if the input is exhausted.

The main difference in the Rust version is that the compiler tracks that the input is being referenced in the Source struct so that it can ensure that the input is valid as long as the Source struct is active.

The Rust code contains lifetime annotations such as 'a which is not something I could handle as a beginner, but fortunately the Visual Studio Code Rust plugin is helpful enough to let me know that the annotation is necessary and also insert it in the right place. I believe the goal of the lifetime annotation is to link the lifetime of the input string to the byte array reference inside the Source struct.