Sunday, December 06, 2020

Beginning Rust

I am translating one of my projects to Rust as a way of learning Rust. To make things more interesting, I am trying to implement my own memory allocators and data structures. After all, in C or C++ that is something I can do easily, so it is worthwhile figuring out how much effort this will be in Rust.

Here is my first piece of code. Lets first look at the C version and then at my attempt to do this in Rust.

struct lexer_state {
	const char *buf;
	size_t bufsize;
	size_t n;
	const char *p;
};  
struct lexer_state *raviX_init_lexer(const char *buf, size_t buflen)
{
	struct lexer_state *ls = (struct lexer_state *)calloc(1, sizeof(struct lexer_state));
	ls->buf = buf;
	ls->bufsize = buflen;
	ls->n = ls->bufsize;
	ls->p = ls->buf;
	return ls;
}
enum { EOZ = -1 }; /* end of stream */
#define cast_uchar(c) cast(unsigned char, c)
static inline int zgetc(struct lexer_state *z) { return z->n-- > 0 ? cast_uchar(*z->p++) : EOZ; }

The goal here is to take a buffer as the input source and return one character at a time, or EOZ when the input is exhausted.

What you can observe above is that the buffer is supplied by the caller, and the C code assumes that the caller will ensure that the buffer is valid as long as lexer_state is active.

Here is my attempt to do this in Rust. Bear in mind that I am new to Rust and this is my first ever Rust code, therefore I may not be doing this the best possible way.

pub struct Source<'a> {
    len: usize,
    bytes: &'a [u8],
    n: usize,
}

pub const EOZ: i32 = -1;

impl<'a> Source<'a> {
    pub fn new(input: &'a str) -> Source {
        Source {
            len: input.len(),
            bytes: input.as_bytes(),
            n: 0,
        }
    }

    pub fn getc(&mut self) -> i32 {
        let ch = if self.n >= self.len {
            EOZ
        } else {
            self.bytes[self.n] as i32
        };
        self.n += 1;
        ch
    }
}

The Rust version takes a string as input, and every time, getc() is called, it returns a byte from the input string, or EOZ if the input is exhausted.

The main difference in the Rust version is that the compiler tracks that the input is being referenced in the Source struct so that it can ensure that the input is valid as long as the Source struct is active.

The Rust code contains lifetime annotations such as 'a which is not something I could handle as a beginner, but fortunately the Visual Studio Code Rust plugin is helpful enough to let me know that the annotation is necessary and also insert it in the right place. I believe the goal of the lifetime annotation is to link the lifetime of the input string to the byte array reference inside the Source struct.

No comments: