A Programmer's Blog

Sunday, June 05, 2022

How not to start with Rust

My attempt to translate an existing C project to Rust stalled after a few days as I struggled to express the concepts that I used so easily in C.

In the C code, I used a bump allocator, and I had data structures that ensured the stability of pointers. Thus I could just reference objects by their pointers everywhere and at the end, all objects were discarded by destroying the allocator.

In Rust expressing this is hard, and actually, impossible without the use of unsafe code. There is no way to convince the Rust compiler that a reference is safe if its stored in multiple places, because while I know that the memory pointed by the reference is held by the allocator and is safe, the compiler cannot see this at compile time, and therefore will not allow you to code this way.

I got bogged down trying to think of ways in which I could achieve what I had in C. This was a bit like trying to run before learning to walk.

Normally I prefer to learn a new programming language by just coding in it. That was one of the mistakes I made with Rust. Rust is one of those languages that you need to spend time learning by reading good books. Despite its popularity, the language specification in The Rust Reference appears to be strangely deficient and unfinished; I struggled to find good description of how lifetime annotation works. The Rust Programming Language book is a good starting point, but doesn't really go into in-depth discussion of any topic. Examples tend to be simplistic too. For instance in the chapter on collections, it wasn't deemed necessary to explain / illustrate how to use user defined types as keys and values in a hash map. A much better book for learning beyond the basics is Programming Rust.

Sunday, December 06, 2020

C Enum equivalent in Rust

In C code we often use enums to represent constants like so:

enum TokenType {
	TOK_OFS = 256,
	TOK_and,
	TOK_break,
	TOK_STRING,

	FIRST_RESERVED = TOK_OFS + 1,
	LAST_RESERVED = TOK_while - TOK_OFS
};

Rust also has an enum type but it is not at all like the C enum. The Rust enum is more like a discriminated union in C.

Superficially though you can almost write something like above in Rust.

enum TokenType {
    TOK_OFS = 256,
    TOK_and,
    TOK_break,
    TOK_STRING,

    FIRST_RESERVED = TOK_OFS + 1,
    LAST_RESERVED = TOK_while - TOK_OFS
}

But this will not compile. Firstly Rust expects explicit conversion from enum to int, so you can try:

enum TokenType {
    TOK_OFS = 256,
    TOK_and,
    TOK_break,
    TOK_STRING,

    FIRST_RESERVED = TOK_OFS as isize + 1,
    LAST_RESERVED = TOK_while as isize - TOK_OFS as isize
}

However this will not work either because an enum in Rust is not really a constant. The enum discriminant value needs to be unique so we cannot have two instances TOK_and and FIRST_RESERVED with the same value.

Perhaps we could try this:

enum TokenType {
    TOK_OFS = 256,
    TOK_and,
    TOK_break,
    TOK_STRING,

}

const FIRST_RESERVED :isize =  TOK_OFS as isize + 1;
const LAST_RESERVED: isize  = TOK_while as isize - TOK_OFS as isize;

This compiles, but the enums are a pain to use as constants because of the need to explicitly convert to integer values.

In the end I ended up doing:

const TOK_OFS: i32 = 256;

const TOK_and: i32 = 257;
const TOK_break: i32 = 258;
const TOK_STRING: i32 = 301;

const FIRST_RESERVED: i32 = TOK_OFS + 1;
const LAST_RESERVED: i32 = TOK_while - TOK_OFS;

Not great but it gives me what I need.

Beginning Rust

I am translating one of my projects to Rust as a way of learning Rust. To make things more interesting, I am trying to implement my own memory allocators and data structures. After all, in C or C++ that is something I can do easily, so it is worthwhile figuring out how much effort this will be in Rust.

Here is my first piece of code. Lets first look at the C version and then at my attempt to do this in Rust.

struct lexer_state {
	const char *buf;
	size_t bufsize;
	size_t n;
	const char *p;
};  
struct lexer_state *raviX_init_lexer(const char *buf, size_t buflen)
{
	struct lexer_state *ls = (struct lexer_state *)calloc(1, sizeof(struct lexer_state));
	ls->buf = buf;
	ls->bufsize = buflen;
	ls->n = ls->bufsize;
	ls->p = ls->buf;
	return ls;
}
enum { EOZ = -1 }; /* end of stream */
#define cast_uchar(c) cast(unsigned char, c)
static inline int zgetc(struct lexer_state *z) { return z->n-- > 0 ? cast_uchar(*z->p++) : EOZ; }

The goal here is to take a buffer as the input source and return one character at a time, or EOZ when the input is exhausted.

What you can observe above is that the buffer is supplied by the caller, and the C code assumes that the caller will ensure that the buffer is valid as long as lexer_state is active.

Here is my attempt to do this in Rust. Bear in mind that I am new to Rust and this is my first ever Rust code, therefore I may not be doing this the best possible way.

pub struct Source<'a> {
    len: usize,
    bytes: &'a [u8],
    n: usize,
}

pub const EOZ: i32 = -1;

impl<'a> Source<'a> {
    pub fn new(input: &'a str) -> Source {
        Source {
            len: input.len(),
            bytes: input.as_bytes(),
            n: 0,
        }
    }

    pub fn getc(&mut self) -> i32 {
        let ch = if self.n >= self.len {
            EOZ
        } else {
            self.bytes[self.n] as i32
        };
        self.n += 1;
        ch
    }
}

The Rust version takes a string as input, and every time, getc() is called, it returns a byte from the input string, or EOZ if the input is exhausted.

The main difference in the Rust version is that the compiler tracks that the input is being referenced in the Source struct so that it can ensure that the input is valid as long as the Source struct is active.

The Rust code contains lifetime annotations such as 'a which is not something I could handle as a beginner, but fortunately the Visual Studio Code Rust plugin is helpful enough to let me know that the annotation is necessary and also insert it in the right place. I believe the goal of the lifetime annotation is to link the lifetime of the input string to the byte array reference inside the Source struct.

Saturday, July 06, 2019

Better late than never: making Linux the main development platform

Although I have been developing on Linux for many years, Windows has always been my primary development environment. Usually I develop first on Windows, and then test on Mac OSX and Linux. But recently as I started to get more into Docker, it has started to make more sense to develop on Linux first.

I have chosen RHEL 7.x as my main development platform. This is unusual choice I guess. The availability of the developer edition made it possible. I like that Red Hat makes available the latest versions of the programming languages via their developer tool sets. On Ubuntu I was always using older versions.

Friday, March 29, 2019

Back from C# to Java

I wanted to write a quick post about my move back from C# to Java.

In 2016 I chose to use C# as the language for a system I was developing. The release of .Net core prompted this move. The key benefits I expected from C# were:

Higher productivity
Ability to generate native executables - unfortunately, this did not materialize as it seems the AOT functionality in .Net Core was not a priority on server platforms
Easier integration with external C libraries
More memory efficient implementation due to support for primitive types in containers, Struct types etc.

Apart from the disappointment with AOT compilation, C# met my expectations. So then why move back to Java?

Well, primarily because of the Java eco-system. Of course Java has improved significantly in terms of developer productivity since Java 8. It even has variable declarations with type inference now. However what sets Java apart is the huge eco-system for server side development, largely because of Apache and Spring projects. With .Net Core I struggled even with basic stuff such as application logging. Maybe this has changed now, but .Net Core 1.0 didn't have any standard way of logging which meant I had to roll out my own.

One thing I am not convinced about is the proliferation of 'async/await' style programming in C#. It just seems wrong that your program will be converted to a state machine. I think if this has to be done, then the approach adopted by Go is better. Anyway, I am digressing.

Thursday, December 22, 2016

SimpleDBM - a NoSQL Transactional DB in Java

The goal of the SimpleDBM project was to primarily teach myself how DBMSes work. In that goal it succeeded I think, and it also was great fun researching all the computer science literature on database technology and applying the techniques invented by great pioneers in this area.

I highlighted some of the sources I used in the implementation of SimpleDBM in this blog post.

Unfortunately due to lack of time I have not been able to devote much time to SimpleDBM in the past few years. So new features are not being implemented, the project is in maintenance mode; that is, I will fix bugs reported.

It is not possible to know if anyone is using SimpleDBM or not. I have not used it in anger in a Production environment so in that sense it is not Production software. However, I think its main value today is pedagogical in that it can be used to understand and learn the traditional techniques used to implement database engines. The implementation is much better documented than any other opensource DBMS I have come across. This is partly because I once thought of writing a book on how to implement a DBMS.

The implementation handles some of the hard problems, such as transactions, write ahead logging, concurrent and recoverable BTREE operations, deadlock detection, etc.

The project is now hosted on GitHub. For anyone just wanting to use it Maven packages are available.

Wednesday, December 07, 2016

What's great about Lua?

Lua is an amazing programming language implementation. I say implementation because it is not just the language itself but how it is implemented that is particularly impressive.

As a programming language, Lua can be characterised as a small but powerful language. The power comes from clever use of a few core meta-mechanisms as Lua authors like to put it. A nice introduction to some of these are in the recent talk by Roberto Lerusalimschy at the Lua Workshop 2016.

I used to think that Lua is a simple language; but appearances are deceptive. I now think of Lua as 'small' and 'powerful' language rather than a 'simple' language.

The language design is clever, but the implementation is what makes it great.

Firstly it is a very compact implementation, just a few C source files, and that's it. No dependencies other than an ANSI C compiler.

Secondly, despite the compact implementation, it features:

A byte-code compiler and Virtual Machine.
An incremental garbage collector.
Extremely fast parser and code generator.
And the language is delivered as a library with an extremely well designed C API, that makes it easy to embed Lua as well as extend it.

It is this combination of economical design and beautiful implementation that makes Lua great.

Lua 5.3 Bytecode Reference

Lua bytecodes are not officially documented as they are considered to be an implementation detail. The best attempt to document Lua bytecodes is the A No-Frills Introduction to Lua 5.1 VM Instructions by Kein-Hong Man. However this document is quite old now and does not reflect the changes made since Lua 5.2.

Some time ago I started an attempt to bring this document up-to-date for Lua 5.3. I recently managed to spend a few hours updating the new Lua 5.3 bytecode reference. This is still not complete but the most important bytecodes are covered.

I want to eventually produce a version that is like a specification, i.e., one that allows independent implementations to replicate the bytecode generation. My interest in this is due to my desire to create a new parser and code generator for Lua.

Thursday, August 25, 2016

Unique problems of a start-up

Well recently I took the plunge and started my own tech start-up after years of thinking about it. So I have been battling the usual things that many start-ups face I guess.

Of great help are inspirational talks such as these.

First is the talk by creator of Ruby on Rails - David Heinemeier Hansson at Startup School 08 where he talks about how to create a successful start-up.

Next is this talk by Walter Bright where he talks about how he started the programming language D, and he describes the way he works.

My journey has only just started. I would love to discuss some of the issues that start-ups face and how I am trying to solve the ones I face.