Saturday, August 22, 2015

An LLVM binding for Lua

I announced the development of Ravi back in January 2015. It started as an experiment - I was not certain whether Ravi could achieve the performance levels of LuaJIT, the reigning monarch for Lua JIT compilation.

I am pleased to report that Ravi is able to match LuaJIT's performance for a few selected benchmarks. More details are available here. While this is positive news, there is still much to do to make Ravi competitive in a variety of situations.

LuaJIT offers a powerful FFI interface for interfacing with external libraries and like. This is very convenient for sure, but the approach taken is not compatible with Lua. After some thought I decided that rather than creating an FFI interface for Ravi, a more general capability would be to allow both Lua and Ravi users to write JIT code using LLVM. Work on this has just started so there is not much to show yet, but I hope to make progress fairly quickly.

LLVM is a very low level api - lower level than C. This has its pluses and minuses. On the plus side the LLVM binding will allow Lua and Ravi users to exploit the full power of LLVM. On the minus side even writing trivial functions can be quite some effort.

Friday, April 03, 2015

Memory bugs and finding them

I am working on creating a JIT compiler for Lua and Ravi. Ravi is a Lua derived language with some enhancements to help improve performance. The JIT compilation is being implemented using LLVM.

As I implement more parts of the Lua language I am able to run more of the standard Lua test suites in JIT mode. A few days ago I encountered a nasty bug - one of the test cases failed on Windows with a run-time exception that seemed to imply an invalid or misaligned stack. Now this is a particularly hard bug to find as at runtime there is a mix of code compiled by the MSVC compiler (the Lua C code) and the code generated by LLVM. The LLVM generated code has no debugging information right now as I haven't yet implemented the required metadata. So the problem is how to investigate the issue if the fault is in LLVM generated code.

Confusingly the error only occurred on Windows - but not on Linux, so I was initially led to believe that the error may be due to some issue with how LLVM was using the stack on Windows.

The problem with memory errors is that they are Heisenbugs. Any change you do to investigate the issue such as adding a debug print may help the bug to hide. So investigation is particularly hard.

My initial attempt at finding the root cause was to run the tests under Valgrind. Valgrind reported some possible memory leaks but not in my code - the reported potential leaks were in LLVM code. There were no reports of buffer overflow or memory overwrites.

I guess that if the problem is some portion of the stack being overwritten then Valgrind cannot find it as it is more a tool for analyzing issues with heap allocation.

My next stop was to use Address Sanitizer. This is a tool created by Google engineers - it works by instrumenting the code using some compiler extensions. Fortunately this capability is available in GCC 4.8.2 which is the compiler I am using on Ubuntu. Just by adding a compiler option (-fsanitize=address) one can get instrumented C code. The tool not only checks heap allocations but also the stack for invalid reads/writes.

When I ran the test suite with the instrumented version asan reported an issue and aborted the program. Fortunately, you can just run the application under GDB, and break at the point where asan reports the error. And if the code has appropriate debug information, then GDB will tell you the line of code that caused the error.

To cut a long story short - I found that the memory error was being caused because I had not modified the Lua Debug API to work with JITed code. I had incorrectly assumed that the Debug API is only used if invoked explicitly. When an error is raised in Lua, the error routine uses the Debug API to obtain information about the code running at the time. This relies upon the 'savedpc' field which is not populated in JITed code - so any calls to the Debug API that relies on this can lead to unexpected memory access. The fix I implemented was to treat JITed functions as if they are C functions.

The reason for this post is however that I found Address Sanitizer to be an amazing tool. It is a life saver for C/C++ programmers.  

Sunday, January 25, 2015

Ravi - an attempt to create optional typing for Lua

I am in love with Lua as blogged previously. While it is perfect little language, there is scope for improving the performance of Lua. Obviously great work in this area has already been done by Mike Pall who created Luajit. However, there are some issues with Luajit that are hard to overcome.
  • Large parts of Luajit are written in assembler - which means that it would take significant investment of time and effort to understand how it works and fix issues or make enhancements to it.
  • Mike Pall is undoubtedly a genius, but he is the sole developer of Luajit. The latest version 2.1 has not been released yet as Mike is presumably working on other things as reported on his sponsorship page. So the destiny of Luajit is pretty much tied up with how much effort Mike puts into Luajit.
  • Luajit was based on Lua 5.1, and for good reasons it has stayed compatible with 5.1, avoiding ABI incompatible features in later versions. But this is increasingly going to be a problem as newer versions of Lua introduce new features.
  • Luajit's FFI is great but not compatible with Lua, so any code exploiting FFI is not compatible with Lua. 
So my solution to above is to enhance core Lua to support optional typing so that the VM can use type specific bytecode. This will hopefully help the interpreter performance but more importantly it will enable simple JIT compilation of performance critical functions.

I am naming this new dialect of Lua as Ravi. Full details of the project can be found at the Ravi github site.

Monday, August 25, 2014

Lua is small and small is beautiful

Lua is a tiny language and like C has a tiny standard library. Like many other users of Lua, there are times when I wish it had some language feature (ability to specify types, for example) - but when I think about who Lua is meant for by its designers, I get the logic behind keeping it really small and simple.

Although Lua is powerful enough as a language that it can be used for many complex tasks - see DynASM for an assembler written in Lua - its primary design goal is to provide applications with an extension language. So for example you have a Spreadsheet application, and you wish to allow users to write their own functions they can use in the Spreadsheet. Or you have a Editor and you wish to allow users to customise the editor. And so on. In all these use cases, we cannot assume that the end user who is coding in Lua is a competent programmer. Hence Lua needs to be ultra-simple for such users. Having types in the language, for example, would immediately complicate the syntax of Lua.

That Lua fulfils the needs of its users is evident from the fact that a number of attempts have been made to create a Lua clone that is more powerful as a language - but none of these alternative improved Lua clones have any great following (Note that I exclude LuaJIT from this list as it is 100% faithful to Lua 5.1 so it is another implementation of Lua rather than a clone). I guess the question you have to ask is:

  • Are you trying to create an extension language for ordinary users who are not programmers?

If not then perhaps Lua is not the language you need. 

Tuesday, August 05, 2014

Lua - A fabulous programming language

Last year I discovered Lua and LuaJIT.

These are both amazing implementations of the programming language Lua.

Ok now I need to explain why I think Lua is fabulous and these implementations are amazing.

Lua is a very small, dynamic, scripting language that is extremely easy to learn, and that can be used standalone as well as an embedded scripting language. The well documented and well crafted C API for extending the language is probably one of the best features of the Lua system.

It takes less that a minute to compile and build the Lua language and its basic libraries. Since the language is written in ANSI C, you can virtually build it on any platform.

LuaJIT is a JIT implementation of Lua created by a guy called Mike Pall who is without a doubt a programming wizard. LuaJIT features an interpreter that is hand-crafted in assembler, and has an amazing FFI library that allows easy extensions in C, including creating new data types. LuaJIT comes with an assembler called DynASM that is itself written in Lua.

Neither of these implementations depend on third-party libraries or tools ... which is an amazing thing in today's world (just look at the dependency list of Julia for comparison).

I hope to use Lua extensively in the future, so much so that I decided to learn assembler in order to be able to understand LuaJIT better.

Friday, August 01, 2014

Should you ever embark on a complete rewrite?

I embarked on V2 of my project SimpleDBM about 3 years back. Finally last month I closed down the V2 branch and merged the useful stuff back into the main branch.

V2 was going to be a major refactoring of the system. That is what killed it - because any major refactoring is large amount of effort. One of the best write ups on why no one should ever do this is this article at Joel on Software.

That doesn't mean one should not refactor software - it is just that small incremental changes that are immediately merged and tested with the mainline is the better way to do it.

Sunday, July 27, 2014

Life after Java

After working exclusively in Java for several years, I have been dabbling in C++ for the last year or so.  Question arises - is C++ still a viable language? If Tiobe Index is to be believed C++ has been steadily declining in popularity since about 2005 - coincidentally this was the year I decided to move from C++ to Java for my project SimpleDBM. At the time I stated my reasons for the move in my second blog post.

So what has happened in the meantime and is C++ still a viable language?

The place where I work (my day job) - I introduced Java in the realm of financial risk analytics. I led the team that converted a C++ based application to Java - and in the process we proved that the Java implementation was several times faster. The reason for this was nothing to do with the choice of the language - it was just that with Java you can focus on better algorithms and data structures, rather than fighting the language - which made all the difference in my view.

And yet it is in the realm of numerical computing where C++ is arguably the best language with the exception perhaps of Fortran (of which I have no experience sadly). The main advantages of C++ are:
  • Ability to seamlessly call C++, Fortran and C libraries - a lot of high performance numerical libraries out there are written in these languages.
  • Control of memory layout of data structures.
  • Efficient array access via pointers - and no bounds checking.
  • Templates for generating type specific code.

C++ is still an ugly language with too many features - but the recent changes in C++ 11 have made life tolerable if not completely easy. I have been looking at alternatives such as D, Go, Julia, etc. but haven't found a viable alternative yet. These other languages are either immature or have very restrictive paradigms. JVM based languages such as Scala have the same issues essentially as Java.