tag:blogger.com,1999:blog-205471172024-03-13T23:06:22.308+00:00A Programmer's BlogDibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.comBlogger116125tag:blogger.com,1999:blog-20547117.post-55985141414034921642022-06-05T22:35:00.000+01:002022-06-05T22:35:38.732+01:00How not to start with Rust<p>My attempt to translate an existing C project to Rust stalled after a few days as I struggled to express the concepts that I used so easily in C.</p>
<p>In the C code, I used a bump allocator, and I had data structures that ensured the stability of pointers. Thus I could just reference objects by their pointers everywhere and at the end, all objects were discarded by destroying the allocator.<p>
<p>In Rust expressing this is hard, and actually, impossible without the use of unsafe code. There is no way to convince the Rust compiler that a reference is safe if its stored in multiple places, because while I know that the memory pointed by the reference is held by the allocator and is safe, the compiler cannot see this at compile time, and therefore will not allow you to code this way.<p>
<p>I got bogged down trying to think of ways in which I could achieve what I had in C. This was a bit like trying to run before learning to walk.</p>
<p>Normally I prefer to learn a new programming language by just coding in it. That was one of the mistakes I made with Rust. Rust is one of those languages that you need to spend time learning by reading good books. Despite its popularity, the language specification in <a href="https://doc.rust-lang.org/reference/introduction.html">The Rust Reference</a> appears to be strangely deficient and unfinished; I struggled to find good description of how lifetime annotation works. <a href="https://doc.rust-lang.org/book/">The Rust Programming Language</a> book is a good starting point, but doesn't really go into in-depth discussion of any topic. Examples tend to be simplistic too. For instance in the chapter on collections, it wasn't deemed necessary to explain / illustrate how to use user defined types as keys and values in a hash map. A much better book for learning beyond the basics is <a href="https://www.oreilly.com/library/view/programming-rust-2nd/9781492052586/">Programming Rust</a>.</p>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-58574405608096385712020-12-06T12:36:00.000+00:002020-12-06T12:36:21.701+00:00C Enum equivalent in Rust<p>In C code we often use enums to represent constants like so:</p>
<pre>
enum TokenType {
TOK_OFS = 256,
TOK_and,
TOK_break,
TOK_STRING,
FIRST_RESERVED = TOK_OFS + 1,
LAST_RESERVED = TOK_while - TOK_OFS
};
</pre>
<p>Rust also has an enum type but it is not at all like the C enum. The Rust enum is more like a discriminated union in C.<p>
<p>Superficially though you can almost write something like above in Rust.</p>
<pre>
enum TokenType {
TOK_OFS = 256,
TOK_and,
TOK_break,
TOK_STRING,
FIRST_RESERVED = TOK_OFS + 1,
LAST_RESERVED = TOK_while - TOK_OFS
}
</pre>
<p>But this will not compile. Firstly Rust expects explicit conversion from enum to int, so you can try:</p>
<pre>
enum TokenType {
TOK_OFS = 256,
TOK_and,
TOK_break,
TOK_STRING,
FIRST_RESERVED = TOK_OFS as isize + 1,
LAST_RESERVED = TOK_while as isize - TOK_OFS as isize
}
</pre>
<p>However this will not work either because an enum in Rust is not really a constant. The enum discriminant value needs to be unique so we cannot have two instances <code>TOK_and</code> and <code>FIRST_RESERVED</code> with the same value.</p>
<p>Perhaps we could try this:</p>
<pre>
enum TokenType {
TOK_OFS = 256,
TOK_and,
TOK_break,
TOK_STRING,
}
const FIRST_RESERVED :isize = TOK_OFS as isize + 1;
const LAST_RESERVED: isize = TOK_while as isize - TOK_OFS as isize;
</pre>
<p>This compiles, but the enums are a pain to use as constants because of the need to explicitly convert to integer values.</p>
<p>In the end I ended up doing:</p>
<pre>
const TOK_OFS: i32 = 256;
const TOK_and: i32 = 257;
const TOK_break: i32 = 258;
const TOK_STRING: i32 = 301;
const FIRST_RESERVED: i32 = TOK_OFS + 1;
const LAST_RESERVED: i32 = TOK_while - TOK_OFS;
</pre>
<p>Not great but it gives me what I need.</p>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-8172404450433343172020-12-06T11:41:00.002+00:002020-12-06T12:36:36.324+00:00Beginning Rust<p>I am translating one of my projects to Rust as a way of learning Rust. To make things more interesting, I am trying to implement my own memory allocators and data structures. After all, in C or C++ that is something I can do easily, so it is worthwhile figuring out how much effort this will be in Rust.</p>
<p>Here is my first piece of code. Lets first look at the C version and then at my attempt to do this in Rust.</p>
<pre>
struct lexer_state {
const char *buf;
size_t bufsize;
size_t n;
const char *p;
};
struct lexer_state *raviX_init_lexer(const char *buf, size_t buflen)
{
struct lexer_state *ls = (struct lexer_state *)calloc(1, sizeof(struct lexer_state));
ls->buf = buf;
ls->bufsize = buflen;
ls->n = ls->bufsize;
ls->p = ls->buf;
return ls;
}
enum { EOZ = -1 }; /* end of stream */
#define cast_uchar(c) cast(unsigned char, c)
static inline int zgetc(struct lexer_state *z) { return z->n-- > 0 ? cast_uchar(*z->p++) : EOZ; }
</pre>
<p>The goal here is to take a buffer as the input source and return one character at a time, or EOZ when the input is exhausted.</p>
<p>What you can observe above is that the buffer is supplied by the caller, and the C code assumes that the caller will ensure that the buffer is valid as long as <code>lexer_state</code> is active.</p>
<p>Here is my attempt to do this in Rust. Bear in mind that I am new to Rust and this is my first ever Rust code, therefore I may not be doing this the best possible way.<p>
<pre>
pub struct Source<'a> {
len: usize,
bytes: &'a [u8],
n: usize,
}
pub const EOZ: i32 = -1;
impl<'a> Source<'a> {
pub fn new(input: &'a str) -> Source {
Source {
len: input.len(),
bytes: input.as_bytes(),
n: 0,
}
}
pub fn getc(&mut self) -> i32 {
let ch = if self.n >= self.len {
EOZ
} else {
self.bytes[self.n] as i32
};
self.n += 1;
ch
}
}
</pre>
<p>The Rust version takes a string as input, and every time, <code>getc()</code> is called, it returns a byte from the input string, or EOZ if the input is exhausted.<p>
<p>The main difference in the Rust version is that the compiler tracks that the input is being referenced in the <code>Source</code> struct so that it can ensure that the input is valid as long as the <code>Source</code> struct is active.</p>
<p>The Rust code contains lifetime annotations such as <code>'a</code> which is not something I could handle as a beginner, but fortunately the Visual Studio Code Rust plugin is helpful enough to let me know that the annotation is necessary and also insert it in the right place. I believe the goal of the lifetime annotation is to link the lifetime of the input string to the byte array reference inside the <code>Source</code> struct.</p>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-62147291815384197422019-07-06T01:12:00.001+01:002019-07-06T01:12:50.621+01:00Better late than never: making Linux the main development platform<div dir="ltr" style="text-align: left;" trbidi="on">
Although I have been developing on Linux for many years, Windows has always been my primary development environment. Usually I develop first on Windows, and then test on Mac OSX and Linux. But recently as I started to get more into Docker, it has started to make more sense to develop on Linux first.<br />
<br />
I have chosen RHEL 7.x as my main development platform. This is unusual choice I guess. The availability of the developer edition made it possible. I like that Red Hat makes available the latest versions of the programming languages via their developer tool sets. On Ubuntu I was always using older versions.<br />
<br />
</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-53861954699827578162019-03-29T22:08:00.001+00:002019-03-29T22:09:58.447+00:00Back from C# to Java<div dir="ltr" style="text-align: left;" trbidi="on">
I wanted to write a quick post about my move back from C# to Java.<br />
<br />
In 2016 I chose to use C# as the language for a system I was developing. The release of .Net core prompted this move. The key benefits I expected from C# were:<br />
<ul style="text-align: left;">
<li>Higher productivity</li>
<li>Ability to generate native executables - unfortunately, this did not materialize as it seems the AOT functionality in .Net Core was not a priority on server platforms</li>
<li>Easier integration with external C libraries</li>
<li>More memory efficient implementation due to support for primitive types in containers, Struct types etc.</li>
</ul>
<div>
Apart from the disappointment with AOT compilation, C# met my expectations. So then why move back to Java?</div>
<div>
<br /></div>
<div>
Well, primarily because of the Java eco-system. Of course Java has improved significantly in terms of developer productivity since Java 8. It even has variable declarations with type inference now. However what sets Java apart is the huge eco-system for server side development, largely because of Apache and Spring projects. With .Net Core I struggled even with basic stuff such as application logging. Maybe this has changed now, but .Net Core 1.0 didn't have any standard way of logging which meant I had to roll out my own.<br />
<br />
One thing I am not convinced about is the proliferation of 'async/await' style programming in C#. It just seems wrong that your program will be converted to a state machine. I think if this has to be done, then the approach adopted by Go is better. Anyway, I am digressing.</div>
</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-82809612341125879752016-12-22T14:42:00.000+00:002016-12-22T14:42:08.700+00:00SimpleDBM - a NoSQL Transactional DB in Java<div dir="ltr" style="text-align: left;" trbidi="on">
The goal of the <a href="http://simpledbm.org/">SimpleDBM</a> project was to primarily teach myself how DBMSes work. In that goal it succeeded I think, and it also was great fun researching all the computer science literature on database technology and applying the techniques invented by great pioneers in this area.<br />
<br />
I highlighted some of the sources I used in the implementation of SimpleDBM in <a href="http://trycatchfinally.blogspot.co.uk/2009/03/dwarfs-standing-on-shoulders-of-giants.html">this blog post</a>.<br />
<br />
Unfortunately due to lack of time I have not been able to devote much time to SimpleDBM in the past few years. So new features are not being implemented, the project is in maintenance mode; that is, I will fix bugs reported.<br />
<br />
It is not possible to know if anyone is using SimpleDBM or not. I have not used it in anger in a Production environment so in that sense it is not Production software. However, I think its main value today is pedagogical in that it can be used to understand and learn the traditional techniques used to implement database engines. The implementation is <a href="https://simpledbm.readthedocs.io/en/latest/index.html">much better documented</a> than any other opensource DBMS I have come across. This is partly because I once thought of writing a book on how to implement a DBMS. <br />
<br />
The implementation handles some of the hard problems, such as transactions, write ahead logging, concurrent and recoverable BTREE operations, deadlock detection, etc. <br />
<br />
The project is now <a href="https://github.com/dibyendumajumdar/simpledbm">hosted on GitHub</a>. For anyone just wanting to use it Maven packages are available.</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-50006041552924055712016-12-07T23:31:00.000+00:002016-12-07T23:31:42.498+00:00What's great about Lua?<div dir="ltr" style="text-align: left;" trbidi="on">
<a href="http://www.lua.org/">Lua</a> is an amazing programming language implementation. I say implementation because it is not just the language itself but how it is implemented that is particularly impressive.<br />
<br />
As a programming language, Lua can be characterised as a small but powerful language. The power comes from clever use of a few core <i>meta-mechanisms</i> as Lua authors like to put it. A nice introduction to some of these are in the recent talk by Roberto Lerusalimschy at <a href="https://www.youtube.com/watch?v=QystqRlz6bw">the Lua Workshop 2016</a>.<br />
<br />
I used to think that Lua is a simple language; but appearances are deceptive. I now think of Lua as 'small' and 'powerful' language rather than a 'simple' language.<br />
<br />
The language design is clever, but the implementation is what makes it great.<br />
<br />
Firstly it is a very compact implementation, just a few C source files, and that's it. No dependencies other than an ANSI C compiler.<br />
<br />
Secondly, despite the compact implementation, it features:<br />
<br />
<ul style="text-align: left;">
<li>A byte-code compiler and Virtual Machine.</li>
<li>An incremental garbage collector.</li>
<li>Extremely fast parser and code generator.</li>
<li>And the language is <a href="http://queue.acm.org/detail.cfm?id=1983083">delivered as a library</a> with an extremely well designed C API, that makes it easy to embed Lua as well as extend it.</li>
</ul>
<div>
It is this combination of economical design and beautiful implementation that makes Lua great.</div>
</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-36120295142054275562016-12-07T23:02:00.000+00:002016-12-07T23:38:00.716+00:00Lua 5.3 Bytecode Reference<div dir="ltr" style="text-align: left;" trbidi="on">
<a href="http://www.lua.org/">Lua</a> bytecodes are not officially documented as they are considered to be an implementation detail. The best attempt to document Lua bytecodes is the <a href="http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf">A No-Frills Introduction to Lua 5.1 VM Instructions</a> by Kein-Hong Man. However this document is quite old now and does not reflect the changes made since Lua 5.2.<br />
<br />
Some time ago I started an attempt to bring this document up-to-date for Lua 5.3. I recently managed to spend a few hours updating the <a href="https://github.com/dibyendumajumdar/ravi/blob/master/readthedocs/lua_bytecode_reference.rst">new Lua 5.3 bytecode reference</a>. This is still not complete but the most important bytecodes are covered.<br />
<br />
I want to eventually produce a version that is like a specification, i.e., one that allows independent implementations to replicate the bytecode generation. My interest in this is due to my desire to create a new parser and code generator for Lua. </div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-20111024843395640472016-08-25T23:11:00.001+01:002016-08-25T23:14:51.564+01:00Unique problems of a start-up<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
Well recently I took the plunge and started my <a href="http://www.redukti.com/">own tech start-up</a> after years of thinking about it. So I have been battling the usual things that many start-ups face I guess. </div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
Of great help are inspirational talks such as these. </div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
First is the talk by creator of Ruby on Rails - David Heinemeier Hansson at Startup School 08 where he talks about how to create a successful start-up.</div>
<div style="text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/0CDXJ6bMkMY/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/0CDXJ6bMkMY?feature=player_embedded" width="320"></iframe></div>
<div>
<br /></div>
<div>
Next is this talk by Walter Bright where he talks about how he started the programming language D, and he describes the way he works.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/0os8giTbQkk/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/0os8giTbQkk?feature=player_embedded" width="320"></iframe></div>
<div>
<br /></div>
<div>
My journey has only just started. I would love to discuss some of the issues that start-ups face and how I am trying to solve the ones I face.</div>
</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-71340144854718081362016-08-19T21:52:00.001+01:002016-08-25T23:15:44.850+01:00I love the new Microsoft!<div dir="ltr" style="text-align: left;" trbidi="on">
I have always been a Java person ... until now. With the new cross platform open source CLR (.Net) platform from Microsoft, I am doing exciting new work in C#!<br />
<br />
I also love the <a href="https://msdn.microsoft.com/en-gb/commandline/wsl/faq">Linux subsystem on Windows 10</a> - which I installed today. I was able to build my Lua derived scripting language <a href="http://ravilang.org/">Ravi</a> using Linux tools right within Windows. I did not enable LLVM but that is next on my list to try. If this really works, then I can decommission my Linux Virtual Machine.<br />
<br />
I also love that Microsoft's made the Visual Studio Community edition free - and this is the full version and not a cut-down version. And the new Visual Studio Code editor is great - checkout my <a href="https://marketplace.visualstudio.com/items?itemName=ravilang.ravi-debug">Lua/Ravi 5.3 debugger extension</a> for it!<br />
<br />
All in all - great stuff, Microsoft! </div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-4285251572048873232016-07-29T01:24:00.003+01:002016-07-29T01:27:28.607+01:00Never been as good as now for creating software<div dir="ltr" style="text-align: left;" trbidi="on">
When I started building software 25 years ago, one had to pay for everything, even a basic C compiler was not free. There was no internet, no Linux, no OpenSource.<br />
<br />
The computing world has really moved on.<br />
<br />
Now when one starts a project there is a whole range of OpenSource building blocks one can choose from. And really cool stuff too. In no particular order here are some of the projects that have excited me in recent times:<br />
<br />
<ul style="text-align: left;">
<li><a href="https://github.com/dotnet/coreclr">CoreCLR</a></li>
<li><a href="http://www.grpc.io/">grpc</a></li>
<li><a href="https://developers.google.com/protocol-buffers/">Protocol Buffers</a></li>
<li><a href="https://zookeeper.apache.org/">Apache Zookeeper</a></li>
<li><a href="http://rocksdb.org/">RocksDB</a></li>
<li><a href="http://www.lua.org/">Lua</a></li>
<li><a href="http://luajit.org/">LuaJIT</a></li>
<li><a href="https://code.visualstudio.com/">Visual Studio Code</a></li>
<li><a href="http://libuv.org/">libuv</a></li>
<li><a href="http://nodejs.org/">node.js</a></li>
<li><a href="http://www.llvm.org/">LLVM</a></li>
</ul>
<div>
One could go on. The wealth of knowledge that is now accessible to all is tremendous, and it is all there for anyone who is interested. </div>
<div>
<br /></div>
<div>
I think this is our time - we who create software. The world is being changed forever as software will drive everything everywhere, and we, the software creators, are at the core of this revolution.</div>
<div>
<br /></div>
<div>
<br /></div>
</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-63545173091593440162016-07-27T00:51:00.000+01:002016-07-27T00:51:23.953+01:00From Java to C#<div dir="ltr" style="text-align: left;" trbidi="on">
I am using C# for the first time in a major new project. This has been made possible thanks to Microsoft's open sourcing C# and making it <a href="https://www.microsoft.com/net">cross platform</a>. I am using C# both on the client desktop as well as the server backend.<br />
<br />
My initial impression is that as a language C# is much more productive than Java!<br />
<br />
If only Microsoft <a href="https://blogs.msdn.microsoft.com/dotnet/2015/02/03/coreclr-is-now-open-source/">had done this earlier</a> - the programming landscape might have been quite different today. Java still has an advantage of a larger eco-system in the opensource world, especially when it comes to writing server applications. </div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-31583933747274990282015-10-31T12:03:00.000+00:002015-10-31T12:07:44.639+00:00Lua Workshop 2015<div dir="ltr" style="text-align: left;" trbidi="on">
I presented <a href="http://ravilang.org/">Ravi</a> at the <a href="http://www.lua.org/wshop15.html">Lua Workshop 2015</a> held at Stockholm. You see see the slides of my presentation <a href="http://www.lua.org/wshop15/Majumdar.pdf">here</a>.</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-82094625085263179572015-08-22T21:09:00.002+01:002015-10-31T11:48:14.090+00:00An LLVM binding for Lua<div dir="ltr" style="text-align: left;" trbidi="on">
I announced the development of <a href="http://trycatchfinally.blogspot.co.uk/2015/01/ravi-attempt-to-create-optional-typing.html">Ravi</a> back in January 2015. It started as an experiment - I was not certain whether Ravi could achieve the performance levels of <a href="https://github.com/LuaJIT/LuaJIT">LuaJIT</a>, the reigning monarch for <a href="http://www.lua.org/">Lua</a> JIT compilation.<br />
<br />
I am pleased to report that Ravi is able to match LuaJIT's performance for a few selected benchmarks. More details are available <a href="http://the-ravi-programming-language.readthedocs.org/en/latest/ravi-benchmarks.html">here</a>. While this is positive news, there is still much to do to make Ravi competitive in a variety of situations.<br />
<br />
LuaJIT offers a powerful <a href="http://luajit.org/ext_ffi.html">FFI interface</a> for interfacing with external libraries and like. This is very convenient for sure, but the approach taken is not compatible with Lua. After some thought I decided that rather than creating an FFI interface for Ravi, a more general capability would be to allow both Lua and Ravi users to write JIT code using <a href="http://www.llvm.org/">LLVM</a>. Work on this has just started so there is not much to show yet, but I hope to make progress fairly quickly.<br />
<br />
LLVM is a very low level api - lower level than C. This has its pluses and minuses. On the plus side the LLVM binding will allow Lua and Ravi users to exploit the full power of LLVM. On the minus side even writing trivial functions can be quite some effort.<br />
<br />
<br /></div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com4tag:blogger.com,1999:blog-20547117.post-70433889552193978112015-07-18T15:10:00.000+01:002015-10-31T12:03:30.709+00:00SimpleDBM migrated to GitHub<div dir="ltr" style="text-align: left;" trbidi="on">
As Google decided to shut down the project hosting service I have been forced to find a new home for SimpleDBM. So SimpleDBM is now hosted at the following github site:<br />
<div>
<br /></div>
<div>
<a href="https://github.com/dibyendumajumdar/simpledbm">https://github.com/dibyendumajumdar/simpledbm</a></div>
</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-33490495127503699782015-04-03T13:19:00.001+01:002015-10-31T11:49:04.696+00:00Memory bugs and finding them<div dir="ltr" style="text-align: left;" trbidi="on">
I am working on creating a JIT compiler for <a href="http://www.lua.org/">Lua</a> and <a href="https://github.com/dibyendumajumdar/ravi">Ravi</a>. Ravi is a Lua derived language with some enhancements to help improve performance. The JIT compilation is being implemented using <a href="http://www.llvm.org/">LLVM</a>.<br />
<br />
As I implement more parts of the Lua language I am able to run more of the standard Lua test suites in JIT mode. A few days ago I encountered a nasty bug - one of the test cases failed on Windows with a run-time exception that seemed to imply an invalid or misaligned stack. Now this is a particularly hard bug to find as at runtime there is a mix of code compiled by the MSVC compiler (the Lua C code) and the code generated by LLVM. The LLVM generated code has no debugging information right now as I haven't yet implemented the required metadata. So the problem is how to investigate the issue if the fault is in LLVM generated code.<br />
<br />
Confusingly the error only occurred on Windows - but not on Linux, so I was initially led to believe that the error may be due to some issue with how LLVM was using the stack on Windows.<br />
<br />
The problem with memory errors is that they are <a href="http://en.wikipedia.org/wiki/Heisenbug">Heisenbug</a>s. Any change you do to investigate the issue such as adding a debug print may help the bug to hide. So investigation is particularly hard.<br />
<br />
My initial attempt at finding the root cause was to run the tests under <a href="http://valgrind.org/">Valgrind</a>. Valgrind reported some possible memory leaks but not in my code - the reported potential leaks were in LLVM code. There were no reports of buffer overflow or memory overwrites.<br />
<br />
I guess that if the problem is some portion of the stack being overwritten then Valgrind cannot find it as it is more a tool for analyzing issues with heap allocation.<br />
<br />
My next stop was to use <a href="https://code.google.com/p/address-sanitizer/">Address Sanitizer</a>. This is a tool created by Google engineers - it works by instrumenting the code using some compiler extensions. Fortunately this capability is available in GCC 4.8.2 which is the compiler I am using on Ubuntu. Just by adding a compiler option (-fsanitize=address) one can get instrumented C code. The tool not only checks heap allocations but also the stack for invalid reads/writes.<br />
<br />
When I ran the test suite with the instrumented version asan reported an issue and aborted the program. Fortunately, you can just run the application under GDB, and <a href="https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAndDebugger">break at the point</a> where asan reports the error. And if the code has appropriate debug information, then GDB will tell you the line of code that caused the error.<br />
<br />
To cut a long story short - I found that the memory error was being caused because I had not modified the Lua Debug API to work with JITed code. I had incorrectly assumed that the Debug API is only used if invoked explicitly. When an error is raised in Lua, the error routine uses the Debug API to obtain information about the code running at the time. This relies upon the 'savedpc' field which is not populated in JITed code - so any calls to the Debug API that relies on this can lead to unexpected memory access. The fix I implemented was to treat JITed functions as if they are C functions.<br />
<br />
The reason for this post is however that I found Address Sanitizer to be an amazing tool. It is a life saver for C/C++ programmers. </div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-40993805341743898802015-01-25T11:52:00.000+00:002015-10-31T11:49:20.240+00:00Ravi - an attempt to create optional typing for Lua<div dir="ltr" style="text-align: left;" trbidi="on">
I am in love with Lua as blogged previously. While it is perfect little language, there is scope for improving the performance of Lua. Obviously great work in this area has already been done by Mike Pall who created <a href="http://luajit.org/">Luajit</a>. However, there are some issues with Luajit that are hard to overcome.<br />
<ul style="text-align: left;">
<li>Large parts of Luajit are written in assembler - which means that it would take significant investment of time and effort to understand how it works and fix issues or make enhancements to it.</li>
<li>Mike Pall is undoubtedly a genius, but he is the sole developer of Luajit. The latest version 2.1 has not been released yet as Mike is presumably working on other things as reported on his sponsorship page. So the destiny of Luajit is pretty much tied up with how much effort Mike puts into Luajit.</li>
<li>Luajit was based on Lua 5.1, and for good reasons it has stayed compatible with 5.1, avoiding ABI incompatible features in later versions. But this is increasingly going to be a problem as newer versions of Lua introduce new features.</li>
<li>Luajit's FFI is great but not compatible with Lua, so any code exploiting FFI is not compatible with Lua. </li>
</ul>
So my solution to above is to enhance core Lua to support optional typing so that the VM can use type specific bytecode. This will hopefully help the interpreter performance but more importantly it will enable simple JIT compilation of performance critical functions.<br />
<br />
I am naming this new dialect of Lua as Ravi. Full details of the project can be found at the <a href="https://github.com/dibyendumajumdar/ravi">Ravi github site</a>. </div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-38106816497655098172014-08-25T16:46:00.000+01:002015-10-31T11:49:34.578+00:00Lua is small and small is beautiful<div dir="ltr" style="text-align: left;" trbidi="on">
Lua is a tiny language and like C has a tiny standard library. Like many other users of Lua, there are times when I wish it had some language feature (ability to specify types, for example) - but when I think about who Lua is meant for by its designers, I get the logic behind keeping it really small and simple.<br />
<br />
Although Lua is powerful enough as a language that it can be used for many complex tasks - see <a href="http://luajit.org/dynasm.html">DynASM</a> for an assembler written in Lua - its primary design goal is to provide applications with an extension language. So for example you have a Spreadsheet application, and you wish to allow users to write their own functions they can use in the Spreadsheet. Or you have a Editor and you wish to allow users to customise the editor. And so on. In all these use cases, we cannot assume that the end user who is coding in Lua is a competent programmer. Hence Lua needs to be ultra-simple for such users. Having types in the language, for example, would immediately complicate the syntax of Lua.<br />
<br />
That Lua fulfils the needs of its users is evident from the fact that a <a href="http://stackoverflow.com/questions/2526300/list-of-lua-derived-vms-and-languages">number of attempts</a> have been made to create a Lua clone that is more powerful as a language - but none of these alternative improved Lua clones have any great following (Note that I exclude LuaJIT from this list as it is 100% faithful to Lua 5.1 so it is another implementation of Lua rather than a clone). I guess the question you have to ask is:<br />
<br />
<ul style="text-align: left;">
<li>Are you trying to create an extension language for ordinary users who are not programmers?</li>
</ul>
<br />
If not then perhaps Lua is not the language you need. </div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-17167843888723257162014-08-05T23:35:00.000+01:002015-10-31T11:49:55.648+00:00Lua - A fabulous programming language<div dir="ltr" style="text-align: left;" trbidi="on">
Last year I discovered <a href="http://www.lua.org/">Lua</a> and <a href="http://luajit.org/luajit.html">LuaJIT</a>.<br />
<br />
These are both amazing implementations of the programming language Lua.<br />
<br />
Ok now I need to explain why I think Lua is fabulous and these implementations are amazing.<br />
<br />
Lua is a <a href="http://www.lua.org/manual/5.2/manual.html#9">very small</a>, dynamic, scripting language that is extremely easy to learn, and that can be used standalone as well as an embedded scripting language. The well documented and well crafted C API for extending the language is probably one of the best features of the Lua system.<br />
<br />
It takes less that a minute to compile and build the Lua language and its basic libraries. Since the language is <a href="http://www.lua.org/source/5.2/">written in ANSI C</a>, you can virtually build it on any platform.<br />
<br />
LuaJIT is a JIT implementation of Lua created by a guy called Mike Pall who is without a doubt a programming wizard. LuaJIT features an interpreter that is hand-crafted in assembler, and has an amazing FFI library that allows easy extensions in C, including creating new data types. LuaJIT comes with an assembler called <a href="http://corsix.github.io/dynasm-doc/index.html">DynASM</a> that is itself written in Lua.<br />
<br />
Neither of these implementations depend on third-party libraries or tools ... which is an amazing thing in today's world (just look at the <a href="https://github.com/JuliaLang/julia#Required-Build-Tools-External-Libraries">dependency list</a> of <a href="http://julialang.org/">Julia</a> for comparison).<br />
<br />
I hope to use Lua extensively in the future, so much so that I decided to learn assembler in order to be able to understand LuaJIT better.<br />
<br />
<br />
<br /></div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-22010728599850352622014-08-01T08:38:00.002+01:002015-10-31T12:03:47.911+00:00Should you ever embark on a complete rewrite?<div dir="ltr" style="text-align: left;" trbidi="on">
I embarked on V2 of my project <a href="https://code.google.com/p/simpledbm/">SimpleDBM</a> about 3 years back. Finally last month I closed down the V2 branch and merged the useful stuff back into the main branch.<br />
<br />
V2 was going to be a major refactoring of the system. That is what killed it - because any major refactoring is large amount of effort. One of the best write ups on why no one should ever do this is <a href="http://www.joelonsoftware.com/articles/fog0000000069.html">this article</a> at <a href="http://www.joelonsoftware.com/">Joel on Software</a>.<br />
<br />
That doesn't mean one should not refactor software - it is just that small incremental changes that are immediately merged and tested with the mainline is the better way to do it.</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-53985009156673632742014-08-01T08:30:00.000+01:002015-10-31T12:04:05.432+00:00SimpleDBM is now licensed under Apache License 2.0<div dir="ltr" style="text-align: left;" trbidi="on">
I am pleased to announce that as of v1.0.23 the license is changing to Apache License 2.0. The new release is available in Maven Central.<br />
<br />
See my earlier <a href="http://simpledbm.blogspot.co.uk/2010/04/licensing-revisited.html">post</a> for the rationale for this change.</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-69103343156431473152014-07-27T00:12:00.000+01:002015-10-31T12:04:20.241+00:00Life after Java<div dir="ltr" style="text-align: left;" trbidi="on">
After working exclusively in Java for several years, I have been dabbling in C++ for the last year or so. Question arises - is C++ still a viable language? If <a href="http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html">Tiobe Index</a> is to be believed C++ has been steadily declining in popularity since about 2005 - coincidentally this was the year I decided to move from C++ to Java for my project <a href="https://code.google.com/p/simpledbm/">SimpleDBM</a>. At the time I stated my reasons for the move in <a href="http://simpledbm.blogspot.co.uk/2005/08/why-java.html">my second blog post</a>.<br />
<br />
So what has happened in the meantime and is C++ still a viable language?<br />
<br />
<div style="text-align: left;">
The place where I work (my day job) - I introduced Java in the realm of <a href="http://www.waterstechnology.com/buy-side-technology/analysis/2316513/buy-side-technology-awards-2013-best-buy-side-risk-portfolio-analytics-product-lchclearnet">financial risk analytics</a>. I led the team that converted a C++ based application to Java - and in the process we proved that the Java implementation was several times faster. The reason for this was nothing to do with the choice of the language - it was just that with Java you can focus on better algorithms and data structures, rather than fighting the language - which made all the difference in my view.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
And yet it is in the realm of numerical computing where C++ is arguably the best language with the exception perhaps of Fortran (of which I have no experience sadly). The main advantages of C++ are:</div>
<div style="text-align: left;">
</div>
<ul style="text-align: left;">
<li>Ability to seamlessly call C++, Fortran and C libraries - a lot of high performance numerical libraries out there are written in these languages.</li>
<li>Control of memory layout of data structures.</li>
<li>Efficient array access via pointers - and no bounds checking.</li>
<li>Templates for generating type specific code.</li>
</ul>
<br />
<div style="text-align: left;">
C++ is still an ugly language with too many features - but the recent changes in <a href="http://www.stroustrup.com/C++11FAQ.html">C++ 11</a> have made life tolerable if not completely easy. I have been looking at alternatives such as <a href="http://dlang.org/">D</a>, <a href="http://golang.org/">Go</a>, <a href="http://julialang.org/">Julia</a>, etc. but haven't found a viable alternative yet. These other languages are either immature or have very restrictive paradigms. JVM based languages such as <a href="http://www.scala-lang.org/">Scala</a> have the same issues essentially as Java.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-30430872308383325422013-05-22T23:33:00.000+01:002015-10-31T12:04:32.058+00:00SimpleDBM now available in Maven Central<div dir="ltr" style="text-align: left;" trbidi="on">
I am pleased to announce that I have finally managed to get SimpleDBM (v 1.0.22) uploaded to <a href="http://search.maven.org/#browse%7C1051502365">Maven Central</a>. This should make it much easier for people to use SimpleDBM in their projects.<br />
<br />
I am working on fixing the build systems of the sample projects - the source repository has fixes for btreedemo and tupledemo samples.</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-50377623792112219432012-07-09T22:45:00.000+01:002015-10-31T12:04:41.769+00:00Performance optimisations in typesystem<div dir="ltr" style="text-align: left;" trbidi="on">
The new typesystem in v2 will only support following types:<br />
<ul style="text-align: left;">
<li>bool </li>
<li>int </li>
<li>long</li>
<li>double</li>
<li>utf8 string</li>
<li>varbyte (raw)</li>
</ul>
<div>
The older version had types like BigInteger, BigDecimal, Date, VarChar.</div>
<div>
The new typesystem attempts to be closer to native types, both for performance reasons and also so that ports to C like languages is easier.</div>
<div>
<br /></div>
<div>
Another major change is the way the row data is serialised. The older typesystem did not store the metadata associated with each column, therefore a separate data dictionary was required to deserialize the data. The new typesystem encodes the types in the serialised format, therefore a row can be reconstructed from the serialised data without reference to a data dictionary. </div>
<div>
<br /></div>
<div>
What is unchanged is the status byte per column. Previously this only stored the value type in the column, i.e., Null, PlusInfinity, MinusInfinity or Value. In the new version I am hoping to expand the use of the status byte to encode 3 things:</div>
<div>
<ul style="text-align: left;">
<li><span style="background-color: white;">value type - only Null or Value, taking 1 bit</span></li>
<li>If int or long or double, then the number of bytes used to store the value - encoded in 3 bits</li>
<li>If bool then the bool value encoded in 1 bit (overlayed with above, 2 bits unused)</li>
<li>If utf-8 string or varbyte then 1 bit to encode if the data is zero length or not (overlayed with above, 2 bits unused)</li>
<li>The remaining 4 bits to encode the type of the data.</li>
</ul>
<div>
One of the performance killers in the old version is the complete deserialization of data whenever a row is read into memory. This is a killer as the overhead of parsing certain types such as Strings, BigInteger, or BigDecimal is huge. The new version will try to avoid parsing the data whenever possible.</div>
</div>
<div>
<br /></div>
<div>
We all know these days that immutable objects are good for multi-threaded applications as they allow us to share data without synchronisation. The old typesystem relies heavily on immutable objects, which is one reason for parsing the row immediately upon deserialization. The problem with lazy parsing is that state must be maintained, and fields initialised upon first access - this makes the row itself mutable, and hence thread unsafe. The solution I am adopting for this is to create separate types. The Row type is immutable, but cannot directly access the bytestream of serialised data. A new type called RowReader is designed to mirror the access methods of Row, but do this over the bytestream. This type is not thread-safe - the caller must ensure that the type is not shared between threads in an unprotected manner. We shall also have a RowBuilder for constructing Row objects incrementally; the RowBuilder is also not thread-safe, and access to it must not be shared across threads in an unprotected manner.</div>
</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0tag:blogger.com,1999:blog-20547117.post-40193701045035145312011-01-30T12:44:00.000+00:002015-10-31T12:04:53.525+00:00Next version of SimpleDBM<div dir="ltr" style="text-align: left;" trbidi="on">
After a break, I have resumed work on creating SimpleDBM V2. This version is mostly a refactoring of the existing codebase to ensure:<br />
<ol style="text-align: left;">
<li>Better project structure - break up the project into smaller modules.</li>
<li>Simple IOC container for auto wiring the modules.</li>
<li>TypeSystem now integral part of the core, hence some modules can take advantage of the Row structure; in the current version, the TypeSystem is an add-on component.</li>
<li>Multi-licensed - SimpleDBM V2 will be available under Apache as well as GPL licenses.</li>
<li>Magic number and version info in the SimpleDBM files.</li>
</ol>
<div>
Rather than making enhancements at the same time as refactoring the codebase, I am going to keep changes to a small number, as I want to get the project restructure completed by Q1 2011. </div>
<div>
<br /></div>
<div>
Moving the type system into the core is the biggest change in SimpleDBM. I think this will allow some modules to exploit the knowledge about rows and column types, and optimise for performance. Some things that may be possible are:</div>
<div>
<ul style="text-align: left;">
<li>Compression of data in pages</li>
<li>Smaller redo log records for data updates</li>
</ul>
<div>
Note that the SimpleDBM V2 code is in a separate mercurial repository. Changes can be viewed at <a href="http://code.google.com/p/simpledbm/source/list?repo=v2">http://code.google.com/p/simpledbm/source/list?repo=v2</a>.</div>
</div>
</div>
Dibyendu Majumdarhttp://www.blogger.com/profile/08417788730731238290noreply@blogger.com0