parscan - experimental interpreter

Age	Commit message (Collapse)	Author
2023-11-13	parser: initial support for type declarations.	Marc Vertes
	The parsing logic for type declarations is there. Note that no tokens are produced, only symbols. The different type kinds will be added next.
2023-11-10	vm: add Grow instruction to increase stack	Marc Vertes

2023-11-10	parser: implement support for var declarations	Marc Vertes
	The full Go syntax is supported, blocks or line, mutiple comma separated variables, assignments. In local and global frame.
2023-11-08	parser: fix break in switch statements	Marc Vertes

2023-11-07	parser: implement switch statement	Marc Vertes
	A VM instruction `EqualSet` has been added to preserve the left operand on the stack in case of failure, to allow efficient multiple tests on the same value. Both the pattern 'if/else if' and the classical case clauses have been implemented.
2023-11-04	fix: skip parsing empty expression	Marc Vertes

2023-11-04	parser: add tests for logical operatators	Marc Vertes

2023-11-03	feat: add support for control flow operators in expressions	Marc Vertes
	Logical operators `&&` (and), `\|\|` (or) are now parsed in expressions. The control flow tokens (labels, conditional jumps) are added accordingly.
2023-10-21	parser: implement operator precedence rules in expressions	Marc Vertes

2023-10-14	parser: implement label, goto and continue statements	Marc Vertes

2023-10-13	parser: do not allocate VM data for labels	Marc Vertes

2023-10-13	parser: include absolute paths in symbols	Marc Vertes

2023-10-13	parser: refactor tests	Marc Vertes

2023-10-13	parser: implement 'break' statement	Marc Vertes

2023-10-12	parser: implement 'for' statement	Marc Vertes

2023-10-12	move to a direct byte code compiler (#8)	Marc Vertes
	* chore: refactor to keep only the new parser and bytecode vm * scanner: remove Token.value field * scanner: remove scanner.kind field * chore: move language specification in lang package This avoid a cyclic dependency in scanner_test which can now use the golang/GoSpec language specification for Go. * clean code * scanner: export scanner fields Also parser now generate function calls, including externals. * chore: fix lint issues * parser: handle strings * wip * parser: implement support for 'if, else, else if' statements Resolving labels in the compiler still in progress. * parser: support if statements, improve compiler * improve handling of functions * improve support of local variables * scanner: trim leading and trailing spaces * fixes to make fibonacci work * parser: improve README, fix function parameters parsing
2023-09-08	scanner: automatic semi-colon insertion at EOF	Marc Vertes

2023-09-08	cli: use flags in command line	Marc Vertes
	Also isolate repl in a separate function. Introduce `run` function for executing programs, letting room for other verbs to be done later: test, debug, ... Use file path as argument. Standard input redirect is no more mandatory.
2023-09-06	scanner: add automatic insertion of semi-colon after newline	Marc Vertes
	As specified in the Go specification, adapted to the following: - the scanner recognise blocks as tokens - the scanner is multi-language: define keywords in scanner spec - as a result, we define how to skip semi-colon insertion rather than how to add it.
2023-09-06	chore: refactor some APIs	Marc Vertes
	The scanner returns a slice of pointers to tokens instead of a slice of tokens. The parser now pass the initial node context.
2023-09-04	codegen: add interpreter tests	Marc Vertes
	Also simplify project structure. The executable is now produced in the root directory. Work in progress.
2023-09-04	scan: fix scanning blocks with inner strings	Marc Vertes

2023-09-01	parser: skip comment modes	Marc Vertes
	Refctor node kind names by concatenating category and instance, to allow better sorting. Comments are now parsed and skipped during generation of AST.
2023-08-31	codegen: fix interpreter re-entrance	Marc Vertes
	So multiple successive incremental Evals function correctly. Also improve the following: - Apply the same Eval API to vm0 and vm1 - parser: dot diagram display is now synchronous - codegen: outsource complex code generation for readability - vm1: Pop take the number of values to pop as operand
2023-08-29	gint: add an interactive REPL	Marc Vertes

2023-08-25	chore: appease lint and vet	Marc Vertes

2023-08-25	gint: simplify	Marc Vertes

2023-08-24	codegen: add Interpreter struct	Marc Vertes
	This makes the code easier to use.
2023-08-24	doc: add diagrams to scanner and parser readmes	Marc Vertes

2023-08-24	fix: parser must be initialized before use	Marc Vertes

2023-08-24	Update README.md	Marc Vertes
	fix diagram
2023-08-24	vm1: improve function calling (#6)	Marc Vertes
	The "Enter" instruction has been removed and the frame pointer is now saved by the "Call" instruction. The "Return" instruction now takes the number of function input parameters as the second operand. It's used to return the output values at the correct place in the caller frame, no matter the number of input parameters. The tests and the code generator have been updated accordingly.
2023-08-24	scanner: handle long string delimiters (#7)	Marc Vertes
	* scanner: handle long string delimiters Strings now can be delimited by arbitrary sequences of characters. It is also possible to exclude the end delimiter, as for example required for `//` C or Go comments. * scanner: fix handling strings within blocks
2023-08-09	codegen: add a bytecode generator (#5)	Marc Vertes
	* codegen: add a bytecode generator * cleaning scanner, parser and vm1.
2023-07-24	vm1: add file pos for debug and a few immediate instructions (#4)	Marc Vertes
	* vm1: add file pos for debug and a few immediate instructions `op[0]` is now reserved for storing the file position computed at code compiling. No impact on performances. Added `Subi` and `Infi` which use integer immediate argument instead of stack. Experimental. Improves `fib()` speed by ~ 20%. * vm1: add benchmark as proposed by @ajnavarro
2023-07-24	parser: define all node kinds to make the parser multi-language (#3)	Marc Vertes
	* parser: define all node kinds to make the parser multi-language Defining all AST node kinds in the parser is necessary to make the parser really multi-language. If a language requires a node kind not already present in parser/kind.go, it will be necessary to add it first here. Note that as long as a node kind subtree is structurally identical between languages, even if there are lexical and/or syntaxic differences, it can (and must) be shared amongst multiple language definitions. For example, an "if" statememt in shell script or in C code should give the same `IfStmt` at AST level. In order to let the parser deal with the various language syntaxes, and produce the right node kind and subtree, parser flags will be set in language definitions (see `Flags` field in `NodeSpec` struct). * lang/golang: use parser node kinds * vm0: remode dependency on language definition.
2023-07-24	scanner: compute numerical values (#2)	Marc Vertes
	The conversion to numerical values is done by the scanner so it's only done once. This will simplify and accelerate vm0 and the code generator.
2023-07-20	vm1: introduce a fast and simple bytecode stack machine (#1)	Marc Vertes
	`vm1` is bytecode based stack machine. The purpose of `vm1` is to provide a simple, fast, embeddable and portable Go execution environment. The bytecode consists of a dozen of instructions, each taking 0 or 1 immediate argument (non-immediate arguments are only passed through the stack). Only a few operators for a few types are implemented. I expect to have 1 instruction per operator per numerical type, all with the same pattern, which would be generated from a template. Estimate is around 20 operators and 10 numerical types, so around 200 instructions in final. Structurally, the vm implements logical and arithmetic operators, condional jumps for `if`, `for` and `switch` control flow, and function call, return and frame management. the memory state of the vm is a slice of Go interfaces (`[]any`). The whole vm is coded in a single function of 80 lines with no dependencies. The size will grow as we add missing instructions, but the code complexity will remain the same. the vm1 package is totally standalone and could be used for any purpose outside of parscan and/or gno. ## Performances The benchmark is performed on a fibonacci program as below: ```go= func fib(n int) int { if n < 2 { return n } return fib(n-2) + fib(n-1) } ``` Note: I generated the bytecode by hand, waiting for the code generator package (next week). I measured the execution time of `fib(35)` on my Apple M1 with: - a static executable generated by the Go compiler (native Go) - a byte code interpreted by vm1 - a byte code interpreted by vm1 where memory is `[]int` instead of `[]any` - the source interpreted by vm0 - using gno and yaegi interpreters \| \| go native \| vm1 \| vm1-int \| yaegi \| vm0 \| gno \| \| ------- \| --------- \| ---- \| ------- \| ----- \| --- \| --- \| \| fib(35) \| 0.1s \| 1.4s \| 1.0s \| 9.2s \| 18s \| 23s \| Notes: - both gno and vm0 are an AST based stack machine - yaegi is a register machine with no bytecode but closures - vm1 is a bytecode stack machine - vm1-int is identical to vm1 except that memory is `[]int` instead of `[]any` (just for benchmark) - go-native is statically compiled native machine code ### Additional notes on vm1 Only one instruction allows to execute code outside of the VM: `CallX`. It can be easily restricted. The performance of vm1 is achieved by simplicity and locality. All the execution of the virtual machine takes place inside a single `for` loop and a single `switch` on the opcode. The code of each instruction is trivial. There is no function called, except for optional `CallX`, and invoking the Go slice builtins (`append`, `len`), which are inlined by the Go compiler. The memory state of vm1, as in vm0, is defined as a `[]any` (slice of interfaces), allowing arbitrary objects, and insuring type consistency (each item of the `mem` slice is a tuple `{typePtr, DataPtr}`). To measure the overhead of maintaining type consistency in the runtime, I defined the memory as `[]int` instead of `[]any`, `fib(35)` executed in 1s instead of 1.4s. In terms of memory management, vm1, as vm0, benefits from the way slices are implemented in Go, minimizing the impact of the garbage collector upon growing and shrinking the stack, as nested and/or recursive function calls require. In this scheme, the bottom of the stack is naturally devoted to storage of global data. ### Breaking performance limits ? Vm1 is approximately 10 to 15 times slower than a Go compiled binary, but is simple, fully dynamic, portable, and self-contained. Given the simplicity of the code, I think it is close to the max of what could be obtained by remaining portable and safe on top of the Go runtime. To break this performance barrier, and get even closer to the performance of a Go native binary, a possible approach is to translate the bytecode into target processor instructions (x86, arm64, ...), then execute them directly, as done for example by some WASM jit interpreters. This would induce either a dependence on librairies inside or outside Go (the fastest existing being outside), or, add dependencies both on each supported CPU architecture (to encode instructions) and each supported OS (to execute a memory-mapped segment). See also the PR about LLVM jit as proposed in [gno#774](https://github.com/gnolang/gno/pull/774). ## Next steps A bytecode generator is still necessary to generate the input bytecode from the AST. I would like to propose that as the next step. hint: it will involve a single tree-walk of the AST.
2023-07-10	readme: how to display AST using graphviz	Marc Vertes

2023-07-10	readme: add usage	Marc Vertes

2023-07-10	first commit	Marc Vertes