| Age | Commit message (Collapse) | Author |
|
The parsing logic for type declarations is there. Note that no
tokens are produced, only symbols. The different type kinds will
be added next.
|
|
|
|
The full Go syntax is supported, blocks or line,
mutiple comma separated variables, assignments.
In local and global frame.
|
|
|
|
A VM instruction `EqualSet` has been added to preserve the left
operand on the stack in case of failure, to allow efficient
multiple tests on the same value.
Both the pattern 'if/else if' and the classical case clauses have
been implemented.
|
|
|
|
|
|
Logical operators `&&` (and), `||` (or) are now parsed in expressions.
The control flow tokens (labels, conditional jumps) are added
accordingly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* chore: refactor to keep only the new parser and bytecode vm
* scanner: remove Token.value field
* scanner: remove scanner.kind field
* chore: move language specification in lang package
This avoid a cyclic dependency in scanner_test which can now use
the golang/GoSpec language specification for Go.
* clean code
* scanner: export scanner fields
Also parser now generate function calls, including externals.
* chore: fix lint issues
* parser: handle strings
* wip
* parser: implement support for 'if, else, else if' statements
Resolving labels in the compiler still in progress.
* parser: support if statements, improve compiler
* improve handling of functions
* improve support of local variables
* scanner: trim leading and trailing spaces
* fixes to make fibonacci work
* parser: improve README, fix function parameters parsing
|
|
|
|
Also isolate repl in a separate function.
Introduce `run` function for executing programs, letting room for
other verbs to be done later: test, debug, ...
Use file path as argument. Standard input redirect is no more mandatory.
|
|
As specified in the Go specification, adapted to the following:
- the scanner recognise blocks as tokens
- the scanner is multi-language: define keywords in scanner spec
- as a result, we define how to skip semi-colon insertion rather
than how to add it.
|
|
The scanner returns a slice of pointers to tokens instead of a
slice of tokens. The parser now pass the initial node context.
|
|
Also simplify project structure. The executable is now produced in
the root directory. Work in progress.
|
|
|
|
Refctor node kind names by concatenating category and instance, to
allow better sorting. Comments are now parsed and skipped during
generation of AST.
|
|
So multiple successive incremental Evals function correctly.
Also improve the following:
- Apply the same Eval API to vm0 and vm1
- parser: dot diagram display is now synchronous
- codegen: outsource complex code generation for readability
- vm1: Pop take the number of values to pop as operand
|
|
|
|
|
|
|
|
This makes the code easier to use.
|
|
|
|
|
|
fix diagram
|
|
The "Enter" instruction has been removed and the frame pointer is
now saved by the "Call" instruction.
The "Return" instruction now takes the number of function input
parameters as the second operand. It's used to return the output
values at the correct place in the caller frame, no matter the
number of input parameters.
The tests and the code generator have been updated accordingly.
|
|
* scanner: handle long string delimiters
Strings now can be delimited by arbitrary sequences of characters.
It is also possible to exclude the end delimiter, as for example
required for `//` C or Go comments.
* scanner: fix handling strings within blocks
|
|
* codegen: add a bytecode generator
* cleaning scanner, parser and vm1.
|
|
* vm1: add file pos for debug and a few immediate instructions
`op[0]` is now reserved for storing the file position computed at
code compiling. No impact on performances.
Added `Subi` and `Infi` which use integer immediate argument instead
of stack. Experimental. Improves `fib()` speed by ~ 20%.
* vm1: add benchmark as proposed by @ajnavarro
|
|
* parser: define all node kinds to make the parser multi-language
Defining all AST node kinds in the parser is necessary to make the
parser really multi-language. If a language requires a node kind not
already present in parser/kind.go, it will be necessary to add it first
here.
Note that as long as a node kind subtree is structurally identical
between languages, even if there are lexical and/or syntaxic
differences, it can (and must) be shared amongst multiple language
definitions.
For example, an "if" statememt in shell script or in C code should give
the same `IfStmt` at AST level.
In order to let the parser deal with the various language syntaxes,
and produce the right node kind and subtree, parser flags will be set
in language definitions (see `Flags` field in `NodeSpec` struct).
* lang/golang: use parser node kinds
* vm0: remode dependency on language definition.
|
|
The conversion to numerical values is done by the scanner so it's
only done once. This will simplify and accelerate vm0 and the code
generator.
|
|
`vm1` is bytecode based stack machine.
The purpose of `vm1` is to provide a simple, fast, embeddable and portable Go execution environment.
The bytecode consists of a dozen of instructions, each taking 0 or 1 immediate argument (non-immediate arguments are only passed through the stack). Only a few operators for a few types are implemented. I expect to have 1 instruction per operator per numerical type, all with the same pattern, which would be generated from a template. Estimate is around 20 operators and 10 numerical types, so around 200 instructions in final.
Structurally, the vm implements logical and arithmetic operators, condional jumps for `if`, `for` and `switch` control flow, and function call, return and frame management.
the memory state of the vm is a slice of Go interfaces (`[]any`).
The whole vm is coded in a single function of 80 lines with no dependencies. The size will grow as we add missing instructions, but the code complexity will remain the same.
the vm1 package is totally standalone and could be used for any purpose outside of parscan and/or gno.
## Performances
The benchmark is performed on a fibonacci program as below:
```go=
func fib(n int) int {
if n < 2 {
return n
}
return fib(n-2) + fib(n-1)
}
```
Note: I generated the bytecode by hand, waiting for the code generator package (next week).
I measured the execution time of `fib(35)` on my Apple M1 with:
- a static executable generated by the Go compiler (native Go)
- a byte code interpreted by vm1
- a byte code interpreted by vm1 where memory is `[]int` instead of `[]any`
- the source interpreted by vm0
- using gno and yaegi interpreters
| | go native | vm1 | vm1-int | yaegi | vm0 | gno |
| ------- | --------- | ---- | ------- | ----- | --- | --- |
| fib(35) | 0.1s | 1.4s | 1.0s | 9.2s | 18s | 23s |
Notes:
- both gno and vm0 are an AST based stack machine
- yaegi is a register machine with no bytecode but closures
- vm1 is a bytecode stack machine
- vm1-int is identical to vm1 except that memory is `[]int` instead of `[]any` (just for benchmark)
- go-native is statically compiled native machine code
### Additional notes on vm1
Only one instruction allows to execute code outside of the VM: `CallX`. It can be easily restricted.
The performance of vm1 is achieved by simplicity and locality. All the execution of the virtual machine takes place inside a single `for` loop and a single `switch` on the opcode. The code of each instruction is trivial. There is no function called, except for optional `CallX`, and invoking the Go slice builtins (`append`, `len`), which are inlined by the Go compiler.
The memory state of vm1, as in vm0, is defined as a `[]any` (slice of interfaces), allowing arbitrary objects, and insuring type consistency (each item of the `mem` slice is a tuple `{typePtr, DataPtr}`). To measure the overhead of maintaining type consistency in the runtime, I defined the memory as `[]int` instead of `[]any`, `fib(35)` executed in 1s instead of 1.4s.
In terms of memory management, vm1, as vm0, benefits from the way slices are implemented in Go, minimizing the impact of the garbage collector upon growing and shrinking the stack, as nested and/or recursive function calls require. In this scheme, the bottom of the stack is naturally devoted to storage of global data.
### Breaking performance limits ?
Vm1 is approximately 10 to 15 times slower than a Go compiled binary, but is simple, fully dynamic, portable, and self-contained. Given the simplicity of the code, I think it is close to the max of what could be obtained by remaining portable and safe on top of the Go runtime.
To break this performance barrier, and get even closer to the performance of a Go native binary, a possible approach is to *translate* the bytecode into target processor instructions (x86, arm64, ...), then execute them directly, as done for example by some WASM jit interpreters. This would induce either a dependence on librairies inside or outside Go (the fastest existing being outside), or, add dependencies both on each supported CPU architecture (to encode instructions) and each supported OS (to execute a memory-mapped segment).
See also the PR about LLVM jit as proposed in [gno#774](https://github.com/gnolang/gno/pull/774).
## Next steps
A bytecode generator is still necessary to generate the input bytecode from the AST. I would like to propose that as the next step. hint: it will involve a single tree-walk of the AST.
|
|
|
|
|
|
|