summaryrefslogtreecommitdiff
path: root/parser/README.md
diff options
context:
space:
mode:
authorMarc Vertes <marc.vertes@tendermint.com>2023-10-12 10:51:58 +0200
committerGitHub <noreply@github.com>2023-10-12 10:51:58 +0200
commit37b9da32d3b911091deb254f6cba2a137c471287 (patch)
treeb4451de0fa0473a937a77d39fd1f8a4f87c8f60d /parser/README.md
parenta21b9b12ad865a19ff687645082f9093c4101039 (diff)
move to a direct byte code compiler (#8)
* chore: refactor to keep only the new parser and bytecode vm * scanner: remove Token.value field * scanner: remove scanner.kind field * chore: move language specification in lang package This avoid a cyclic dependency in scanner_test which can now use the golang/GoSpec language specification for Go. * clean code * scanner: export scanner fields Also parser now generate function calls, including externals. * chore: fix lint issues * parser: handle strings * wip * parser: implement support for 'if, else, else if' statements Resolving labels in the compiler still in progress. * parser: support if statements, improve compiler * improve handling of functions * improve support of local variables * scanner: trim leading and trailing spaces * fixes to make fibonacci work * parser: improve README, fix function parameters parsing
Diffstat (limited to 'parser/README.md')
-rw-r--r--parser/README.md157
1 files changed, 99 insertions, 58 deletions
diff --git a/parser/README.md b/parser/README.md
index b8bc0d4..d46608c 100644
--- a/parser/README.md
+++ b/parser/README.md
@@ -1,71 +1,112 @@
# Parser
-A parser takes an array of tokens (produced by the scanner) in input and
-returns a node representing a syntax tree. A node is an object
-containing a kind, the corresponding token and the ordered references to
-descendent nodes.
+See if we can build an almost single pass compiler for Go doing syntax
+directed translation, without any complex data structure (no syntax
+tree), only lists of tokens.
-```mermaid
-graph LR
-s[ ] --> |source| a(scanner)
---> |tokens| b(parser)
---> |AST| c[ ]
-subgraph parser
- b
-end
-style s height:0px;
-style c height:0px;
-```
+The goal is to have the shortest and simplest path from source to
+bytecode.
-A goal is to make the parser generic enough so it can generate syntax
-trees for most of existing programming languages (no claim of generality
-yet), provided a small set of generating rules per language, and a small
-set of validating rules (yet to be defined) to detect invalid
-constructs.
+## Design
-The input tokens are particular in the sense that they include classical
-lexical items such as words, separators, numbers, but also strings and
-nested blocks, which are resolved at scanning stage rather than parsing
-stage. See the scanner for more details.
+The input of parser is a list of tokens produced by the scanner.
+Multiple tokens are processed at once. The minimal set to get
+meaningful results (not an error or nil) is a complete statemement.
-The language specification includes the following:
+The output of parser is also a list of tokens, to be consummed by
+the compiler to produce bytecode. The output tokens set is identical
+to the bytecode instructions set except that:
-- a scanner specification, to produce the set of possible tokens.
-- a map of node specification per token name. The node specification
- defines some parameters influing how the tree is generated.
+- code locations may be provided as as labels instead of numerical
+ values,
+- memory locations for constants and variables may be provided as
+ symbol names instead of numerical values.
-## Development status
+## Status
-A successful test must be provided to check the status.
+Go language support:
-- [x] binary operator expressions
-- [x] unary operator (prefix) expressions
-- [ ] unary operator (suffix) expressions
-- [x] operator precedence rules
-- [x] parenthesis in expressions
-- [x] call expressions
-- [ ] nested calls
-- [x] index expressions
-- [x] single assignments
-- [ ] multi assignments
-- [x] simple `if` statement (no `else`)
-- [ ] full `if` statement (including `else`, `else if`)
-- [x] init expressions in `if` statements
-- [x] statement blocks
-- [x] comments
+- [x] named functions
+- [ ] anonymous functions (closures)
+- [ ] methods
+- [x] internal function calls
+- [x] external function calls (calling runtime symbols in interpreter)
+- [ ] export to runtime
+- [ ] builtin calls (new, make, copy, delete, len, cap, ...)
+- [ ] out of order declarations
+- [ ] arbirtrary precision constants
+- [x] basic types
+- [ ] complete numeric types
+- [x] function types
+- [ ] variadic functions
+- [ ] pointers
+- [ ] structures
+- [ ] embedded structures
+- [ ] recursive structures
+- [ ] interfaces
+- [ ] arrays, slices
+- [ ] maps
+- [ ] deterministic maps
+- [ ] channel types
+- [ ] channel operations
+- [x] var defined by assign :=
+- [x] var assign =
+- [ ] var declaration
+- [ ] type declaration
+- [x] func declaration
+- [ ] const declaration
+- [ ] iota expression
+- [ ] defer statement
+- [ ] recover statement
+- [ ] go statement
+- [x] if statement (including else and else if)
- [ ] for statement
- [ ] switch statement
+- [ ] break statement
+- [ ] continue statement
+- [ ] fallthrough statement
+- [ ] goto statement
- [ ] select statement
-- [x] return statement
-- [x] function declaration
-- [ ] method declaration
-- [ ] anonymous functions (closures)
-- [ ] type declaration
-- [ ] var, const, type single declaration
-- [ ] var, const, type multi declaration
-- [ ] type parametric expressions
-- [x] literal numbers (see scanner)
-- [x] literal strings
-- [ ] composite literal
-- [ ] import statements
-- [ ] go.mod syntax
+- [x] binary operators
+- [ ] unary operators
+- [ ] logical operators && and ||
+- [ ] assign operators
+- [ ] operator precedence
+- [x] parenthesis expressions
+- [x] call expressions
+- [ ] index expressions
+- [ ] selector expressions
+- [ ] type conversions
+- [ ] type assertions
+- [ ] parametric types (generic)
+- [ ] type parametric functions (generic)
+- [ ] type constraints (generic)
+- [ ] type checking
+- [ ] comment pragmas
+- [ ] packages import
+- [ ] modules
+
+Other items:
+
+- [x] REPL
+- [x] multiline statements in REPL
+- [ ] completion, history in REPL
+- [x] eval strings
+- [x] eval files (including stdin, ...)
+- [x] debug traces for scanner, parser, compiler, bytecode vm
+- [x] simple interpreter tests to exercise from source to execution
+- [ ] compile time error detection and diagnosis
+- [ ] stack dump
+- [ ] symbol tables, data tables, code binded to source lines
+- [ ] interactive debugger: breaks, continue, instrospection, ...
+- [ ] machine level debugger
+- [ ] source level debugger
+- [ ] replay debugger, backward instruction execution
+- [ ] vm monitor: live memory / code display during run
+- [ ] stdlib wrappers a la yaegi
+- [ ] system and environment sandboxing
+- [ ] build constraints (arch, sys, etc)
+- [ ] test command (running go test / benchmark / example files)
+- [ ] skipping / including test files
+- [ ] test coverage
+- [ ] fuzzy tests for scanner, vm, ...