diff options
| author | Marc Vertes <marc.vertes@tendermint.com> | 2023-08-09 11:47:39 +0200 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2023-08-09 11:47:39 +0200 |
| commit | 947873b34aabe46dfb9f8d06214736cb11b5a6b2 (patch) | |
| tree | 9fc4728cf39017ee0275d62a7578881cbb3073bb /scanner/readme.md | |
| parent | 355750be61fbf4b90d132a9560e01113f22f4c38 (diff) | |
codegen: add a bytecode generator (#5)
* codegen: add a bytecode generator
* cleaning scanner, parser and vm1.
Diffstat (limited to 'scanner/readme.md')
| -rw-r--r-- | scanner/readme.md | 42 |
1 files changed, 42 insertions, 0 deletions
diff --git a/scanner/readme.md b/scanner/readme.md new file mode 100644 index 0000000..b8b31fb --- /dev/null +++ b/scanner/readme.md @@ -0,0 +1,42 @@ +# Scanner + +A scanner takes a string in input and returns an array of tokens. + +Tokens can be of the following kinds: +- identifier +- number +- operator +- separator +- string +- block + +Resolving nested blocks in the scanner is making the parser simple +and generic, without having to resort to parse tables. + +The lexical rules are provided by a language specification at language +level which includes the following: + +- a set of composable properties (1 per bit, on an integer) for each + character in the ASCII range (where all separator, operators and + reserved keywords must be defined). +- for each block or string, the specification of starting and ending + delimiter. + +## Development status + +A successful test must be provided to check the status. + +- [x] numbers starting with a digit +- [ ] numbers starting otherwise +- [x] unescaped strings (including multiline) +- [x] escaped string (including multiline) +- [x] separators (in UTF-8 range) +- [ ] single line string (\n not allowed) +- [x] identifiers (in UTF-8 range) +- [x] operators, concatenated or not +- [x] single character block/string delimiters +- [x] arbitrarly nested blocks and strings +- [ ] multiple characters block/string delimiters +- [ ] blocks delimited by identifiers/operators/separators +- [ ] blocks with delimiter inclusion/exclusion rules +- [ ] blocks delimited by indentation level |
