summaryrefslogtreecommitdiff
path: root/scanner/readme.md
diff options
context:
space:
mode:
authorMarc Vertes <marc.vertes@tendermint.com>2023-08-09 11:47:39 +0200
committerGitHub <noreply@github.com>2023-08-09 11:47:39 +0200
commit947873b34aabe46dfb9f8d06214736cb11b5a6b2 (patch)
tree9fc4728cf39017ee0275d62a7578881cbb3073bb /scanner/readme.md
parent355750be61fbf4b90d132a9560e01113f22f4c38 (diff)
codegen: add a bytecode generator (#5)
* codegen: add a bytecode generator * cleaning scanner, parser and vm1.
Diffstat (limited to 'scanner/readme.md')
-rw-r--r--scanner/readme.md42
1 files changed, 42 insertions, 0 deletions
diff --git a/scanner/readme.md b/scanner/readme.md
new file mode 100644
index 0000000..b8b31fb
--- /dev/null
+++ b/scanner/readme.md
@@ -0,0 +1,42 @@
+# Scanner
+
+A scanner takes a string in input and returns an array of tokens.
+
+Tokens can be of the following kinds:
+- identifier
+- number
+- operator
+- separator
+- string
+- block
+
+Resolving nested blocks in the scanner is making the parser simple
+and generic, without having to resort to parse tables.
+
+The lexical rules are provided by a language specification at language
+level which includes the following:
+
+- a set of composable properties (1 per bit, on an integer) for each
+ character in the ASCII range (where all separator, operators and
+ reserved keywords must be defined).
+- for each block or string, the specification of starting and ending
+ delimiter.
+
+## Development status
+
+A successful test must be provided to check the status.
+
+- [x] numbers starting with a digit
+- [ ] numbers starting otherwise
+- [x] unescaped strings (including multiline)
+- [x] escaped string (including multiline)
+- [x] separators (in UTF-8 range)
+- [ ] single line string (\n not allowed)
+- [x] identifiers (in UTF-8 range)
+- [x] operators, concatenated or not
+- [x] single character block/string delimiters
+- [x] arbitrarly nested blocks and strings
+- [ ] multiple characters block/string delimiters
+- [ ] blocks delimited by identifiers/operators/separators
+- [ ] blocks with delimiter inclusion/exclusion rules
+- [ ] blocks delimited by indentation level