I was watching TV, and there was a commercial which proclaimed, "It's time to do what you want!" I replied to the TV, "It's time to write a JSON parser in 6502 assembly language?" Somehow I don't think that's what they had in mind, but the TV is right, I should do what I want.
So, here is my JSON parser. The core parser is written entirely in 6502 assembly language, and is meant to be assembled with ca65. However, it is meant to be called from C, and uses the cc65 calling convention (specifically, the fastcall
convention).
JSON65 should work on any processor in the 6502 family. (It does not use any 65C02 instructions.)
The assembly language parts of JSON65 use the zero page locations used by cc65
, in a way which is compatible with the C calling convention.
JSON65 should work on any target supported by the cc65
toolchain. I have tested it on sim65
and on an unenhanced Apple //e.
Parser (json65.h)
JSON65 is an event-driven (SAX-style) parser, so the parser is given a callback function, which it calls for each event.
JSON65 supports incremental parsing, so you can freely feed it any sized chunks of input, and you don't need to have the whole file in memory at once.
JSON65 is fully reentrant, so you can incrementally parse several files at once if you so desire.
JSON65 does have a couple of limits: strings are limited to 255 bytes, and the nesting depth (of nested arrays or objects) is limited to 224. However, there is no limit on the length of a line, or the length of a file.
JSON65 uses 512 bytes of memory for each parser, which must be allocated by the caller. JSON65 does not use dynamic memory allocation.
In accordance with the JSON specification, JSON65 assumes its input is UTF-8 encoded. However JSON65 does not validate the UTF-8, so any encoding can be used, as long as all bytes with the high bit clear represent ASCII characters. Bytes with the high bit set are only allowed inside strings. The only place where JSON65 assumes UTF-8 is in the processing of \u
escape sequences. In accordance with the JSON specification, a single \u
escape can be used to specify code points in the Basic Multilingual Plane, and two consecutive \u
escapes (a UTF-16 surrogate pair) can be used to specify a code point outside the Basic Multilingual Plane. These escapes will be translated into the proper UTF-8.
Because JSON only allows newlines in places where arbitrary whitespace is allowed, JSON65 is agnostic to the type of line ending. (CR, LF, or CRLF.) For the purposes of counting line numbers for error reporting, JSON65 handles CR, LF, or CRLF line endings.
JSON65 will parse numbers which fit into a 32-bit signed long, and will provide the long to the callback. All other numbers (i. e. floating point numbers, or integers which overflow a 32-bit long) are provided to the callback as a string. (Like strings, numbers cannot be more than 255 digits long.)
The callback function may return an error if it wishes. This will cause parsing to stop immediately, and the error code returned by the callback will be returned by j65_parse()
. Error codes are negative numbers, and the user may use the codes from J65_USER_ERROR
to -1
, inclusive, for their own error codes.
Tree interface (json65-tree.h)
If you use the event-driver parser, you'll need to build your own data structure (or otherwise handle the data somehow) as the events come in. If you don't want to do that, you can use the tree interface (json65-tree.h
) instead, which builds up a data structure for you. This only works for small files, because the entire tree has to fit in memory at once.
Unlike the event-based parser, the tree interface uses dynamic memory allocation.
Printing JSON (json65-print.h)
Mostly, JSON65 is a parser. However, it does have some support for printing JSON back to a file, in json65-print.h
. The function j65_print_tree()
will print a JSON tree (from the tree interface in json65-tree.h
) to a given filehandle. It prints the entire JSON tree on a single line with no whitespace. This is the most compact format for a machine-readable JSON file, but it is not particularly human-readable.
If you write your own code to print JSON, either because you want to pretty-print it, or because you are using a data structure other than j65_node
, you may still want to use the function j65_print_escaped()
from json65-quote.h
. It handles escaping a string using the JSON escape sequences.
API documentation
I don't have any fancy Doxygen documentation, but the API is documented by comments in the header files. If you wish to use the event-driven parser, read json65.h. If you wish to use the tree interface, read json65-tree.h.
Library organization
If you simply wish to use the event-driven (SAX-style) parser, you only need one header file (json65.h
) and one assembly file (json65.s
). However, there are some helper functions in other files, which you can optionally use with JSON65 if you like. Most notable is the tree interface to JSON65, which you may use instead of the event-driven interface for small files.
Each header file corresponds directly to one implementation file. Some of the implementation files are written in assembly language, and some are written in C. Here is a description of each, along with the size of the machine code of the implementation (CODE
section plus RODATA
section; none of the implementation files have any DATA
or BSS
).
- json65.h (2240 bytes) - The core, event-driven parser. This is the only file that is required if you wish to build your own data structure.
- json65-string.h (291 bytes) - This implements a string intern pool which is used by the tree interface.
- json65-tree.h (1300 bytes) - The tree interface, which builds up a tree data structure as the file is parsed. You may then traverse the tree to your heart's content.
- json65-quote.h (226 bytes) - This has a function which prints strings, replacing special characters with the escape sequences from the JSON specification. It is used by the tree printer, but can also be used standalone if you are printing JSON files yourself without using the tree interface.
- json65-print.h (710 bytes) - Prints a tree to a file as JSON. Use this if you are using the tree interface, and wish to write JSON files as well as read them.
- json65-file.h (1378 bytes) - Provides a helper function to feed data to the parser from a file, in chunks, and to display error messages to the user (including printing the offending line, and printing a caret to indicate the offending position of the line).
I hate build systems (or at least, build systems for C code), so I have not provided one. (Other than a lame little Perl script to build and run the tests using sim65.) Instead, I encourage you to copy the source files and header files you need into your own project, and use whatever build system you are already using for your project. (Such as the GNU Make based cc65 build system.)
You can use the following dependency graph to determine which source files you will need to copy into your project. (For each source file, you will also need to copy the corresponding header file.) Source files with no dependencies (such as json65.s
) are at the top of the graph, while the source file with the most dependencies (json65-print.c
) is at the bottom of the graph.
json65.s json65-string.s
/ \ /
/ \ /
/ \ /
json65-file.c json65-tree.c json65-quote.s
\ /
\ /
\ /
json65-print.c
If you wish to build and run the tests, simply run the run-test.pl
Perl script at the top level of the repository. (It takes no arguments.) You'll need to have the cc65 toolchain installed.
Note: version 2.17 and earlier of sim65 have a bug in the implementation of the BIT instruction, so the tests will fail. You'll need a more recent version to get the tests to pass. (This only affects the simulation of the tests. If you plan on running JSON65 on real hardware, or on an emulator other than sim65, then you'll be fine with an older version of cc65.)
License
JSON65 is licensed under the zlib/libpng license, which is approved by the OSI and the FSF.
from Hacker News https://ift.tt/37W6YQB
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.