-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Motorola 68k assembly lexer (Devpac syntax) #909
Conversation
Added new lexer for Motorola 68000 assembly, for the Devpac syntax. There are numerous assembler for the 68000 with slightly different syntax. Devpac is the only one I know, but it could be used as a starting point to implement the other variants. It doesn't cover the "older" Devpac syntax where comments can be put after the last operand, at the end of the line, without a separator (`;` or `*`). This would require the lexer to know the semantics of the language, and that was a bit too much for my first lexer 😉 I started it from the NASM lexer, but it's a bit simpler. I initially associated it with `*.asm` files but that conflicts with NASM. I also associated it with the `text/x-asm` mime type, but I removed it as I'm not sure if that makes sense (why this one, and not NASM?) As a result there are no tests, since there's nothing to test in terms of file/mime association.
This is my first attempt and I'm not especially familiar with Ruby, so let me know if there's anything I can improve, thanks! |
Thanks for the PR @nguillaumin. How is/isn't the Motorola 68000 assembler related to NASM? As this is currently largely a copy/paste of NASM I think we should consider how they're related and see if subclasses/delegating would be appropriate here. Note from the README:
Happy to help you through this if you can help me understand the relationship/differences between the two. |
Hi, NASM is an assembler for the x86 platform (i.e. PCs). This syntax is for assembly of Motorola 68000 processors which is a completely different architecture. It was for example used on older computers like Atari ST and Amiga. As such, there's not much that can be shared between the two, as they have completely different instructions and syntax. The only thing that's common is that it's assembler so it's mostly single-line instruction oriented and lacks control structures or higher level constructs, but I don't think it would make sense to try to refactor that. It would be similar to trying to use a common base for C and Java just because they both use braces and have I don't believe it's largely a copy/paste of NASM, if you I hope this helps? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments. Thanks for your contribution @nguillaumin and for explaining. I commented on why I thought it a copy/paste.
lib/rouge/lexers/m68k_devpac.rb
Outdated
def self.keywords_type | ||
@keywords_type ||= Set.new %w( | ||
dc ds dcb | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent the keywords two spaces.
@keywords_type ||= Set.new %w(
dc ds dcb
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this throughout, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/rouge/lexers/m68k_devpac.rb
Outdated
end | ||
|
||
state :inline_whitespace do | ||
rule /[ \t\r]+/, Text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the space before the \t
to match on a space character? If so, use \s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the semantics differ because \s
will match more than just a space (It will match [ \t\r\n\f]
) but to be honest I don't really understand this part and using \s
works, so fine by me.
lib/rouge/lexers/m68k_devpac.rb
Outdated
state :expr_whitespace do | ||
rule /\n+/m, Text, :expr_bol | ||
mixin :whitespace | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What added value does expr_whitespace
have? The first rule is identical to whitespace
and then it mixes in whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the rule that made me think it was a direct copy/paste of nasm. It has this state and I'm not sure why it's needed. It seems to be a paradigm that has traveled through about 4 different lexers. Unless we know why it's needed, let's leave it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, this is indeed coming from NASM and I can't say I understand it fully. I removed it.
Yes, you’re right. I’m ok with either way you decide to go.
…On Tue, Jun 19, 2018 at 9:40 PM Nicolas Guillaumin ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In lib/rouge/lexers/m68k_devpac.rb
<#909 (comment)>:
> + @builtins ||=Set.new %w(
+ d0 d1 d2 d3 d4 d5 d6 d7
+ a0 a1 a2 a3 a4 a5 a6 a7 a7'
+ pc usp ssp ccr
+ )
+ end
+
+ start { push :expr_bol }
+
+ state :expr_bol do
+ mixin :inline_whitespace
+ rule(//) { pop! }
+ end
+
+ state :inline_whitespace do
+ rule /[ \t\r]+/, Text
I think the semantics differ because \s will match more than just a space
(It will match [ \t\r\n\f]) but to be honest I don't really understand
this part and using \s works, so fine by me.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#909 (comment)>, or mute
the thread
</~https://github.com/notifications/unsubscribe-auth/ACSKlHNgAiH4Va2MzkUZOfvkFcjqPLT9ks5t-bYogaJpZM4T66yH>
.
|
First, thanks for the job of implementing m68k-syntax, I was just about to read up on how to do it and was glad to see this. However, I think the should be called m68k just, even if it is initially only tested with one 68k-compiler (devpac). Why? Because m68k for the other compilers have almost the very same syntax in most regards, and It wouldn't make sense, that for example, I next would add an m68k-asmone, m68k-vasm. Let's try to have all the m68k-support in the same lexer. So I suggest to drop the devpac suffix. If I get the time, I'll look into adding full support for code written in some of the others. |
I'm only familiar with the Devpac one and I assumed other syntaxes would be different enough to warrant a separate lexer. If we can support all syntaxes in a single lexer though, that's great! 👍 |
I will test your highlighting on a few codebases I have in other 68k assemblers, and see what I can find , and try to suggest additions or minor changes. I noticed Vim and Atom has m68k-highlighting so one could also peek there to get some inspiration. I have never done Ruby before, but I think I understand enough of the lexers now to be able to contribute. Also, we should think about how to identify a file, maybe we could make the guessing better first give it the common extensions (s,i, asm, 68k, more?), and then guess by instructions in the code like move (nasm doesn't have this I think) etc. Rouge only supports utf-8 though as most codebases in m68k are not (unless they are converted on github/lab etc), but I guess utf-8 only is a design decision and will never change? - Asking @jneen |
I'm keen to get something merge soon-ish as I need the m68k syntax for another project. So even if we intend to make improvements later, I'll probably just remove the devpac suffix on this branch so that it can be merged as-is, as an initial version, if everyone is ok with that? |
So I've pushed a commit that does that, removing the Devpac suffix. I think it's good to go (Subsequent improvements can be made on a separate PR). The PR still says "Changes requested" but I think I have addressed these? Let me know if that weren't the case, thanks! |
Sound good, I don't know if/when I get the time to contribute more to it, so just merge, I'm happy with the devpac suffix removed. |
@nguillaumin FYI, I've started working on improvements, almost done, after testing on various source codes and looking at the 68k spec. It is covering more cases. The cases with with free style comments possible to follow any opcode or operand made me have to refactor a your contribution a bit and make it more of a flow with a few states. Also, I added a test with s,i, m68k and also looking for move.(bsw) and register d0 in the source which is very characteristic of 68k, and should be good enough to identify a 68k file. I still suggest however this PR is merged first and then I'll add my contribution in a separate PR later, as not to confuse, or delay this PR even longer. @dblessing |
The build is failing, but I don't think that's related to my changes from what I can tell... |
Is there anything I can do to help progress this? Thanks! |
Excellent, thanks! |
Added new lexer for Motorola 68000 assembly, for the Devpac syntax.
There are numerous assembler for the 68000 with slightly different
syntax. Devpac is the only one I know, but it could be used as a starting
point to implement the other variants.
It doesn't cover the "older" Devpac syntax where comments can be put
after the last operand, at the end of the line, without a separator (
;
or
*
). This would require the lexer to know the semantics of thelanguage, and that was a bit too much for my first lexer 😉
I started it from the NASM lexer, but it's a bit simpler.
I initially associated it with
*.asm
files but that conflicts withNASM. I also associated it with the
text/x-asm
mime type, but Iremoved it as I'm not sure if that makes sense (why this one, and not
NASM?)
As a result there are no tests, since there's nothing to test in terms
of file/mime association.