Skip to main content

Using PHP_CodeSniffer for nefarious purposes!


I recently had the task of learning how to implement our internal company PHP coding standards documents which exist only as a set of JIRA pages into something that could be integrated into the subversion pre-commit checking phase so that not only does code have to be syntax error free, it now also has to adhere to the coding standards. I remembered reading about PHP_CodeSniffer(PCS) and volunteered to "have a look" and see if I could implement one of the more simpler standards, our variable naming convention, as a starter exercise.

Well, after the mandatory head-banging and spike in coffee consumption that accompanies learning new stuff, I became very impressed by the way that PCS actually does what it does. It builds upon the token_get_all() function and creates not just an array of tokens and there positions but it also figures out (I read the source for half an hour!) which bits of code are contained within other bits of code; in other words the context within which the current token resides. This is very very useful as it means that when, for example, a T_FUNCTION callback is being processed you can know where the function body and the function signature are in the token stream. That's beside the point though, if you want to know more about PCS then visit the PHP_CoderSniffer Pear Site. If you have PEAR installed then it is but a mere incantation away, the sequence as given on the site:
pear install PHP_CodeSniffer
Didn't actually work for me, some message other other came up but when I cut and pasted the suggested alternative command it worked and everything was just fine after that which was a relief because some of the most frustrating PHP vibes I've ever had have come from tackling Pear applications on various platforms that just refuse to install cleanly.

Tick tock tick tock...

A week passes... and I have managed to really "get into" PCS and how it works and I have even managed to write some really cool stuff with it such as one of our standards that says:
  1. all functions must have a single return
  2. the return value must be a variable
  3. the returned variable must begin with a data type and end with "Out", eg $intOut
By using the positions of the open and closing parentheses and the very useful findNext functions, I count the number of "return" statements within the body of the function. None means no further checking is required, more than one raises an error and a subsequent commit rejection whilst one means checking that it is in fact the last statement before the closing brace and that the return data is a variable that fits the acceptable pattern.

The "mad idea" but here about using PCS...

Then I got to thinking.... I have recently released my own programming "system" called FELT, and I have been looking for a way to reverse engineer existing PHP code and then translating it into FELT code. My mad plan would be to reverse engineer Drupal and then make it work as a Node.js site for example. There would be more work but converting it to JavaScript code would be easier if it could be automated.

I had initially planned on using an already available bunch of projects that parse PHP code or provide a grammar description that can be re-used to do it but that's a lot of effort and having spent enough time with PCS now I think it might just be able to pull it off... or it might not but here's the jist of it.

I am going to see if I can use PCS and the data it provides to reverse engineer the code into FELT code.

Sounds good but...


I have a gut feeling that it won't actually be enough because good though it is, it doesn't provide a full AST that I think I am going to need in which case I may need other tools or roll my own.

Comments

Popular posts from this blog

PHP and Lisp: multiple-value-bind (MVB)

This is another article in my attempts to find new ways of looking at PHP and making it less of a chore to type in all that code. As much as I love PHP I hate wasting keystrokes. More typing is more errors is more grief. Being an off and on user of Lisp, although not as much as I used to, one of the things that I always liked in Lisp was the ability to be able to return multiple values from a function at once using (values) and then marry that with (multiple-value-bind) to create convenient named bindings for whatever you were about to do. I recently found myself wanting to return a couple of values from a helper function and I just didn't want to go to the trouble of having to type all those character required to create an array with keys for the two values and then I remembered MVB and a little light went on in my head! If somebody else has already done this then I apologise up front but it was new to me and I haven't seen it anywhere else so this could be a first! ...

Angular.JS ... absolutely awesome BUT...

Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaagh! Sort the documentation soon please!  More soon, I really do like it though. :)

Cross platform development with Scheme

Just a quick note about something I recently found whilst trying not to buy MOCL… http://www.lambdanative.org/ LambdaNative is developed and maintained by the Pediatric Anethesia Research Team (PART) and the Electical and Computer Engineering in Medicine (ECEM) group at the University of British Columbia (UBC). I have spent a week or two evaluating it and accidentally contributed a pull request that got merged and one that didn’t and I have to say that, for what it offers, it is awesomely good. Having personally spent many hours with it now, I do not yet think it is ready for mainstream development for a few reasons. I don’t mean that it a bad way either. What I mean is that the language, Scheme, is not really that widely known and as such it probably won’t break out into the public arena any time yet. That’s a real shame because I think that they have achieved an amazing thing; total abstraction of the underlying platform with a really powerful, underestimated language. ...