Dig a Python’s regex trench… or not.

Let’s talk about not so clearly viewed but important thing as regex execution sequence.

Let’s imagine, we have a list of strings:

sarr = [ 'kitty.env', 'doggy-bone.env', 'dog-plum.env', 'dog-bone.as', 'dog-pass-bone1.env', 'dog-shout.env', 'kitty-paw.as', 'doggy-pitstop.env', ];

and we need to match any string ‘^dog.*\.env$’, but not contained ‘-bone[0-9]*’

So, the matches sould be: 3rd, 6th and 8th strings

Referer docs: python’s regex

The first what cames into the head is the following:

>>> [re.match(r'^dog.*(?!-bone[0-9]*)\.env$', s) is not None for s in sarr] [False, True, True, False, True, True, False, True]

“Whuuuut?” it was my first reaction. A few minutes later with colleague’s help, we’ve found the solution, here it is:

>>> [re.match(r'^dog(?!.*-bone[0-9]*).*\.env$', s) is not None for s in sarr] [False, False, True, False, False, True, False, True]

Seems, as you can guessed, python’s regex have sequentially processing of conditional rules

So, in the first case using human words:

1) Does string is starts with 'dog'?
2) Keep in mind any-characters sentence, mark them.
3) Does string contains '-bone[0-9]*'?
    No? OK.
    Yes? I could quit, BTW, it got match to previous rule, then OK.
4) Does string ends with '\.env'

When the second could reads as:

1) Does string starts with 'dog'?
2) Does string contains any sentence '.*-bone[0-9]*'? Yes? This isn't our case, quit.
3) Does string ends with sentence '.*\.env'?

That’s it. I hope, this post helps you or you’ve got a new piece of knowledge.

HH