Friday, April 11, 2014

Non capturing group in regular expression and it's use

Recently I have been stuck in a regular expression where I had to use grouping in places, values of which were not required. It won't make sense to explain the problem without the text. So here it is


Controller Id Connection Status Connection State Secure Role
------------- ----------------- ---------------- ------ ------
1             Connected         Active           No     Slave
25            Disconnected      Idle             No     Master


My goal was to get the numeric values at the beginning of the line while still matching the whole line for safety (so that any other numeric values in the output don't match). As you see in the output, each column can contain a value from a list of fixed values. e.g. Connection Status can only contain Connected or Disconnected. That demands the following regular expression -


  
(\d+)\s+(Connected|Disconnected)\s+(Active|Idle|Connecting)\s+(Yes|No)\s+(Equal|Master|Slave)
  


But with the above expression, we are fetching all the columns when all we want is the first column. This is where the non capturing group of regular expression comes into picture. With the help of non capturing groups we can remove them from the matched output while still using the grouping functionality.

Corrected regular expression is as below -


  
(\d+)\s+(?:Connected|Disconnected)\s+(?:Active|Idle|Connecting)\s+(?:Yes|No)\s+(?:Equal|Master|Slave)
  


By using the "?:" combination just after the parenthesis, we can instruct regular expression engine to not include that group in the matched output. This can be used anywhere we want to use grouping but the value is not needed in the matched output.