What is grok?
it’s a well established way of parsing, using Regular Expressions, files with single lines (e.g. Log files).
Why Create a .NET version?
Speed, the ability to use it in AWS Lambda, and because dotnetcore is my preferred language.
Syntax
var grok = new Grok("<grok string>"); var response = grok.ParseLine(" <line to parse>"); foreach (var match in response.Captures) { Console.WriteLine($"Name: {match.Item1} Value: {match.Item2?.ToString()}"); } var response2 = grok.ParseLine(" <line to parse>");
Performance?
So far, I’m seeing ~0.0002ms for complex patterns sure AWS ELB logs. I think this is the benefit using compiled Regex, as we reuse the same pattern on the second parse.
"%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb} %{IP:clientip}:%{INT:clientport:int} (?:(%{IP:backendip}:?:%{INT:backendport:int})|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:-|%{INT:elb_status_code:int}) (?:-|%{INT:backend_status_code:int}) %{INT:received_bytes:int} %{INT:sent_bytes:int} \"%{ELB_REQUEST_LINE}\" \"(?:-|%{DATA:user_agent})\" (?:-|%{NOTSPACE:ssl_cipher}) (?:-|%{NOTSPACE:ssl_protocol})"
Where can I get it?
It will going up on Nuget at some point this week, it will be a rough and ready version, with some core patterns, there may be breaking changes as I figure out the best option for the outbound interface.
It will also be opensource on GitHub, if anyone does want to take it on, and do something amazing with it.