Get results as structured data, instead of Apache text log format #13

pszabop · 2024-11-02T01:33:29Z

Currently the libmodsecurity does not provide any way to get structured data back from the result of a transaction (such as the score). This means (for example) you can't integrate it with some other scoring mechanism, or use JSON logging, etc.

It's also a weird legacy of being tied to the Apache hip that hasn't been addressed.

It would be a useful feature to report results back as a structured data. For example, here's a regex code that converts the final (score) log message to a structure of results:

struct LogEntry {
    client_ip: String,
    status_code: u16,
    score: u32,
    msg: String,
    uri: String,
    unique_id: String,
}
fn parse_log_entry(log: &str) -> Option<LogEntry> {
    // Define a regex pattern to match the log entry format
    let re = Regex::new(r#"\[client (?P<client_ip>[\d\.]+)\] ModSecurity: Access denied with code (?P<status_code>\d{3}) \(phase \d+\). Matched "Operator `Ge' with parameter `(?P<score>\d+)' against variable `TX:BLOCKING_INBOUND_ANOMALY_SCORE' \(Value: `\d+' \) \[file ".*?"\] \[line "\d+"\] \[id "\d+"\] \[rev ""\] \[msg "(?P<msg>.*?)"\] \[data ".*?"\] \[severity "\d+"\] \[ver ".*?"\] \[maturity "\d+"\] \[accuracy "\d+"\] \[tag ".*?"\] \[tag ".*?"\] \[hostname ".*?"\] \[uri "(?P<uri>.*?)"\] \[unique_id "(?P<unique_id>.*?)"\] \[ref ".*?"\]"#).unwrap();

    // Capture the groups using the regex
    if let Some(captures) = re.captures(log) {
        Some(LogEntry {
            client_ip: captures["client_ip"].to_string(),
            status_code: captures["status_code"].parse().unwrap_or(0),
            score: captures["score"].parse().unwrap_or(0),
            msg: captures["msg"].to_string(),
            uri: captures["uri"].to_string(),
            unique_id: captures["unique_id"].to_string(),
        })
    } else {
        None
    }
}

The text was updated successfully, but these errors were encountered:

rkrishn7 · 2024-11-21T04:36:05Z

Yes, agreed it would be nice to have a more structured representation of the log data!

I think my only hesitancy is the format of the actual log, and if it always adheres to the structure shown in your comment. But maybe we can get around that by wrapping each field in an Option? That way we can leave it up to callers on how they want to handle the potential absence of values. This also is beneficial if the log format for some reason ever changes in some backwards-incompatible manner.

Just some thoughts from my end! Would you be willing to make a PR for this?

pszabop · 2024-12-16T23:51:27Z

Bigger problem is Rust is treating the callback as asynchronous, and therefore combining the log message with local context is proving to be a nightmare. I will have to re-write all my code to copy the context (which then gets copied again when serialized, ugh)>

Is there some annotation we can provide to the function signature(s) to mitigate this, or is this inherit in how Rust deals with call stacks and async (using tokio in this example)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get results as structured data, instead of Apache text log format #13

Get results as structured data, instead of Apache text log format #13

pszabop commented Nov 2, 2024

rkrishn7 commented Nov 21, 2024

pszabop commented Dec 16, 2024

Get results as structured data, instead of Apache text log format #13

Get results as structured data, instead of Apache text log format #13

Comments

pszabop commented Nov 2, 2024

rkrishn7 commented Nov 21, 2024

pszabop commented Dec 16, 2024