Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get results as structured data, instead of Apache text log format #13

Open
pszabop opened this issue Nov 2, 2024 · 2 comments
Open

Comments

@pszabop
Copy link

pszabop commented Nov 2, 2024

Currently the libmodsecurity does not provide any way to get structured data back from the result of a transaction (such as the score). This means (for example) you can't integrate it with some other scoring mechanism, or use JSON logging, etc.

It's also a weird legacy of being tied to the Apache hip that hasn't been addressed.

It would be a useful feature to report results back as a structured data. For example, here's a regex code that converts the final (score) log message to a structure of results:

struct LogEntry {
    client_ip: String,
    status_code: u16,
    score: u32,
    msg: String,
    uri: String,
    unique_id: String,
}
fn parse_log_entry(log: &str) -> Option<LogEntry> {
    // Define a regex pattern to match the log entry format
    let re = Regex::new(r#"\[client (?P<client_ip>[\d\.]+)\] ModSecurity: Access denied with code (?P<status_code>\d{3}) \(phase \d+\). Matched "Operator `Ge' with parameter `(?P<score>\d+)' against variable `TX:BLOCKING_INBOUND_ANOMALY_SCORE' \(Value: `\d+' \) \[file ".*?"\] \[line "\d+"\] \[id "\d+"\] \[rev ""\] \[msg "(?P<msg>.*?)"\] \[data ".*?"\] \[severity "\d+"\] \[ver ".*?"\] \[maturity "\d+"\] \[accuracy "\d+"\] \[tag ".*?"\] \[tag ".*?"\] \[hostname ".*?"\] \[uri "(?P<uri>.*?)"\] \[unique_id "(?P<unique_id>.*?)"\] \[ref ".*?"\]"#).unwrap();

    // Capture the groups using the regex
    if let Some(captures) = re.captures(log) {
        Some(LogEntry {
            client_ip: captures["client_ip"].to_string(),
            status_code: captures["status_code"].parse().unwrap_or(0),
            score: captures["score"].parse().unwrap_or(0),
            msg: captures["msg"].to_string(),
            uri: captures["uri"].to_string(),
            unique_id: captures["unique_id"].to_string(),
        })
    } else {
        None
    }
}

@rkrishn7
Copy link
Owner

Yes, agreed it would be nice to have a more structured representation of the log data!

I think my only hesitancy is the format of the actual log, and if it always adheres to the structure shown in your comment. But maybe we can get around that by wrapping each field in an Option? That way we can leave it up to callers on how they want to handle the potential absence of values. This also is beneficial if the log format for some reason ever changes in some backwards-incompatible manner.

Just some thoughts from my end! Would you be willing to make a PR for this?

@pszabop
Copy link
Author

pszabop commented Dec 16, 2024

Bigger problem is Rust is treating the callback as asynchronous, and therefore combining the log message with local context is proving to be a nightmare. I will have to re-write all my code to copy the context (which then gets copied again when serialized, ugh)>

Is there some annotation we can provide to the function signature(s) to mitigate this, or is this inherit in how Rust deals with call stacks and async (using tokio in this example)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants