Skip to content

Commit

Permalink
Merge pull request #24 from hueristiq/dev
Browse files Browse the repository at this point in the history
Development version 0.1.0
  • Loading branch information
enenumxela authored May 18, 2023
2 parents b0e0d9c + f48b231 commit a852204
Show file tree
Hide file tree
Showing 29 changed files with 964 additions and 863 deletions.
67 changes: 19 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

![made with go](https://img.shields.io/badge/made%20with-Go-0000FF.svg) [![release](https://img.shields.io/github/release/hueristiq/xurlfind3r?style=flat&color=0000FF)](https://github.com/hueristiq/xurlfind3r/releases) [![license](https://img.shields.io/badge/license-MIT-gray.svg?color=0000FF)](https://github.com/hueristiq/xurlfind3r/blob/master/LICENSE) ![maintenance](https://img.shields.io/badge/maintained%3F-yes-0000FF.svg) [![open issues](https://img.shields.io/github/issues-raw/hueristiq/xurlfind3r.svg?style=flat&color=0000FF)](https://github.com/hueristiq/xurlfind3r/issues?q=is:issue+is:open) [![closed issues](https://img.shields.io/github/issues-closed-raw/hueristiq/xurlfind3r.svg?style=flat&color=0000FF)](https://github.com/hueristiq/xurlfind3r/issues?q=is:issue+is:closed) [![contribution](https://img.shields.io/badge/contributions-welcome-0000FF.svg)](https://github.com/hueristiq/xurlfind3r/blob/master/CONTRIBUTING.md)

`xurlfind3r` is a command-line interface (CLI) utility to fetch known URLs.
`xurlfind3r` is a command-line interface (CLI) utility to find domain's known URLs passively from **[AlienVault's Open Threat Exchange](https://otx.alienvault.com/)**, **[Common Crawl](https://commoncrawl.org/)**, **[Github](https://github.com)**, **[Intelligence X](https://intelx.io)**, **[URLScan](https://urlscan.io/)**, and the **[Wayback Machine](https://archive.org/web/)**.

## Resource

Expand All @@ -14,22 +14,14 @@
* [`go build ...` the development Version](#go-build--the-development-version)
* [Post Installation](#post-installation)
* [Usage](#usage)
* [Examples](#examples)
* [Basic](#basic)
* [Regex filter URLs](#regex-filter-urls)
* [Include Subdomains' URLs](#include-subdomains-urls)
* [Contribution](#contribution)
* [Licensing](#licensing)

## Features

* Fetches known URLs:-
* ... from **[AlienVault's OTX](https://otx.alienvault.com/)**, **[Common Crawl](https://commoncrawl.org/)**, **[URLScan](https://urlscan.io/)**, **[Github](https://github.com)**, **[Intelligence X](https://intelx.io)** and the **[Wayback Machine](https://archive.org/web/)**.
* ... from parsing `robots.txt`, snapshots on the Wayback Machine, disallowed paths.
* Reduces noise:-
* ... by xegex filtering URLs.
* ... by removing duplicate pages in the sense of URL patterns that are probably repetitive and points to the same web template.
* Outputs to stdout, for piping, or file.
* Fetches known URLs from **[AlienVault's OTX](https://otx.alienvault.com/)**, **[Common Crawl](https://commoncrawl.org/)**, **[URLScan](https://urlscan.io/)**, **[Github](https://github.com)**, **[Intelligence X](https://intelx.io)** and the **[Wayback Machine](https://archive.org/web/)**.
* Parse URLs from `robots.txt` snapshots on the Wayback Machine.
* Parse URLs from webpages snapshots on the Wayback Machine.

## Installation

Expand Down Expand Up @@ -109,20 +101,19 @@ go install -v github.com/hueristiq/xurlfind3r/cmd/xurlfind3r@latest

## Post Installation

xurlfind3r will work after [installation](#installation). However, to configure xurlfind3r to work with certain services - currently github - you will need to have setup API keys. The API keys are stored in the `$HOME/.hueristiq/xurlfind3r/config.yaml` file - created upon first run - and uses the YAML format. Multiple API keys can be specified for each of these services.
`xurlfind3r` will work right after [installation](#installation). However, **[Github](https://github.com)** and **[Intelligence X](https://intelx.io)** require API keys to work. The API keys are stored in the `$HOME/.hueristiq/xurlfind3r/config.yaml` file - created upon first run - and uses the YAML format.

Example:

```yaml
version: 0.0.0
version: 0.1.0
sources:
- commoncrawl
- github
- intelx
- otx
- urlscan
- wayback
- waybackrobots
keys:
github:
- d23a554bbc1aabb208c9acfbd2dd41ce7fc9db39
Expand All @@ -133,68 +124,48 @@ keys:
## Usage
**DiSCLAIMER:** fetching urls from github is a bit slow.
To display help message for `xurlfind3r` use the `-h` flag:

```bash
xurlfind3r -h
```

This will display help for the tool.
help message:

```
_ __ _ _ _____
__ ___ _ _ __| |/ _(_)_ __ __| |___ / _ __
\ \/ / | | | '__| | |_| | '_ \ / _` | |_ \| '__|
> <| |_| | | | | _| | | | | (_| |___) | |
/_/\_\\__,_|_| |_|_| |_|_| |_|\__,_|____/|_| v0.0.0
/_/\_\\__,_|_| |_|_| |_|_| |_|\__,_|____/|_| v0.1.0

A CLI utility to fetch known URLs.
A CLI utility to find domain's known URLs.

USAGE:
xurlfind3r [OPTIONS]

INPUT:
TARGET:
-d, --domain string target domain
--include-subdomains bool include domain's subdomains

SOURCES:
--use-sources strings comma(,) separated sources to use
--exclude-sources strings comma(,) separated sources to exclude
--list-sources list all the available sources
--list-sources bool list available sources
-s --sources strings comma(,) separated sources to use (default: commoncrawl,github,intelx,otx,urlscan,wayback)

FILTER:
--include-subdomains include subdomains
-f, --filter string URL filtering regex
CONFIGURATION:
--skip-wayback-robots bool skip parsing wayback robots.txt snapshots
--skip-wayback-source bool skip parsing wayback source code snapshots

OUTPUT:
-m, --monochrome no colored output mode
-o, --output string output file to write found URLs
-v, --verbosity debug, info, warning, error, fatal or silent (default: info)
```
### Examples

#### Basic

```bash
xurlfind3r -d tesla.com
```

#### Regex filter URLs

```bash
xurlfind3r -d tesla.com -f ".(jpg|jpeg|gif|png|ico|css|eot|tif|tiff|ttf|woff|woff2)"
```

#### Include Subdomains' URLs

```bash
xurlfind3r -d tesla.com --include-subdomains
```

## Contribution
[Issues](https://github.com/hueristiq/xurlfind3r/issues) and [Pull Requests](https://github.com/hueristiq/xurlfind3r/pulls) are welcome! Check out the [contribution guidelines.](./CONTRIBUTING.md)
[Issues](https://github.com/hueristiq/xurlfind3r/issues) and [Pull Requests](https://github.com/hueristiq/xurlfind3r/pulls) are welcome! Check out the [contribution guidelines](./CONTRIBUTING.md).
## Licensing
This utility is distributed under the [MIT license](./LICENSE)
This utility is distributed under the [MIT license](./LICENSE).
122 changes: 52 additions & 70 deletions cmd/xurlfind3r/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ package main

import (
"bufio"
"regexp"

"fmt"
"os"
"path/filepath"
Expand All @@ -15,10 +13,8 @@ import (
"github.com/hueristiq/hqgoutils/log/formatter"
"github.com/hueristiq/hqgoutils/log/levels"
"github.com/hueristiq/xurlfind3r/internal/configuration"
"github.com/hueristiq/xurlfind3r/pkg/runner"
"github.com/hueristiq/xurlfind3r/pkg/runner/collector"
"github.com/hueristiq/xurlfind3r/pkg/runner/collector/filter"
"github.com/hueristiq/xurlfind3r/pkg/runner/collector/sources"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources"
"github.com/imdario/mergo"
"github.com/logrusorgru/aurora/v3"
"github.com/spf13/pflag"
Expand All @@ -27,28 +23,27 @@ import (
var (
au aurora.Aurora

listSources bool

domain string
sourcesToUse, sourcesToExclude []string
includeSubdomains bool
filterRegex string
output string

monochrome bool
verbosity string
domain string
includeSubdomains bool
listSources bool
sourcesToUse []string
skipWaybackRobots bool
skipWaybackSource bool
monochrome bool
output string
verbosity string
)

func init() {
// parse flags
pflag.StringVarP(&domain, "domain", "d", "", "target domain")
pflag.BoolVar(&includeSubdomains, "include-subdomains", false, "include subdomains")
pflag.StringVarP(&filterRegex, "filter", "f", "", "URL filtering regex")
pflag.StringSliceVar(&sourcesToUse, "use-sources", []string{}, "comma(,) separated sources to use")
pflag.StringSliceVar(&sourcesToExclude, "exclude-sources", []string{}, "comma(,) separated sources to exclude")
pflag.BoolVar(&listSources, "list-sources", false, "list all the available sources")
pflag.BoolVarP(&monochrome, "monochrome", "m", false, "no colored output mode")
pflag.StringVarP(&output, "output", "o", "", "output file")
// Handle command line arguments & flags
pflag.StringVarP(&domain, "domain", "d", "", "")
pflag.BoolVar(&includeSubdomains, "include-subdomains", false, "")
pflag.BoolVar(&listSources, "list-sources", false, "")
pflag.StringSliceVarP(&sourcesToUse, "sources", "s", sources.List, "")
pflag.BoolVar(&skipWaybackRobots, "skip-wayback-robots", false, "")
pflag.BoolVar(&skipWaybackSource, "skip-wayback-source", false, "")
pflag.BoolVarP(&monochrome, "monochrome", "m", false, "")
pflag.StringVarP(&output, "output", "o", "", "")
pflag.StringVarP(&verbosity, "verbosity", "v", string(levels.LevelInfo), "")

pflag.CommandLine.SortFlags = false
Expand All @@ -58,18 +53,17 @@ func init() {
h := "USAGE:\n"
h += " xurlfind3r [OPTIONS]\n"

h += "\nINPUT:\n"
h += "\nTARGET:\n"
h += " -d, --domain string target domain\n"
h += " --include-subdomains bool include domain's subdomains\n"

h += "\nSOURCES:\n"
h += " --use-sources strings comma(,) separated sources to use\n"
h += " --exclude-sources strings comma(,) separated sources to exclude\n"
h += " --list-sources list all the available sources\n"

h += "\nFILTER:\n"
h += " --list-sources bool list available sources\n"
h += " -s --sources strings comma(,) separated sources to use (default: commoncrawl,github,intelx,otx,urlscan,wayback)\n"

h += " --include-subdomains include subdomains\n"
h += " -f, --filter string URL filtering regex\n"
h += "\nCONFIGURATION:\n"
h += " --skip-wayback-robots bool skip parsing wayback robots.txt snapshots\n"
h += " --skip-wayback-source bool skip parsing wayback source code snapshots\n"

h += "\nOUTPUT:\n"
h += " -m, --monochrome no colored output mode\n"
Expand All @@ -81,43 +75,43 @@ func init() {

pflag.Parse()

// initialize logger
// Initialize logger
hqlog.DefaultLogger.SetMaxLevel(levels.LevelStr(verbosity))
hqlog.DefaultLogger.SetFormatter(formatter.NewCLI(&formatter.CLIOptions{
Colorize: !monochrome,
}))

// initialize configuration
// Handle configuration on initial run
var (
err error
conf configuration.Configuration
err error
config configuration.Configuration
)

_, err = os.Stat(configuration.ConfigurationFilePath)
if err != nil {
if os.IsNotExist(err) {
conf = configuration.Default
config = configuration.Default

if err = configuration.Write(&conf); err != nil {
if err = configuration.Write(&config); err != nil {
hqlog.Fatal().Msg(err.Error())
}
} else {
hqlog.Fatal().Msg(err.Error())
}
} else {
conf, err = configuration.Read()
config, err = configuration.Read()
if err != nil {
hqlog.Fatal().Msg(err.Error())
}

if conf.Version != configuration.VERSION {
if err = mergo.Merge(&conf, configuration.Default); err != nil {
if config.Version != configuration.VERSION {
if err = mergo.Merge(&config, configuration.Default); err != nil {
hqlog.Fatal().Msg(err.Error())
}

conf.Version = configuration.VERSION
config.Version = configuration.VERSION

if err = configuration.Write(&conf); err != nil {
if err = configuration.Write(&config); err != nil {
hqlog.Fatal().Msg(err.Error())
}
}
Expand All @@ -127,14 +121,6 @@ func init() {
}

func main() {
var (
keys sources.Keys
regex *regexp.Regexp
ftr filter.Filter
clr *collector.Collector
rnr *runner.Runner
)

if verbosity != string(levels.LevelSilent) {
fmt.Fprintln(os.Stderr, configuration.BANNER)
}
Expand All @@ -144,8 +130,9 @@ func main() {
hqlog.Fatal().Msg(err.Error())
}

keys = config.GetKeys()
keys := config.GetKeys()

// Handle sources listing
if listSources {
hqlog.Info().Msgf("current list of the available %v sources", au.Underline(strconv.Itoa(len(config.Sources))).Bold())
hqlog.Info().Msg("sources marked with an * needs key or token")
Expand All @@ -170,33 +157,28 @@ func main() {
os.Exit(0)
}

// Handle URLs finding
if verbosity != string(levels.LevelSilent) {
hqlog.Info().Msgf("`fetching urls for %v", au.Underline(domain).Bold())
hqlog.Info().Msgf("finding URLs for %v.", au.Underline(domain).Bold())

if includeSubdomains {
hqlog.Info().Msg("`--include-subdomains` used: includes subdomains' urls")
hqlog.Info().Msg("`--include-subdomains` used: includes subdomains' URLs.")
}

hqlog.Print().Msg("")
}

if filterRegex != "" {
regex = regexp.MustCompile(filterRegex)
options := &xurlfind3r.Options{
Domain: domain,
IncludeSubdomains: includeSubdomains,
Sources: sourcesToUse,
Keys: keys,
ParseWaybackRobots: !skipWaybackRobots,
ParseWaybackSource: !skipWaybackSource,
}

ftr = filter.Filter{
Domain: domain,
IncludeSubdomains: includeSubdomains,
ExcludeRegex: regex,
}

clr = collector.New(sourcesToUse, sourcesToExclude, keys, ftr)
rnr = runner.New(clr)

URLs, err := rnr.Run()
if err != nil {
hqlog.Fatal().Msg(err.Error())
}
finder := xurlfind3r.New(options)
URLs := finder.Find()

if output != "" {
directory := filepath.Dir(output)
Expand Down
12 changes: 6 additions & 6 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ require (
github.com/logrusorgru/aurora/v3 v3.0.0
github.com/spf13/pflag v1.0.5
github.com/tomnomnom/linkheader v0.0.0-20180905144013-02ca5825eb80
github.com/valyala/fasthttp v1.44.0
github.com/valyala/fasthttp v1.47.0
gopkg.in/yaml.v3 v3.0.1
)

require (
github.com/andybalholm/brotli v1.0.4 // indirect
github.com/klauspost/compress v1.15.9 // indirect
github.com/andybalholm/brotli v1.0.5 // indirect
github.com/klauspost/compress v1.16.3 // indirect
github.com/valyala/bytebufferpool v1.0.0 // indirect
golang.org/x/net v0.7.0 // indirect
golang.org/x/sys v0.5.0 // indirect
golang.org/x/term v0.5.0 // indirect
golang.org/x/net v0.8.0 // indirect
golang.org/x/sys v0.6.0 // indirect
golang.org/x/term v0.6.0 // indirect
)
Loading

0 comments on commit a852204

Please sign in to comment.