shardz

- Unnamed repository; edit this file 'description' to name the repository.
git clone git://git.acid.vegas/-c.git
Log | Files | Refs | Archive | README | LICENSE

README.md (2876B)

      1 <h1 align="center">Shardz</h1>
      2 <p align="center">
      3     <img src="./.screens/shardz.jpg">
      4 </p>
      5 
      6 Shardz is a lightweight C utility that shards *(splits)* the output of any process for distributed processing. It allows you to easily distribute workloads across multiple processes or machines by splitting input streams into evenly distributed chunks.
      7 
      8 ## Use Cases
      9 - Distributing large datasets across multiple workers
     10 - Parallel processing of log files
     11 - Load balancing input streams
     12 - Splitting any line-based input for distributed processing
     13 
     14 ## Building & Installation
     15 
     16 ### Quick Build
     17 ```bash
     18 gcc -o shardz shardz.c
     19 ```
     20 
     21 ### Using Make
     22 ```bash
     23 # Build only
     24 make
     25 
     26 # Build and install system-wide (requires root/sudo)
     27 sudo make install
     28 
     29 # To uninstall
     30 sudo make uninstall
     31 ```
     32 
     33 ## Usage
     34 ```bash
     35 some_command | shardz INDEX/TOTAL
     36 ```
     37 
     38 Where:
     39 - `INDEX` is the shard number (starting from 1)
     40 - `TOTAL` is the total number of shards
     41 
     42 ### Examples
     43 Let's say you have a very large list of domains and you want to do recon on each domain. Using a single machine, this could take a very long time. However, you can split the workload across multiple machines:
     44 
     45 - Machine number 1 would run:
     46 ```bash
     47 curl https://example.com/datasets/large_domain_list.txt | shardz 1/3 | httpx -title -ip -tech-detect -json -o shard-1.json
     48 ```
     49 
     50 - Machine number 2 would run:
     51 ```bash
     52 curl https://example.com/datasets/large_domain_list.txt | shardz 2/3 | httpx -title -ip -tech-detect -json -o shard-2.json
     53 ```
     54 
     55 - Machine number 3 would run:
     56 ```bash
     57 curl https://example.com/datasets/large_domain_list.txt | shardz 3/3 | httpx -title -ip -tech-detect -json -o shard-3.json
     58 ```
     59 
     60 ## How It Works
     61 
     62 Shardz uses a modulo operation to determine which lines should be processed by each shard. For example, with `3` total shards:
     63 - Shard 1 processes lines 1, 4, 7, 10, ...
     64 - Shard 2 processes lines 2, 5, 8, 11, ...
     65 - Shard 3 processes lines 3, 6, 9, 12, ...
     66 
     67 This ensures an even distribution of the workload across all shards.
     68 
     69 ## Simplicity
     70 
     71 For what its worth, the same functionality of this tool can be done with a bash function in your `.bashrc`:
     72 ```bash
     73 shardz() {
     74 	awk -v n="$1" -v t="$2" 'NR % t == n'
     75 }
     76 ```
     77 
     78 ```bash
     79 cat domains.txt | shardz 1/3 | httpx -title -ip -tech-detect -json -o shard-1.json
     80 cat domains.txt | shardz 2/3 | httpx -title -ip -tech-detect -json -o shard-2.json
     81 cat domains.txt | shardz 3/3 | httpx -title -ip -tech-detect -json -o shard-3.json
     82 ```
     83 
     84 This was just a fun little project to brush up on my C, and to explore the requirements to having a package added to Linux package manager repositories.
     85 
     86 ---
     87 
     88 ###### Mirrors: [acid.vegas](https://git.acid.vegas/shardz) • [SuperNETs](https://git.supernets.org/acidvegas/shardz) • [GitHub](https://github.com/acidvegas/shardz) • [GitLab](https://gitlab.com/acidvegas/shardz) • [Codeberg](https://codeberg.org/acidvegas/shardz)