shardz- Unnamed repository; edit this file 'description' to name the repository. |
git clone git://git.acid.vegas/-c.git |
Log | Files | Refs | Archive | README | LICENSE |
README.md (2876B)
1 <h1 align="center">Shardz</h1> 2 <p align="center"> 3 <img src="./.screens/shardz.jpg"> 4 </p> 5 6 Shardz is a lightweight C utility that shards *(splits)* the output of any process for distributed processing. It allows you to easily distribute workloads across multiple processes or machines by splitting input streams into evenly distributed chunks. 7 8 ## Use Cases 9 - Distributing large datasets across multiple workers 10 - Parallel processing of log files 11 - Load balancing input streams 12 - Splitting any line-based input for distributed processing 13 14 ## Building & Installation 15 16 ### Quick Build 17 ```bash 18 gcc -o shardz shardz.c 19 ``` 20 21 ### Using Make 22 ```bash 23 # Build only 24 make 25 26 # Build and install system-wide (requires root/sudo) 27 sudo make install 28 29 # To uninstall 30 sudo make uninstall 31 ``` 32 33 ## Usage 34 ```bash 35 some_command | shardz INDEX/TOTAL 36 ``` 37 38 Where: 39 - `INDEX` is the shard number (starting from 1) 40 - `TOTAL` is the total number of shards 41 42 ### Examples 43 Let's say you have a very large list of domains and you want to do recon on each domain. Using a single machine, this could take a very long time. However, you can split the workload across multiple machines: 44 45 - Machine number 1 would run: 46 ```bash 47 curl https://example.com/datasets/large_domain_list.txt | shardz 1/3 | httpx -title -ip -tech-detect -json -o shard-1.json 48 ``` 49 50 - Machine number 2 would run: 51 ```bash 52 curl https://example.com/datasets/large_domain_list.txt | shardz 2/3 | httpx -title -ip -tech-detect -json -o shard-2.json 53 ``` 54 55 - Machine number 3 would run: 56 ```bash 57 curl https://example.com/datasets/large_domain_list.txt | shardz 3/3 | httpx -title -ip -tech-detect -json -o shard-3.json 58 ``` 59 60 ## How It Works 61 62 Shardz uses a modulo operation to determine which lines should be processed by each shard. For example, with `3` total shards: 63 - Shard 1 processes lines 1, 4, 7, 10, ... 64 - Shard 2 processes lines 2, 5, 8, 11, ... 65 - Shard 3 processes lines 3, 6, 9, 12, ... 66 67 This ensures an even distribution of the workload across all shards. 68 69 ## Simplicity 70 71 For what its worth, the same functionality of this tool can be done with a bash function in your `.bashrc`: 72 ```bash 73 shardz() { 74 awk -v n="$1" -v t="$2" 'NR % t == n' 75 } 76 ``` 77 78 ```bash 79 cat domains.txt | shardz 1/3 | httpx -title -ip -tech-detect -json -o shard-1.json 80 cat domains.txt | shardz 2/3 | httpx -title -ip -tech-detect -json -o shard-2.json 81 cat domains.txt | shardz 3/3 | httpx -title -ip -tech-detect -json -o shard-3.json 82 ``` 83 84 This was just a fun little project to brush up on my C, and to explore the requirements to having a package added to Linux package manager repositories. 85 86 --- 87 88 ###### Mirrors: [acid.vegas](https://git.acid.vegas/shardz) • [SuperNETs](https://git.supernets.org/acidvegas/shardz) • [GitHub](https://github.com/acidvegas/shardz) • [GitLab](https://gitlab.com/acidvegas/shardz) • [Codeberg](https://codeberg.org/acidvegas/shardz)