diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 43 |
1 files changed, 43 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..cf15bb4 --- /dev/null +++ b/README.md @@ -0,0 +1,43 @@ +# CS172 Group 10 Project + +## Scraper Instructions + +To run the scraper, run the following script: + +```bash +./scraper.sh +``` + +This will run the scraper with the default settings and create a CSV file titled “computerscience_data.csv” +in the current working directory. + +The scraper can be configured with environment variables like so: + +```bash +SCRAPY_MAX_CONCURRENT_REQUESTS="16" SCRAPY_MAX_FILE_SIZE_GB="1" ./scraper.sh +``` + +The following environment variables exist: + +- `SCRAPY_MAX_FILE_SIZE_GB` + - Maximum file size in GB before scraper is closed + - Default: 0.5 +- `SCRAPY_MAX_CONCURRENT_REQUESTS` + - Maximum number of concurrent requests + - Default: 8 +- `SCRAPY_MAX_REQUESTS_PER_DOMAIN` + - Maximum concurrent requests per domain + - Default: 4 +- `SCRAPY_OUTPUT_FILE` + - Output CSV File + - Default: “computerscience_data.csv” + +## Group Members + +| Name | SID | NID | +|--------------------------|-----------|----------| +| Nikhil Anand Mahendrakar | 862464249 | nmahe008 | +| Anshul Gupta | 862319580 | agupt109 | +| Ishaan Bijor | 862128714 | ibijo001 | +| Junbo Yang | 862234040 | jyang389 | +| Junyan Hou | 862394589 | jhou038 | |