aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md43
1 files changed, 43 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..cf15bb4
--- /dev/null
+++ b/README.md
@@ -0,0 +1,43 @@
+# CS172 Group 10 Project
+
+## Scraper Instructions
+
+To run the scraper, run the following script:
+
+```bash
+./scraper.sh
+```
+
+This will run the scraper with the default settings and create a CSV file titled “computerscience_data.csv”
+in the current working directory.
+
+The scraper can be configured with environment variables like so:
+
+```bash
+SCRAPY_MAX_CONCURRENT_REQUESTS="16" SCRAPY_MAX_FILE_SIZE_GB="1" ./scraper.sh
+```
+
+The following environment variables exist:
+
+- `SCRAPY_MAX_FILE_SIZE_GB`
+ - Maximum file size in GB before scraper is closed
+ - Default: 0.5
+- `SCRAPY_MAX_CONCURRENT_REQUESTS`
+ - Maximum number of concurrent requests
+ - Default: 8
+- `SCRAPY_MAX_REQUESTS_PER_DOMAIN`
+ - Maximum concurrent requests per domain
+ - Default: 4
+- `SCRAPY_OUTPUT_FILE`
+ - Output CSV File
+ - Default: “computerscience_data.csv”
+
+## Group Members
+
+| Name | SID | NID |
+|--------------------------|-----------|----------|
+| Nikhil Anand Mahendrakar | 862464249 | nmahe008 |
+| Anshul Gupta | 862319580 | agupt109 |
+| Ishaan Bijor | 862128714 | ibijo001 |
+| Junbo Yang | 862234040 | jyang389 |
+| Junyan Hou | 862394589 | jhou038 |