I made a python script to scrape threads.net :) (lemmy.ca)

submitted 9 months ago* (last edited 9 months ago) by kionite231@lemmy.ca to c/python@programming.dev

3 comments fedilink hide all child comments

Hello,

I made a simple script to scraper threads.net using python and selenium. the script is just few lines long and it's easy to understand.

So what this script does?

first it will open edge browser(which you can change it to firefox or chrome). now you have to enter credentials to log into it. your browsing data and credentials will be stored in user_data which you can move around.

It scroll through threads's feed/hashtag/explore and It will store the src of every image it encounters so at the end we will have a links.txt file containing all the links to the images we have encountered.

now we have links.txt and we can use the following command to download all the images from the links.txt

wget -i links.txt

the script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.edge.options import Options
import time

options = Options()
options.add_argument("--user-data-dir=user_data")

driver = webdriver.Edge(options=options)

driver.get('https://threads.net')

s = set()

input("Press any key to continue...")
for i in range(30):
    try:
        elements = driver.find_elements(By.XPATH, "//img")
        for e in elements:
            s.add(e.get_attribute("src"))
        driver.execute_script("window.scrollBy(0, 1000);")
        time.sleep(0.2)
    except:
        print("oopsie")

with open("links.txt", 'w') as f:
    links = list(s)
    for l in links:
        f.write(l+"\n")

driver.quit()

I hope it was usefull :D

Edit: here is a link to links.txt https://0x0.st/HGjx.txt

you are viewing a single comment's thread
view the rest of the comments

[-] conorm@feddit.uk 1 points 9 months ago

nice code, could you share the asm for all that?