A New App! A Technical Internet Utility
This is an internet crawling, HTML scraping Android App. Crawl throughout the internet, from website to website, scraping here and scraping there, to be wherever, to find whatever there is to be found,
to see what you can see…
Celtic dragons, symbolic of power and wisdom,
protect the Earth and all living things.
CrawlNScrape: Ethical Internet-Crawler & HTML-Scraper
Crawl Wherever! Scrape Whatever!
What is CrawlNScrape?
CrawlNScrape [Crawl And Scrape] is an Android app that facilitates crawling through the internet, following links from website to website, peering in here and there, getting an introduction to ethical internet crawling and HTML scraping, to go wherever you go, to find whatever you may come across, to see what there is to be seen. Yes, this is a true crawl through unfamiliar, and perhaps unknown, facets of the internet.
As an internet crawler and HTML scraper CrawlNScrape permits you to visit arbitrary websites and to extract whatever data may be found there - technical bits such as details of the HTML code, images, icon, author, description, keywords, Meta Data, Forms Data, Media, and especially IP addresses, geoLocations and links - and more especially - links to other websites!
Unlike a traditional Web Crawler, with CrawlNScrape the web crawling is under your control. A typical web crawler such as a Google bot is given a list of “seed sites” and turned loose to crawl and scrape. With CrawlNScrape, you are the bot and CrawlNScrape is your tool for crawling and scraping. You control the choice of seed site, which sites you will visit and what data you will scrape. For ideas about choosing seed sites consider a Google search for “seed websites” or any specific topics of interest to yourself.
If you are interested in internet crawling and HTML scraping you should enjoy working with this app. It can be tedious until you become familiar with how to Select | Copy | Paste on your device, how to use technical parts of the app like the stack, until you accommodate yourself to the crawling - this is not a road race! and until you discover which websites are “good seeds” for your particular interests - preferably those with many offsite links.
A note concerning Ethical Web Scraping…
The web crawler should respect the rules set by robots.txt and avoid frequent visits to any single website. CrawlNScrape gives you the tools to work this way. HTML scraping is just like any tool in the world - you can use it for good stuff and you can use it for bad stuff. That web scraping itself is not illegal doesn’t mean you can scrape any site you want. Some sites explicitly block any sort of automated data extraction either via the robots.txt file or their Terms of Service page. You can find any terms of service page with a browser and Google search. CrawlNScrape gives you the tools to download and study the robots.txt file, so you can choose to visit or not visit individual sites, and to scrape or not scrape various folders and files, as appropriate.
Crawling and Scraping the Deeper Web!
With Meta Quest you can collect URLs of pages where you may want to extract the HTML code and data in a special data structure called the Stack. With Deeper Crawling the idea is to search any web page for links, especially for links to other web pages or web sites in other countries. Then explore those pages and sites for further links to other websites, to yet other countries, to wherever. Then continue, deeper and deeper, through the World Wide Web.
From the opening view CrawlNScrape has practical, introductory lessons to get you started. Plus you will find that you can exit to any other app such as Google Maps, Google Search, a text editor and to your favorite browser, then return to CrawlNScrape while keeping your “breadcrumbs” intact in the Stack, so you can go wherever there is a place to go and explore whatever is to be found there, with confidence that you can get back there again.
Download CrawlNScrape from Google Play Store
First, before you run the app you will want to get a free, private API_KEY from https://ipgeolocation.io/ This will allow your copy of CrawlNScrape to retrieve from the internet the geoLocation that corresponds to each IP address that it finds. An illustrated set of steps to follow to get your free, personal API_KEY is here…
Second, a preview is available right now…
This introductory Crawl begins with an overview of the CrawlNScrape menu options so you gain an understanding of the app structure and flow, including installing the free API_KEY. It then starts a crawl at https://www.example.com in Phoenix, Arizona, United States and tours throughout the internet to Stockholm, Stockholm County, Sweden. Click here to get started…
Lastly, From your Android device click here to download CrawlNScrape…
… and continue this tour through Stockholm, Stockholm County, Sweden; London, England, UK; Dublin, Leinster, Ireland; and, well, to wherever…
… to see what you can see