View Full Version : prevent download entire website
I'm not an expert but I was able to create a website that is currently online, I have a lot of files in sub directories that are linked from the main page, it works fine.
I got a "download entire website" program to test & see if i could download everything just by entering my website name & success...
what I wanna do is let people download off the website but restrict access through the "download entire website" type programs.
The purpose of the Internet is to publish information so that others can view it. So, the short answer is you cannot prevent people and scripts/programs from following the links on your pages and saving the HTML/CSS/Javascript/images. Also see this link - http://www.programmingtalk.com/showthread.php?t=21985
A script/program that indexes all the pages on your site, simply starts at the root document and extracts all the links and any CSS/javascript/images and saves them. It then visits and repeats this for all the links it found that are within your domain, the same way that a visitor could follow all the links he sees on your pages.
End User
07-27-06, 01:55 PM
This is for all intents and purposes impossible to do. You can make it more difficult but there isn't really anything you can do to prevent people from copying your site. There are some limitations you can put in place, but if the information is available to a browser then it can be harvested.
I want to make it as difficult as possible what could I do?
End User
07-28-06, 12:25 AM
I want to make it as difficult as possible what could I do?
1) Serve your content dynamically,
2) vary the output dynamically so it's difficult to scrape programmatically,
3) randomly inject and/or generate inconsistent internal page structures,
4) institute time- and page-based limits or caps,
5) use an exclusionary robots.txt file,
6) serve images dynamically, renaming them dynamically as well,
7) make use of htaccess as widely as possible.
This will stop or slow down a lot of folks. It wouldn't stop me, though. If I wanted your content badly enough, I'd get it.
As the fellow posters already said, its the poit of publishing that people get your stuff.
What are you afraid of?
If its the design : forget it. If somebody gets just one page, your design is gone.
If its the information, then I don't know why you publish in the first place.
If its the bandwith, then you should throttle your apache, if your provider allows that, but page access gets slower!
You could also put a robots.txt in, which disallows spidering of pictures etc. but make sure the google bot will see it still (and not all spidering programs respect robots.txt).
Happy Coding!
curbview.com
07-28-06, 04:12 AM
I'm not an expert but I was able to create a website that is currently online, I have a lot of files in sub directories that are linked from the main page, it works fine.
I got a "download entire website" program to test & see if i could download everything just by entering my website name & success...
what I wanna do is let people download off the website but restrict access through the "download entire website" type programs.
I read with great interest to see if anyone was going to give you the best solution, but I have yet to see someone even mention it. There are two ways to prevent programs like that from running. Run your application on one of my sites to test what I mean. (http://www.curbview.com) Tell me how many pages you are able to download :)
If your pages are served via Perl or PHP, simply "turn off" any request that is not using GET or POST. The second method is to turn your links into a language that your browser can interpret, but very difficult for spiders. (This will prevent most search engines from viewing your site, but you didn't mention that you are concerned about such.) Take the following *code* and place it in a html document. Then view it. Do you see a difference? **Remove the spaces in the line below when you have pasted it in your html document.
info & # 64 ; curbview.com
Use the same method for your links and cock your feet up on the desk while you laugh at the bots... :)
Error:
195.Red-88-15-227.dynamicIP.rima-tde.net | 88.15.227.195 has been banned
I can't access your site. It says I'm banned. lol
Me too, I don't even see the entry page.
Of course, you can forbid apache to serve anyone, buts whats the point?
curbview.com: Good down,oadsers such as wget have no problems providing post variables, so your method is useless against this tool.
curbview.com
07-28-06, 06:33 AM
I can't access your site. It says I'm banned. lol
Things work like I want them to :)
Here, try your repeated 28 requests again. :) :)
curbview.com
07-28-06, 06:39 AM
wget have no problems providing post variables, so your method is useless against this tool.
I guess that is why you don't see the entry page... Doesn't work eh? yea right... :)
curbview.com
07-28-06, 06:41 AM
I guess that is why you don't see the entry page... Doesn't work eh? yea right... :)
I see you though:
<< snip >>
I guess that is why you don't see the entry page... Doesn't work eh? yea right... :)
Well, I just looked at it using my normal browser.
To the original poster: The point made in the above exchange of posts is - if you want to make your site usable and viewable to human visitors (which includes the majority of the people on this planet :p ), don't be paranoid and waste your time trying to make it dysfunctional. You will just end up loosing customers that cannot view it for one reason or another (you cannot test for every possible eventuality.)
curbview.com
07-28-06, 01:09 PM
To the original poster: The point made in the above exchange of posts is - if you want to make your site usable and viewable to human visitors (which includes the majority of the people on this planet :p ), don't be paranoid and waste your time trying to make it dysfunctional. You will just end up loosing customers that cannot view it for one reason or another (you cannot test for every possible eventuality.)
What point was made? Duesi can see it with a normal browser, you did too. I have yet to have someone say that a "site - ripper" was able to download the site. My method works, search google, yahoo... I actually had to ban both for hitting my site so hard. SIMPLY SEARCH THEM. And before you say it, I still have them visiting the site just like "other" people on the planet. Here is a live snapshot of visitors (Mind ya, my site is not even open yet):
<< snip >>
End User
07-28-06, 05:29 PM
If your pages are served via Perl or PHP, simply "turn off" any request that is not using GET or POST. The second method is to turn your links into a language that your browser can interpret, but very difficult for spiders.
Trust me, neither of these techniques would prevent me from harvesting your site if I wanted the data.
curbview.com
07-28-06, 06:01 PM
Trust me, neither of these techniques would prevent me from harvesting your site if I wanted the data.
I hear a lot of chatter but no data... PROVE ME WRONG... I KNOW YOU WANT TO :) but the proof has YET to be seen. :)
End User
07-28-06, 07:03 PM
I hear a lot of chatter but no data... PROVE ME WRONG... I KNOW YOU WANT TO :) Whatever. I have no interest iin proving you wrong. I'm also old enough so that I don't feel compelled to rise to childish "bet you can't do it" challenges like this.
Honestly though, if you think you can put data on the web without someone being able to harvest it, you're considerably more foolish and inexperienced than I thought, lol.
If you want to feel secure, be my guest.... but like I said, if I wanted your data, I'd get it. I just don't want it. Real estate doesn't interest me. Clients pay me to harvest data from sites like yours all the time, but I don't do it for free or just to prove a point. If someone paid me, I'd have your site's data completely collected in a few days, possibly less. :)
curbview.com
07-28-06, 07:20 PM
Whatever. I have no interest iin proving you wrong. I'm also old enough so that I don't feel compelled to rise to childish "bet you can't do it" challenges like this.
Honestly though, if you think you can put data on the web without someone being able to harvest it, you're considerably more foolish and inexperienced than I thought, lol.
If you want to feel secure, be my guest.... but like I said, if I wanted your data, I'd get it. I just don't want it. Real estate doesn't interest me. Clients pay me to harvest data from sites like yours all the time, but I don't do it for free or just to prove a point. If someone paid me, I'd have your site's data completely collected in a few days, possibly less. :)
MY POINT HAS BEEN PROVEN!!! (But I knew it would) Case closed... If you have any other mindless chatter, please send it privately. Otherwise, for those that need a solution, feel free to post "constructive" questions and don't believe the hype of naysayers that protecting your site is not possible.
End User
07-28-06, 09:43 PM
MY POINT HAS BEEN PROVEN!!!If this is what you call "proving a point", I'd suggest that you not plan on pursuing a career in law. But your enthusiasm on the subject is adorable.
If you have any other mindless chatter, please send it privately. I'll certainly defer to you on the subject of "mindless chatter", as you seem to be quite the expert. :)
Otherwise, for those that need a solution, feel free to post "constructive" questions and don't believe the hype of naysayers that protecting your site is not possible.Well, I would caution people not to rely upon anyone who insists that site content can't be taken. It can be, regardless of what the less informed among us claim. :)
Of course, if you're so sure of yourself, how about you and I each put some worthwhile amount (perhaps $10,000?) in a trusted escrow.... I'll have some mutually agreed-upon time to reproduce your site's content. If I can extract your site's content I get your $10K. If I can't, you'll get mine.
Since you seem to be so certain that it's not possible for me to do this then you should have nothing to fear, right? If you can't afford $10K, we might agree on a lesser sum, but I won't do it for peanuts.
Like I said, I do this sort of thing all the time for paying clients, and I haven't missed a house payment yet. :)
Closed for obvious reason.
vBulletin® v3.6.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.