Book-bot.com - Children's Internet Protection Act (CIPA) Ruling by United States District Court For The Eastern District Of Pennsylvania

Children's Internet Protection Act (CIPA) Ruling by United States District Court For The Eastern District Of Pennsylvania
page 31 of 209 (14%)

content. If a Web page or site is not linked by others, then
spidering will not discover that page or site.

Furthermore, many larger Web sites contain instructions,
through software, that prevent spiders from investigating that
site, and therefore the contents of such sites also cannot be
indexed using spidering technology. Because of the vast size and
decentralized structure of the Web, no search engine or directory
indexes all of the content on the publicly indexable Web. We
credit current estimates that no more than 50% of the content
currently on the publicly indexable Web has been indexed by all
search engines and directories combined. No currently available
method or combination of methods for collecting URLs can collect
the addresses of all URLs on the Web.
The portion of the Web that is not theoretically indexable
through the use of "spidering" technology, because other Web
pages do not link to it, is called the "Deep Web." Such sites or
pages can still be made publicly accessible without being made
publicly indexable by, for example, using individual or mass
emailings (also known as "spam") to distribute the URL to
potential readers or customers, or by using types of Web links
that cannot be found by spiders but can be seen and used by
readers. "Spamming" is a common method of distributing to
potential customers links to sexually explicit content that is
not indexable.
Because the Web is decentralized, it is impossible to say
exactly how large it is. A 2000 study estimated a total of 7.1
million unique Web sites, which at the Web's historical rate of
growth, would have increased to 11 million unique sites as of