The Deep
Web (also called the Deepnet, the Invisible Web, the Undernet or the Hidden
Web) is World Wide Web content that is not part of the Surface Web, which is
indexed by standard search engines like Google. It should not be confused with
the dark Internet, the computers that can no longer be reached via Internet, or
with a Darknet distributed filesharing network, which could be classified as a
smaller part of the Deep Web. [1]
The Deep
Web is a complex concept. It is essentially two categories of data. The first
is basically any information that is not easy to obtain through standard
searching, which could be Twitter or Facebook posts, links buried many layers
down in a dynamic page, or results that sit so far down the standard search
results that typical users will never find them. The second category is the
larger of the two and represents a vast repository of information that is not
accessible to standard search engines. It is comprised of content found in
websites, databases, and other sources. Often it is only accessible through a
custom query directed at individual websites, which cannot be accomplished by a
simple “surface web” search. [2] Some of the more comprehensive search engines
have written algorithms to search the deeper portions of the world wide web by
attempting to find files such as .pdf, .docx, .xls, ppt, .ps. and others. These
files are predominately used by businesses to communicate within their
organization or to disseminate topical information and work product to
customers and potential clients.
[Fig] the Deep Web size compared to the Surface Web size, taken from :
Traditional
search engines create their indices by spidering or crawling surface Web pages.
To be discovered, the page must be static and linked to other pages.
Traditional search engines cannot "see" or retrieve content in the Deep
Web ; those pages do not exist until they are created dynamically as the result
of a specific search. Because traditional search engine crawlers can not probe
beneath the surface, the deep Web has heretofore been hidden. The Deep Web is
qualitatively different from the surface Web. Deep Web sources store their
content in searchable databases that only produce results dynamically in
response to a direct request. But a direct query is a "one at a time"
laborious way to search.
Surface Web
: Parts of the internet that can be found via link crawling techniques –
meaning it is linked data and can be found via a link from the homepage of a
domain; Google can find this data. Deep Web : Portions of the internet that
cannot be accessed by a link crawling search engine like Google. The only way a
user can access this portion of the internet is by doing a directed query into
web search form to access content within a database that is not linked data. In
layman’s terms, a search that is within a particular website. [7]
To put it
in context, the Deep Web isn’t found in a single location. It consists of both
structured and unstructured content ; a huge amount of which is found in
databases. This content has often been compiled by experts, researchers,
analysts and through automated processing systems at an array of institutions
throughout the world. All of the content is housed in different systems, with
different structures, at physical locations that can be as far apart as New
York and Hong Kong. It’s almost impossible to measure the size of the Deep Web.
While some early estimates put the size of the Deep Web at 4,000-5,000 times
larger than surface web, the changing dynamic of how information is accessed
and presented means that the Deep Web is growing exponentially and at a rate
that defies quantification. [2]
“The Deep
Web has existed for more than a decade but came under the spotlight last month
after police shutdown the Silk Road website - the online marketplace dubbed the
'eBay of drugs' - and arrested its creator. But experts warn this has done next
to nothing to stem the rising tide of such illicit online exchanges, which are
already jostling to fill the gap now left in this unregulated virtual world. Meanwhile,
even as the Silk Road was trundling to a halt, already hundreds of other
websites were springing up in its place, peddling anything from drugs to stolen
identities, illegal weapons to sickening child pornography and even explosives.
In June it emerged one such site, called Atlantis, was even offering its wares in
an advert posted on YouTube. Hiring a hitman has never been easier. Nor has
purchasing cocaine or heroin, nor even viewing horrific child pornography. Such
purchases are now so easy, in fact, that they can all be done from the comfort
of one's home at the click of a button... and there's almost nothing the police
can do about it.” [3]
However is
the above hype all that justified ? There seems to be a grave misundestanding
of what the Deep Web actually is and what information is accessible on the
Internet through encryption procedures. All recent cases that have “shocked”
reporters and users alike are reffering to TOR sites, ie sites that are not
accessible through the standard search engines, but that happens because they
are encrypted via the TOR protocol.
TOR
(originally TOR, an acronym for The Onion Router, a usage now abandoned) is
free software for enabling online anonymity. TOR directs Internet traffic
through a free, worldwide, volunteer network consisting of more than four
thousand relays to conceal a user's location or usage from anyone conducting
network surveillance or traffic analysis. Using TOR makes it more difficult to
trace Internet activity, including "visits to Web sites, online posts,
instant messages, and other communication forms", back to the user and is
intended to protect the personal privacy of users, as well as their freedom and
ability to conduct confidential business by keeping their internet activities
from being monitored. [4] Read the “history” section of the Wikipedia article
and you will realise that TOR is neither secure, nor so much “independent” in
nature as most believe. TOR is not “on the deep web”, as most people suggest.
While Tor
is used by everyone from law enforcement to Syrian dissidents to protect
valuable information, it is a double-edged sword. Many experts warn that groups
ranging from the Russian mafia to international drug cartels are looking
closely at the lessons learned from the Silk Road. It took the FBI more than
two years of investigative work to find Ulbricht. They don’t have the resources
to compete with Silicon Valley in hiring, or the tools—a long-hoped for
modernization of the law governing online wiretapping is on ice in Congress
thanks to Edward Snowden. [5]
“The so called Darknet is a part of the internet encrypted and partially hidden from indexing, but it still runs on the physical network infrastructure and uses the TCP and IP protocols for transmission and identification. Far more interesting is the new physical networks being created, using preexisting power, cable and telecommunication lines, with new data transfer and node id protocols. Totally separate networks using a multitude of unique languages and rules. Three primitive forms run in LA ,New york and possibly London. Inter- network communication is provided by translator servers, with local protocol knowledge.These will surly be the the real darknet of the not so distant future.” [3, comments page]
In your
quest of surfing the Deep Web, you may try the following search companions,
however be advised that they mostly deal with TOR sites. The article source
offers also a comparison with regards to the performance of these search engines.
[6]
Evil Wiki : Without a doubt, this is the
single best entry point into the world of Tor. The well-maintained website
provides an organized list of links to hidden services with explanations and
even reviews. It’s not meant to be used as a search engine, but it often is.
TorSearch : A new search engine that has
garnered some buzz in publications like VentureBeat. It operates in much the
same way as Google, with a link-crawling spider that will forever build its
arsenal.
Google : With proxy tools like Onion.to,
Google actually crawls much of the Deep Web in a roundabout way. And because
it’s so popular, it’s the first tool that almost anyone who hears about the
Deep Web uses.
DuckDuckGo : Similar to Google but with one
significant difference, DuckDuckGo offers anonymous search, a feature in
keeping with Tor’s powers of anonymity. It’s no surprise that it’s popular
among the Tor crowd.
Torch : An older Deep Web search engine,
Torch has existed for a long time but little fanfare.
Hidden Wiki : The Hidden Wiki is a website that
uses hidden services available through the Tor network. The site has a
collection of links to other .onion sites, and encyclopedia articles in a wiki
format.
Further reading :
Bergman, Michael K., White Paper, “The Deep
Web: Surfacing Hidden Value”, http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104
Zillman, Marcus P., “Deep Web Research and
Discovery Resources 2013”, http://www.zillman.us/white-papers/deep-web-research-and-discovery-resources-2013-llrx-feature-article-and-online-white-paper/
Tools for Mining the Deep Web : http://www.learnthenet.com/how-to/search-the-deep-web/index.php?p=02
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.