The Ask.com search technology uses semantic and extraction capabilities to recognize the best answer from within a sea of relevant pages. Instead of 10 blue links, Ask delivers the best answer to user's questions right at the top of the page. By using an established technique pioneered at Ask, our search technology uses click-through behavior to determine a site's relevance and extract the answer. Unlike presenting text snippets of the destination site, this technology presents the actual answer to a user's question without requiring an additional click through. Underpinning these advancements are Ask.com's innovative DADS, DAFS, and AnswerFarm technologies, which break new ground in the areas of semantic search, web extraction and ranking. These technologies index questions and answers from numerous and diversified sources across the web. It then applied its semantic search technology advancements in clustering, rephrasing, and answer relevance to filter out insignificant and less meaningful answer formats. In order to extract and rank exciting answers, as opposed to merely ranking web pages, Ask.com continues to develop a unique algorithms and technologies that are based on new signals for evaluating relevancy specifically tuned to questions.
Ask's Website crawler is our Web-indexing robot (or crawler/spider). The crawler collects documents from the
Web to build the ever-expanding index for our advanced search functionality at Ask and other Web sites that
license the proprietary Ask search technology.
Ask search technology is unique from any other search technology because it analyzes the Web as it actually exists -- in
subject-specific communities. This process begins by creating a comprehensive and high-quality index. Web crawling is
an essential tool for this approach, and it ensures that we have the most up-to-date search results.
On this page you'll find answers to the most commonly asked questions about how the Ask Website crawler works. For these
and other Webmaster FAQs, visit our
Searchable
FAQ Database.
Q: What is a website crawler?
A: A website crawler is a software program designed
to follow hyperlinks throughout a Web site, retrieving and indexing
pages to document the site for searching purposes. The crawlers
are innocuous and cause no harm to an owner's site or servers.
Q: Why does Ask use website crawlers?
A: Ask utilizes website crawlers to collect raw data and gather information
that is used in building our ever-expanding search index. Crawling
ensures that the information in our results is as up-to-date and
relevant as it can possibly be. Our crawlers are well designed and
professionally operated, providing an invaluable service that is
in accordance with search industry standards.
Q: How does the Ask crawler work?
Q: How frequently will the Ask Crawler download pages from my site?
A: The crawler will download only one page at a time from your site
(specifically, from your IP address). After it receives a page, it
will pause a certain amount of time before downloading the next page.
This delay time may range from 0.1 second to hours. The quicker your
site responds to the crawler when it asks for pages, the shorter the
delay.
Q. Can I prevent Teoma/Ask search
engine from showing a cached copy of my page?
A: Yes. We obey the "noarchive" meta tag. If you place the
following command in your HTML page, we will not provide an archived
copy of the document to the user.
Q: Does Ask observe the Robot Exclusion Standard?
A: Yes, we obey the 1994 Robots Exclusion Standard (RES), which
is part of the Robot Exclusion Protocol. The Robots Exclusion Protocol
is a method that allows Web site administrators to indicate to robots
which parts of their site should not be visited by the robot. For
more information on the RES, and the Robot Exclusion Protocol, please
visit http://www.robotstxt.org/wc/exclusion.html.
Q: Can I prevent the Ask crawler from
indexing all or part of my site/URL?
A: Yes. The Ask crawler will respect and obey commands that direct it not to index all or part of a given URL. To specify that the
Ask crawler visit only pages whose paths begin with /public, include
the following lines:
Q: Where do I put my robots.txt file?
A: Your file must be at the top level of your Web site, for example,
if www.mysite.com is the name
of your Web site, then the robots.txt
file must be at http://www.mysite.com/robots.txt.
Q: How can I tell if the Ask crawler has visited my site/URL?
A: To determine whether the Ask crawler has visited your site, check
your server logs. Specifically, you should be looking for the following
user-agent string:
Q: How can I prevent the Ask crawler
from indexing my page or following links from a particular page?
A: If you place the following command in the
Q: Why is the Ask crawler
downloading the same page on my site multiple times?
A: Generally, the Ask crawler should only download one copy of each
file from your site during a given crawl. There are two exceptions:
Q: Why is the Ask crawler trying to
download incorrect links from my server? Or from a server that doesn't
exist?
A: It is a property of the Web that many links will be broken or
outdated at any given time. Whenever any Web page contains a broken
or outdated link to your site, or to a site that never existed or
no longer exists, Ask will visit that link trying to find the
Web page it references. This may cause the crawler to ask for URLs
which no longer exist or which never existed, or to try to make
HTTP requests on IP addresses which no longer have a Web server
or never had one. The crawler is not randomly generating addresses;
it is following links. This is why you may also notice activity
on a machine that is not a Web server.
Q: How did the Ask Website crawler find my URL?
A: The Ask crawler finds pages by following links (HREF tags in
HTML) from other pages. When the crawler finds a page that contains
frames (i.e., it is a frameset), the crawler downloads the component
frames and includes their content as part of the original page.
The Ask crawler will not index the component frames as URLs themselves
unless they are linked via HREF from other pages.
Q: What types of links does the Ask crawler follow?
A: The Ask crawler will follow HREF links, SRC links and re-directs.
Q. Can I control the rate at which the Ask crawler visits my site?
A. Yes. We support the "Crawl-Delay" robots.txt directive. Using this directive you may specify the minimum delay between two successive requests from our spider to your site.
Q: Why has the Ask crawler not visited my URL?
A: If the Ask crawler has not visited your URL, it is because
we did not discover any link to that URL from other pages (URLs)
we visited.
Q: Does Ask crawler support HTTP compression?
A: Yes, it does. Both HTTP client and server should support this
for the HTTP compression feature to work. When supported, it lets
webservers send compressed documents (compressed using gzip or other
formats) instead of the actual documents. This would result in significant
bandwidth savings for both the server and the client. There is a
little CPU overhead at both server and client for encoding/decoding,
but it is worth it. Using a popular compression method such as gzip,
one could easily reduce file size by about 75%.
Q: How do I register my site/URL with Ask so that it will be indexed?
A: We appreciate your interest in having your site listed on Ask.com and the Ask.com search engine. Your best bet is to
follow the open-format Sitemaps protocol, which Ask.com supports. Once you have
prepared a sitemap for your site, add the sitemap auto-discovery directive to robots.txt, or submit the sitemap file
directly to us via the ping URL. (For more information on this process, see Does Ask.com support sitemaps?)
Please note that sitemap submissions do not guarantee the indexing of URLs.
Create your Web site and set up your Web server to optimize how search engines look at your site's content, and how they
index and trigger based upon different types of search keywords. You'll find a variety of resources online that
provide tips and helpful information on how to best do this.
Q: Why aren't the pages the Ask crawler indexed showing up in the search results at Ask.com?
A: If you don't see your pages indexed in our search results, don't
be alarmed. Because we are so thorough about the quality of our
index, it takes some time for us to analyze the results of a crawl
and then process the results for inclusion into the database. Ask
does not necessarily include every site it has crawled in its index.
Q: Can I control the crawler request rate from Ask spider to my site?
A: Yes. We support the "Crawl-Delay" robots.txt directive. Using
this directive you may specify the minimum delay between two successive
requests from our spider to your site.
Q. How do I authenticate the Ask Crawler?
A: A. User-Agent is no guarantee of authenticity as it is trivial for a malicious user to mimic the properties of the
Ask Crawler. In order to properly authenticate the Ask Crawler, a round trip DNS lookup is required. This involves
first taking the IP address of the Ask Crawler and performing a reverse DNS lookup ensuring that the IP address
belongs to the ask.com domain. Then perform a forward DNS lookup with the host name ensuring that the resulting
IP address matches the original.
Q: Does Ask.com support sitemaps?
A: Yes, Ask.com supports the open-format Sitemaps protocol. Once you have prepared the sitemap, add the sitemap auto-discovery directive to robots.txt as follows:
SITEMAP: http://www.the URL of your sitemap here.xml
The sitemap location should be the full sitemap URL. Alternatively, you can also submit your sitemap through the ping URL:
http://submissions.ask.com/ping?sitemap=http%3A//www.the URL of your sitemap here.xml
Please note that sitemap submissions do not guarantee the indexing of URLs. To learn more about the protocol, please visit the Sitemaps web site at http://www.sitemaps.org.
Q: How can I add Ask.com search to my site?
A: We've made this easy, you can generate the necessary code here.
Q: How can I get additional information?
A: Please visit our full Searchable FAQ Database.
Please note that we cannot honor your emails regarding updates to
your site/URL or requests to be indexed.