The Ask.com search technology uses semantic and extraction capabilities to recognize the best answer from within a sea of relevant pages. Instead of 10 blue links, Ask delivers the best answer to user's questions right at the top of the page. By using an established technique pioneered at Ask, our search technology uses click-through behavior to determine a site's relevance and extract the answer. Unlike presenting text snippets of the destination site, this technology presents the actual answer to a user's question without requiring an additional click through. Underpinning these advancements are Ask.com's innovative DADS, DAFS, and AnswerFarm technologies, which break new ground in the areas of semantic search, web extraction and ranking. These technologies index questions and answers from numerous and diversified sources across the web. It then applied its semantic search technology advancements in clustering, rephrasing, and answer relevance to filter out insignificant and less meaningful answer formats. In order to extract and rank exciting answers, as opposed to merely ranking web pages, Ask.com continues to develop a unique algorithms and technologies that are based on new signals for evaluating relevancy specifically tuned to questions.
Ask's Website crawler is our Web-indexing robot (or crawler/spider). The crawler collects documents from the
Web to build the ever-expanding index for our advanced search functionality at Ask and other Web sites that
license the proprietary Ask search technology.
Ask search technology is unique from any other search technology because it analyzes the Web as it actually exists -- in subject-specific communities. This process begins by creating a comprehensive and high-quality index. Web crawling is an essential tool for this approach, and it ensures that we have the most up-to-date search results.
On this page you'll find answers to the most commonly asked questions about how the Ask Website crawler works. For these and other Webmaster FAQs, visit our Searchable FAQ Database.
Q: What is a website crawler?
A: A website crawler is a software program designed to follow hyperlinks throughout a Web site, retrieving and indexing pages to document the site for searching purposes. The crawlers are innocuous and cause no harm to an owner's site or servers.
Q: Why does Ask use website crawlers?
A: Ask utilizes website crawlers to collect raw data and gather information that is used in building our ever-expanding search index. Crawling ensures that the information in our results is as up-to-date and relevant as it can possibly be. Our crawlers are well designed and professionally operated, providing an invaluable service that is in accordance with search industry standards.
Q: How frequently will the Ask Crawler download pages from my site?
A: The crawler will download only one page at a time from your site (specifically, from your IP address). After it receives a page, it will pause a certain amount of time before downloading the next page. This delay time may range from 0.1 second to hours. The quicker your site responds to the crawler when it asks for pages, the shorter the delay.
Q. Can I prevent Teoma/Ask search
engine from showing a cached copy of my page?
A: Yes. We obey the "noarchive" meta tag. If you place the following command in your HTML page, we will not provide an archived copy of the document to the user.
Q: Does Ask observe the Robot Exclusion Standard?
A: Yes, we obey the 1994 Robots Exclusion Standard (RES), which is part of the Robot Exclusion Protocol. The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to robots which parts of their site should not be visited by the robot. For more information on the RES, and the Robot Exclusion Protocol, please visit http://www.robotstxt.org/wc/exclusion.html.
Q: Can I prevent the Ask crawler from
indexing all or part of my site/URL?
A: Yes. The Ask crawler will respect and obey commands that direct it not to index all or part of a given URL. To specify that the Ask crawler visit only pages whose paths begin with /public, include the following lines:
Q: Where do I put my robots.txt file?
A: Your file must be at the top level of your Web site, for example, if www.mysite.com is the name of your Web site, then the robots.txt file must be at http://www.mysite.com/robots.txt.
Q: How can I tell if the Ask crawler has visited my site/URL?
A: To determine whether the Ask crawler has visited your site, check your server logs. Specifically, you should be looking for the following user-agent string:
Q: Why is the Ask crawler
downloading the same page on my site multiple times?
A: Generally, the Ask crawler should only download one copy of each file from your site during a given crawl. There are two exceptions:
Q: Why is the Ask crawler trying to
download incorrect links from my server? Or from a server that doesn't
A: It is a property of the Web that many links will be broken or outdated at any given time. Whenever any Web page contains a broken or outdated link to your site, or to a site that never existed or no longer exists, Ask will visit that link trying to find the Web page it references. This may cause the crawler to ask for URLs which no longer exist or which never existed, or to try to make HTTP requests on IP addresses which no longer have a Web server or never had one. The crawler is not randomly generating addresses; it is following links. This is why you may also notice activity on a machine that is not a Web server.
Q: How did the Ask Website crawler find my URL?
A: The Ask crawler finds pages by following links (HREF tags in HTML) from other pages. When the crawler finds a page that contains frames (i.e., it is a frameset), the crawler downloads the component frames and includes their content as part of the original page. The Ask crawler will not index the component frames as URLs themselves unless they are linked via HREF from other pages.
Q. Can I control the rate at which the Ask crawler visits my site?
A. Yes. We support the "Crawl-Delay" robots.txt directive. Using this directive you may specify the minimum delay between two successive requests from our spider to your site.
Q: Does Ask crawler support HTTP compression?
A: Yes, it does. Both HTTP client and server should support this for the HTTP compression feature to work. When supported, it lets webservers send compressed documents (compressed using gzip or other formats) instead of the actual documents. This would result in significant bandwidth savings for both the server and the client. There is a little CPU overhead at both server and client for encoding/decoding, but it is worth it. Using a popular compression method such as gzip, one could easily reduce file size by about 75%.
Q: How do I register my site/URL with Ask so that it will be indexed?
A: We appreciate your interest in having your site listed on Ask.com and the Ask.com search engine. Your best bet is to follow the open-format Sitemaps protocol, which Ask.com supports. Once you have prepared a sitemap for your site, add the sitemap auto-discovery directive to robots.txt, or submit the sitemap file directly to us via the ping URL. (For more information on this process, see Does Ask.com support sitemaps?) Please note that sitemap submissions do not guarantee the indexing of URLs.
Create your Web site and set up your Web server to optimize how search engines look at your site's content, and how they index and trigger based upon different types of search keywords. You'll find a variety of resources online that provide tips and helpful information on how to best do this.
Q: Why aren't the pages the Ask crawler indexed showing up in the search results at Ask.com?
A: If you don't see your pages indexed in our search results, don't be alarmed. Because we are so thorough about the quality of our index, it takes some time for us to analyze the results of a crawl and then process the results for inclusion into the database. Ask does not necessarily include every site it has crawled in its index.
Q: Can I control the crawler request rate from Ask spider to my site?
A: Yes. We support the "Crawl-Delay" robots.txt directive. Using this directive you may specify the minimum delay between two successive requests from our spider to your site.
Q. How do I authenticate the Ask Crawler?
A: A. User-Agent is no guarantee of authenticity as it is trivial for a malicious user to mimic the properties of the Ask Crawler. In order to properly authenticate the Ask Crawler, a round trip DNS lookup is required. This involves first taking the IP address of the Ask Crawler and performing a reverse DNS lookup ensuring that the IP address belongs to the ask.com domain. Then perform a forward DNS lookup with the host name ensuring that the resulting IP address matches the original.
Q: Does Ask.com support sitemaps?
A: Yes, Ask.com supports the open-format Sitemaps protocol. Once you have prepared the sitemap, add the sitemap auto-discovery directive to robots.txt as follows:
SITEMAP: http://www.the URL of your sitemap here.xml
The sitemap location should be the full sitemap URL. Alternatively, you can also submit your sitemap through the ping URL:
http://submissions.ask.com/ping?sitemap=http%3A//www.the URL of your sitemap here.xml
Please note that sitemap submissions do not guarantee the indexing of URLs. To learn more about the protocol, please visit the Sitemaps web site at http://www.sitemaps.org.
Q: How can I add Ask.com search to my site?
A: We've made this easy, you can generate the necessary code here.
Q: How can I get additional information?
A: Please visit our full Searchable FAQ Database.
Please note that we cannot honor your emails regarding updates to your site/URL or requests to be indexed.