# This file lists the areas of www.ucs.ed.ac.uk that we do not want # to go into search engines. # # There is first a local section dealing with how we want the EUCS # Ultraseek service to see things, then a general section for all # external spiders, and finally a section with specific blocks on # known (faulty) spiders that dont handle the robot exclusion protocol # properly. # # Graham Rule 10/8/2000 # The User-agent string "EUCS-Ultraseek-Others" is used when the # local search engine is spidering for the 'others' collection. # This collection is not searched by default and contains files # which have their own search pages (eg back issues of Bits go in # here). User-agent: EUCS-Ultraseek-Others Disallow: /~ # exclude all user directories Disallow: /home/ Disallow: /fmd/unix/People/ Disallow: /fmd/unix/ug/members/ Disallow: /bin/ # exclude common cgi and binaries Disallow: /cgi/ Disallow: /cgi-bin/ Disallow: /cgi-bin-private/ Disallow: /scripts/ Disallow: /Icons/ # dont index lists of icons Disallow: /icons/ Disallow: /Template/ # exclude incomplete files included Disallow: /Templates/ # in documents as server-side includes Disallow: /includes/ Disallow: /Architext/ # stay clear of packages used to maintain Disallow: /Excite/ # our site and the files they produce Disallow: /wusage/ Disallow: /stats/ Disallow: /linkscan/ Disallow: /nsd/abman/ # we start our locally chosen exclusions Disallow: /nsd/clydenet/ # with a big chunk from NSD who have stats Disallow: /nsd/dept_errors/ # and other files they do not want indexed Disallow: /nsd/dept_status/ Disallow: /nsd/ed-only/ Disallow: /nsd/ed-trunks/ Disallow: /nsd/epvcs/ Disallow: /nsd/estats/ Disallow: /nsd/fatman/ Disallow: /nsd/mrtg/ Disallow: /nsd/nfmorse/ Disallow: /nsd/nfstats/ Disallow: /nsd/nfxtras/ Disallow: /nsd/private/ Disallow: /nsd/restofscotland/ Disallow: /nsd/sam/ Disallow: /nsd/scotx/ Disallow: /nsd/scotx97/ Disallow: /nsd/scotx98/ Disallow: /nsd/sxintra/ Disallow: /nsd/sxsouth/ Disallow: /nsd/ustats/ Disallow: /tmp/ # Someones temporary files Disallow: /test.fra/ # Various test things - shouldnt really Disallow: /test.graham/ # be linked to anyway Disallow: /test/ Disallow: /Temporary%20Items/ # Has someone been using a Mac? :-) Disallow: /Network%20Trash%20Folder/ Disallow: /TheFindByContentFolder/ Disallow: /Alerts.old/ # Old stuff that shouldnt be linked to Disallow: /a-z.(old)/ # and probably should be deleted Disallow: /Menu/ # dont index navigation tools such as the Disallow: /a-z/ # menu, topics list and a-z Disallow: /a2z.shtml # Disallow: /topics.shtml (NS, 20020723) Disallow: /ucsinfo/scottk/ # stuff for public system in entrance Disallow: /exhibit/ # annual exhibition of arts & crafts Disallow: /usd/iss/newdocs/ # ? Disallow: /HomePage/ # ? what is this - not readable anyway Disallow: /Index/ # ? placeholder Disallow: /Y2K/ # Not current any more Disallow: /alerts/ # alerts have their own search system Disallow: /datalib/ # redirects to the datalibrary site Disallow: /selfhelp/ # ? redirects to LRC pages on ASG site Disallow: /ay2zed/ # material for ASG site Disallow: /fmd/unix/learn/usail/ # local mirror of unix material Disallow: /usd/iss/ol/mm/Deltagraph_PC/ # local mirror of manual Disallow: /usd/iss/ol/mm/Deltagraph_Mac/ # local mirror of manual Disallow: /tsd/software/Special/yr2000/ # no longer current Disallow: /fmd/unix/rob_mccron/ # obituary notices # The "EUCS-Ultraseek" agent populates the main 'edcat' collection which # is searched by default from the main University search page (and the # EUCSinfo one). # Here we should exclude pages that we do not want to appear in the # results of normal searches (eg old editions of Bits) User-agent: EUCS-Ultraseek Disallow: /~ # exclude all user directories Disallow: /home/ Disallow: /fmd/unix/People/ Disallow: /fmd/unix/ug/members/ Disallow: /bin/ # exclude common cgi and binaries Disallow: /cgi/ Disallow: /cgi-bin/ Disallow: /cgi-bin-private/ Disallow: /scripts/ Disallow: /Icons/ # dont index lists of icons Disallow: /icons/ Disallow: /Template/ # exclude incomplete files included Disallow: /Templates/ # in documents as server-side includes Disallow: /includes/ Disallow: /Architext/ # stay clear of packages used to maintain Disallow: /Excite/ # our site and the files they produce Disallow: /wusage/ Disallow: /stats/ Disallow: /linkscan/ Disallow: /nsd/abman/ # we start our locally chosen exclusions Disallow: /nsd/clydenet/ # with a big chunk from NSD who have stats Disallow: /nsd/dept_errors/ # and other files they do not want indexed Disallow: /nsd/dept_status/ Disallow: /nsd/ed-only/ Disallow: /nsd/ed-trunks/ Disallow: /nsd/epvcs/ Disallow: /nsd/estats/ Disallow: /nsd/fatman/ Disallow: /nsd/mrtg/ Disallow: /nsd/nfmorse/ Disallow: /nsd/nfstats/ Disallow: /nsd/nfxtras/ Disallow: /nsd/private/ Disallow: /nsd/restofscotland/ Disallow: /nsd/sam/ Disallow: /nsd/scotx/ Disallow: /nsd/scotx97/ Disallow: /nsd/scotx98/ Disallow: /nsd/sxintra/ Disallow: /nsd/sxsouth/ Disallow: /nsd/ustats/ Disallow: /bits/1994/ # back issues of Bits should only Disallow: /bits/1995/ # appear in the 'others' collection Disallow: /bits/1996/ # dealt with above Disallow: /bits/1997/ Disallow: /bits/1998/ Disallow: /bits/1999/ Disallow: /bits/2000/ Disallow: /bits/2001/ Disallow: /tmp/ # Someones temporary files Disallow: /test.fra/ # Various test things - shouldnt really Disallow: /test.graham/ # be linked to anyway Disallow: /test/ Disallow: /Temporary%20Items/ # Has someone been using a Mac? :-) Disallow: /Network%20Trash%20Folder/ Disallow: /TheFindByContentFolder/ Disallow: /Alerts.old/ # Old stuff that shouldnt be linked to Disallow: /a-z.(old)/ # and probably should be deleted Disallow: /Menu/ # dont index navigation tools such as the Disallow: /a-z/ # menu, topics list and a-z Disallow: /a2z.shtml # Disallow: /topics.shtml (NS, 20020723) Disallow: /ucsinfo/scottk/ # stuff for public system in entrance Disallow: /exhibit/ # annual exhibition of arts & crafts Disallow: /usd/iss/newdocs/ # ? Disallow: /HomePage/ # ? what is this - not readable anyway Disallow: /Index/ # ? placeholder Disallow: /Y2K/ # Not current any more Disallow: /alerts/ # alerts have their own search system Disallow: /datalib/ # redirects to the datalibrary site Disallow: /selfhelp/ # ? redirects to LRC pages on ASG site Disallow: /ay2zed/ # material for ASG site Disallow: /fmd/unix/learn/usail/ # local mirror of unix material Disallow: /usd/iss/ol/mm/Deltagraph_PC/ # local mirror of manual Disallow: /usd/iss/ol/mm/Deltagraph_Mac/ # local mirror of manual Disallow: /tsd/software/Special/yr2000/ # no longer current Disallow: /fmd/unix/rob_mccron/ # obituary notices Disallow: /ucsinfo/cttees/ # committees ####################################################################### ## Google ## # John Murison has asked me to allow Google to index this site. # I have given it exactly the same rules as we have for Ultraseek # Graham Rule 14/12/2001 User-agent: Googlebot Disallow: /~ # exclude all user directories Disallow: /home/ Disallow: /fmd/unix/People/ Disallow: /fmd/unix/ug/members/ Disallow: /bin/ # exclude common cgi and binaries Disallow: /cgi/ Disallow: /cgi-bin/ Disallow: /cgi-bin-private/ Disallow: /scripts/ Disallow: /Icons/ # dont index lists of icons Disallow: /icons/ Disallow: /Template/ # exclude incomplete files included Disallow: /Templates/ # in documents as server-side includes Disallow: /includes/ Disallow: /Architext/ # stay clear of packages used to maintain Disallow: /Excite/ # our site and the files they produce Disallow: /wusage/ Disallow: /stats/ Disallow: /linkscan/ Disallow: /nsd/abman/ # we start our locally chosen exclusions Disallow: /nsd/clydenet/ # with a big chunk from NSD who have stats Disallow: /nsd/dept_errors/ # and other files they do not want indexed Disallow: /nsd/dept_status/ Disallow: /nsd/ed-only/ Disallow: /nsd/ed-trunks/ Disallow: /nsd/epvcs/ Disallow: /nsd/estats/ Disallow: /nsd/fatman/ Disallow: /nsd/mrtg/ Disallow: /nsd/nfmorse/ Disallow: /nsd/nfstats/ Disallow: /nsd/nfxtras/ Disallow: /nsd/private/ Disallow: /nsd/restofscotland/ Disallow: /nsd/sam/ Disallow: /nsd/scotx/ Disallow: /nsd/scotx97/ Disallow: /nsd/scotx98/ Disallow: /nsd/sxintra/ Disallow: /nsd/sxsouth/ Disallow: /nsd/ustats/ Disallow: /bits/1994/ # back issues of Bits should only Disallow: /bits/1995/ # appear in the 'others' collection Disallow: /bits/1996/ # dealt with above Disallow: /bits/1997/ Disallow: /bits/1998/ Disallow: /bits/1999/ Disallow: /bits/2000/January_2000/ Disallow: /bits/2000/February_2000/ Disallow: /bits/2000/March_2000/ Disallow: /bits/2000/April_2000/ Disallow: /bits/2000/May_2000/ Disallow: /tmp/ # Someones temporary files Disallow: /test.fra/ # Various test things - shouldnt really Disallow: /test.graham/ # be linked to anyway Disallow: /test/ Disallow: /Temporary%20Items/ # Has someone been using a Mac? :-) Disallow: /Network%20Trash%20Folder/ Disallow: /TheFindByContentFolder/ Disallow: /Alerts.old/ # Old stuff that shouldnt be linked to Disallow: /a-z.(old)/ # and probably should be deleted Disallow: /Menu/ # dont index navigation tools such as the Disallow: /a-z/ # menu, topics list and a-z Disallow: /a2z.shtml # Disallow: /topics.shtml (NS, 20020723) Disallow: /ucsinfo/scottk/ # stuff for public system in entrance Disallow: /exhibit/ # annual exhibition of arts & crafts Disallow: /usd/iss/newdocs/ # ? Disallow: /HomePage/ # ? what is this - not readable anyway Disallow: /Index/ # ? placeholder Disallow: /Y2K/ # Not current any more Disallow: /alerts/ # alerts have their own search system Disallow: /datalib/ # redirects to the datalibrary site Disallow: /selfhelp/ # ? redirects to LRC pages on ASG site Disallow: /ay2zed/ # material for ASG site Disallow: /fmd/unix/learn/usail/ # local mirror of unix material Disallow: /usd/iss/ol/mm/Deltagraph_PC/ # local mirror of manual Disallow: /usd/iss/ol/mm/Deltagraph_Mac/ # local mirror of manual Disallow: /tsd/software/Special/yr2000/ # no longer current Disallow: /fmd/unix/rob_mccron/ # obituary notices ####################################################################### ## General Catch-all ## # Here we tell ALL other robots to go away. # This is a very strict policy - do we really need to be as restrictive? ## No! (25/9/02) #User-agent: * #Disallow: / # # Well, it may have been OK to remove the general restriction but # there are some things that we don't want spidered (ever) User-agent: * Disallow: /wusage/ Disallow: /linkscan/ # added by Atrhur Wilson 15th March 2007 to stop Slurp and other agents # from crawling over /fmd/unix/cgi-bin/info2www and any other scripts in any # cgi-bin directory Disallow /*cgi-bin/ ####################################################################### ## Special bits ## ## Handle special cases - I dont know why this is here as any well ## behaved robot will have been put off by the catch-all above # Presumably Scooter has been naughty! User-agent: Scooter/2.0 G.R.A.B. V1.1.0 Disallow: /