THE WELCH COMPANY
440 Davis Court #1602
San Francisco, CA 94111-2496
415 781 5700
rodwelch@pacbell.net


S U M M A R Y


DIARY: July 16, 2003 08:13 AM Wednesday; Rod Welch

Gary called with guidance for purging Google cache.

1...Summary/Objective
2...Instructions on Notifying Google to Avoid Indexing and Caching
....2...Removing Google Index
3...Google Account Setup to Maintain Indexing and Caching of Welchco
4...Specify Index Cache Maintenance Using Robots.txt Instructions
5...Index and Cache User Control Supports Confidentiality
6...Confidentiality User Controls Index Cache Adds Utility to Internet
7...Confidentiality Exclude Records from Index Cache Using Robots.txt
8...Exclude Records from Index Cache Using Robots.txt for Confidentiality
9...Robots.txt Configured to Exclude Specific Records from Index Cache
........Robots.txt Instructions on Submission for Processing
........Instructions on Submission of Robots.txt for Processing
10...Google Account Operational for Maintaining Indexing and Caching
11...Google Processes Request for Index and Cache Update
12...Request to Google for Index Cache Update Processed Successfully
13...Report Progress Correcting Inadvertant Publication of Records
14...Verify Only Google Maintains Secondary Cache of Files on Internet


..............
Click here to comment!

CONTACTS 
0201/- Boeing                                                                                                                                                             O-00000816 0505
020101 - Mr. Garold L. Johnson
020103/- Modeling and Simulation                                                                                                                                          O-00000816 0505

SUBJECTS
Visibility of SDS on Radar at Objects to SDS Records
Objects to SDS Records on Project Published on Internet Ac
Meetings Gary to Capture Record of TM Meetings
Procedures Distribute SDS Records Via Email Problem Today Did Not Use
Confidentiality Distribute SDS Records on Aerospace company Intranet Ga   Lookin
Distribution via Email Using Outlook Adequate to Introduce Aerospace co  any Sta
Radar Visibility of SDS on Radar at Aerospace company Objects to SDS Re  rds on
Google Cache Instructions to Expedite Removal of References in SDS to
Cache Search Engine Management Control of Internet Website
Index Cache Internet Search Engine Radar Visibility of SDS on Radar a

4912 -
4912 -    ..
4913 - Summary/Objective
4914 -
491401 - Follow up ref SDS 16 0000, ref SDS 11 0000.
491402 -
491403 - Gary provided information on notifying Google to immediately index our
491404 - records and to omit certain portions of the record from the index and
491405 - from caching using a file -- robots.txt. ref SDS 0 016C  He found that
491406 - only Google maintains a cache.  We did some further testing that
491407 - seemed to support Gary's research. ref SDS 0 P150  Developed a list of
491408 - records and files to exclude from indexing and caching. ref SDS 0 3R6F
491409 - Instructions for submitting robots.txt for processing to suppress
491410 - indexing, ref SDS 0 ZR9H, were successful in getting this processed by
491411 - Google. ref SDS 0 PS3W  This experience revealed procedures that may
491412 - support the goal of using the Internet for fast, efficient delivery of
491413 - anytime, anywhere intelligence, while maintaining a measure of
491414 - confidentiality. ref SDS 0 AQ92
491416 -  ..
491417 - In the afternoon, submitted an email linked to this record showing
491418 - performance on request to purge records; called to give timely notice
491419 - of action. ref SDS 0 P27U
491420 -
491421 -
491422 -
491424 -  ..
4915 -
4916 -
4917 - Progress
4918 -
491801 - Instructions on Notifying Google to Avoid Indexing and Caching
491802 -
491803 - Received ref DRT 1 0001 from Gary saying....
491804 -
491805 -    1.  Google has cached the pages, which we need to remove. I have
491806 -        done some looking, and:
491808 -         ..
491809 -    2.  Removing Google Index
491810 -
491811 -        Google has an automatic URL removal system for *urgent*
491812 -        requests.
491813 -
491814 -    3.  Instructions are at:
491815 -
491816 -           http://www.google.com/remove.html
491818 -            ..
491819 -           http://services.google.com:8882/urlconsole/controller
491820 -
491821 -
491823 -  ..
491824 - Review of this location shows....
491825 -
491826 -              Google views the quality of its search results as an
491827 -              extremely important priority. Therefore, Google stops
491828 -              indexing the pages on your site only at the request of
491829 -              the webmaster who is responsible for those pages or as
491830 -              required by law. This policy is necessary to ensure that
491831 -              pages are not inappropriately removed from our index.
491833 -               ..
491834 -              Google keeps the text of the many documents it crawls
491835 -              available in a cache. This allows an archived, or
491836 -              "cached", version of a web page to be retrieved for your
491837 -              end users if the original page is ever unavailable (due
491838 -              to temporary failure of the page's web server).  The
491839 -              cached page appears to users exactly as it looked when
491840 -              Google last crawled it. The cached page also includes a
491841 -              message (at the top of the page) to indicate that it's a
491842 -              cached version of the page.
491844 -               ..
491845 -              If you want to prevent all robots from archiving content
491846 -              on your site, use the NOARCHIVE meta tag. Place this tag
491847 -              in the <HEAD> section of your documents as follows
491848 -                    ..
491849 -                    NAME="ROBOTS" CONTENT="NOARCHIVE">
491851 -  ..
491852 - Seems like this is a good idea for SDS records, so that when a
491853 - decision is made to remove a record, there is not a second process to
491854 - perform, as we are doing now.
491856 -  ..
491857 - Upon further review, this may be unnecessary.  If we can control
491858 - immediate updating using robots.txt, per below, this will provide
491859 - significant confidentiality by reducing occassions when records are
491860 - encountered by people looking randomly for information on the
491861 - Internet.
491863 -  ..
491864 - The Google instructions further say...
491865 -
491866 -              Note: If you believe your request is urgent and cannot
491867 -              wait until the next time Google crawls your site, use
491868 -              our automatic URL removal system. In order for this
491869 -              automated process to work, your webmaster must first
491870 -              insert the appropriate meta tags into the page's HTML
491871 -              code.
491873 -               ..
491874 -              Google will continue to exclude your site or directories
491875 -              from successive crawls if the robots.txt file exists in
491876 -              the web server root. If you do not have access to the
491877 -              root level of your server, you may place a robots.txt
491878 -              file at the same level as the files you want to remove.
491879 -              Doing this and submitting via the automatic URL removal
491880 -              system will cause a temporary, 90 day removal of your
491881 -              site from the Google index. (Keeping the robots.txt file
491882 -              at the same level would require you to return to the URL
491883 -              removal system every 90 days to reissue the removal.)
491884 -
491885 -
491886 -
491888 -  ..
491889 - Google Account Setup to Maintain Indexing and Caching of Welchco
491890 -
491891 - Applying this feature has instructions at....
491892 -
491893 -
491894 -           http://services.google.com:8882/urlconsole/controller?cmd=reload&lastcmd=login  
491895 -
491896 - ...which say in part....
491897 -
491898 -              In order to remove a URL from the Google index or an
491899 -              article from Google Groups, we need to first verify your
491900 -              e-mail address. Please enter it below, along with a
491901 -              password.
491903 -       ..
491904 -      [On 030728 Google reports there is a limit on the size of
491905 -      robots.txt, but does not specify the size, nor where this
491906 -      guidance is specified. ref SDS 17 0001
491908 -  ..
491909 - Provided email address and password
491910 -
491912 -
491914 -  ..
491915 - Got a message saying....
491916 -
491917 -              Your account has been created!
491919 -               ..
491920 -              A message has been sent to your email account. Follow the
491921 -              instructions in that email to proceed. Please note that
491922 -              you must activate your account within 24 hours or it will
491923 -              be deleted.
491925 -         ..
491926 -        [...below, received letter from Google.... ref SDS 0 0B5F
491928 -         ..
491929 -        [...see below instructions on processing robots.txt to
491930 -        supprerss indexing files on the Internet. ref SDS 0 ZR9H
491931 -
491932 -
491933 -
491935 -  ..
491936 - Specify Index Cache Maintenance Using Robots.txt Instructions
491937 -
491938 - Gary continues...
491939 -
491940 -    4.  It involves placing a file named "robots.txt" at the level
491941 -        wanted to remove, in this case the GLJDY directory.\
491943 -  ..
491944 - This requires having the directory "GLJDY," so used wspft to put the
491945 - directory back, which was removed yesterday. ref SDS 16 F553
491947 -         ..
491948 -    5.  The file should contain:
491949 -
491950 -           User-agent: *
491951 -           Disallow: /
491953 -         ..
491954 -        [...below, develop file to accomplish this task. ref SDS 0 3R6F
491955 -
491957 -  ..
491958 - What is the source for this information?  Do not see it on the Google
491959 - site so far???
491960 -
491961 -        [...below, Gary provides guidance. ref SDS 0 P24V
491963 -  ..
491964 - Looking at Google's instructions under...
491965 -
491966 -              Remove an image from Google's Image Search
491968 -  ..
491969 - This section seems to say that instructions in para 5 of Gary's letter
491970 - pertains to removing images with a jpg or equivalent filename
491971 - extension.  In this case, we do not have any files of this kind on the
491972 - GLJDY directory.
491973 -
491974 -
491975 -
491977 -  ..
4920 -
4921 -
4922 - 0827 called Gary
492301 -  ..
492302 - Gary advised there is additional guidance on using robots.txt at....
492303 -
492304 -             http://www.robotstxt.org/wc/norobots.html
492305 -
492306 - ...and this indicates that when a file or directory is listed in
492307 - robot.txt, then the file and or directory is neither indexed nor
492308 - cached.  Second, there is a procedure for notifying Google to
492309 - immediately update the index, so that files listed for exclusion will
492310 - not longer be displayed.
492311 -  ..
492312 -
492313 -      [On 030728 Google reports there is a limit on the size of
492314 -      robots.txt, but does not specify the size, nor where this
492315 -      guidance is specified. ref SDS 17 0001
492316 -
492318 -  ..
492319 - Index and Cache User Control Supports Confidentiality
492320 - Confidentiality User Controls Index Cache Adds Utility to Internet
492321 -
492322 - Being able to exclude files from indexing and caching, means we can
492323 - put files on the Internet for use by people upon specific notice, and
492324 - not have these files accessible to others through search engine
492325 - operations.  This capability supports using web mail with a measure of
492326 - confidentiality, so we can come closer to objectives on 971021 for
492327 - delivering anytime, anywhere intelligence, without opening the
492328 - Pandora's Box of losing privacy. ref SDS 1 3636
492329 -
492330 -       Question is that it seems likely the robots.txt procedure only
492331 -       works for Google and not for other search engines.  Therefore,
492332 -       robots.txt only avoids processing by Google and to avoid
492333 -       visibility by other search engines still requires removal from
492334 -       the Internet????
492336 -        ..
492337 -       Maybe Gary can comment.
492338 -
492339 -
492341 -  ..
492342 - Confidentiality Exclude Records from Index Cache Using Robots.txt
492343 - Exclude Records from Index Cache Using Robots.txt for Confidentiality
492344 - Robots.txt Configured to Exclude Specific Records from Index Cache
492345 -
492346 - For example, I am inclined to list the entire directory for Gary's
492347 - project....
492348 -
492349 -                       03 00101
492350 -
492351 - ...and also Gary SDS directory...
492352 -
492353 -                       04 00074
492354 -
492356 -  ..
492357 - Created file ref OF 1 0000, listing SDS records identified yesterday,
492358 - ref SDS 16 0A3K, plus, following...
492359 -
492360 -     User-agent: *
492361 -     Disallow: /03/00101/  #  Gary's project directory
492362 -     Disallow: /04/00074/  #  Gary's SDS project directory
492363 -     Disallow: /sd/08/GLJDY/   #  Gary's personal SDS directory
492364 -     Disallow: /sd/08/00101/02/03/01/23/101114.HTM..... ref SDS 2 0001
492365 -     Disallow: /sd/08/00101/02/03/01/25/082838.HTM..... ref SDS 3 0001
492366 -     Disallow: /sd/08/00101/02/03/02/24/205217.HTM..... ref SDS 4 0001
492367 -     Disallow: /sd/08/00101/02/03/04/19/081308.HTM..... ref SDS 8 0001
492368 -  *  Disallow: /sd/08/00101/02/03/04/22/075739.HTM..... ref SDS 9 0001
492369 -     Disallow: /sd/08/00101/02/03/05/16/083917.HTM..... ref SDS 10 IZ4N
492370 -     Disallow: /sd/08/00101/02/03/05/25/084653.HTM..... ref SDS 12 0001
492371 -     Disallow: /sd/08/00101/02/03/05/27/090556.HTM..... ref SDS 13 0001
492372 -
492373 - ...to remove from the index and from the cache the list of URLs
492374 - removed yesterday. ref SDS 16 0A3K
492375 -
492376 -        *  Note the record on 030422 has been purged of any references
492377 -           to the project and to Gary's company.
492379 -  ..
492380 - Actually, we should only need a single file, so put everything in one
492381 - file and posted to root directory of URL.
492382 -
492383 -                http://www.welchco.com/robots.txt
492384 -
492386 -  ..
492387 - This provides a way to use the Internet for delivery, but provide a
492388 - measure of privacy by preventing access through Internet search
492389 - engines, per above. ref SDS 0 AQ92
492391 -  ..
492392 - Gary continues...
492393 -
492394 -    6.  The URL removal request has to be made by the webmaster from
492395 -        the link above.
492396 -
492398 -         ..
492399 -        Robots.txt Instructions on Submission for Processing
492400 -        Instructions on Submission of Robots.txt for Processing
492401 -
492402 -
492403 - After robots.txt is created and uploaded to the Internet, in this
492404 - case to....
492405 -
492406 -              http://www.welchcolcom/robots.txt
492407 -
492408 - ....then open the Internet page (requires "cookies on") at....
492409 -
492410 -              http://services.google.com:8882/urlconsole/controller
492411 -
492412 - ...and enter the webmaster's email address and password set up, per
492413 - above. ref SDS 0 YY89
492415 -  ..
492416 - Sometimes when this location is initially opened there is an error
492417 - message saying the account is not setup.  Pressing "Reload" on the
492418 - browser clears this error, and the form is presented for entering an
492419 - email address and password.  Press Enter to process request.
492421 -  ..
492422 - This opens another form.
492424 -  ..
492425 - Enter the address for the robots.txt file, which in our case is....
492426 -
492427 -         http://www.welchco.com/robots.txt
492428 -
492429 - ...and then press Enter or click on the button for processing.
492431 -  ..
492432 - After a few moments there is a mesage displayed showing the request
492433 - has been processed to immediatley update the indexing for the
492434 - specified address, per below. ref SDS 0 Y24N
492435 -
492436 -
492437 -
492438 -
492440 -  ..
4925 -
4926 -
4927 - 1223
4928 -
492801 - Google Account Operational for Maintaining Indexing and Caching
492802 -
492803 - Received letter from Google saying.....
492804 -
492805 -        Subject: Removing your URL
492806 -        Date: Wed, 16 Jul 2003 11:36:43 -0700
492807 -        From: Googlebot url-remove@google.com
492808 -        To: rodwelch@pacbell.net
492810 -         ..
492811 -        Greetings from Google!
492813 -         ..
492814 -        This email was automatically generated when your account was
492815 -        created. Visit the following link to activate your account:
492817 -         ..
492818 -        http://services.google.com:8882/urlconsole/controller?cmd=newUserActivation&code=54cab9ea&uid=78368
492820 -         ..
492821 -        You must activate your account within 24 hours or it will be
492822 -        removed.
492824 -         ..
492825 -        If you did not request this account, or do not wish to proceed
492826 -        with the removal of a URL, please ignore this message.
492828 -         ..
492829 -        Regards,
492830 -        The Google Team
492831 -        help@google.com
492832 -
492833 -
492834 -
492836 -  ..
4929 -
4930 -
4931 - 1438
4932 -
493201 - Google Processes Request for Index and Cache Update
493202 - Request to Google for Index Cache Update Processed Successfully
493203 -
493204 - After revising the specification of SDS records and files to exclude
493205 - from indexing, per para 5 above, ref SDS 0 3R6F, was successful
493206 - performing steps in para 6, per above, ref SDS 0 EH3X, by logging onto
493207 - Google Internet address for requesting immediate index update, per
493208 - above. ref SDS 0 RJ6N
493210 -  ..
493211 - Got following response....
493212 -
493213 -        Your request has been submitted.
493214 -
493215 -        Your request should be processed within 24 hours.
493216 -
493217 -        You may visit the options page to check the status of all your
493218 -        pending requests or submit another one.
493220 -  ..
493221 - This solves earlier problem we had, where for some reason Google was
493222 - rejecting the robots.txt submission.
493223 -
493224 -
493225 -
493227 -  ..
4933 -
4934 -
4935 - 1657
4936 -
493601 - Report Progress Correcting Inadvertant Publication of Records
493602 -
493603 - Gary called back.
493605 -  ..
493606 - Gary met with is boss, Steve, per planning yesterday to review
493607 - progress on request by Andy Johnson. ref SDS 16 T66F  Gary explained
493608 - inadvertant transfer of TE meeting record to Internet, per analysis
493609 - yesterday. ref SDS 16 0001  Steve will relay the report up the chain
493610 - of command.
493612 -  ..
493613 - This afternoon, Gary got a call from Steve with more feedback...
493614 -
493615 -     Sounds like Andy Johnson may have further concerns.
493617 -      ..
493618 -     Apparently someone read the TEM records and concluded there is
493619 -     nothing material that impacts interests of the project, the
493620 -     government nor Gary's company.
493622 -      ..
493623 -     Steve wants a report from Gary on actions taken to address this
493624 -     issue.
493626 -      ..
493627 -     There was a request for my phone number, and Gary gave this out
493628 -     even though generally we are avoiding my getting in the loop.
493629 -
493631 -  ..
493632 - Gary wanted to know if email has gone out transmitting this record so
493633 - he can complete his report to Steve, and in light of earlier problems
493634 - sending email?
493636 -  ..
493637 - Explained prior problems sending email seem to have been cleared by
493638 - the ISP; I am working on the record to explain opportunity for using
493639 - web mail to improve productivity, and with a measure of additional
493640 - confidentiality, per above. ref SDS 0 AQ92
493642 -  ..
493643 - Need feedback showing Gary's transfer ops have been modified to avoid
493644 - future occurrance, per planning yesterday on 030715. ref SDS 16 F59M
493645 -
493646 -     [On 030827 followed up. ref SDS 18 0001
493647 -
493648 -
493650 -  ..
493651 - Verify Only Google Maintains Secondary Cache of Files on Internet
493652 -
493653 - Discussed briefly Gary's understanding that only Google uses the cache
493654 - system which provides a secondary version of the file that has the
493655 - search string highlighted for ease in using the record.
493657 -  ..
493658 - Gary explained having tested about 12 search engines and only Google
493659 - uses this feature.
493661 -  ..
493662 - While talking we tested Netscape search engine and then Ask Jeeves.
493664 -  ..
493665 - Did a search using spec yesterday. ref SDS 16 0A3K
493667 -  ..
493668 - Both Netscape and Ask Jeeves came up with two records, neither of them
493669 - are cached.  This supports Gary's research.
493671 -  ..
493672 - Ask Jeeves lists a Welch SDS record that was not listed by Google
493673 - yesterday. ref SDS 16 0A3K
493674 -
493675 -         030125...................................... ref SDS 3 0001
493677 -  ..
493678 - So I deleted the record from the web, and added the URL to the
493679 - robots.txt file per above. ref SDS 0 3R6F
493681 -  ..
493682 - Then logged back onto the Google maintenance location, per above,
493683 - ref SDS 0 RJ6N, and entered the index and cache update op again.  This
493684 - probably is not necessary, since they need 24 hours to process a
493685 - request; but, hopefully does not hurt anything.
493687 -  ..
493688 - When I logged onto the maintenance site, ref SDS 0 RJ6N, there was a
493689 - list showing Google has already processed the request.
493691 -  ..
493692 - I entered another request to update the list per above, and got
493693 - another message saying this second request was accepted using same
493694 - language reported above. ref SDS 0 Y24N
493695 -
493696 -
493698 -  ..
4937 -
4938 -
4939 - 1807
4940 -
494001 - Submitted report to Gary linked to this record.
494002 -
494003 - Called Gary and notified he can review the record and provide feedback
494004 - for further action.
494005 -
494006 -
494007 -
494008 -
494009 -
494010 -
494011 -
494012 -
4941 -