Our wayback machine is being taken away ...
-
- Posts: 881
- Joined: Thu May 20, 2010 5:34 pm
Our wayback machine is being taken away ...
Hello, UOSA players and staff,
more and more "vintage" stratics pages from web.archive.org are being internally redirected to stratic's (present-day) main uo page.
I have found a path through links that have not yet been compromised: there is a lot of era-accurate data/notions that could yet be salvaged.
What stuff, if anything, do we want to copy and store, either here or in our Wiki, for future reference? There are skill guides, various essays, and a lot of statistics data; here and in the wiki, we've got some of the substance of this stuff, but in almost no circumstance do we have complete information ...
Information retrieved would, naturally, require parsing for era-accuracy: some data/essays pre-date t2A, while others are post-UOR; in both cases, some of the data is indeed accurate for our own target era.
So, how should I, and/or all other interested parties, proceed?
SS
more and more "vintage" stratics pages from web.archive.org are being internally redirected to stratic's (present-day) main uo page.
I have found a path through links that have not yet been compromised: there is a lot of era-accurate data/notions that could yet be salvaged.
What stuff, if anything, do we want to copy and store, either here or in our Wiki, for future reference? There are skill guides, various essays, and a lot of statistics data; here and in the wiki, we've got some of the substance of this stuff, but in almost no circumstance do we have complete information ...
Information retrieved would, naturally, require parsing for era-accuracy: some data/essays pre-date t2A, while others are post-UOR; in both cases, some of the data is indeed accurate for our own target era.
So, how should I, and/or all other interested parties, proceed?
SS
SighelmofWyrmgard wrote:FTFY.uosa44 wrote:For sale, by original owner:
1 Human Brain, never been used, only slightly damaged, still in original packaging.
$1, obo
SS
uosa44 wrote:The inability for this person to respond in such a crazy manner proves my point.
Re: Our wayback machine is being taken away ...
I was not aware of stuff going missing. I believe the "wstub" URLs use the referrer tag to determine which version of the page you should be redirected to, which may cause issues.
I'm sure it wouldn't be that difficult to mirror the whole thing though, every version included. Internally re-linking the pages would likely be non-trivial though.
I'm sure it wouldn't be that difficult to mirror the whole thing though, every version included. Internally re-linking the pages would likely be non-trivial though.
Re: Our wayback machine is being taken away ...
Lately it seems a lot of the stuff I've tried looking up was missing.. don't know if it was always missing or not but .....
its easy enough to spider the entire site if theres still available pages..
its easy enough to spider the entire site if theres still available pages..
-
- Posts: 881
- Joined: Thu May 20, 2010 5:34 pm
Re: Our wayback machine is being taken away ...
There are actually two things going on right now:
SS
- the application robots.txt would apply an internal block to the retrieval of the primary document (but the sidebar would still load valid internal redirects);
- some other "engine" is redirecting DNS-resolution to present-day uo stratics;
SS
SighelmofWyrmgard wrote:FTFY.uosa44 wrote:For sale, by original owner:
1 Human Brain, never been used, only slightly damaged, still in original packaging.
$1, obo
SS
uosa44 wrote:The inability for this person to respond in such a crazy manner proves my point.
Re: Our wayback machine is being taken away ...
Having been using the archives for a long time, I haven't noticed any a-typical behavior out of the archives lately. Earlier today, I visited several of the stable dates from 1999, 2000, 2001, and 2002 without any issue. Given that, one thing to consider is this: Many times a link to an archived page from a certain page won't directly link you to a page that was archived at the same time. Many times, particularly with late 2002 pages, the links will be to archived versions of the page found later on that happen to contain the redirect links to current stratics pages. Beyond that, the archives are behaving normally.
One small note: robots.txt is a file hosted on the server at the time of the crawl that alerts the bot not to archive the page in question. Archived stratics pages ranging from mid 99 to late 99 will link to this particular page when stratics had the file set up to block any such crawl from occurring.
One small note: robots.txt is a file hosted on the server at the time of the crawl that alerts the bot not to archive the page in question. Archived stratics pages ranging from mid 99 to late 99 will link to this particular page when stratics had the file set up to block any such crawl from occurring.
Useful links for researching T2A Mechanics
Stratics - UO Latest Updates - Newsgroup 1 - Noctalis - UO98.org
-
- Posts: 881
- Joined: Thu May 20, 2010 5:34 pm
Re: Our wayback machine is being taken away ...
I'm not "Henny Penny" running around crying out that the sky is falling: on page 2 of this very forum will be found a post in which I originally provided a link to wayback's valid, archived page; very shortly after, robots.txt imposed a "retrieval exclusion" (which can be worked around internally, as I explained in that post's edit).
The "retrieval exclusion" is being imposed against any-and-all externally launched connection queries (but, not necessarily internally-linked redirects ...) by the site-owner, as explicitly stated in the navigation-error message; this is not a "bad-source-document" type of problem.
The other "action" I've described is a DNS-resolution-bound redirect to present-day stratics uo homepage: accidentally or otherwise, this is a DNS-resolution redirect: it isn't a product of a faulty entry in my computer's Hosts file, a faulty entry in my machine's cache, or my router's cache, or my service provider's cache, or anybody's cache; it can only be the archive-server's DNS-resolution protocols; the odds of an entirely accidental redirect being so precise are astronomically fantastic ...
So, before all navigation becomes blocked, what shall we do?
SS
The "retrieval exclusion" is being imposed against any-and-all externally launched connection queries (but, not necessarily internally-linked redirects ...) by the site-owner, as explicitly stated in the navigation-error message; this is not a "bad-source-document" type of problem.
The other "action" I've described is a DNS-resolution-bound redirect to present-day stratics uo homepage: accidentally or otherwise, this is a DNS-resolution redirect: it isn't a product of a faulty entry in my computer's Hosts file, a faulty entry in my machine's cache, or my router's cache, or my service provider's cache, or anybody's cache; it can only be the archive-server's DNS-resolution protocols; the odds of an entirely accidental redirect being so precise are astronomically fantastic ...
So, before all navigation becomes blocked, what shall we do?
SS
SighelmofWyrmgard wrote:FTFY.uosa44 wrote:For sale, by original owner:
1 Human Brain, never been used, only slightly damaged, still in original packaging.
$1, obo
SS
uosa44 wrote:The inability for this person to respond in such a crazy manner proves my point.
Re: Our wayback machine is being taken away ...
The last I knew, any present/future blocking via robots.txt will prevent archive.org from displaying those relevant past results. It used the past version of the file to determine what to mirror, but then uses the current file to determine if further restrictions should apply when returning the results. This is probably unlikely in the case of stratics, but I've had that issue with numerous other sites which were re-purposed/squatted.Kaivan wrote:One small note: robots.txt is a file hosted on the server at the time of the crawl that alerts the bot not to archive the page in question. Archived stratics pages ranging from mid 99 to late 99 will link to this particular page when stratics had the file set up to block any such crawl from occurring.
In any case, I'll try to get it mirroring tonight for peace of mind. After that's done, I'll see about interlinking the uo.stratics.org and uoss.stratics.org entries (some is on one, but not the other), fixing links, making diffs, etc.
-
- Posts: 881
- Joined: Thu May 20, 2010 5:34 pm
Re: Our wayback machine is being taken away ...
Hi, Rammar,
the this-forum post I've cited connects to many documents stratics updated no later than Jan 01, 2000 (and most are spring-summer '99); alternatively, I could provide a valid external link to a Summer 2000 page that, itself, possesses valid internal links to essays et al. that are inaccessible via the first page; some of these items are also updated no later than Jan 01, 2000.
PM me if I can assist you in any fashion.
SS
the this-forum post I've cited connects to many documents stratics updated no later than Jan 01, 2000 (and most are spring-summer '99); alternatively, I could provide a valid external link to a Summer 2000 page that, itself, possesses valid internal links to essays et al. that are inaccessible via the first page; some of these items are also updated no later than Jan 01, 2000.
PM me if I can assist you in any fashion.
SS
SighelmofWyrmgard wrote:FTFY.uosa44 wrote:For sale, by original owner:
1 Human Brain, never been used, only slightly damaged, still in original packaging.
$1, obo
SS
uosa44 wrote:The inability for this person to respond in such a crazy manner proves my point.
Re: Our wayback machine is being taken away ...
Looking at the robots.txt file from the September 18th, 1999 shows that some of the directories were blocked at the time, and no such archives of any of those pages exist within the archives. However, it is likely that the current day robots.txt file is being used to block access to pages, but only a single change is apparent on their robots file since its first available archive in its near current form on November 11, 2006. Thus, no changes have been made to block any old content.
Secondly, regarding the link you provided in this thread, the results are typical for linking in that manner. What you're actually linking is the base page, uo.stratics.com/index.html. The main pages that load when you go to that page, as per the source code, are uo.stratics.com/topuoss.htm, uo.stratics.com/menu.htm, uo.stratics.com/top_mid.htm, and uo.stratics.com/cgi-bin/uoss_news.pl. This results in losing any links that you arrive at when you link the thread on the forums, because a link typed on a forum has no knowledge of what pages were loaded in place of the news.pl page in your web browser. This is apparent if I attempt to link to the weapons page (I was actually viewing the weapons page when I copied at the link) using your method. This is not an indication that the page is not linkable, but that the page's URL is masked by the frame construction of the page. However, if we directly link to either the treasure maps or the weapons page we will successfully navigate to the page from an external link without issue.
Finally, with respect to the redirects to the stratics error page, there are two things to note. First, the error page that is reached is not on the current day stratics page, but in fact, on an archived page. Secondly, if we look at the generic search page we can see that there are two links provided, one for uo.stratics.com and another for uo.stratics.com/index. Starting in 2002, in all instances where you click on a link marked /index, you will always be redirected to an archived error page with a very similar archive date to that of the original link. This is because of the fact that starting in 2002 the page that was loaded no longer ended in /index.html but ended in /index.shtml. This is apparent in the source code for working pages from 2002.
Final note: Note that the main archive pages from any given date don't necessarily match up with the archive date for other pages on the website, which I had mentioned before.
Secondly, regarding the link you provided in this thread, the results are typical for linking in that manner. What you're actually linking is the base page, uo.stratics.com/index.html. The main pages that load when you go to that page, as per the source code, are uo.stratics.com/topuoss.htm, uo.stratics.com/menu.htm, uo.stratics.com/top_mid.htm, and uo.stratics.com/cgi-bin/uoss_news.pl. This results in losing any links that you arrive at when you link the thread on the forums, because a link typed on a forum has no knowledge of what pages were loaded in place of the news.pl page in your web browser. This is apparent if I attempt to link to the weapons page (I was actually viewing the weapons page when I copied at the link) using your method. This is not an indication that the page is not linkable, but that the page's URL is masked by the frame construction of the page. However, if we directly link to either the treasure maps or the weapons page we will successfully navigate to the page from an external link without issue.
Finally, with respect to the redirects to the stratics error page, there are two things to note. First, the error page that is reached is not on the current day stratics page, but in fact, on an archived page. Secondly, if we look at the generic search page we can see that there are two links provided, one for uo.stratics.com and another for uo.stratics.com/index. Starting in 2002, in all instances where you click on a link marked /index, you will always be redirected to an archived error page with a very similar archive date to that of the original link. This is because of the fact that starting in 2002 the page that was loaded no longer ended in /index.html but ended in /index.shtml. This is apparent in the source code for working pages from 2002.
Final note: Note that the main archive pages from any given date don't necessarily match up with the archive date for other pages on the website, which I had mentioned before.
Useful links for researching T2A Mechanics
Stratics - UO Latest Updates - Newsgroup 1 - Noctalis - UO98.org
Re: Our wayback machine is being taken away ...
Either way, as with any data that someone feels is important, there should be a backup copy.
-
- Posts: 881
- Joined: Thu May 20, 2010 5:34 pm
Re: Our wayback machine is being taken away ...
Kaivan,
I understand the details you incorporated into your last response; I did not go into any similar type of explanation, because there are many who would not be able to follow it.
I will reiterate the two things you refuse to acknowledge:
SS
I understand the details you incorporated into your last response; I did not go into any similar type of explanation, because there are many who would not be able to follow it.
I will reiterate the two things you refuse to acknowledge:
- some of the direct links NO LONGER WORK, because they have been blocked by robots.txt; of course I initially thought that the "this-forum-post-broken-link" was because of the indirect nature of embedding the link here -- then I checked ... by "no longer work" I mean direct navigation used to retrieve the document, but now is being blocked; for several months now, I have been bouncing from one "blocked" page to another unblocked page, in order to continue accessing this data.
- unless, it is archived for today, the Stratics homepage redirect is NOT a redirect to an archived page: the page possesses Nov/2010 info.
SS
SighelmofWyrmgard wrote:FTFY.uosa44 wrote:For sale, by original owner:
1 Human Brain, never been used, only slightly damaged, still in original packaging.
$1, obo
SS
uosa44 wrote:The inability for this person to respond in such a crazy manner proves my point.
Re: Our wayback machine is being taken away ...
If any content had been blocked by a current day robots.txt it would be apparent in that robots.txt file. However, as I said, the robots.txt file on the current day stratics page is nearly identical to the one archived in November of 2006, and the one directory that's different didn't have any relevant content on it in the first place. Thus, no active attempt has been made, by anyone, to block access to any such content.
Regarding the redirects to current day stratics pages, I haven't been able to find any instances where a page is redirecting to the current day stratics page. All redirects that I was able to find either went to an archived error page (2002 pages), went to an external website called flashlink.com (mid to late 2003) , caused a redirect error (All pages between 2004 and 2006), and went to the actual archive page (2007 and 2008). So if a redirect does exist, you'll have to point it out for us.
Regarding the redirects to current day stratics pages, I haven't been able to find any instances where a page is redirecting to the current day stratics page. All redirects that I was able to find either went to an archived error page (2002 pages), went to an external website called flashlink.com (mid to late 2003) , caused a redirect error (All pages between 2004 and 2006), and went to the actual archive page (2007 and 2008). So if a redirect does exist, you'll have to point it out for us.
Useful links for researching T2A Mechanics
Stratics - UO Latest Updates - Newsgroup 1 - Noctalis - UO98.org
- Arkon
- UOSA Subscriber!
- Posts: 248
- Joined: Fri Jun 05, 2009 6:02 am
- Location: In your house stealing your stuff.
Re: Our wayback machine is being taken away ...
Kaivan, I've been running across the same thing as SS. This is from trying to visit stratics main page of April 1999:Kaivan wrote:If any content had been blocked by a current day robots.txt it would be apparent in that robots.txt file. However, as I said, the robots.txt file on the current day stratics page is nearly identical to the one archived in November of 2006, and the one directory that's different didn't have any relevant content on it in the first place. Thus, no active attempt has been made, by anyone, to block access to any such content.
Regarding the redirects to current day stratics pages, I haven't been able to find any instances where a page is redirecting to the current day stratics page. All redirects that I was able to find either went to an archived error page (2002 pages), went to an external website called flashlink.com (mid to late 2003) , caused a redirect error (All pages between 2004 and 2006), and went to the actual archive page (2007 and 2008). So if a redirect does exist, you'll have to point it out for us.
"If you want an image of the future, imagine a bag of milk being poured on a human face--forever."
Re: Our wayback machine is being taken away ...
That's an old redirect embedded into the old stratics page itself. It's been there for the last 4 years if not longer. If you're using IE, you'll be redirected to the main website under all circumstances, but under Firefox and possibly Chrome, clicking the X on the pop-up message will allow you to access the site normally.
Useful links for researching T2A Mechanics
Stratics - UO Latest Updates - Newsgroup 1 - Noctalis - UO98.org
- Arkon
- UOSA Subscriber!
- Posts: 248
- Joined: Fri Jun 05, 2009 6:02 am
- Location: In your house stealing your stuff.
Re: Our wayback machine is being taken away ...
I am using Firefox. And whether I clicked OK or the 'x' I got re-directed to stratics current home.
"If you want an image of the future, imagine a bag of milk being poured on a human face--forever."