RSS 2.0 Feed

» Welcome Guest Log In :: Register

    
  Topic: Uncommonly Dense Thread Browser< Next Oldest | Next Newest >  
steve_h



Posts: 544
Joined: Jan. 2006

(Permalink) Posted: Sep. 15 2009,19:12   

Searching for posts on the UD I & II posts is not as easy as it should be.

When Wesley add the following links to the bored mechanics thread, I thought my problems were solved. They present
entire threads in two huge pages.
 - Complete UD I thread
 - Complete UD II thread

remove the initial X to get the real results (Warning may crash browser)

Since every browser I tried on windows XP struggled to display those pages. I realised that my problems were not solved after all.

So I tried to split the  huge files into manageable chunks, one file per comment, named after the post number, date, and author

- surely the "find in files.." of so many editors or the indexing service would text care of things. Well, to an extent yes, but the HTML formattings got in the way somewhat, and the search results were pretty much unreadable.

I tried creating a new set of files with all of the HTML stripped out (No fonts, bolds, italics, links, images etc) and the result was a bit better but still not very readable.

Then I put the texts into an SQLITE database and wrote a simple TCL script to search it and display the results.

The result was still crap, because so much depends on knowing what is quoted and what isn't, so I arranged for <Q> and </Q> to be placed around the quoted stuff, then for the quoted stuff to be in different colors, then for the hyperlinked text to be hilited, and then for images to display (thumbnails only).  

Finally I added double clicking of urls and images to paste the URLs to the clip board, and control-left-mouse-button to open in a browser (I.E hardcoded, but it's a script which you can edit).

It's fairly basic and it's unlikely to be improved unless I get very bored indeed. Here's a screenshot:


If you want to try it, you can download
-the database atbc.zip (12.5MB),
- the (optional) thumbnail views of most of the images thumbnails.zip (15.5MB)
- and the browser script browse.tcl

You may also need/like to download:
- A free TCL 8.5 interpreter from www.activestate.com]activestate.com.
- The free open source sqlite3 database program sqlite3.exe from sqlite.org
- Free open-source ZIP-compatible archiving software www.7-zip.org

Post bug reports, comments, improvements, missing images, copyright infringements here. If any of my HTML-stripping has completely altered the meaning of any comment, then obviously I would like to know about it. Also I've noticed some instances of characters such as ä/ö/ü/è being rather badly mangled. I don't need to know every page that contains character errors but examples of each bad character would be nice.

Disclaimer: I've only tried this on windows-XP so far. I may need to choose a unix friendly compression algoritm and/or tweak the TCL script.

Edit:  Corrected thumbnail link
Edit:  b
Edit: Corrected database link, atbc.zip

  
Henry J



Posts: 5760
Joined: Mar. 2005

(Permalink) Posted: Sep. 15 2009,21:37   

On the character encoding, Notepad has options for saving files in 4 character sets, identified by a few bytes stuck onto the front of the file. If the contents of the file doesn't match the prefix, characters outside of ascii 0-127 will be messed up.

Henry

  
J-Dog



Posts: 4402
Joined: Dec. 2006

(Permalink) Posted: Sep. 15 2009,21:39   

Steve  H - You are the man!

Or, in the  smart, shortened jargon prevalent on this Board - HOMO!

POTW - for this week and next week (and even the week after that I think :)

--------------
Come on Tough Guy, do the little dance of ID impotence you do so well. - Louis to Joe G 2/10

Gullibility is not a virtue - Quidam on Dembski's belief in the Bible Code Faith Healers & ID 7/08

UD is an Unnatural Douchemagnet. - richardthughes 7/11

  
deadman_932



Posts: 3094
Joined: May 2006

(Permalink) Posted: Sep. 15 2009,21:49   

Uh, I is impressed. Very much so. Thank you for making the UD threads available and readable, steve h...one small problem is that I can't access http://www.newfangledweb.com/atbc/atbc.db -- It gives me a " webpage cannot be found "

--------------
AtBC Award for Thoroughness in the Face of Creationism

  
steve_h



Posts: 544
Joined: Jan. 2006

(Permalink) Posted: Sep. 16 2009,06:48   

deadman,

Thanks, the link was wrong. The filename is thumbnails.zip of course.

Also, if you remove atbc.db from the end of that url, leaving just the directory name, you can see all of the files -- Just in case the editing process broke any other links....

  
khan



Posts: 1554
Joined: May 2007

(Permalink) Posted: Sep. 16 2009,11:22   

Most impressive.
Thank you.

--------------
"It's as if all those words, in their hurry to escape from the loony, have fallen over each other, forming scrambled heaps of meaninglessness." -damitall

That's so fucking stupid it merits a wing in the museum of stupid. -midwifetoad

Frequency is just the plural of wavelength...
-JoeG

  
steve_h



Posts: 544
Joined: Jan. 2006

(Permalink) Posted: Sep. 16 2009,17:56   

I just read some stuff in the UD III thread which contained stuff crap complete bollocks perfectly valid alternative viewpoints which had been humorously struck out. Such outstriking is systematically lost/expelled by my program,  substantially altering the legibility of some comments.

However, I need to think about how to fix this. I'm not a three thousand bug-free lines of code per hour "star programmer" and I've had a few drinks. During the meanwhile, the term "fixed that for you" may indicate that not all the the previous text should be subject to strict grammatical analysis.

To summarize: D'oh !  Or as HJS dubbed into German would say, "Nein !"

  
midwifetoad



Posts: 4003
Joined: Mar. 2008

(Permalink) Posted: Sep. 17 2009,09:02   

As an experiment, I've told Google Chrome to save the first thread as a complete website. It has run overnight and is still going, but it seems to be creating one big html file, plus a subfolder with all the images.

I hope and imagine that it is changing all the image references to the subfolder. The html file is still being processed, but I have 850 images in the image subfolder. They will remain even if the html download fails to complete.

CPU and memory usage have leveled off (unlike IE8). My biggest problem is that we have nearly daily power outages.

--------------
Any version of ID consistent with all the evidence is indistinguishable from evolution.

  
carlsonjok



Posts: 3326
Joined: May 2006

(Permalink) Posted: Oct. 08 2009,13:17   

I am in the process of downloading and setting this up, but it will probably be a while before I get around to finishing the job.  

However, Tarden Chatterbox and I have an important anthropological question needing to be answered and it seems to me that people who already have this utility up and running are in the perfect position to answer it.

Question:  Who posted the first LOLCat on ATBC?

--------------
It's natural to be curious about our world, but the scientific method is just one theory about how to best understand it.  We live in a democracy, which means we should treat every theory equally. - Steven Colbert, I Am America (and So Can You!)

  
Henry J



Posts: 5760
Joined: Mar. 2005

(Permalink) Posted: Oct. 08 2009,15:15   

Quote (carlsonjok @ Oct. 08 2009,12:17)
Question:  Who posted the first LOLCat on ATBC?

And what punishment is to be inflicted on the perpetrator once identified? ;)

  
carlsonjok



Posts: 3326
Joined: May 2006

(Permalink) Posted: Oct. 08 2009,15:27   

Quote (Henry J @ Oct. 08 2009,15:15)
 
Quote (carlsonjok @ Oct. 08 2009,12:17)
Question:  Who posted the first LOLCat on ATBC?

And what punishment is to be inflicted on the perpetrator once identified? ;)

None.  As I said, it is an anthropological question. It is all about the science!

Actually, I got the Thread browser working and when I searched on "icanhascheezburger" and the earliest LOLCat was found here.  But, I am pretty sure there are earlier LOLCats.  They must just be on a different thread.

By the way, I am wondering if someone is going to periodically update the post and thumbnail zip files?

--------------
It's natural to be curious about our world, but the scientific method is just one theory about how to best understand it.  We live in a democracy, which means we should treat every theory equally. - Steven Colbert, I Am America (and So Can You!)

  
steve_h



Posts: 544
Joined: Jan. 2006

(Permalink) Posted: Oct. 08 2009,15:30   

AFICT it was one carlsonjok back on page 521 of the first UD thread.  

Here

Damn!  the browser's links back to ATBC seem to be broken. There's no ST=nnnn part. To get this link, I had to calculate it from (521-1) * 30 = 15600.

I don't feel like regenerating the DB from scratch right now though, so you'll have to do your own calculations for other posts.

ETA:  I found this by searching for ICANHAS.

Incidentally, I used another nice free program, irfanview, to produce the thumbnails.  This also gives you an easy way to browse directories of images as thumbnails (including my thumbnail images). I didn't see any obvious LOLCATS before yours.

  
steve_h



Posts: 544
Joined: Jan. 2006

(Permalink) Posted: Oct. 08 2009,15:50   

Quote
By the way, I am wondering if someone is going to periodically update the post and thumbnail zip files?


Probably not.  I checked the stats and, AFAICT, everyone who has downloaded the files, has also posted in this thread.

So I have to put this down as a dismal failure. Sniff.

If I get really bored when move on to UDIV I may consider an update, but there's not been enough interest
to justify frequent ones.

I think we should just pester Wes to implement a proper search at ATBC.

  
carlsonjok



Posts: 3326
Joined: May 2006

(Permalink) Posted: Oct. 08 2009,15:54   

Quote (steve_h @ Oct. 08 2009,15:50)
Quote
By the way, I am wondering if someone is going to periodically update the post and thumbnail zip files?


Probably not.  I checked the stats and, AFAICT, everyone who has downloaded the files, has also posted in this thread.

So I have to put this down as a dismal failure. Sniff.

Too bad. I think it is pretty freaking cool. I can go back and relive my favorite moments.  Like the time you asked Joel Borofsky what it was like to be Dembski's research assistant.

--------------
It's natural to be curious about our world, but the scientific method is just one theory about how to best understand it.  We live in a democracy, which means we should treat every theory equally. - Steven Colbert, I Am America (and So Can You!)

  
Louis



Posts: 6436
Joined: Jan. 2006

(Permalink) Posted: Oct. 08 2009,16:13   

I think what you have done is seriously impressive. I have no use for it yet, but I have bookmarked this thread in case I have to use it one day.

Does that count?

Louis

--------------
Bye.

  
steve_h



Posts: 544
Joined: Jan. 2006

(Permalink) Posted: Oct. 08 2009,16:26   

Thanks. I will probably revisit this after I've had a sufficient break from it, and start to have fond memories of things on the current thread that I can no longer find. Though I doubt that I will ever get around to producing a fully automated process.

  
khan



Posts: 1554
Joined: May 2007

(Permalink) Posted: Oct. 08 2009,16:32   

Quote (Louis @ Oct. 08 2009,17:13)
I think what you have done is seriously impressive. I have no use for it yet, but I have bookmarked this thread in case I have to use it one day.

Does that count?

Louis

As have I.

--------------
"It's as if all those words, in their hurry to escape from the loony, have fallen over each other, forming scrambled heaps of meaninglessness." -damitall

That's so fucking stupid it merits a wing in the museum of stupid. -midwifetoad

Frequency is just the plural of wavelength...
-JoeG

  
  16 replies since Sep. 15 2009,19:12 < Next Oldest | Next Newest >  

    


Track this topic Email this topic Print this topic

[ Read the Board Rules ] | [Useful Links] | [Evolving Designs]