I would like to do something that may be ...

Give us a seminar, lecture or lesson on what your 'thing' is. Now with our exclusive ASK-A-NERD!!!
User avatar
Gawdzilla Sama
Stabsobermaschinist
Posts: 151265
Joined: Thu Feb 26, 2009 12:24 am
About me: My posts are related to the thread in the same way Gliese 651b is related to your mother's underwear drawer.
Location: Sitting next to Ayaan in Domus Draconis, and communicating via PMs.
Contact:

I would like to do something that may be ...

Post by Gawdzilla Sama » Wed Feb 06, 2013 9:59 pm

...impossible.

I want to merge a bunch of html files together so the file can be searched. The pages look like this: http://www.history.navy.mil/photos/imag ... 01395c.htm

Any suggestions?

Any suggestions that are sane?
Image
Ein Ubootsoldat wrote:“Ich melde mich ab. Grüssen Sie bitte meine Kameraden.”

User avatar
Tyrannical
Posts: 6326
Joined: Thu Dec 30, 2010 4:59 am
Contact:

Re: I would like to do something that may be ...

Post by Tyrannical » Wed Feb 06, 2013 10:03 pm

Doesn't google offer a search widget for indexing web sites?
A rational skeptic should be able to discuss and debate anything, no matter how much they may personally disagree with that point of view. Discussing a subject is not agreeing with it, but understanding it.

User avatar
Thinking Aloud
Page Bottomer
Posts: 20111
Joined: Thu Feb 26, 2009 10:56 am
Contact:

Re: I would like to do something that may be ...

Post by Thinking Aloud » Wed Feb 06, 2013 10:13 pm

Agh. Looks like you've got yourself something that was started in the days before databases. Short of using a spider-search as suggested above, I'm not sure there's a quick solution.

For a decent custom search you'd need to somehow import the data from each page as a record into a db and use that to serve pages dynamically. But I suspect that's going to be a job too far!

User avatar
Gawdzilla Sama
Stabsobermaschinist
Posts: 151265
Joined: Thu Feb 26, 2009 12:24 am
About me: My posts are related to the thread in the same way Gliese 651b is related to your mother's underwear drawer.
Location: Sitting next to Ayaan in Domus Draconis, and communicating via PMs.
Contact:

Re: I would like to do something that may be ...

Post by Gawdzilla Sama » Wed Feb 06, 2013 10:17 pm

Thinking Aloud wrote:Agh. Looks like you've got yourself something that was started in the days before databases. Short of using a spider-search as suggested above, I'm not sure there's a quick solution.

For a decent custom search you'd need to somehow import the data from each page as a record into a db and use that to serve pages dynamically. But I suspect that's going to be a job too far!
They're all the same, everything outside the table can be ditched.

So, basically, I want to import everything between "<P><TABLE" and "</TABLE></P>" into a file. I would want to do this to every html file in a directory.
Image
Ein Ubootsoldat wrote:“Ich melde mich ab. Grüssen Sie bitte meine Kameraden.”

User avatar
Tyrannical
Posts: 6326
Joined: Thu Dec 30, 2010 4:59 am
Contact:

Re: I would like to do something that may be ...

Post by Tyrannical » Wed Feb 06, 2013 10:20 pm

A rational skeptic should be able to discuss and debate anything, no matter how much they may personally disagree with that point of view. Discussing a subject is not agreeing with it, but understanding it.

User avatar
Gawdzilla Sama
Stabsobermaschinist
Posts: 151265
Joined: Thu Feb 26, 2009 12:24 am
About me: My posts are related to the thread in the same way Gliese 651b is related to your mother's underwear drawer.
Location: Sitting next to Ayaan in Domus Draconis, and communicating via PMs.
Contact:

Re: I would like to do something that may be ...

Post by Gawdzilla Sama » Wed Feb 06, 2013 10:22 pm

That's nice, but this is just the first step. The data would be imported into a database after this step is done.

There is no overall index for the Naval History and Heritage Command's Library of Online Images, something I want to fix.
Image
Ein Ubootsoldat wrote:“Ich melde mich ab. Grüssen Sie bitte meine Kameraden.”

User avatar
Red Celt
Humanist Misanthrope
Posts: 1349
Joined: Sat Sep 08, 2012 8:30 pm
About me: Crow Philosopher
Location: Fife, Scotland
Contact:

Re: I would like to do something that may be ...

Post by Red Celt » Wed Feb 06, 2013 10:55 pm

Gawdzilla Sama wrote:That's nice, but this is just the first step. The data would be imported into a database after this step is done.

There is no overall index for the Naval History and Heritage Command's Library of Online Images, something I want to fix.
Can be done with code (stripping out the table contents and writing that to a database), which could be used in web pages accessing the database. I'd offer my help, but I'm rustier than Davey Jones' cock ring.
Image

User avatar
klr
(%gibber(who=klr, what=Leprageek);)
Posts: 32964
Joined: Wed Mar 04, 2009 1:25 pm
About me: The money was just resting in my account.
Location: Airstrip Two
Contact:

Re: I would like to do something that may be ...

Post by klr » Wed Feb 06, 2013 11:10 pm

Gawdzilla Sama wrote:That's nice, but this is just the first step. The data would be imported into a database after this step is done.

There is no overall index for the Naval History and Heritage Command's Library of Online Images, something I want to fix.
I do databases - and importing data into and out of databases - for breakfast, dinner and tea. Quite literally - that's my job.

"We need to to talk ..." :whisper:
God has no place within these walls, just like facts have no place within organized religion. - Superintendent Chalmers

It's not up to us to choose which laws we want to obey. If it were, I'd kill everyone who looked at me cock-eyed! - Rex Banner

The Bluebird of Happiness long absent from his life, Ned is visited by the Chicken of Depression. - Gary Larson

:mob: :comp: :mob:

User avatar
Gawdzilla Sama
Stabsobermaschinist
Posts: 151265
Joined: Thu Feb 26, 2009 12:24 am
About me: My posts are related to the thread in the same way Gliese 651b is related to your mother's underwear drawer.
Location: Sitting next to Ayaan in Domus Draconis, and communicating via PMs.
Contact:

Re: I would like to do something that may be ...

Post by Gawdzilla Sama » Wed Feb 06, 2013 11:32 pm

klr wrote:
Gawdzilla Sama wrote:That's nice, but this is just the first step. The data would be imported into a database after this step is done.

There is no overall index for the Naval History and Heritage Command's Library of Online Images, something I want to fix.
I do databases - and importing data into and out of databases - for breakfast, dinner and tea. Quite literally - that's my job.

"We need to to talk ..." :whisper:
The files are on Hyperwar. http://www.ibiblio.org/hyperwar/OnlineL ... rg11-2.htm
Image
Ein Ubootsoldat wrote:“Ich melde mich ab. Grüssen Sie bitte meine Kameraden.”

User avatar
klr
(%gibber(who=klr, what=Leprageek);)
Posts: 32964
Joined: Wed Mar 04, 2009 1:25 pm
About me: The money was just resting in my account.
Location: Airstrip Two
Contact:

Re: I would like to do something that may be ...

Post by klr » Wed Feb 06, 2013 11:39 pm

Yup, I've seen 'em. I take it you'll want to retain a link to the picture along with each piece of text.

There are a number of sections making up the text that goes with each picture, so each of those might need to go into a separate field. Of course, there's probably no guarantee that the text for each and every picture is broken down the same way.

How many pictures are there in all do you think?
God has no place within these walls, just like facts have no place within organized religion. - Superintendent Chalmers

It's not up to us to choose which laws we want to obey. If it were, I'd kill everyone who looked at me cock-eyed! - Rex Banner

The Bluebird of Happiness long absent from his life, Ned is visited by the Chicken of Depression. - Gary Larson

:mob: :comp: :mob:

User avatar
Gawdzilla Sama
Stabsobermaschinist
Posts: 151265
Joined: Thu Feb 26, 2009 12:24 am
About me: My posts are related to the thread in the same way Gliese 651b is related to your mother's underwear drawer.
Location: Sitting next to Ayaan in Domus Draconis, and communicating via PMs.
Contact:

Re: I would like to do something that may be ...

Post by Gawdzilla Sama » Wed Feb 06, 2013 11:45 pm

klr wrote:Yup, I've seen 'em. I take it you'll want to retain a link to the picture along with each piece of text.

There are a number of sections making up the text that goes with each picture, so each of those might need to go into a separate field. Of course, there's probably no guarantee that the text for each and every picture is broken down the same way.

How many pictures are there in all do you think?
Official U.S. Navy Photograph, now in the collections of the National Archives.

Online Image: 91KB; 740 x 605 pixels

Reproductions of this image may also be available through the National Archives photographic reproduction system.
We would need everything above the first line, and only the second line of those three would be retained. Nothing below that would be needed.

I'm trying to figure out which drive the mirror is on here so I can get a rough file count. There is one HTML, one image and one thumbnail for each picture.
Image
Ein Ubootsoldat wrote:“Ich melde mich ab. Grüssen Sie bitte meine Kameraden.”

User avatar
Gawdzilla Sama
Stabsobermaschinist
Posts: 151265
Joined: Thu Feb 26, 2009 12:24 am
About me: My posts are related to the thread in the same way Gliese 651b is related to your mother's underwear drawer.
Location: Sitting next to Ayaan in Domus Draconis, and communicating via PMs.
Contact:

Re: I would like to do something that may be ...

Post by Gawdzilla Sama » Wed Feb 06, 2013 11:47 pm

Pictures.JPG
Pictures.JPG (39.04 KiB) Viewed 4310 times
Image
Ein Ubootsoldat wrote:“Ich melde mich ab. Grüssen Sie bitte meine Kameraden.”

User avatar
klr
(%gibber(who=klr, what=Leprageek);)
Posts: 32964
Joined: Wed Mar 04, 2009 1:25 pm
About me: The money was just resting in my account.
Location: Airstrip Two
Contact:

Re: I would like to do something that may be ...

Post by klr » Wed Feb 06, 2013 11:51 pm

I don't have FTP access yet, so I'm not sure what I can do at this point short of trying to download the entire site - probably not recommended ...

Some images also have a hi-res version:

http://www.ibiblio.org/hyperwar/OnlineL ... inct14.htm (third image down)

88,000+ photos. Since some of the HTML files have multiple images - see above - the number of HTML files will presumably be lower than that, even with two versions of some images.
God has no place within these walls, just like facts have no place within organized religion. - Superintendent Chalmers

It's not up to us to choose which laws we want to obey. If it were, I'd kill everyone who looked at me cock-eyed! - Rex Banner

The Bluebird of Happiness long absent from his life, Ned is visited by the Chicken of Depression. - Gary Larson

:mob: :comp: :mob:

User avatar
Gawdzilla Sama
Stabsobermaschinist
Posts: 151265
Joined: Thu Feb 26, 2009 12:24 am
About me: My posts are related to the thread in the same way Gliese 651b is related to your mother's underwear drawer.
Location: Sitting next to Ayaan in Domus Draconis, and communicating via PMs.
Contact:

Re: I would like to do something that may be ...

Post by Gawdzilla Sama » Wed Feb 06, 2013 11:55 pm

klr wrote:I don't have FTP access yet, so I'm not sure what I can do at this point short of trying to download the entire site - probably not recommended ...

Some images also have a hi-res version:

http://www.ibiblio.org/hyperwar/OnlineL ... inct14.htm (third image down)

88,000+ photos. Since some of the HTML files have multiple images - see above - the number of HTML files will presumably be lower than that, even with two versions of some images.
Did you get your account? I was told it was in process.

As for the files, I think it would be safe to divide by three (after deducting the number of folders) and taking 2-3% for hi-res copies, max. We can keep the same format, just a link to the hi-res.
Image
Ein Ubootsoldat wrote:“Ich melde mich ab. Grüssen Sie bitte meine Kameraden.”

User avatar
klr
(%gibber(who=klr, what=Leprageek);)
Posts: 32964
Joined: Wed Mar 04, 2009 1:25 pm
About me: The money was just resting in my account.
Location: Airstrip Two
Contact:

Re: I would like to do something that may be ...

Post by klr » Thu Feb 07, 2013 12:04 am

Ah, I see some emails from a few days ago that I'd missed. The details are there. I might need new software for this though, so it'll likely be tomorrow morning* when I get to it.

*Our politicians here in Ireland are going to be working very late tonight to debate important legislation, but I've no intention of staying up so late. :)
God has no place within these walls, just like facts have no place within organized religion. - Superintendent Chalmers

It's not up to us to choose which laws we want to obey. If it were, I'd kill everyone who looked at me cock-eyed! - Rex Banner

The Bluebird of Happiness long absent from his life, Ned is visited by the Chicken of Depression. - Gary Larson

:mob: :comp: :mob:

Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests