User:Funper/Sandbox

This project page concerns the acquirement of The Juilliard Manuscript Collection.

Contents

Project Members

Active: Funper

If you want to be a part of the project or think you can help by providing expertise, information or other knowledge, contact user above.

Project Status

Ripping Manuscripts

In order to rip the JPEG files that make up a page in a score, you need the absolute path to files that are on the highest zoom level (i.e. maximum resolution).

With the help of Wireshark, the formula of these addresses has been figured out as follows:


www.juilliardmanuscriptcollection.org/zoomify/[WORK NAME]/[PAGE NAME]/TileGroup[N]/5-[X]-[Y].jpg

without brackets, where:

  • the number 5 designs the highest resolution available,
  • WORK NAME is the name of the work and can be retrieved from the panel to the right on a work page in the collection web site.
  • PAGE NAME is self-explainatory and can also be retrieved from there, but it can vary from page to page within a score (especially larger ones) so attention should be paid,
  • N designes a grouping that has been made on the basis of the height of the score and is a number that can go from 0-2 (most likely not higher than that),
  • X and Y are the coordinates (height and width respectively) that design the image fragments constituting these groups. They can go from 0-25 (possibly higher). All groups have continuous numbering, e.g. if TileGroup0 goes from 0-5, then TileGroup1 continues from 6-15.

Now it is easy to figure out the path to the JPEG files of a particular page. If someone wants to the rip e.g. Beethoven's manuscript of the Ninth Symhpony, they could do as follows:

  • The filename of the first page is BEET_ODEJ_1st_movement_p000a.jpg. This is the PAGE NAME. It is natural to begin with this.
  • From this one can deduce that the name of the directory, i.e. WORK NAME, is BEET_ODEJ.

Thus the path of these particular JPEG files would be:

www.juilliardmanuscriptcollection.org/zoomify/BEET_ODEJ/BEET_ODEJ_1st_movement_p000a.jpg/TileGroup[N]/5-[X]-[Y].jpg

There is one known way of obtaining these images. There might be others but this is the only one known right now.

Make a list and downloading it through a download manager/website ripper

Make a .txt. You are going to write ALL possible addresses in this file, so start with N = 0 < 2, X < 25 and Y < 25. Your .txt file should contain approximately 1950 entries like this:


www.juilliardmanuscriptcollection.org/zoomify/BEET_ODEJ/BEET_ODEJ_1st_movement_p000a/TileGroup0/5-0-0.jpg
www.juilliardmanuscriptcollection.org/zoomify/BEET_ODEJ/BEET_ODEJ_1st_movement_p000a/TileGroup0/5-0-1.jpg
www.juilliardmanuscriptcollection.org/zoomify/BEET_ODEJ/BEET_ODEJ_1st_movement_p000a/TileGroup0/5-0-2.jpg
www.juilliardmanuscriptcollection.org/zoomify/BEET_ODEJ/BEET_ODEJ_1st_movement_p000a/TileGroup0/5-0-3.jpg
...
www.juilliardmanuscriptcollection.org/zoomify/BEET_ODEJ/BEET_ODEJ_1st_movement_p000a/TileGroup0/5-0-25.jpg
...
www.juilliardmanuscriptcollection.org/zoomify/BEET_ODEJ/BEET_ODEJ_1st_movement_p000a/TileGroup0/5-25-25.jpg
...
www.juilliardmanuscriptcollection.org/zoomify/BEET_ODEJ/BEET_ODEJ_1st_movement_p000a/TileGroup1/5-25-25.jpg
...
www.juilliardmanuscriptcollection.org/zoomify/BEET_ODEJ/BEET_ODEJ_1st_movement_p000a/TileGroup2/5-25-25.jpg


In windows, there is a little tool for helping doing url list like this, I use "Extreme Url Generator" , It is Not Free, but you can add many Variations in the url address, In this case , We need add 3 Variations for N , X & Y.


Now you need to feed this .txt file to a website ripper or download manager. I used HTTrack Website Copier. Run the program and make it try to download all files in the list. Now, not all of these are valid links, so try it some few times one different pages and see how many it uses. Optionally, you can weed out unused links, since it takes a lot of time for the program to check such a large amount of them (approximately two thirds of these are useless, depending on the page size).

When you have downloaded the first page, you can replace all the PAGE NAME strings in the .txt file (BEET_ODEJ_1st_movement_p000a) with the next one (i.e. BEET_ODEJ_1st_movement_p000b).

Repeat the procedure until all pages are downloaded.

Combining images into a single page

Use a program that supports panorama processing. I use IrfanView. You could either do this manually, or through the command promt or through scripts. Since the work takes a lot of time to do manually, and since I do not have enough (or any) script skills, I used the command promt to do this relatively quickly.

I used this with IrfanView:

Example for /panorama:
Syntax: /panorama=(X,file1,...,fileN)
First parameter (X) is the direction: 1 = horizontal, 2 = vertical.
i_view32.exe /panorama=(2,c:\5-0-0.jpg,c:\5-0-1.jpg,c:\5-0-2.jpg,[etc])
Create vertical panorama image from other files.

Change the folder name and place it in a convenient place, e.g. if it's the first page then you could place it in D:\1. Open up notepad and write the following:

move D:\1\TileGroup0\*.* D:\1
move D:\1\TileGroup1\*.* D:\1
move D:\1\TileGroup2\*.* D:\1
"C:\Program Files\IrfanView\i_view32.exe" /panorama=(2,D:\1\5-0-0.jpg,D:\1\5-0-1.jpg,D:\1\5-0-2.jpg,D:\1\5-0-3.jpg,D:\1\5-0-4.jpg,D:\1\5-0-5.jpg,D:\1\5-0-6.jpg,D:\1\5-0-7.jpg,D:\1\5-0-8.jpg,D:\1\5-0-9.jpg,D:\1\5-0-10.jpg,D:\1\5-0-11.jpg,D:\1\5-0-12.jpg,D:\1\5-0-13.jpg,D:\1\5-0-14.jpg,D:\1\5-0-15.jpg,D:\1\5-0-16.jpg,D:\1\5-0-17.jpg,D:\1\5-0-18.jpg,D:\1\5-0-19.jpg,D:\1\5-0-20.jpg,D:\1\5-0-21.jpg,D:\1\5-0-22.jpg,D:\1\5-0-23.jpg,D:\1\5-0-24.jpg) /tifc=0 /convert=C:\1\0.tif /silent

If run in the command prompt, this will move all files to the same directory and combine them in a vertical picture (C:\1\1.tif), which is only a part of the full page. Continue this with the rest of the pictures, e.g. 5-1-0.jpg, 5-1-1.jpg ... etc.

When you're done, run this in the command prompt:

"C:\Program Files\IrfanView\i_view32.exe" /panorama=(1,D:\1\0.tif,D:\1\1.tif,D:\1\2.tif,D:\1\3.tif,D:\1\4.tif,D:\1\5.tif,D:\1\6.tif,D:\1\7.tif,D:\1\8.tif,D:\1\9.tif,D:\1\10.tif,D:\1\11.tif,D:\1\12.tif,D:\1\13.tif,D:\1\14.tif,D:\1\15.tif,D:\1\16.tif,D:\1\17.tif,D:\1\18.tif,D:\1\19.tif,D:\1\20.tif,D:\1\21.tif,D:\1\22.tif,D:\1\23.tif,D:\1\24.tif,D:\1\25.tif) /tifc=0 /convert=D:\page1.tif /silent

This will combine the vertical pictures horizontally. The resulting file (page1.tif) will be the complete page.

Batch convert all files in Irfanview, changing the dpi to 108, in order to make the zoom function correct in the PDF. Then combine all files with you favorite PDF program into a PDF. The resulting PDF file is around 1.5 megabytes / page.

Continue with the rest of the manuscript.


I propose another way to do this job, Using "Contact sheet II" in Photoshop. 1. You have to fix width and height for all little Jpg Files,

  in this case is 256 pixel * 256 Pixal. 
  So choose all files for 5-*-16 & 5-23-* , 
  use batch function fixing them as 256*256 Pixals(Left*Upper)


2. Rename them all for 001~408.jpg, I use Canon Digital photo professional-rename tool. You can do it in photoshop too


3. Use Photoshop autofunction - Contact sheet.

  built colume24* Raw17 , 
  Place "down first"
  Width 6144 , Height 44352 , resolution 100, 
  
  "Auto Spceing" cancelled , press ok .Done!

Stylistic Guidelines

When combining the pictures, keep the ordering of the pages. If the ordering is confusing or unnecessary in one way or another, post something on the project talk page about it and we will discuss it.

To facilitate tracking of submitted files, please add some future template to the "Misc. Notes" field of the file entry. Also, please mention Juilliard School as the scanner by adding some template to the "Scanner" field.

Music from Flash-Based Websites

I have written a small tutorial on how to do this exact thing. Maybe it will help you out. Generoso 19:34, 24 June 2011 (UTC)