Preliminary Guide
Applied Stacks - Open Infrastructure Documentation, Advocacy By Example.
Contents |
Preliminary Guide
This guide documents the preliminary conventions for the Applied Stacks 'Data Seed' project. The primary goal of this project is to seed the database by documenting the software components behind an interesting and useful subset of websites.
Site meta data
- Don't worry about entering site aliases.
- For each site, enter a 1 to 2 sentence description. Since the database is being released into the public domain, it's important not to use material that's copyrighted by someone else (e.g., anything in Wikipedia can't be used since its content is licensed under the GFDL). The descriptions should be informative but they don't have to be great. Just be sure to read them over at least once just to check for any typos.
- Try to assign 2 or 3 tags to each site. Just use whatever keywords come to mind when you look at the site.
- For daily visitor's, I've been using compete.com. Be sure to include the site specific compete.com URL as a citation (e.g. for 'digg.com' this would be http://siteanalytics.compete.com/digg.com/?metric=uv)
Documenting software components
- To start with, here are the core software components I'd like to focus on for each site:
- Programming Languages (e.g., PHP, Python, Ruby, Java, ASP)
- Web application frameworks (e.g., Ruby on Rails, Django, TurboGears, Zend)
- Content managment systems (e.g., Drupal, Radiant)
- Databases (e.g., PostgreSQL, MySQL, SQLite, as well as alternatives like CouchDB, and Amazon SimpleDB)
- If you come across an article or other reference material that lists other components used to build a site (e.g., Lucene, GearMan, memcached, mod_perl, mod_python, mongrel) be sure to include them. However, it's not necessary to spend too much time tracking down such components.
- For each site that is manually entered, there should be at least two separate components listed. If one of the components is a web framework like ruby on rails or django that necessarily implies the use of a specific programming language, then at least three components should listed (I'll try to automatically insert the sites listed at places like djangosites.org. Doing so should be easy, but I'd like to get permission from the people who run such lists first before I do so)
- Don't enter JavaScript libraries (e.g., MooTools, jQuery, Prototype) since I'll be writing some code to get this information automatically for each website.
- Don't enter operating systems since netcraft already provides this information. Also, it should be possible to get some of this information automatically.
- Be sure to include a reference URL that backs up each software component that you list. The reference URL should point to whatever web-page says that a site uses the given set of components (e.g., for Digg I have listed components like PHP, MySQL, Memcached and I back up this claim with the reference url http://blog.digg.com/?p=168 since this page documents that Digg does in fact use all of these components)
- It's probably not necessary to fill out component notes for all websites. But, please do if you feel there's something interesting about how a site uses a given component. Or, if a component is more obscure, it's probably good to mention what the component does.)
Miscellaneous Notes
- All fields that ask for lists of entries are space not comma delimited. To include phrases, connect words with an underscore (e.g., Ruby_on_Rails)
- E-mail me about any bugs you come across on the site. Also, let me know if you come across any annoyances or have ideas for how the site could be made better.
