Machine/Discovery One: Difference between revisions
m add image |
m add notes |
||
| Line 1: | Line 1: | ||
<noinclude>{{TOCright}}</noinclude> | <noinclude>{{TOCright}}</noinclude> | ||
Discovery One is the name of the cluster of machines hosting activities.sugarlabs.org. Activities.sugarlabs.org is a system for encouraging developers of different skill levels to cooperatively develop, edit, and distribute learning activities for the Sugar Platform. | |||
This section of the wiki is about setting up and maintaining the infrastructure. For information about using and and improving activities.sl.o please see [[Activity_Library| Activity Library]]. | |||
===Design=== | |||
The prime design characteristics of a.sl.o are scalability and availability. As the a.sl.o userbase grows, each component can be scaled horizontally, across multiple physical machines. | |||
As of November 2009 activities.sl.o is server 500,000 activities per month using two machines located at Gnaps. The proxy(green) is on treehouse and the rest(red) is on sunjammer. | |||
[[Image:Aslo1.png]] | [[Image:Aslo1.png]] | ||
=== | ===Components=== | ||
* [[Machine/Discovery_One/Proxy | Proxy ]] | * [[Machine/Discovery_One/Proxy | Proxy ]] The Proxy is the public web face portion of a.sl.o. The Proxy both serves static content and acts as a firewall in fron to the rest of the system. | ||
* [[Machine/Discovery_One/Web | Web ]] | * [[Machine/Discovery_One/Web | Web ]] The Web nodes serve dynamically generated content and pass requests for activity downloads to the Content Delivery Network. | ||
* [[Machine/Discovery_One/Database | Database ]] | * [[Machine/Discovery_One/Database | Database ]] The Database maintains the data for the web nodes. | ||
* [[Machine/Sunjammer | Shared File System ]] The Shared File System maintains a consistant file structure for the web nodes and the Content Delivery Network. | |||
* [[Infrastructure_Team/Content_Delivery_Network | Content Delivery Network ]] The Content Delivery Network distributes and serves files from mirrors outside of the primary datacenter. | |||
===Scaling Stage 1=== | |||
Our first bottleneck in scaling a.sl.o is the cpu load of the web-nodes. Our first step will be to split the web nodes across multiple physical machines. | |||
====Considerations==== | |||
* Cloning web nodes. Each web node is an exact clone of eachother. The only difference is in assigned IP Address. Tested | |||
* Load balancing. Add Perlbal loadbalancing and Heartbeat HA monitoring to proxy. Tested | |||
* Common data base. Point web nodes to common database. tested. | |||
* Common file system. Point web nodes and CDN to common file system. In Progress. | |||
====Observations==== | |||
As of Nov 2009 | |||
* Proxy nodes | |||
** At peak loads catches ~ 20-25% of hits before they reach webnodes | |||
** Limiting factors inodes and memory | |||
** VM has 2G memory.... Starting to swap. | |||
* Web nodes | |||
** A Dual core 2.4 Opteron(Sunjammer) can handle our peak load at ~ 60% cpu | |||
** A Quad core 2.2 AMD(treehouse) can handle ~ 22 transactions per second. | |||
** Estimate less than 4GB of memory required per web node. | |||
* Memcached nodes (part of web nodes) | |||
** ~85 hit rate | |||
** 1.25G of assigned memory. | |||
*Database Nodes | |||
** Cpu load about 25% of web node -- one Database node should serve 4-5 web nodes. | |||
====Compromises==== | |||
This design sacrifices availability for simplicity. We have several possible single points of failure; Proxy, common file system, and database. | |||
[[Image:Aslo2.png]] | |||
===Scaling Stage 2+=== | |||
Sorry Bernie this bit is likely to give you a heart attack. | |||
As we split the web nodes across multiple physical machines, we we be able to add redundant components for High availability. | |||
====Considerations==== | |||
* Proxy - Loadbalancers. 2+ proxies on separate physical machines which share an IP. If a machine fails the other(s) pick up the load. | |||
* Web nodes - Individual nodes will be monitored by Heartbeat HA monitor living on the proxies. If a web node fails, it is dropped from the Load balancing rotation. | |||
* Memcached - Memcached is designed to be distributed. If a node fails it is dropped. | |||
* Database - Two machines in a Master-Master configuration. Under normal operation they operate as master-slave. If the master fails, the other takes over as master. | |||
* File system - TBD | |||
[[Image:Aslo3.png]] | |||
== Location == | == Location == | ||
Hosted by [[Machine/treehouse|treehouse]] | * Hosted by [[Machine/sunjammer|sunjammer]] | ||
* Hosted by [[Machine/treehouse|treehouse]] | |||
== Admins == | == Admins == | ||
| Line 23: | Line 77: | ||
This machine is a clone from the VM-Template base904.img on treeehouse and runs | This machine is a clone from the VM-Template base904.img on treeehouse and runs | ||
Ubuntu server 9.04. | Ubuntu server 9.04. | ||
{{Special:PrefixIndex/{{PAGENAME}}/}} | {{Special:PrefixIndex/{{PAGENAME}}/}} | ||
[[Category:Machine]] | [[Category:Machine]] | ||