Changes

Machine/Discovery One (view source)

Revision as of 09:58, 25 November 2009

2,871 bytes added , 09:58, 25 November 2009

m

add notes

Line 1: Line 1:

<noinclude>{{TOCright}}</noinclude>

+

Discovery One is the name of the cluster of machines hosting activities.sugarlabs.org. Activities.sugarlabs.org is a system for encouraging developers of different skill levels to cooperatively develop, edit, and distribute learning activities for the Sugar Platform.

−

~~Discovery One~~ is the ~~name of the cluster of machines hosting activities.sugarlabs.org~~. ~~In the fall of 2008~~ and ~~we discovered our own TMA-1. Activities~~.sl.o ~~is a very effective system for encouraging developers of different skill levels to cooperatively develop, edit, and distribute learning activities for the Sugar Platform~~.

+

This section of the wiki is about setting up and maintaining the infrastructure. For information about using and and improving activities.sl.o please see [[Activity_Library| Activity Library]].

−

~~This section~~ of ~~the wiki is about setting up~~ and ~~maintaining~~ the ~~infrastructure~~, ~~Discovery One~~, ~~necessary to keep~~ activities.~~sugarlabs~~.~~org running~~. ~~For information about using~~ and ~~and improving activities.sl.o please see [[Activity_Library| Activity Library]]~~.

+

===Design===

+

The prime design characteristics of a.sl.o are scalability and availability. As the a.sl.o userbase grows, each component can be scaled horizontally, across multiple physical machines.

+

As of November 2009 activities.sl.o is server 500,000 activities per month using two machines located at Gnaps. The proxy(green) is on treehouse and the rest(red) is on sunjammer.

−

~~===Design===~~

[[Image:Aslo1.png]]

−

===~~Machines~~===

+

===Components===

−

* [[Machine/Discovery_One/Proxy | Proxy ]]

+

* [[Machine/Discovery_One/Proxy | Proxy ]] The Proxy is the public web face portion of a.sl.o. The Proxy both serves static content and acts as a firewall in fron to the rest of the system.

−

* [[Machine/Discovery_One/Web | Web ]]

+

* [[Machine/Discovery_One/Web | Web ]] The Web nodes serve dynamically generated content and pass requests for activity downloads to the Content Delivery Network.

−

* [[Machine/Discovery_One/Database | Database ]]

+

* [[Machine/Discovery_One/Database | Database ]] The Database maintains the data for the web nodes.

+

* [[Machine/Sunjammer | Shared File System ]] The Shared File System maintains a consistant file structure for the web nodes and the Content Delivery Network.

+

* [[Infrastructure_Team/Content_Delivery_Network | Content Delivery Network ]] The Content Delivery Network distributes and serves files from mirrors outside of the primary datacenter.

+

===Scaling Stage 1===

+

Our first bottleneck in scaling a.sl.o is the cpu load of the web-nodes. Our first step will be to split the web nodes across multiple physical machines.

+

====Considerations====

+

* Cloning web nodes. Each web node is an exact clone of eachother. The only difference is in assigned IP Address. Tested

+

* Load balancing. Add Perlbal loadbalancing and Heartbeat HA monitoring to proxy. Tested

+

* Common data base. Point web nodes to common database. tested.

+

* Common file system. Point web nodes and CDN to common file system. In Progress.

+

====Observations====

+

As of Nov 2009

+

* Proxy nodes

+

** At peak loads catches ~ 20-25% of hits before they reach webnodes

+

** Limiting factors inodes and memory

+

** VM has 2G memory.... Starting to swap.

+

* Web nodes

+

** A Dual core 2.4 Opteron(Sunjammer) can handle our peak load at ~ 60% cpu

+

** A Quad core 2.2 AMD(treehouse) can handle ~ 22 transactions per second.

+

** Estimate less than 4GB of memory required per web node.

+

* Memcached nodes (part of web nodes)

+

** ~85 hit rate

+

** 1.25G of assigned memory.

+

*Database Nodes

+

** Cpu load about 25% of web node -- one Database node should serve 4-5 web nodes.

+

====Compromises====

+

This design sacrifices availability for simplicity. We have several possible single points of failure; Proxy, common file system, and database.

+

[[Image:Aslo2.png]]

+

===Scaling Stage 2+===

+

Sorry Bernie this bit is likely to give you a heart attack.

+

As we split the web nodes across multiple physical machines, we we be able to add redundant components for High availability.

+

====Considerations====

+

* Proxy - Loadbalancers. 2+ proxies on separate physical machines which share an IP. If a machine fails the other(s) pick up the load.

+

* Web nodes - Individual nodes will be monitored by Heartbeat HA monitor living on the proxies. If a web node fails, it is dropped from the Load balancing rotation.

+

* Memcached - Memcached is designed to be distributed. If a node fails it is dropped.

+

* Database - Two machines in a Master-Master configuration. Under normal operation they operate as master-slave. If the master fails, the other takes over as master.

+

* File system - TBD

+

[[Image:Aslo3.png]]

== Location ==

−

Hosted by [[Machine/treehouse|treehouse]]

+

* Hosted by [[Machine/sunjammer|sunjammer]]

+

* Hosted by [[Machine/treehouse|treehouse]]

+

== Admins ==

Line 23: Line 77:

This machine is a clone from the VM-Template base904.img on treeehouse and runs

Ubuntu server 9.04.

−

~~===External Services===~~

−

* Shared File System

−

~~Aslo depends on each web node having access to a common file system. This is currently set up as a NFS share on sunjammer~~

−

* Content Delivery Network

−

~~Aslo depends on the Sugar Labs content delivery next for distribution of public files.~~

{{Special:PrefixIndex/{{PAGENAME}}/}}

[[Category:Machine]]

Dfarning

2,751

edits

Changes

Machine/Discovery One (view source)

Revision as of 09:58, 25 November 2009

Navigation menu

Search