Difference between revisions of "Service/mirrors"

From Sugar Labs
Jump to navigation Jump to search
m (update for users)
 
(13 intermediate revisions by 6 users not shown)
Line 2: Line 2:
 
[[Category:Resource]]
 
[[Category:Resource]]
  
==Introduction==
+
== Introduction ==
A content delivery network or content distribution network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server, thereby causing a bottleneck near that server.
+
A content delivery network or Content Delivery Network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server, thereby causing a bottleneck near that server.
  
==Mirrors==
+
== Goals ==
 +
* Reduce bandwidth at primary download server.
 +
* Improve quality of service for users.
 +
* Move content closer to users, thus reducing latency.
 +
 
 +
== Architecture ==
 +
The Sugar Labs Content Delivery Network uses [http://www.mirrorbrain.org/ MirrorBrain] as a redirector.  The redirector, which lives in a Sugar Labs data center, keeps track of which files are available on which mirror. When a user requests a file, the redirector points the user to the correct mirror and automatically starts the file download.
 +
 
 +
== Mirrors ==
 
The current list of available mirrors is available at http://mirrors.sugarlabs.org/
 
The current list of available mirrors is available at http://mirrors.sugarlabs.org/
  
==Goals==
+
== Considerations ==
* Reduce bandwidth at primary download server.
+
 
* Improve quality of service for users.
+
=== Bandwidth ===
* Move content closer to users.
+
 
 +
To run a mirror you need a lot of bandwidth!  You should look at [http://stats.sugarlabs.org/download.sugarlabs.org/ the total bandwidth used by all the mirrors].
 +
 
 +
If you have trouble with bandwidth, you should look at [http://www.cloudflare.com CloudFlare].
 +
 
 +
=== HDD Space ===
 +
 
 +
Hosting a mirror takes a lot of space.  If you don't have a lot of space you can only choose to mirror some parts.  For example exclude all directories but the activities (~13gb):
 +
 
 +
  rsync -avzh rsync://download.sugarlabs.org/pub --exclude 'dextrose' --exclude 'hexoquinasa' --exclude 'images' --exclude 'sources' --exclude 'docs' --exclude 'packages' --exclude 'soas' /rsync/download.sugarlabs.org
 +
 
 +
== Setting up a new mirror ==
 +
 
 +
=== For mirror administrators ===
 +
All you need is a web server with enough bandwidth to serve the files. To set up a new mirror, the site administrator needs to:
 +
 
 +
* First lets make a directory to store the data:
 +
 
 +
  mkdir /rsync
 +
  mkdir /rsync/download.sugarlabs.org
 +
 
 +
* Then lets use rsync to download the data (warning: takes a long time)
 +
 
 +
  rsync -avzh rsync://download.sugarlabs.org/pub /rsync/download.sugarlabs.org
 +
 
 +
* Save the rsync command as a shell script and make it executable:
 +
 
 +
  echo "rsync -avzh rsync://download.sugarlabs.org/pub /rsync/download.sugarlabs.org" > /rsync/download.sugarlabs.org/sync.sh
 +
  chmod 774 /rsync/download.sugarlabs.org/sync.sh
 +
 
 +
* Then lets make this to sync automatically.  We can use a cron job to do that.  You could make sync every 2 hours:
 +
 
 +
  echo "0 */2 * * * /rsync/download.sugarlabs.org/sync.sh" > asloSyncCronJob.txt
 +
  crontab asloSyncCronJob.txt
 +
 
 +
If you don't want it to sync every 2 hours, have a look at [https://www.digitalocean.com/community/tutorials/how-to-use-cron-to-automate-tasks-on-a-vps a cron tutorial] to change that value.
 +
 
 +
* Publish the files via HTTP.  Look at your http server documentation on how to do that.  You could set up a virtual host to serve these files: [https://www.digitalocean.com/community/tutorials/how-to-set-up-nginx-server-blocks-virtual-hosts-on-ubuntu-14-04-lts in nginx] [https://www.digitalocean.com/community/tutorials/how-to-set-up-apache-virtual-hosts-on-ubuntu-12-04-lts in apache]
 +
 
 +
* Setup a rsync mirror so we can view the status of your mirror.  To do so, create a rsyncd.conf file and open it:
 +
 
 +
  sudo nano rsyncd.conf
 +
 
 +
Then insert the following config:
 +
 
 +
  log file = /rsync/log
 +
 
 +
  [sugarlabs]
 +
      path = /rsync/download.sugarlabs.org
 +
      comment = PUT SOME INFORMATION HERE - LIKE A MOTD
 +
      read only = true
 +
      list = yes
 +
 
 +
Save and quit nano.  Then start rsyncd so it can serve your files:
 +
 
 +
  rsync --daemon --config=/etc/rsyncd.conf
 +
 
 +
* Alert the [[Infrastructure_Team/Contacts|Sugar Labs System Administrators]] that they would like their mirror into rotation, including the following information in the request:
 +
** Name and URL of the mirror operator (e.g. organization)
 +
** Name and email address of the administrative contact
 +
** [http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 ISO 3166-1 alpha-2] country code of the server location
 +
** HTTP base URL of the files on the mirror  (typically http://mirrors.example.org/sugarlabs/)
 +
** rsync base URL of the files on the mirror (typically rsync://mirrors.example.org/sugarlabs/)
 +
 
 +
Please contact sysadmin AT sugarlabs DOT org if you are interested in hosting a mirror.
 +
 
 +
=== For Sugar Labs sysadmins ===
  
==Setting up a Mirror==
+
To add a new mirror to the MirrorBrain redirector:
Setting up a mirror is very easy.  All you need is a web server which is available via http, and (if possiable) rsync.
 
  
Please contact systems@sugarlabs.org if you are interested.
+
* Choose a name for the mirror, usually the host name.
 +
* Register the mirror with MirrorBrain:
 +
sudo -u mirrorbrain mb new <mirror name> --operator-name <operator name> \
 +
  --operator-url <operator URL> -a <admin name> -e <admin email> \
 +
  -c <country code> -H <base HTTP URL> -R <base rsync URL> -F <base FTP URL>
 +
* Scan and enable the mirror:
 +
sudo -u mirrorbrain mb scan -e <mirror name>
 +
* Export the list of mirrors for mirmon (a hourly cronjob does this, but if you don't want to wait...):
 +
mb export --format=mirmon-apache | sudo -u mirrorbrain tee /srv/mirrorbrain/mirmon/mirrorlist-export
 +
* Finally, re-run mirmon to ensure it can check the health of the mirror (this is also done by a cronjob, but our patience is very short):
 +
sudo -u mirrorbrain mirmon -v -get all -c /etc/mirmon.conf

Latest revision as of 20:10, 9 August 2014

Team Home   ·   Join   ·   Contacts   ·   Resources   ·   FAQ   ·   Roadmap   ·   To Do   ·   Meetings

Introduction

A content delivery network or Content Delivery Network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server, thereby causing a bottleneck near that server.

Goals

  • Reduce bandwidth at primary download server.
  • Improve quality of service for users.
  • Move content closer to users, thus reducing latency.

Architecture

The Sugar Labs Content Delivery Network uses MirrorBrain as a redirector. The redirector, which lives in a Sugar Labs data center, keeps track of which files are available on which mirror. When a user requests a file, the redirector points the user to the correct mirror and automatically starts the file download.

Mirrors

The current list of available mirrors is available at http://mirrors.sugarlabs.org/

Considerations

Bandwidth

To run a mirror you need a lot of bandwidth! You should look at the total bandwidth used by all the mirrors.

If you have trouble with bandwidth, you should look at CloudFlare.

HDD Space

Hosting a mirror takes a lot of space. If you don't have a lot of space you can only choose to mirror some parts. For example exclude all directories but the activities (~13gb):

 rsync -avzh rsync://download.sugarlabs.org/pub --exclude 'dextrose' --exclude 'hexoquinasa' --exclude 'images' --exclude 'sources' --exclude 'docs' --exclude 'packages' --exclude 'soas' /rsync/download.sugarlabs.org

Setting up a new mirror

For mirror administrators

All you need is a web server with enough bandwidth to serve the files. To set up a new mirror, the site administrator needs to:

  • First lets make a directory to store the data:
 mkdir /rsync
 mkdir /rsync/download.sugarlabs.org
  • Then lets use rsync to download the data (warning: takes a long time)
 rsync -avzh rsync://download.sugarlabs.org/pub /rsync/download.sugarlabs.org
  • Save the rsync command as a shell script and make it executable:
 echo "rsync -avzh rsync://download.sugarlabs.org/pub /rsync/download.sugarlabs.org" > /rsync/download.sugarlabs.org/sync.sh
 chmod 774 /rsync/download.sugarlabs.org/sync.sh
  • Then lets make this to sync automatically. We can use a cron job to do that. You could make sync every 2 hours:
 echo "0 */2 * * * /rsync/download.sugarlabs.org/sync.sh" > asloSyncCronJob.txt
 crontab asloSyncCronJob.txt

If you don't want it to sync every 2 hours, have a look at a cron tutorial to change that value.

  • Publish the files via HTTP. Look at your http server documentation on how to do that. You could set up a virtual host to serve these files: in nginx in apache
  • Setup a rsync mirror so we can view the status of your mirror. To do so, create a rsyncd.conf file and open it:
 sudo nano rsyncd.conf

Then insert the following config:

 log file = /rsync/log
 [sugarlabs]
     path = /rsync/download.sugarlabs.org
     comment = PUT SOME INFORMATION HERE - LIKE A MOTD
     read only = true
     list = yes

Save and quit nano. Then start rsyncd so it can serve your files:

 rsync --daemon --config=/etc/rsyncd.conf
  • Alert the Sugar Labs System Administrators that they would like their mirror into rotation, including the following information in the request:
    • Name and URL of the mirror operator (e.g. organization)
    • Name and email address of the administrative contact
    • ISO 3166-1 alpha-2 country code of the server location
    • HTTP base URL of the files on the mirror (typically http://mirrors.example.org/sugarlabs/)
    • rsync base URL of the files on the mirror (typically rsync://mirrors.example.org/sugarlabs/)

Please contact sysadmin AT sugarlabs DOT org if you are interested in hosting a mirror.

For Sugar Labs sysadmins

To add a new mirror to the MirrorBrain redirector:

  • Choose a name for the mirror, usually the host name.
  • Register the mirror with MirrorBrain:
sudo -u mirrorbrain mb new <mirror name> --operator-name <operator name> \
 --operator-url <operator URL> -a <admin name> -e <admin email> \
 -c <country code> -H <base HTTP URL> -R <base rsync URL> -F <base FTP URL>
  • Scan and enable the mirror:
sudo -u mirrorbrain mb scan -e <mirror name>
  • Export the list of mirrors for mirmon (a hourly cronjob does this, but if you don't want to wait...):
mb export --format=mirmon-apache | sudo -u mirrorbrain tee /srv/mirrorbrain/mirmon/mirrorlist-export
  • Finally, re-run mirmon to ensure it can check the health of the mirror (this is also done by a cronjob, but our patience is very short):
sudo -u mirrorbrain mirmon -v -get all -c /etc/mirmon.conf