Difference between revisions of "Service/mirrors"

From Sugar Labs
Jump to navigation Jump to search
m (fix command)
 
(38 intermediate revisions by 6 users not shown)
Line 1: Line 1:
<noinclude>{{TeamHeader|Infrastructure Team}}</noinclude>
+
<noinclude>{{TeamHeader|Infrastructure Team}}{{TOCright}}
{{TOCright}}
+
[[Category:Resource]]
  
==Introduction==
+
== Introduction ==
A content delivery network or content distribution network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server, thereby causing a bottleneck near that server.
+
A content delivery network or Content Delivery Network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server, thereby causing a bottleneck near that server.
  
Mirrorbrain, Bouncer, Fedora Mirror Manager, and Cacheboy are four possible choices for scaling up the Sugar Labs content delivery net work.  Long term, Cacheboy looks like it might be the best fit for Sugar Labs, but the project must become more stable before becoming Sugar Labs primary CDN.  Bouncer is currently being used as successfully by Mozilla. But development ended at the end of 2008. Mozilla is currently investigating Mirrorbrain.  Mirrorbrain looks promising.
+
== Goals ==
 +
* Reduce bandwidth at primary download server.
 +
* Improve quality of service for users.
 +
* Move content closer to users, thus reducing latency.
  
==Goals==
+
== Architecture ==
*Reduce bandwith at primary download server.
+
The Sugar Labs Content Delivery Network uses [http://www.mirrorbrain.org/ MirrorBrain] as a redirector. The redirector, which lives in a Sugar Labs data center, keeps track of which files are available on which mirror. When a user requests a file, the redirector points the user to the correct mirror and automatically starts the file download.
*Improve quality of service for users.
 
*Move content closer to users.
 
  
==[http://mirrorbrain.org/ Mirrorbrain]==
+
== Mirrors ==
Mirrrorbrain is used by several major project including opensuse and openoffice.  It is quite stable, under activate development, and well documented,
+
The current list of available mirrors is available at http://mirrors.sugarlabs.org/
  
===Installation===
+
== Considerations ==
The following Recipe for installing mirrorbrain on ubuntu 9.04 is based on information at http://mirrorbrain.org/docs/ .
 
  
====Install LAMP Server====
+
=== Bandwidth ===
Install a standard Ubuntu LAMP server
 
  
====Download mirrorbrain====
+
To run a mirror you need a lot of bandwidth! You should look at [http://stats.sugarlabs.org/download.sugarlabs.org/ the total bandwidth used by all the mirrors].
<code>
 
  wget http://mirrorbrain.org/files/releases/mirrorbrain-2.10.0.tar.gz
 
tar xzf mirrorbrain-2.10.0.tar.gz
 
</code>
 
  
====Install python dependencies====
+
If you have trouble with bandwidth, you should look at [http://www.cloudflare.com CloudFlare].
<code>
 
sudo apt-get install python-sqlobject python-psycopg2
 
</code>
 
  
The python cmdln module is not prepackaged for Ubutnu so it must be installed manually.
+
=== HDD Space ===
  
<code>
+
Hosting a mirror takes a lot of spaceIf you don't have a lot of space you can only choose to mirror some partsFor example exclude all directories but the activities (~13gb):
wget http://cmdln.googlecode.com/files/cmdln-1.1.2.zip
 
  unzip cmdln-1.1.2.zip
 
  cd cmdln-1.1.2
 
sudo python setup.py install
 
</code>
 
  
====Install perl dependencies for scanner====
+
  rsync -avzh rsync://download.sugarlabs.org/pub --exclude 'dextrose' --exclude 'hexoquinasa' --exclude 'images' --exclude 'sources' --exclude 'docs' --exclude 'packages' --exclude 'soas' /rsync/download.sugarlabs.org
The mirrorbrain scanner is written in perl and requires several dependencies.
 
  
<code>
+
== Setting up a new mirror ==
sudo apt-get install libconfig-inifiles-perl libwww-perl libdbd-pg-perl libdatetime-perl libdigest-md4-perl
 
</code>
 
  
====Build, Install, and Configure Apache2 modules====
+
=== For mirror administrators ===
Mirrorbrain requires several Apache modules, several of which must be built manually. Apache modules are built and installed using '''apxs2'''. (APache eXtenSion tool)  Apxs2 is in the apache2-threaded-dev package.
+
All you need is a web server with enough bandwidth to serve the files. To set up a new mirror, the site administrator needs to:
  
<code>
+
* First lets make a directory to store the data:
sudo apt-get install apache2-threaded-dev
 
</code>
 
  
=====Install and Configure mod_geoip=====
+
  mkdir /rsync
Mod_geoip is available as a prebuilt package.
+
  mkdir /rsync/download.sugarlabs.org
  
<code>
+
* Then lets use rsync to download the data (warning: takes a long time)
sudo apt-get install libapache2-mod-geoip
 
</code>
 
  
Mod_geoip must be configured to find the the GeoIP data set.
+
  rsync -avzh rsync://download.sugarlabs.org/pub /rsync/download.sugarlabs.org
  
<code>
+
* Save the rsync command as a shell script and make it executable:
sudo sh -c "cat > /etc/apache2/mods-available/geoip.conf << EOF
 
<IfModule mod_geoip.c>
 
  GeoIPEnable On
 
  GeoIPOutput Env
 
  GeoIPDBFile /var/lib/GeoIP/GeoIP.dat MMapCache
 
</IfModule>
 
EOF
 
"
 
</code>
 
  
Download GeoIP data set
+
  echo "rsync -avzh rsync://download.sugarlabs.org/pub /rsync/download.sugarlabs.org" > /rsync/download.sugarlabs.org/sync.sh
<code>
+
  chmod 774 /rsync/download.sugarlabs.org/sync.sh
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
 
sudo apt-get install gzip
 
gunzip GeoIP.dat.gz
 
sudo mkdir /var/lib/GeoIP
 
sudo cp GeoIP.dat /var/lib/GeoIP/GeoIP.dat
 
</code>
 
  
Enable module.
+
* Then lets make this to sync automatically. We can use a cron job to do that.  You could make sync every 2 hours:
<code>
+
 
sudo a2enmod geoip
+
  echo "0 */2 * * * /rsync/download.sugarlabs.org/sync.sh" > asloSyncCronJob.txt
</code>
+
  crontab asloSyncCronJob.txt
  
Restart Apache
+
If you don't want it to sync every 2 hours, have a look at [https://www.digitalocean.com/community/tutorials/how-to-use-cron-to-automate-tasks-on-a-vps a cron tutorial] to change that value.
<code>
 
sudo /etc/init.d/apache2 restart
 
</code>
 
  
=====Install and Configure mod_form=====
+
* Publish the files via HTTP. Look at your http server documentation on how to do that.  You could set up a virtual host to serve these files: [https://www.digitalocean.com/community/tutorials/how-to-set-up-nginx-server-blocks-virtual-hosts-on-ubuntu-14-04-lts in nginx] [https://www.digitalocean.com/community/tutorials/how-to-set-up-apache-virtual-hosts-on-ubuntu-12-04-lts in apache]
Mod_from must be build from sctatch.
 
  
FIXME correct directory
+
* Setup a rsync mirror so we can view the status of your mirror. To do so, create a rsyncd.conf file and open it:
Download source
 
<code>
 
  wget http://apache.webthing.com/svn/apache/forms/mod_form.c
 
wget http://apache.webthing.com/svn/apache/forms/mod_form.h
 
</code>
 
  
build mod_form
+
  sudo nano rsyncd.conf
<code>
 
sudo apxs2 -cia mod_form.c
 
</code>
 
  
create loader
+
Then insert the following config:
<code>
 
sudo sh -c "cat > /etc/apache2/mods-available/form.load << EOF
 
LoadModule form_module /usr/lib/apache2/modules/mod_form.so
 
EOF
 
"
 
</code>
 
  
Enable module.
+
  log file = /rsync/log
<code>
 
sudo a2enmod form
 
</code>
 
  
Restart Apache
+
  [sugarlabs]
<code>
+
      path = /rsync/download.sugarlabs.org
sudo /etc/init.d/apache2 restart
+
      comment = PUT SOME INFORMATION HERE - LIKE A MOTD
</code>
+
      read only = true
 +
      list = yes
  
=====Configure mod_dsd=====
+
Save and quit nanoThen start rsyncd so it can serve your files:
configure mod_dsd
 
<code>
 
sudo sh -c "cat > /etc/apache2/mods-available/dbd.conf << EOF
 
  <IfModule mod_dbd.c>
 
    DBDriver pgsql
 
    # note that the connection string (which is passed straight through to
 
    # PGconnectdb in this case) looks slightly different - pass vs. password
 
    DBDParams "host=localhost user=mb password=12345 dbname=mb_sugar connect_timeout=15"
 
  </IfModule>
 
EOF
 
  "
 
</code>
 
  
Enable module.
+
  rsync --daemon --config=/etc/rsyncd.conf
<code>
 
sudo a2enmod form
 
</code>
 
  
Restart Apache
+
* Alert the [[Infrastructure_Team/Contacts|Sugar Labs System Administrators]] that they would like their mirror into rotation, including the following information in the request:
<code>
+
** Name and URL of the mirror operator (e.g. organization)
  sudo /etc/init.d/apache2 restart
+
** Name and email address of the administrative contact
</code>
+
** [http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 ISO 3166-1 alpha-2] country code of the server location
 +
** HTTP base URL of the files on the mirror (typically http://mirrors.example.org/sugarlabs/)
 +
** rsync base URL of the files on the mirror (typically rsync://mirrors.example.org/sugarlabs/)
  
====Install and configure mod_mirrorbrain====
+
Please contact sysadmin AT sugarlabs DOT org if you are interested in hosting a mirror.
Build mod_mirrorbrain
 
<code>
 
sudo apxs2 -cia mod_mirrorbrain.c
 
</code>
 
  
create module loader
+
=== For Sugar Labs sysadmins ===
<code>
 
sudo sh -c "cat > /etc/apache2/mods-available/mirrorbrain.load << EOF
 
LoadModule mirrorbrain_module /usr/lib/apache2/modules/mod_mirrorbrain.so
 
EOF
 
"
 
</code>
 
  
Enable module.
+
To add a new mirror to the MirrorBrain redirector:
<code>
 
sudo a2enmod mirrorbrain
 
</code>
 
  
Restart Apache
+
* Choose a name for the mirror, usually the host name.
<code>
+
* Register the mirror with MirrorBrain:
  sudo /etc/init.d/apache2 restart
+
  sudo -u mirrorbrain mb new <mirror name> --operator-name <operator name> \
</code>
+
  --operator-url <operator URL> -a <admin name> -e <admin email> \
 
+
  -c <country code> -H <base HTTP URL> -R <base rsync URL> -F <base FTP URL>
====Build and Install helper programs====
+
* Scan and enable the mirror:
build and install geoiplookup
+
  sudo -u mirrorbrain mb scan -e <mirror name>
<code>
+
* Export the list of mirrors for mirmon (a hourly cronjob does this, but if you don't want to wait...):
gcc -Wall -lGeoIP -o geoiplookup_continent geoiplookup_continent.c
+
  mb export --format=mirmon-apache | sudo -u mirrorbrain tee /srv/mirrorbrain/mirmon/mirrorlist-export
sudo cp geoiplookup_contintent/usr/bin/geoiplookup_continent
+
* Finally, re-run mirmon to ensure it can check the health of the mirror (this is also done by a cronjob, but our patience is very short):
</code>
+
  sudo -u mirrorbrain mirmon -v -get all -c /etc/mirmon.conf
 
 
Install scanner
 
<code>
 
sudo cp ../tools/scanner.pl /usr/bin/scanner
 
</code>
 
 
 
====Install postgres====
 
<code>
 
sudo apt-get install postgresql-8.4
 
</code>
 
 
 
=====Create the postgresql user account and database=====
 
Switch to user postgress
 
<code>
 
sudo su - postgres
 
</code>
 
 
 
Create user
 
<code>
 
createuser -P mb
 
Enter password for new role:
 
Enter it again:
 
Shall the new role be a superuser? (y/n) n
 
Shall the new role be allowed to create databases? (y/n) n
 
Shall the new role be allowed to create more new roles? (y/n) n
 
</code>
 
 
 
Create database
 
<code>
 
createdb -O mb mb_sugar
 
createlang plpgsql mb_sugar
 
</code>
 
 
 
Exit user postrges
 
<code>
 
exit
 
</code>
 
 
 
=====Edit host-based authentication=====
 
add line 'local mb_sugar mb trust' to the end of pg_hba.conf
 
 
 
FIXME should not be trust on production machine
 
<code>
 
sudo vim /etc/postgresql/8.4/pg_hba.conf
 
</code>
 
 
 
Start the potgres server
 
<code>
 
sudo /etc/init.d/postgresql-8.4 restart
 
</code>
 
 
 
=====Import intial mirrorbrain Data=====
 
Import table structure, and initial data
 
<code>
 
psql -U mb -f sql/schema-postgresql.sql mb_sugar
 
psql -U mb -f sql/initialdata-postgresql.sql mb_sugar
 
</code>
 
 
 
=====Create mirrorbrain user and group=====
 
<code>
 
  sudo groupadd -r mirrorbrain
 
sudo useradd -r -g mirrorbrain -s /bin/bash -c "MirrorBrain user" -d /home/mirrorbrain mirrorbrain
 
</code>
 
 
 
=====Create mirrorbrain.conf=====
 
<code>
 
sudo cat > /etc/mirrorbrain.conf << EOF
 
[general]
 
instances = samba
 
[samba]
 
dbuser = mb
 
dbpass = 12345
 
dbdriver = postgresql
 
dbhost = 127.0.0.1
 
# optional: dbport = ...
 
  dbname = mb_sugar
 
[mirrorprobe]
 
mailto = dfarning@sugarlabs.org
 
</code>
 
 
 
Set permission and priviglies
 
<code>
 
sudo chmod 0604 /etc/mirrorbrain.conf
 
sudo chown root:mirrorbrain /etc/mirrorbrain.conf
 
</code>
 
 
 
=====Test mirrorbrain=====
 
<code>
 
./mirrorbrain.py
 
</code>
 
 
 
====Create VirtualHost====
 
<code>
 
sudo cat > /etc/apache2/sites-available/mirror << EOF
 
<VirtualHost your.host.name:80>
 
    ServerName samba.mirrorbrain.org
 
    ServerAdmin webmaster@mirrorbrain.org
 
    DocumentRoot /srv/samba/pub/projects
 
    ErrorLog    /var/log/apache/samba.mirrorbrain.org/logs/error_log
 
    CustomLog    /var/log/apache/samba.mirrorbrain.org/logs/access_log combined
 
    <Directory /srv/samba/pub/projects>
 
      MirrorBrainEngine On
 
      MirrorBrainDebug Off
 
      FormGET On
 
      MirrorBrainHandleHEADRequestLocally Off
 
      MirrorBrainMinSize 2048
 
      MirrorBrainHandleDirectoryIndexLocally On
 
      MirrorBrainExcludeUserAgent rpm/4.4.2*
 
      MirrorBrainExcludeUserAgent *APT-HTTP*
 
      MirrorBrainExcludeMimeType application/pgp-keys
 
      Options FollowSymLinks Indexes
 
      AllowOverride None
 
      Order allow,deny
 
      Allow from all
 
    </Directory>
 
  </VirtualHost>
 
</code>
 
 
 
 
 
 
 
<noinclude>{{ GoogleTrans-en | es =show | bg =show | zh-CN =show | zh-TW =show | hr =show | cs =show | da =show | nl =show | fi =show | fr =show | de =show | el =show | hi =show | it =show | ja =show | ko =show | no =show | pl =show | pt =show | ro =show | ru =show | sv =show }}</noinclude>
 
== Subpages ==
 
 
 
{{Special:PrefixIndex/{{PAGENAMEE}}/}}
 
 
 
 
 
[[Category:Team]]
 

Latest revision as of 21:10, 9 August 2014

Team Home   ·   Join   ·   Contacts   ·   Resources   ·   FAQ   ·   Roadmap   ·   To Do   ·   Meetings

Introduction

A content delivery network or Content Delivery Network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server, thereby causing a bottleneck near that server.

Goals

  • Reduce bandwidth at primary download server.
  • Improve quality of service for users.
  • Move content closer to users, thus reducing latency.

Architecture

The Sugar Labs Content Delivery Network uses MirrorBrain as a redirector. The redirector, which lives in a Sugar Labs data center, keeps track of which files are available on which mirror. When a user requests a file, the redirector points the user to the correct mirror and automatically starts the file download.

Mirrors

The current list of available mirrors is available at http://mirrors.sugarlabs.org/

Considerations

Bandwidth

To run a mirror you need a lot of bandwidth! You should look at the total bandwidth used by all the mirrors.

If you have trouble with bandwidth, you should look at CloudFlare.

HDD Space

Hosting a mirror takes a lot of space. If you don't have a lot of space you can only choose to mirror some parts. For example exclude all directories but the activities (~13gb):

 rsync -avzh rsync://download.sugarlabs.org/pub --exclude 'dextrose' --exclude 'hexoquinasa' --exclude 'images' --exclude 'sources' --exclude 'docs' --exclude 'packages' --exclude 'soas' /rsync/download.sugarlabs.org

Setting up a new mirror

For mirror administrators

All you need is a web server with enough bandwidth to serve the files. To set up a new mirror, the site administrator needs to:

  • First lets make a directory to store the data:
 mkdir /rsync
 mkdir /rsync/download.sugarlabs.org
  • Then lets use rsync to download the data (warning: takes a long time)
 rsync -avzh rsync://download.sugarlabs.org/pub /rsync/download.sugarlabs.org
  • Save the rsync command as a shell script and make it executable:
 echo "rsync -avzh rsync://download.sugarlabs.org/pub /rsync/download.sugarlabs.org" > /rsync/download.sugarlabs.org/sync.sh
 chmod 774 /rsync/download.sugarlabs.org/sync.sh
  • Then lets make this to sync automatically. We can use a cron job to do that. You could make sync every 2 hours:
 echo "0 */2 * * * /rsync/download.sugarlabs.org/sync.sh" > asloSyncCronJob.txt
 crontab asloSyncCronJob.txt

If you don't want it to sync every 2 hours, have a look at a cron tutorial to change that value.

  • Publish the files via HTTP. Look at your http server documentation on how to do that. You could set up a virtual host to serve these files: in nginx in apache
  • Setup a rsync mirror so we can view the status of your mirror. To do so, create a rsyncd.conf file and open it:
 sudo nano rsyncd.conf

Then insert the following config:

 log file = /rsync/log
 [sugarlabs]
     path = /rsync/download.sugarlabs.org
     comment = PUT SOME INFORMATION HERE - LIKE A MOTD
     read only = true
     list = yes

Save and quit nano. Then start rsyncd so it can serve your files:

 rsync --daemon --config=/etc/rsyncd.conf
  • Alert the Sugar Labs System Administrators that they would like their mirror into rotation, including the following information in the request:
    • Name and URL of the mirror operator (e.g. organization)
    • Name and email address of the administrative contact
    • ISO 3166-1 alpha-2 country code of the server location
    • HTTP base URL of the files on the mirror (typically http://mirrors.example.org/sugarlabs/)
    • rsync base URL of the files on the mirror (typically rsync://mirrors.example.org/sugarlabs/)

Please contact sysadmin AT sugarlabs DOT org if you are interested in hosting a mirror.

For Sugar Labs sysadmins

To add a new mirror to the MirrorBrain redirector:

  • Choose a name for the mirror, usually the host name.
  • Register the mirror with MirrorBrain:
sudo -u mirrorbrain mb new <mirror name> --operator-name <operator name> \
 --operator-url <operator URL> -a <admin name> -e <admin email> \
 -c <country code> -H <base HTTP URL> -R <base rsync URL> -F <base FTP URL>
  • Scan and enable the mirror:
sudo -u mirrorbrain mb scan -e <mirror name>
  • Export the list of mirrors for mirmon (a hourly cronjob does this, but if you don't want to wait...):
mb export --format=mirmon-apache | sudo -u mirrorbrain tee /srv/mirrorbrain/mirmon/mirrorlist-export
  • Finally, re-run mirmon to ensure it can check the health of the mirror (this is also done by a cronjob, but our patience is very short):
sudo -u mirrorbrain mirmon -v -get all -c /etc/mirmon.conf