Skip to content

Q&A with Antoviaque

pdehaye edited this page Sep 1, 2013 · 15 revisions

On Friday we (Paul-Olivier Dehaye and Marko Seric at the Mathematics Institute of the University of Zurich) held a Q&A session with Xavier Antoviaque. We had prepared questions and in the 4 hours allocated Antoviaque tried to answer as many of them as possible. The transcript is below, on the following topics:

  • Access control
  • Analytics
  • Backup strategy
  • Configuration with Ansible
  • Mako templates
  • Services management
  • Theming
  • Wiki
  • Additional unanswered questions

Access control

Question

As far as I know anyone who knows the IP address of the Studio instance can create an account on Studio, with potential security risk: they could add custom python homework and run arbitrary code on the server.

What is the canonical way to restrict access to Studio, such that legitimate users can still access it?

Answer

The way it seems to be done for edx.org is through the use of basic HTTP auth - ie adding a login/password HTTP auth popup when someone attempts to access the CMS (Studio). It's a single login/password which is identical for everyone, and simply prevents anyone who doesn't have it to access Studio altogether.

Also note that, for those who have the HTTP auth login/password, once they are on Studio the normal user account restrictions will apply. Ie even if someone gained access to the shared login/password, users can only view and edit courses they have created, or to which they have been invited. So this initial HTTP auth is only one extra layer of security.

The way it's done is to configure nginx to ask for a HTTP password when proxying the CMS service, which is configured from the following template:

https://github.com/edx/configuration/blob/master/playbooks/roles/nginx/templates/cms.j2

which goes to /etc/nginx/sites-enabled on the VM.

To keep it simple, you can edit that template to add:

location / {
    ...
    auth_basic "Restricted";
    auth_basic_user_file /etc/nginx/nginx.htpasswd;
}

The content of /etc/nginx/nginx.htpasswd is already set by ansible in https://github.com/edx/configuration/blob/master/playbooks/roles/nginx/tasks/main.yml#L17

using the configuration variable nginx_cfg.htpasswd from https://github.com/edx/configuration/blob/master/playbooks/roles/nginx/vars/main.yml#L20

The default value seem to be the login/password edx/edx. If you want something harder to guess add the nginx_cfg config variables to your configuration-secure repository, with a different value for htpasswd.

Analytics

Question

As far as I know, in edX code, documentation and UI, the word "analytics" appears multiple times:

  • there is an analytics tab for instructors within studio
  • insights repo on github
  • edxanalytics repo on github

These together have at least two different roles:

  • one to help the instructor teach (how many students answered this quiz/hw successfully?, etc)
  • the other one to help further paedagogy research (keep logs of all student/site interactions in order to answer questions such as "how often do students pause videos?" "do they go back to the video when doing the homework?", etc)

Correct so far?

Answer

That's correct yes, as far as I can tell for the goals - giving feedback about how the classroom is doing, and processing data for research. The insights repository seem to have been designed with flexibility in mind, which would be useful for both scenarios.

I haven't attempted to install them, but insight and edxanalytics are actually closely related. In a nutshell, insights seem to be a generic analytics software, developped by edX but potentially usable with other applications than edX. edxanalytics is code & templates used by insights when it is used with edX.

Also note that both seem to be experimental/development versions of what would be used in the future for running analytics on edX - I haven't looked at the code for this, so not sure what's currently used, but the repositories seem to indicate that it's not the case yet, so not sure those two can be used yet.

Untouched questions

Do you know how to install insights or edxanalytics? If so, how long does it take to install that? What does it allow us to do? Is is possible/recommended/en-/discouraged to install it separately or later?

Do you have an oppinion about the scope of analytics and insights in the next few months (where's the future heading, will insights superseed the functionality of analytics, etc.)

Backup strategy

Question

For backup and restore we were thinking about using the dump procedures provided by mongodb and mysql, respectively. Then we would use the backups from our IT department to backup the dumps. For restore we would use the restore procedures provided by mongodb and mysql, respectively.

Do you see anything wrong with the proposal ? If so, could you give a few pointer about how to attack it?

Answer

This seems completely reasonable. When there is no pre-existing architecture, I usually recommend to externalize db+backup, as this part is not specific to edX, so it can be worth letting someone else worry about it. Ie, when the installation is on AWS, using RDS for MySQL and mongolab for the mongo databases, and using their backup services to save them.

However, in your case, it makes sense to use your IT department backup capabilities. If they also offer to run MySQL/Mongo, that might be worth taking advantage of it, similarly as you would from RDS/mongolab. If they are already running instances for other projects, you might also benefit from existing backup procedures.

If not, and you want to do the backups, you will need dump the contents of the different databases. The tools which will allow you to do this are:

Here are some sample command line calls - to adapt to your case:

$ mysqldump -u root --single-transaction $db > /opt/backup/sql/$db.sql
$ mongodump --db $db --out /opt/backup/mongodump/$db

In all cases, make sure you're correctly setting up databases to point to external servers, don't let the default edx_sandbox.yml values which use the edxlocal role to setup local database servers on the same instance. You basically want to work with the assumption that you should be able to destroy your instances at any time without losing anything. Not only this ensures that you backup everything that should be, it also ensures that you won't make changes directly on the instances, but rather always from the ansible playbooks. This way you don't accumulate entropy on the VMs, and all changes to the servers are versioned with git (which you will also need to backup those repos, if you don't use github).

The best way to ensure this discipline, and make it easier to follow, is to setup an automated build. It would automatically recreate the instance from scratch every time a change is made from one of the repositories.

Configuration with Ansible

Situation

We want to use the configuration repo for setting up and configuring our machines.

We had (some) success with changing values in the folders

 playbooks/roles/<rolename>/vars/

for the different roles. But we don't have much experience with variable precedence and in particular with parametrized variables putting directly in the initial playbook and the 'secure_dir' variable.

Question

We would ideally like to have some properties with our use of the configuration repo:

  • having all changes (variables etc.) made to the configuration within one or two files or directory. (or two with 'secure_dir')
  • (if possible) also only having to call one playbook.

This would give us independence of the generic edx repo in terms of development/testing/documentation and be like an "internal use plugin".

Do you know if some kind of playbook or config-file/dir is possible? If yes, can you describe how to setup such a "plugin playbook"?

Answer

Yes, it's definitely better to isolate the configuration variables which are specific to your own installation separately. I actually use a separate repository for this pruprose - this way you can keep the configuration variables in a private repository (since you don't want passwords to be available publicly), while still keeping the more general changes you make to the configuration repository available publicly, as required by the AGPL license of that repository (it would be illegal to keep those changes private).

  • Private: configuration-secure
  • Public: configuration

Then, using variable precedence rules (cf http://www.ansibleworks.com/docs/playbooks2.html#understanding-variable-precedence ) configuration-secure is structured like this:

|-- group_vars -> ../configuration/playbooks/group_vars/
|-- hosts_production.lst
|-- hosts_staging.lst
`-- vars
    `-- edxapp_vars.yml

edxapp_vars.yml will contain any variable coming from any of the roles variables (playbooks/roles/*/vars/main.yml) which you want to overwrite.

hosts_*.lst is an inventory file, containing something like:

[localhost]
edx.antoviaque.org

Using the group name localhost will allow us to use the sandbox playbook remotely without having to change the hostname.

group_vars is accessd by Ansible from the same directory as the inventory file, so the symlink allows to use the default values if you don't need to changed them.

There is one change you would still need to make in the configuration repository, which is to get it to load the overriding variables from a directory we can configure from the ansible command-line (the edx_sandbox.yml used to have this line, but it was removed from the official repo):

https://github.com/antoviaque/configuration/commit/703d06aafac8cedb0bf23ff86930778c075ee817

Assuming you put both configuration and configuration-secure repositories in the same directory, to run the edx_sandbox.yml playbook remotely on edx.antoviaque.org, you would run:

$ ansible-playbook -vvv --user=ubuntu -i ../../configuration-secure/hosts_staging.lst -s --extra-vars="secure_dir=../../configuration-secure" edx_sandbox.yml

Mako templates

Question

How does one use mako templates? (I m not asking how to write a template, just where to put them and how to make the rest of edx interact with them!)

Example: I want every yotuube video to be followed by an html block that says "here was a youtube video!". For this I make a template called videohtml, let's say. Where do I put the template (which folder), where do I signal that videohtml is a template name?

Answer

It will highly depend on the exact feature you want to alter - edx-platform is actually a collection of django applications, so there are different set of templates and different ways for the django applications to load them. How templates are loaded from views is actually more of a Django question, even if the project doesn't use the default templating language that comes with Django.

If you haven't already, I would recommend spending some time getting familiar with the Django documentation on this subject, which is very comprehensive:

Once this is done, the following should make more sense:

The structure is different from a typical Django application. For the LMS, the templates are located in lms/templates/ (cf video.html for example) and the urls conf which will allow you to track down what handles a particular view (And thus calls the template you are looking for) is in lms.urls.py. A similar structure exists for the CMS in cms/.

We can follow up with additional questions on this, if there are some specific parts you want more precise explanations about.

Services management

Question

There is talk about "services" which are called cms, lms, etc... Are these ubuntu upstart services? should they automatically be created by the playbooks?

Answer

Yes, each role from the ansible playbooks provides different services, which are setup as Ubuntu upstart scripts.

For example, the edxapp role creates the following upstart services, using templates in playbooks/roles/edxapp/templates/*.conf.j2

# ls playbooks/roles/edxapp/templates/*.conf.j2
playbooks/roles/edxapp/templates/cms.conf.j2                 playbooks/roles/edxapp/templates/edx-worker-lms.conf.j2  playbooks/roles/edxapp/templates/lms-preview.conf.j2
playbooks/roles/edxapp/templates/edx-worker-cms.conf.j2      playbooks/roles/edxapp/templates/edx-workers.conf.j2     playbooks/roles/edxapp/templates/lms-xml.conf.j2
playbooks/roles/edxapp/templates/edx-worker-lms-xml.conf.j2  playbooks/roles/edxapp/templates/edxapp.conf.j2          playbooks/roles/edxapp/templates/lms.conf.j2

These templates are evaluated and copied to /etc/init on the VM being configured.

Question

How can we stop/start/restart "the services". is there a proposed procedure?

Answer

To stop a service:

# service <name> stop

To start a service:

# service <name> start

To restart:

# service <name> restart

Also, you can either start/stop the LMS & CMS services individually (ie, service lms-xml restart), or all at the same time using the edxapp service:

# service edxapp restart

Theming

Question

How did Stanford get its theme to work on edX?

We are asking with the perspective of a course launch mid-September (internal to our institute). We don't really care that Stanford's solution is hacky. What we would like to do is be able to install the theme https://github.com/Stanford-Online/edx-theme with the convenience of a playbook. (if possible, or as a general procedure which we apply via playbooks or shellscripts) We can do the next step of modifying the stanford theme to make it suit our style.

We figured we at least need to change the variable edxapp_theme_name, for instance directly in playbooks/edx-east/roles/edxapp/vars/main.yml

We did that and it seems to be recognized: it copies Stanford-online/edx-theme into a folder stanford in the directory /opt/wwc/themes But nothing else much seems to happen...

Note: lms/envs/common.py also contains the variable 'USE_CUSTOM_THEME' and the function enable_theme, see its docstring rakelib/assets.rake contains THEME_NAME = ENV_TOKENS['THEME_NAME'] One can add "THEME_NAME":"stanford" to env.json, still does not seem to make a difference

Answer

You seem to be on the right track. I've investigated this a bit, and as far as I can tell without looking directly at your installation, your troubles might come from the way you set the theme name.

First, you actually have two other variables related to the theme name that you can set:

edxapp_theme_name: 'stanford'
edxapp_theme_source_repo: 'https://github.com/antoviaque/edx-theme.git'
edxapp_theme_version: 'HEAD'

You have already changed edxapp_theme_source_repo, but check that you also changed edxapp_theme_name.

Also, unless this was changed since the last time I installed a theme, you also need to set the THEME_NAME like this, still in your ansible playbooks (the files /opt/wwc/*.json are handled by ansible, so you shouldn't alter them directly):

generic_env_config: &edxapp_generic_env
    ...
    'THEME_NAME': 'stanford'

This will make ansible add the THEME_NAME variable to the env tokens. This is similar to what you did, but besides not doing it from ansible, you might have been doing the change in the wrong file, as the lms environment is found in /opt/wwc/lms*.env.json (there can be several variants of the LMS service on a single host, and thus different files).

Question

Is the theme applied during the playbook? After the setup? Recommendations?

Answer

The theme is not strictly speaking "applied" during the playbook - the original files from edx-platform will remain unchanged. Rather, the theme repository is fetched (as you observed) and the configuration variables get the code from edx-platform to load the templates and static files from the theme directory.

Note that the theming capabilities currently don't allow to override everything from edx-platform, so depending on the design that you will want to implement, you might also have to alter edx-platform directly (this is actually what Stanford did for some of their own changes).

Unanswered question

Say we update a theme. how do we play it onto a server? How difficult is it to play it onto a running machine? Is it just a matter of shutting down edx for maintenance, putting the themes, and restarting edx?

Wiki

Question

Is there an API for the edX wikis?

Answer

The wiki uses django-wiki, which does not have an API (yet). With that answer, Fanthomas90 on #django-wiki (TomLottermann on github) tells me: Fanthomas90: django has some sort of model view controller principle although it is sort of mixed up a little. You have your models (which are the data entities described in a very nice classbased form), views (which are sort of the controllers) which provide the logic of extracting data and creating models and stuff and then the templates

[11:39am] Fanthomas90: (which are more like the views in the mvc principle), which are html with some django-specific tags... So what you want to see is the view of the article creation (e.g.).
[11:39am] Fanthomas90: You can find it in /wiki/views/article
[11:39am] Fanthomas90: the Create class represents it...
[11:40am] Fanthomas90: This is a class based view and features a form. So whatever is called in the form_valid mathod is what you want to dig into
[11:41am] Fanthomas90: and there yo can see, that benjaoming has already made it very easy with the models.URLPath.create_article() method to create articles...
[11:42am] Fanthomas90: The next step is to find out how to use the django environment and control this environment from a script.
[11:42am] Fanthomas90: I would recommend that you look into https://docs.djangoproject.com/en/dev/howto/custom-management-commands/

It does look like create_article gives an indication of how to populate a wiki from the command-line.

Additional topics: unanswered questions

If you know the answers to the following questions, feel free to let me know!

Student id signing hw

How could we transmit a student id/login name to a custom python grading script? (e.g. get some info from a python script associated to a text-box or input field)

Installation of xqueue

We want to install xqueue. We played the playbook, and then enter

<problem markdown="null">
  <p>
Some nice problem
</p>
  <p>A box of stuff to grade</p>
  <coderesponse url="http://fancy.url.com">
    <codeparam></codeparam>
    <textbox/>
  </coderesponse>
  <p/>
</problem>

as a problem. We test and get the response: "Error checking problem: no external queueing server is configured."

Clone this wiki locally