2. User Guide

2.1. Install twikiget

If you are interested in archiving twikis using twikiget, all you need to install is the twikiget, ideally in a new virtual environment:

$ # create new virtual environment
$ virtualenv ~/.virtualenvs/twikiget
$ source ~/.virtualenvs/twikiget/bin/activate
$ # install twikiget
$ pip install twikiget

2.2. Basic usage

$ # download twiki
$ twikiget archive https://twiki.cern.ch/twiki/bin/view/Main/ZhuTopAnalysis
$ ls
ZhuTopAnalysis.warc cache
$ # once the twiki is archived we can list it's contents:
$ twikiget list ZhuTopAnalysis.warc
$ ...
$ # we can also view the raw content of each file:
$ twikiget view ZhuTopAnalysis.warc https://twiki.cern.ch/twiki/bin/view/Main/ZhuTopAnalysis
$ ...

2.3. CLI API

2.3.1. archive

Archive a TWiki page with attachments into a WARC archive.

Raw archived files are also saved to a directory specified in directory-prefix option (default=./cache).

Options passed in wget-options will overwrite the twikiget defaults,
and should be used with caution.
archive [OPTIONS] URL

Options

--wget-options <wget_options>

additional options to pass to wget

-o, --out-warc-file <out_warc_file>

output file name for a WARC file

-P, --directory-prefix <directory_prefix>

output folder for raw files

Arguments

URL

Required argument

2.3.2. list

List files in the WARC archive.

The list can be filtered by the HTTP Content Type,
and exported as json if needed.
Note that content-type option can be a full name of a type
or, to search broader, just the first part of it e.g. text/css and text

Examples:

$ twikiget list ExampleTwiki.warc

$ twikiget list ExampleTwiki.warc --content-type=text/html

$ twikiget list ExampleTwiki.warc --content-type=text

$ twikiget list ExampleTwiki.warc --json
list [OPTIONS] WARC_FILE

Options

--json

Get output in JSON format.

--content-type <filter_content_type>

Filter files in an archive by content_type. It can be either full version or just a begging of type name

Arguments

WARC_FILE

Required argument

2.3.3. view

View raw content of one of the files in the WARC archive.

View command is usefull to inspect contents of one file from the archive. It can be used with a pipe or a stream to view the file in a web-browser or other suitable program. FILE-URI argument can be copied form the outputs of twikiget list.

Examples:

$ twikiget view ExampleTwiki.warc https://example.com/twiki?raw=on
$ twikiget view ExampleTwiki.warc http://example.com/style.css

$ twikiget view ExampleTwiki.warc http://example.com/img.png > img.png
view [OPTIONS] WARC_FILE FILE_URI

Arguments

WARC_FILE

Required argument

FILE_URI

Required argument