With non-empty root location, the canonic form of the root URL for a
kiwix server is now required to end with a slash (to match the situation
for an empty root location). This requirement enables usage of relative
URLs on the welcome page and resources/scripts loaded through that page.
A slashless root URL is redirected to the slashful version.
Now the root location is URI-encoded too.
In order to properly test this change the root location in the tests was
changed from "/ROOT" to "/ROOT#?" (or "/ROOT%23%3F" in URI-encoded form),
which is why this commit is so big.
This change doesn't make much sense on its own - the real goal is to
prepare some ground for easier implementation of URI-encoding of the root
location.
Testing of this functionality revealed that the query part containing +
symbols (as replacement for spaces in the parameter values) isn't
forwarded properly as the + symbols are URI-encoded (this is a bug on
the part of the `RequestContext::get_query()` the result of which
already contains URI-encoded +'s).
- Before this change `InternalServer::build_redirect()` only URI-encoded the
article path, ignoring the book name and/or the root location components of
the URL.
- In order to be able to test this fix, corner_cases.zim was renamed to
contain a couple of special URL symbols in its filename. The
`create_corner_cases_zim_file` script was updated accordingly.
`false` is a pretty bad default value as most user want to track
the real download.
By removing the default value, we force user to make a choice.
We could have change the default value to true but it would have been
a silent API change and we don't want that.
User may already have a pointer to the `Download` and it is not protected
against concurrent access.
We could update the status of new created `Download` as by definition,
no one have a pointer on it.
But it better to not do it neither :
- For consistency
- Because the first call on update status may be long on windows (because
of file preallocation). It is better to not block the downloader for that.
Special URI symbols occurring in the item path part of the search result
link were NOT encoded, because that would also encode the path separator (/)
symbol. Now that `urlEncode()` never encodes the / symbol, it is safe to
encode all other URI-special symbols in the path.
This change is a quick hack solving known issues with URI-encoding in
libkiwix.
This change removes the slash character from the list of URL separator
symbols in URL encoding/decoding utilities, and makes it a symbol that
is safe to leave unencoded.
Effects:
- `urlEncode()` never encodes the '/' symbol (even when it is requested
to encode the URL separator symbols too).
- `urlDecode(str)`/`urlDecode(..., false)` will now decode %2F to '/';
other encoded URL separator symbols are NOT decoded when the second
argument of `urlDecode()` is set to false (which is the default).
Without specifying the "Path" attribute of the cookie in the "Set-Cookie" header
we end up with multiple instances of the cookie for different URLs. We
want a single "global" cookie for kiwix-serve. Besides we want it to be
"permanent" rather than a session cookie, hence the large (1-year-long)
TTL value for the "Max-Age" attribute.
From now on, the `lang` parameter of the /catalog/search,
/catalog/v2/entries, and /catalog/v2/partial_entries endpoints is
interpreted as a comma-separated list of languages.
Before this change RequestContext::get_query() returned a reordered
query string (alphabetically sorted by the parameter names).
This fix facilitiates testing of responses where the request URL appears
in the response.
Multizim search requires that all selected books be in the same
language.
No new URL query parameter was introduced for specifying the intended
search language - `books.filter.lang` can be used for that purpose.
The server_search unit-test was updated to use a slightly cheating
library xml file where the language of example.zim was tweaked from "en"
to "eng" in order to match that of zimfile.zim. Note that this change
drops from the tested server two other goofy ZIM files corner_cases.zim
and poor.zim that have been/are included in ServerTest.
During static resource preprocessing and compilation their cacheid
values are embedded into libkiwix and can be accessed at runtime.
If a static resource is requsted without specifying any cacheid
it is served as dynamic content (with short TTL and the library id
used for the ETag, though using the cacheid for the ETag would
be better).
If a cacheid is supplied in the request it must match the cacheid of the
resource (otherwise a 404 Not Found error is returned) whereupon the
resource is served as immutable content.
Known issues:
- One issue is caused by the fact that some static resources don't get a
cacheid; this is resolved in the next commit.
- Interaction of this change with the support for dynamically customizing
static resources (via KIWIX_SERVE_CUSTOMIZED_RESOURCES env var) was
not addressed.
Before this fix the root URL for a book was assumed to resolve to the
main page. This was not true for ZIM files containing an entry at an
empty path or with a path equal to "/", resulting in issue #826. The
logic behind this behaviour is found in `kiwix::getEntryFromPath()`.
The fix to that issue is a little more general and will result in an
HTTP redirect in any case where `kiwix::getEntryFromPath(zim, path)`
returns an entry with a real path different from the requested one. In
particular, this will affect the behaviour on ZIM files with the old
namespace scheme, where the requested resource - if not found - is also
looked up in the 'A', 'I', 'J', and/or '-' namespaces. Now instead of
returning the contents of that other resource an HTTP redirect response
will be sent.
If `kiwix-serve` is run with the `--nosearchbar` option the toolbar is
disabled (hidden) in its viewer.
Note however that certain actions performed by the viewer merely with
the purpose of keeping the toolbar up-to-date are still carried out.
`--nosearchbar` option of `kiwix-serve` (despite its misleading name)
was used to disable the entire taskbar. This commit accounts for the
existence of that option only partially:
1. Links to books on the welcome/library page are affected - by default
books are displayed in the viewer, but in a kiwix-serve instance run
with --nosearchbar books are loaded in the top window.
2. The `/viewer` endpoint is enabled unconditionally, so if anyone
enters the viewer URL in the address bar they will see books in the
viewer.
Made the viewer respect the `--blockexternal` and `--nolibrarybutton`
options of `kiwix-serve`. Those options are passed to the viewer
via the dynamically generated resource `/viewer_settings.js`.
The only place that the root link is now used is in /skin/index.js,
so added it in static/templates/index.html. But it seems that nothing
prevents us from from switching from aboslute paths to relative paths
in /skin/index.js, which will eliminate the need for the root link
altogether.
As a result of this change content is never decorated by kiwix serve.
This resulted in compiler aided discovery of all call sites where the
default values were used. For OPDS/catalog requests now passing true for the
`raw` parameter, since XML content isn't supposed to undergo any
transformations.
Removed the isHomePage param from one of the variants of
`ContentResponse::build()`. The other overload is dangerous since
failing to review&update all of its call site may result in changed
semantics. Will do it in a couple of separate commits.
The next goal is to redirect old-style /book/path/to/entry URLs to
/content/book/path/to/entry, which seemed pretty trivial.
However, given the current handling of some endpoint URLs, more work was
required to ensure that invalid endpoint URLs (e.g. "/random/number" or
"/suggest/fr") are not interpreted as content URLs. Previously, that was
not a user-observable issue, since the result would be an immediate 404
error (except in certain edge cases, like handling the request for
"/random/number" when there is a book with name "random" containing an
article at path "/number"). With redirection of URLs that were assumed
to refer to content a 404 error would be issued for the
transformed URL ("/content/random/number") which may be confusing.
Therefore this change is to ensure the correct routing of endpoint URL
handling.
Book content is now served under /content/book/...
The old access to book content via a top-level URL /book/... is so far
preserved for backward compatibility.
Redirects were changed to use the new URL scheme. Links in the search results
still use the old scheme.
If the server is initialized with a library.xml file, then the id
specified in the XML file is used (rather than the UUID recorded in the
ZIM file).
Note that in test/data/library.xml the book ids are fake and
different from the real ZIM IDs; that file was created for testing
of the /catalog endpoint which doesn't access ZIM content, so the
the same ZIM file zimfile.zim was added to library.xml three times as
three different books (with unique human-friendly ids). This explains
the diff in test/library_server.cpp.
During work on the kiwix-serve front-end, the edit-save-test cycle is
a multistep procedure:
1. build and install libkiwix
2. build kiwix-tools
3. run kiwix-serve
4. reload the web-page in the browser
When making changes in static resources that are served by kiwix-serve
unmodified, the steps 1-3 can be eliminated if kiwix-serve is capable of
serving resources from the file-system. This commit adds such a
functionality to kiwix-serve. Now, if during startup of kiwix-serve the
environment variable `KIWIX_SERVE_CUSTOMIZED_RESOURCES` is defined it is
assumed to point to a file where every line has the following format:
URL MIMETYPE RESOURCE_FILE_PATH
When a request is received by kiwix-serve and its URL matches any of the
URLs read from the customized resource file, then the resource data is
read from the respective file RESOURCE_FILE_PATH and served with
mime-type MIMETYPE.
Though this feature was introduced in order to facilitate the
development of the iframe-based content viewer, it can also be useful to
users who would like to customize the kiwix-serve front-end on their own
(without re-building all of kiwix-serve).
There is some overlap with a feature of the kiwix-compile-resources
script that also allows to override resources. The differences are:
1. The new way of customizing front-end resources has all such resources
listed in a text file and there is a single environment variable
from which the path of that file is read. kiwix-compile-resources
associates a separate environment variable with each resource.
2. The new way uses regular paths to identify a resource. The
kiwix-compile-resources method encodes the resource path by replacing
any non-alphanumeric characters (including the path separator) with
underscores (so that the resulting resource identifier can be used
to construct the name of the environment variable controlling that
resource).
3. The new method allows adding new front-end resources. The old method
only allows to modify existing resources.
4. The new method allows (actually requires) to specify the URL at which
the overriden resource should be served (similarly, the MIME-type can/must
be specified, too). The old method only allows to override the contents of
a resource.
5. The new method only allows to override front-end resources that are
served without any preprocessing by kiwix-serve at runtime. The old
method allows to override template resources as well (note that
internationalization/translation resources cannot be overriden using the
old method, either).
If we keep a reference to a `Reader` it is better to (share) owning
the reference. Else the reader may be deleted after we create the searcher.
This is especially the case now we are creating the `Reader` at demand
and we don't store it in the library's cache.
We have to reuse the query the user give us to generate the
pagination links.
At search result rendering step we don't have access to the query object.
The best place to know which arguments are used to select books
(and so which arguments to keep in the pagination links) is when we
parse the query to select books.
Fix tests (pagination links) with book selector other than "books.id="
(pattern=jazz&books.query.lang=eng)
libzim's search is not thread safe (mainly because xapian is not).
So we must protect our search objects from multi thread calls.
The best way to do this is to associate a mutex to the `zim::Searcher`
and lock the searcher each time we access object derivated from the
searcher (search, results, iterator, ...)
When ConcurrentCache store a shared_ptr we may have shared_ptr in used
while the ConcurrentCache has drop it.
When we "recreate" a value to put in the cache, we don't want to recreate
it, but copying the shared_ptr in use.
To do so we use a (unlimited) store of weak_ptr (aka `WeakStore`)
Every created shared_ptr added to the cache has a weak_ptr ref also stored
in the WeakStore, and we check the WeakStore before creating the value.
The prefix will be used to parse a "query to select book" in different context.
For now we have only one context : selecting books for the catalog search.
But we will want to select books to do fulltext search on them
(will be done in later commit)