This resulted in compiler aided discovery of all call sites where the
default values were used. For OPDS/catalog requests now passing true for the
`raw` parameter, since XML content isn't supposed to undergo any
transformations.
Removed the isHomePage param from one of the variants of
`ContentResponse::build()`. The other overload is dangerous since
failing to review&update all of its call site may result in changed
semantics. Will do it in a couple of separate commits.
The next goal is to redirect old-style /book/path/to/entry URLs to
/content/book/path/to/entry, which seemed pretty trivial.
However, given the current handling of some endpoint URLs, more work was
required to ensure that invalid endpoint URLs (e.g. "/random/number" or
"/suggest/fr") are not interpreted as content URLs. Previously, that was
not a user-observable issue, since the result would be an immediate 404
error (except in certain edge cases, like handling the request for
"/random/number" when there is a book with name "random" containing an
article at path "/number"). With redirection of URLs that were assumed
to refer to content a 404 error would be issued for the
transformed URL ("/content/random/number") which may be confusing.
Therefore this change is to ensure the correct routing of endpoint URL
handling.
Book content is now served under /content/book/...
The old access to book content via a top-level URL /book/... is so far
preserved for backward compatibility.
Redirects were changed to use the new URL scheme. Links in the search results
still use the old scheme.
If the server is initialized with a library.xml file, then the id
specified in the XML file is used (rather than the UUID recorded in the
ZIM file).
Note that in test/data/library.xml the book ids are fake and
different from the real ZIM IDs; that file was created for testing
of the /catalog endpoint which doesn't access ZIM content, so the
the same ZIM file zimfile.zim was added to library.xml three times as
three different books (with unique human-friendly ids). This explains
the diff in test/library_server.cpp.
During work on the kiwix-serve front-end, the edit-save-test cycle is
a multistep procedure:
1. build and install libkiwix
2. build kiwix-tools
3. run kiwix-serve
4. reload the web-page in the browser
When making changes in static resources that are served by kiwix-serve
unmodified, the steps 1-3 can be eliminated if kiwix-serve is capable of
serving resources from the file-system. This commit adds such a
functionality to kiwix-serve. Now, if during startup of kiwix-serve the
environment variable `KIWIX_SERVE_CUSTOMIZED_RESOURCES` is defined it is
assumed to point to a file where every line has the following format:
URL MIMETYPE RESOURCE_FILE_PATH
When a request is received by kiwix-serve and its URL matches any of the
URLs read from the customized resource file, then the resource data is
read from the respective file RESOURCE_FILE_PATH and served with
mime-type MIMETYPE.
Though this feature was introduced in order to facilitate the
development of the iframe-based content viewer, it can also be useful to
users who would like to customize the kiwix-serve front-end on their own
(without re-building all of kiwix-serve).
There is some overlap with a feature of the kiwix-compile-resources
script that also allows to override resources. The differences are:
1. The new way of customizing front-end resources has all such resources
listed in a text file and there is a single environment variable
from which the path of that file is read. kiwix-compile-resources
associates a separate environment variable with each resource.
2. The new way uses regular paths to identify a resource. The
kiwix-compile-resources method encodes the resource path by replacing
any non-alphanumeric characters (including the path separator) with
underscores (so that the resulting resource identifier can be used
to construct the name of the environment variable controlling that
resource).
3. The new method allows adding new front-end resources. The old method
only allows to modify existing resources.
4. The new method allows (actually requires) to specify the URL at which
the overriden resource should be served (similarly, the MIME-type can/must
be specified, too). The old method only allows to override the contents of
a resource.
5. The new method only allows to override front-end resources that are
served without any preprocessing by kiwix-serve at runtime. The old
method allows to override template resources as well (note that
internationalization/translation resources cannot be overriden using the
old method, either).
If we keep a reference to a `Reader` it is better to (share) owning
the reference. Else the reader may be deleted after we create the searcher.
This is especially the case now we are creating the `Reader` at demand
and we don't store it in the library's cache.
We have to reuse the query the user give us to generate the
pagination links.
At search result rendering step we don't have access to the query object.
The best place to know which arguments are used to select books
(and so which arguments to keep in the pagination links) is when we
parse the query to select books.
Fix tests (pagination links) with book selector other than "books.id="
(pattern=jazz&books.query.lang=eng)
libzim's search is not thread safe (mainly because xapian is not).
So we must protect our search objects from multi thread calls.
The best way to do this is to associate a mutex to the `zim::Searcher`
and lock the searcher each time we access object derivated from the
searcher (search, results, iterator, ...)
When ConcurrentCache store a shared_ptr we may have shared_ptr in used
while the ConcurrentCache has drop it.
When we "recreate" a value to put in the cache, we don't want to recreate
it, but copying the shared_ptr in use.
To do so we use a (unlimited) store of weak_ptr (aka `WeakStore`)
Every created shared_ptr added to the cache has a weak_ptr ref also stored
in the WeakStore, and we check the WeakStore before creating the value.
The prefix will be used to parse a "query to select book" in different context.
For now we have only one context : selecting books for the catalog search.
But we will want to select books to do fulltext search on them
(will be done in later commit)
- Throw a exception if we cannot extract from string.
(We throw the same exception as `std::sto*`)
- Add a specialization to extract string from string
- Add some unit test
The story of search_results.css
static/skin/search_results.css was extracted from
static/templates/no_search_result.html before the latter was dropped.
static/templates/no_search_result.html in turn seems to be a copied and
edited version of static/templates/search_result.html.
In the context of exploratory work on the internationalization of
kiwix-serve (PR #679) I noticed duplication of inline CSS across those
two templates and intended to eliminated it. That goal was not fully
accomplished (static/templates/search_result.html remained untouched)
because by that time PR #679 grew too big and the efforts were diverted
into splitting it into smaller ones. Thus search_results.css slipped
into one of those small PRs, without making much sense because nothing
really justifies preserving custom CSS in the "Fulltext search unavailable"
error page.
At the same time, it served as the only case where a link to a cacheable
resource is generated in C++ code (rather than found in a template).
This poses certain problems to the handling of cache-ids. A workaround
is to expel the URL into a template so that it is processed by
`kiwix-resources`. This commit merely demonstrates that solution. But
whether it should be preserved (or rather the "Fulltext search
unavailable" page should be deprived of CSS) is questionable.
In the absence of the "userlang" query parameter in the URL, the value
of the "Accept-Language" header is used. However, it is assumed that
"Accept-Language" specifies a single language (rather than a comma
separated list of languages possibly weighted with quality values).
Example:
Accept-Language: fr
// should work
Accept-Language: fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5
// The requested language will be considered to be
// "fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5".
// The i18n code will fail to find resources for such a language
// and will use the default "en" instead.
At this point a potential issue has been revealed. Now we produce
the final HTML via 2-level template expansion
1. Render parameterized messages
2. Render the HTML template
In which templates we should use double mustache "{{}}" (HTML-escaping)
tags and where we may use triple mustache "{{{}}}" (non-escaping) tags?
Introduced a new resource compiler script kiwix-compile-i18n that
processes i18n string data stored in JSON files and generates sorted C++
tables of string keys and values for all languages.
The "Fulltext search unavailable" error page is now generated using the
static/templates/error.html template. Also added two test cases checking
that error page.
404.html no longer contains anything specific to the 404 error and will
henceforth serve (with some enhancements) as a general purpose error
page template.
It is better to directly try to get the `Search` from the cache instead
of getting the `Searcher` first which could be useless in Search already
exist.
SearchInfo is a small helper structure to store information about the
queried search. It regroup already existing information (`patternString`,
geo query, ...) in one structure.
It is also used as key in the cache instead of using a generated string.
Instead of passing the `bookName` and `bookTitle` parameters to
`Response::build_404()`, `withTaskbarInfo()` is applied to its result
when needed. Note, that in `InternalServer::handle_raw()`
`withTaskbarInfo()` was not utilized since the results of the `/raw`
endpoint are not supposed to be decorated with a taskbar.
This was done in preparation for removing the `bookName` and `bookTitle`
parameters from `Response::build_404()`, but since the new function
could already be put to some use in this commit that was done too.
Previously, the seachURL was not encoded.
This resulted in an XSS vulnerability, a concept of proof is:
start kiwix-serve
visit - http://192.168.18.1:8081/"><svg onload="alert(1)">
This would display an alert message.
This encodes the searchURL before passing it to searchSuggestionHtml
We create a cache for SuggestionSearcher very similar to that of FT
searcher. User can specify a custom cache size using the environment
variable SUGGESTION_SEARCHER_CACHE_SIZE. It has a default value of 10%
of the number of books in the library.
We use the new cache template to implement two kind of cache.
1: The Searcher cache is more general in terms of its usage. A Searcher
can be used for multiple searches without much change to itself. We
try to retrieve the searcher and perform searches using it whenever
possible, and if not we put a searcher into the cache. User can
specify a custom cache length by manipulating the environment
variable SEARCHER_CACHE_SIZE. It's default value is 10% of all the
books available.
2: The search cache is much more restricted in terms of usage. It's main
purpose is to avoid re-searching on the searcher during page changes
to generate SearchResultSet of various ranges. User can specify a
custom cache length using the environment variable SEARCH_CACHE_SIZE
with a default value of 2;
Adds a std::map<std::string, std::string> with display names for language codes not given by libicu
Fault language codes are taken from library.kiwix.org
As we still create a `Reader` in the deprecated code of `Library`,
we need a way to create a reader without raising a deprecated warning.
So we create a another constructor with a dummy argument and we use it.
As the `Entry` is still created by `Reader` we need a way to create a
entry without raising a deprecated warning.
To do so we create a second constructor with a dummy argument.
This second constructor is private and is not marked as deprecated so we
can use it.
The HumanReadableId can contains special char (`&`/`=`/...)
As it is used as to create a url in the opds template,
we must url encode it.
- We don't need to encode the book id as it is a uuid, it never contains
special char.
- We don't need to encode the book url as it is read from the library and
the url must already be correctly encoded in the library.xml.
(tests modified accordingly)
kiwix::fileExists only checks for file existence now
kiwix::fileReadable will check if the file is readable (implicitly checking for file existence also)
As the name suggests it, this endpoint is not smart :
It returns the content as it is and only if it is present
(no compatibility or whatever).
The only "smart" thing is to return a redirect if the entry is a redirect.
As we render the entry's xml in a separated steps, we need to pass the
rootLocation to all the internal rendering.
Testing with and without root is not so easy.
I've simply made all server tests using a ROOT prefix.
We can assume that if the ROOT is present everywhere we need it, it will not
when we don't need. (As long as we don't hardcode "ROOT" in the server.)
As a result of this clean-up the /suggest endpoint too stopped
generating confusing 404 Not Found errors (which, like in /meta's case
is not that important). Another functional change is that the "term"
parameter became optional.
Before this fix the /meta endpoint could return a 404 Not Found page
saying
The requested URL "/meta" was not found on this server.
Error cases producing such a result were:
- `/meta?content=NON-EXISTING-BOOK&name=metaname`
- `/meta?content=book&name=BAD-META-NAME`
Now a proper message is shown for each of those cases.
This fix is being done just for consistency (the /meta endpoint is not
a user-facing one and the scripts don't bother about error texts).
Now Response::build_404() takes the URL instead of the entire
RequestContext object. An empty url suppresses the
The requested URL "url" was not found on this server.
part of the error text.
Before this fix the /random endpoint could return a 404 Not Found page
saying
The requested URL "/random" was not found on this server.
Error cases producing such a result were:
- `/random?content=NON-EXISTING-BOOK` (can happen when a server is
restarted or the library is reloaded and the current book is no longer
available).
- Failure of the libkiwix routine for picking a random article.
Now a proper message is shown for each of those cases.
Library became thread-safe with the exception of `getBookById()`
and `getBookByPath()` methods - thread safety in those accessors is
rendered meaningless by their return type (they return a reference
to a book which can be removed any time later by another thread).
Introducing a mutex in `Library` necessitates manually implementing the
move constructor and assignment operator. It's better to still delegate
that work to the compiler to eliminate any possibility of bugs when new
data members are added to `Library`. The trick is to move the data into
an auxiliary class `LibraryBase` and derive `Library` from it.
Originally `LibraryManipulator` was an abstract class completely decoupled
from `Library`. Its `addBookToLibrary()` and `addBookmarkToLibrary()`
methods could be defined in an arbitrary way. Now `LibraryManipulator` has to be
bound to a library object, those methods are no longer virtual, they always
update the library and allow for some additional actions via virtual
functions `bookWasAddedToLibrary()` and `bookmarkWasAddedToLibrary()`.