Skip to content

IndexError: list index out of range error following "Build QA engine" steps and submitting example query #29

@ltfschoen

Description

@ltfschoen

i'm using Elasticsearch 7.11.1, Python 3.7.13

In the "Build QA engine" section, when I respond to the query as follows:

Enter your query here: what does covid-19 cause    

It outputs an error:

WARNING:allennlp.data.fields.sequence_label_field:Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.
  See documentation for `non_padded_namespaces` parameter in Vocabulary.                                    
INFO:elasticsearch:GET http://localhost:9200/ [status:200 request:0.520s]                                   
INFO:elasticsearch:POST http://localhost:9200/elastic_index/_search [status:200 request:0.353s]             
The number of datapacks(including query) is 1         
Traceback (most recent call last):                    
  File "./examples/pipeline/inference/search_cord19.py", line 97, in <module>                               
    data_pack = next(nlp.process_dataset()).get_pack_at(1)                                                  
  File "/home/ubuntu/.pyenv/versions/3.7.13/lib/python3.7/site-packages/forte/data/multi_pack.py", line 491, in get_pack_at
    return self.packs[index]                          
IndexError: list index out of range                   

It seems I'm not reading the datasets at all, even though I tried to read the sample datasets that were provided in the previous step with

python examples/pipeline/indexer/cordindexer.py --data-dir ./data/document_parses/sample_pdf_json 

which output the following really quickly, so it doesn't seem it indexed any data...

WARNING:root:Re-declared a new class named [ConstituentNode], which is probably used in import.                                                                                                                         
INFO:elasticsearch:GET http://localhost:9200/ [status:200 request:0.008s]                                                                                                                                               
/home/ubuntu/.pyenv/versions/3.7.13/lib/python3.7/site-packages/elasticsearch/connection/base.py:200: ElasticsearchWarning: [types removal] Specifying types in bulk requests is deprecated.                            
  warnings.warn(message, category=ElasticsearchWarning)                                                                                                                                                                 
INFO:elasticsearch:POST http://localhost:9200/_bulk?refresh=true [status:200 request:0.338s]

and that directory contains three dataset files:

  • 55736408816d3f956d830854659f24109444a36c.json
  • aadc3e716b6cb0e898953dff056124378b31483c.json
  • ffff73d17bc392ee68f3f16ef37d25579cb99322.json

i also noticed that in the config.yml file for the Indexer, it has fields doc_id and content https://github.com/petuum/composing_information_system/blob/main/examples/pipeline/indexer/config.yml#L3, however the above dataset files don't contain those fields at all, most of the content is in fields title, text, and section, but if i update that config.yml to be the following i get the same outcome

create_index:
  batch_size: 10000
  fields:
    # - doc_id
    # - content
    - title
    - text
    - section

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions