pippo.py 5.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
  1. # %%
  2. from engine.cooccorrenze import cooccorrenze
  3. dataConfig = {
  4. 'data_interface': 'sqlite3',
  5. # 'dbPath': '../../../../db/first_db/',
  6. # 'dbfile_default': 'test1.db',
  7. # 'listOcc': ['Occ00001', 'Occ00002', 'Occ00003']
  8. 'dbPath': '../../../../db/ndg2.gat4/',
  9. 'dbfile_default': 'corpus.db',
  10. "db_encoded": True,
  11. "texts_encoded": True,
  12. 'listOcc': ['Occ00001', 'Occ00002', 'Occ00003', 'Occ00004', 'Occ00005', 'Occ00006', 'Occ00007', 'Occ00008', 'Occ00009', 'Occ00010', 'Occ00011', 'Occ00012', 'Occ00013', 'Occ00014']
  13. }
  14. prova = cooccorrenze(dataConfig)
  15. ricerca_prova=[['cavaliere', 0, 0, 0], ['corte', 0, 0, 0]]
  16. #prova.ricerca_cooccorrenze (ricerca_prova, intervallo, periodo, ordinate)
  17. prova.ricerca_cooccorrenze (ricerca_prova, 10, 0, 0)
  18. # %%
  19. ######
  20. # KORA
  21. '''
  22. Suppose I have two different lists of Python dicts, all of which contain an integer field, for instance named "position". What is the fastest algorithm to find all pairs of objects satisfying the following conditions:
  23. - Each pair contains a dict from the first list and a dict from the second list
  24. - The absolute difference between the position field in the first member of the pair and the second member of the pair is less than a given integer value
  25. ?
  26. '''
  27. ##########
  28. # Chat GPT
  29. '''
  30. One possible approach to solve this problem is to use a nested loop to compare each element from the first list with each element from the second list, and check if the absolute difference between their "position" field is less than the given integer value. However, this approach has a time complexity of O(n^2), which can be slow for large datasets.
  31. A more efficient approach is to use a hash table to store the elements from the second list, and then iterate over the elements from the first list and check if there are any elements in the hash table that satisfy the condition. This approach has a time complexity of O(n), which is much faster than the nested loop approach.
  32. Here's an example implementation of this approach:
  33. '''
  34. def find_pairs(list1, list2, max_diff):
  35. # Create a dictionary to store the elements from the second list
  36. dict2 = {}
  37. for obj in list2:
  38. dict2[obj['position']] = obj
  39. # Iterate over the elements from the first list and check for matching pairs
  40. pairs = []
  41. for obj1 in list1:
  42. for pos in range(obj1['position'] - max_diff, obj1['position'] + max_diff + 1):
  43. if pos in dict2:
  44. obj2 = dict2[pos]
  45. if abs(obj1['position'] - obj2['position']) <= max_diff:
  46. pairs.append((obj1, obj2))
  47. return pairs
  48. '''
  49. In this implementation, we first create a dictionary dict2 to store the elements from the second list, where the keys are the "position" field of each element. We then iterate over the elements from the first list, and for each element we check the range of possible positions that could match with it based on the given maximum difference. We then look up the corresponding elements from the second list in the dict2 dictionary, and check if they satisfy the condition. If so, we add the pair to the pairs list.
  50. This implementation has a time complexity of O(n), where n is the total number of elements in the two lists, which is faster than the nested loop approach with a time complexity of O(n^2).
  51. '''
  52. ######
  53. # KORA
  54. '''
  55. Is there any approach which is faster than O(n)?
  56. '''
  57. ##########
  58. # Chat GPT
  59. '''
  60. No, there is no algorithm that can solve this problem in less than O(n) time complexity in the worst case. This is because we need to examine every element in both lists at least once in order to find all pairs that satisfy the condition, and there could be up to n^2 possible pairs to examine.
  61. However, it is possible to optimize the algorithm to reduce the constant factor and make it faster in practice. One way to do this is to first sort the two lists on the "position" field, which can be done in O(n log n) time complexity using a sorting algorithm such as merge sort. We can then use a modified merge algorithm to compare the two sorted lists and find all pairs that satisfy the condition. This approach has a time complexity of O(n log n), which is faster than the O(n^2) approach but still not faster than O(n) in the worst case.
  62. Here's an example implementation of this approach:
  63. '''
  64. def find_pairs(list1, list2, max_diff):
  65. # Sort both lists on the "position" field
  66. list1 = sorted(list1, key=lambda x: x['position'])
  67. list2 = sorted(list2, key=lambda x: x['position'])
  68. # Initialize pointers for both lists
  69. i, j = 0, 0
  70. # Iterate over the lists and compare elements to find pairs that satisfy the condition
  71. pairs = []
  72. while i < len(list1) and j < len(list2):
  73. diff = abs(list1[i]['position'] - list2[j]['position'])
  74. if diff <= max_diff:
  75. pairs.append((list1[i], list2[j]))
  76. i += 1
  77. elif list1[i]['position'] < list2[j]['position']:
  78. i += 1
  79. else:
  80. j += 1
  81. return pairs
  82. ##########
  83. # Chat GPT
  84. '''
  85. In this implementation, we first sort both lists on the "position" field using the sorted function and a lambda function to extract the "position" field. We then initialize two pointers i and j for both lists, and iterate over the lists using a while loop. At each iteration, we compare the "position" fields of the current elements from both lists, and if the absolute difference is less than or equal to the given maximum difference, we add the pair to the pairs list and increment the pointer for the first list. If the difference is greater than the maximum, we increment the pointer for the list with the smaller current "position" field.
  86. This implementation has a time complexity of O(n log n), which is faster than the O(n^2) approach for large datasets. However, it still has a worst-case time complexity of O(n) and may not provide a significant speedup for small datasets.
  87. '''
  88. # %%